Duncan NRI Faculty and Staff Publications
Language
English
Publication Date
1-1-2025
Journal
Bioinformatics Advances
DOI
10.1093/bioadv/vbaf148
PMID
40666130
PMCID
PMC12263109
PubMedCentral® Posted Date
6-24-2025
PubMedCentral® Full Text Version
Post-print
Abstract
Motivation: Rare diseases remain difficult to diagnose due to limited patient data and genetic diversity, with many cases remaining undiagnosed despite advances in variant prioritization tools. While large language models have shown promise in medical applications, their optimal application for trustworthy and accurate gene prioritization downstream of modern prioritization tools has not been systematically evaluated.
Results: We benchmarked various language models for gene prioritization using multi-agent and Human Phenotype Ontology classification approaches to categorize patient cases by phenotype-based solvability levels. To address language model limitations in ranking large gene sets, we implemented a divide-and-conquer strategy with mini-batching and token limiting for improved efficiency. GPT-4 outperformed other language models across all patient datasets, demonstrating superior accuracy in ranking causal genes. Multi-agent and Human Phenotype Ontology classification approaches effectively distinguished between confidently-solved and challenging cases. However, we observed bias toward well-studied genes and input order sensitivity as notable language model limitations. Our divide-and-conquer strategy enhanced accuracy, overcoming positional and gene frequency biases in literature. This framework optimized the overall process for identifying disease-causal genes compared to baseline evaluation, better enabling targeted diagnostic and therapeutic interventions and streamlining diagnosis of rare genetic disorders.
Availability and implementation: Software and additional material is available at: https://github.com/LiuzLab/GPT-Diagnosis
Published Open-Access
yes
Recommended Citation
Neeley, Matthew B; Qi, Guantong; Wang, Guanchu; et al., "Survey and Improvement Strategies for Gene Prioritization With Large Language Models" (2025). Duncan NRI Faculty and Staff Publications. 162.
https://digitalcommons.library.tmc.edu/duncar_nri_pub/162
Included in
Genetic Phenomena Commons, Medical Genetics Commons, Neurology Commons, Neurosciences Commons