Language
English
Publication Date
8-18-2025
Journal
Genome Medicine
DOI
10.1186/s13073-025-01521-w
PMID
40826123
PMCID
PMC12359922
PubMedCentral® Posted Date
8-18-2025
PubMedCentral® Full Text Version
Post-print
Abstract
Background: Diagnosing rare genetic disorders relies on precise phenotypic and genotypic analysis, with the Human Phenotype Ontology (HPO) providing a standardized language for capturing clinical phenotypes. Rule-based HPO extraction tools use concept recognition to automatically identify phenotypes, but they often struggle with incomplete phenotype assignment, requiring significant manual review. While large language models (LLMs) hold promise for more context-driven phenotype extraction, they are prone to errors and "hallucinations," making them less reliable without further refinement. We present RAG-HPO, a Python-based tool that leverages retrieval-augmented generation (RAG) to elevate accuracy of HPO term assignment by LLM. This approach bypasses the limitations of baseline models and eliminates the need for time- and resource-intensive fine-tuning. RAG-HPO integrates a dynamic vector database, containing > 54,000 phenotypic phrases mapped to HPO IDs, which allows real-time retrieval and contextual matching. The RAG-HPO workflow begins by extracting phenotypic phrases from clinical text via an LLM and then matching them via semantic similarity to entries within the database. The best term matches are returned to the LLM as context for final HPO term assignment of each phrase.
Results: Performance was benchmarked on 112 published case reports with 1792 manually assigned HPO terms and compared to Doc2HPO, ClinPhen, and FastHPOCR. In evaluations, RAG-HPO + LLaMa-3.1 70B achieved a mean precision of 0.81, recall of 0.76, and an F1 score of 0.78-significantly surpassing conventional tools (p < 0.00001). RAG-HPO returned 1648 terms, of which 19.1% (315) were false positives that did not exactly match our manually annotated standard. Among these, < 1% (1/315) represented hallucinations, and 1.3% (4/315) represented terms with no ontological relationship to the desired target; the remaining false positives (95.2%, 300/315) were broader ancestor terms of the target term, which may still be relevant to users in many contexts.
Conclusions: RAG-HPO is a user-friendly, adaptable tool designed for secure evaluation of clinical text and outperforms standard HPO-matching tools in precision, recall, and F1. Its enhanced precision and recall represent a substantial advancement in phenotypic analysis, accelerating the identification of genetic mechanisms underlying rare diseases and driving progress in genetic research and clinical genomics. RAG-HPO is available at https://github.com/PoseyPod/RAG-HPO .
Keywords
Humans, Phenotype, Biological Ontologies, Language, Software, Large Language Models, Large language models (LLMs), Retrieval-augmented generation (RAG), Phenotyping, Human Phenotype Ontology (HPO), Natural language processing (NLP), Clinical genomics, Generative pre-trained transformer (GPT), Generative AI, LLaMa-3
Published Open-Access
yes
Recommended Citation
Garcia, Brandon T; Westerfield, Lauren; Yelemali, Priya; et al., "Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval-Augmented Generation" (2025). Faculty and Staff Publications. 5068.
https://digitalcommons.library.tmc.edu/baylor_docs/5068
Included in
Genetic Phenomena Commons, Genetic Processes Commons, Genetic Structures Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons