Faculty, Staff and Students Publications

Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval-Augmented Generation

Language

English

Publication Date

8-18-2025

Journal

Genome Medicine

DOI

10.1186/s13073-025-01521-w

PMID

40826123

PMCID

PMC12359922

PubMedCentral® Posted Date

8-18-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Background: Diagnosing rare genetic disorders relies on precise phenotypic and genotypic analysis, with the Human Phenotype Ontology (HPO) providing a standardized language for capturing clinical phenotypes. Rule-based HPO extraction tools use concept recognition to automatically identify phenotypes, but they often struggle with incomplete phenotype assignment, requiring significant manual review. While large language models (LLMs) hold promise for more context-driven phenotype extraction, they are prone to errors and "hallucinations," making them less reliable without further refinement. We present RAG-HPO, a Python-based tool that leverages retrieval-augmented generation (RAG) to elevate accuracy of HPO term assignment by LLM. This approach bypasses the limitations of baseline models and eliminates the need for time- and resource-intensive fine-tuning. RAG-HPO integrates a dynamic vector database, containing > 54,000 phenotypic phrases mapped to HPO IDs, which allows real-time retrieval and contextual matching. The RAG-HPO workflow begins by extracting phenotypic phrases from clinical text via an LLM and then matching them via semantic similarity to entries within the database. The best term matches are returned to the LLM as context for final HPO term assignment of each phrase.

Results: Performance was benchmarked on 112 published case reports with 1792 manually assigned HPO terms and compared to Doc2HPO, ClinPhen, and FastHPOCR. In evaluations, RAG-HPO + LLaMa-3.1 70B achieved a mean precision of 0.81, recall of 0.76, and an F1 score of 0.78-significantly surpassing conventional tools (p < 0.00001). RAG-HPO returned 1648 terms, of which 19.1% (315) were false positives that did not exactly match our manually annotated standard. Among these, < 1% (1/315) represented hallucinations, and 1.3% (4/315) represented terms with no ontological relationship to the desired target; the remaining false positives (95.2%, 300/315) were broader ancestor terms of the target term, which may still be relevant to users in many contexts.

Conclusions: RAG-HPO is a user-friendly, adaptable tool designed for secure evaluation of clinical text and outperforms standard HPO-matching tools in precision, recall, and F1. Its enhanced precision and recall represent a substantial advancement in phenotypic analysis, accelerating the identification of genetic mechanisms underlying rare diseases and driving progress in genetic research and clinical genomics. RAG-HPO is available at https://github.com/PoseyPod/RAG-HPO .

Keywords

Humans, Phenotype, Biological Ontologies, Language, Software, Large Language Models, Large language models (LLMs), Retrieval-augmented generation (RAG), Phenotyping, Human Phenotype Ontology (HPO), Natural language processing (NLP), Clinical genomics, Generative pre-trained transformer (GPT), Generative AI, LLaMa-3

Published Open-Access

yes

Recommended Citation

Garcia, Brandon T; Westerfield, Lauren; Yelemali, Priya; et al., "Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval-Augmented Generation" (2025). Faculty, Staff and Students Publications. 5068.
https://digitalcommons.library.tmc.edu/baylor_docs/5068

Download

Included in

Genetic Phenomena Commons, Genetic Processes Commons, Genetic Structures Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons

COinS

Faculty, Staff and Students Publications

Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval-Augmented Generation

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Students Publications

Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval-Augmented Generation

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library