Faculty, Staff and Student Publications
Language
English
Publication Date
1-1-2024
Journal
AMIA Annual Symposium
PMID
41726504
PMCID
PMC12919620
PubMedCentral® Posted Date
2-14-2026
PubMedCentral® Full Text Version
Post-print
Abstract
Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. De-BERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT's quality.
Keywords
Systematized Nomenclature of Medicine, Natural Language Processing, Humans
Published Open-Access
yes
Recommended Citation
Hao, Xubing; Abeysinghe, Rashmie; Shi, Jay; et al., "Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs" (2024). Faculty, Staff and Student Publications. 3745.
https://digitalcommons.library.tmc.edu/uthmed_docs/3745