Faculty, Staff and Student Publications

Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs

Language

English

Publication Date

1-1-2024

Journal

AMIA Annual Symposium

PMID

41726504

PMCID

PMC12919620

PubMedCentral® Posted Date

2-14-2026

PubMedCentral® Full Text Version

Post-print

Abstract

Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. De-BERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT's quality.

Keywords

Systematized Nomenclature of Medicine, Natural Language Processing, Humans

Published Open-Access

yes

Recommended Citation

Hao, Xubing; Abeysinghe, Rashmie; Shi, Jay; et al., "Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs" (2024). Faculty, Staff and Student Publications. 3745.
https://digitalcommons.library.tmc.edu/uthmed_docs/3745

Download

Included in

Medical Sciences Commons

COinS

Faculty, Staff and Student Publications

Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs

Language

Publication Date

Journal

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs

Authors

Language

Publication Date

Journal

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library