Student and Faculty Publications
Publication Date
12-1-2022
Journal
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Abstract
The Orphanet Rare Disease Ontology (ORDO) provides a structured vocabulary encapsulating rare diseases. Downstream applications of ORDO depend on its accuracy to effectively perform their tasks. In this paper, we implement an automated quality assurance pipeline to identify missing is-a relations in ORDO. We first obtain lexical features from concept names. Then we generate related and unrelated feature sharing concept-pairs, where a feature sharing concept-pair can further generate derived term-pairs. If an unrelated and related feature sharing concept-pair generate the same derived term-pair, then we suggest a potential missing is-a relation between the unrelated feature sharing concept-pair. Applying this approach on the 2022-06-27 release of ORDO, we obtained 705 potential missing is-a relations. Leveraging external ontological information in the Unified Medical Language System, we validated 164 missing is-a relations. This indicates that our approach is a promising way to audit is-a relations in ORDO, even though further domain expert evaluation is still needed to validate the remaining potential missing is-a relations identified.
Keywords
Rare diseases, Orphanet, Orphanet rare disease ontology, ontology quality assurance
Comments
PMID: 36776767