Student and Faculty Publications

Publication Date

12-1-2022

Journal

2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Abstract

The Orphanet Rare Disease Ontology (ORDO) provides a structured vocabulary encapsulating rare diseases. Downstream applications of ORDO depend on its accuracy to effectively perform their tasks. In this paper, we implement an automated quality assurance pipeline to identify missing is-a relations in ORDO. We first obtain lexical features from concept names. Then we generate related and unrelated feature sharing concept-pairs, where a feature sharing concept-pair can further generate derived term-pairs. If an unrelated and related feature sharing concept-pair generate the same derived term-pair, then we suggest a potential missing is-a relation between the unrelated feature sharing concept-pair. Applying this approach on the 2022-06-27 release of ORDO, we obtained 705 potential missing is-a relations. Leveraging external ontological information in the Unified Medical Language System, we validated 164 missing is-a relations. This indicates that our approach is a promising way to audit is-a relations in ORDO, even though further domain expert evaluation is still needed to validate the remaining potential missing is-a relations identified.

Keywords

Rare diseases, Orphanet, Orphanet rare disease ontology, ontology quality assurance

Comments

PMID: 36776767

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.