Student and Faculty Publications
Publication Date
1-1-2022
Journal
AMIA Summits on Translational Science Proceedings
Abstract
Auditing the Human Phenotype Ontology (HPO) is necessary to provide accurate terminology for its use in clinical research. We investigate an approach leveraging the lexical features of concepts in HPO to identify missing IS-A relations among HPO concepts. We first model the names of HPO concepts as sets of words in lower case. Then, we generate two types of concept-pairs which have at least a single common word: (1) Linked concept-pairs generated from concept-pairs having an IS-A relation; (2) Unlinked concept-pairs generated from concept-pairs without an IS- A relation. Concept-pairs generate Derived Term Pairs (DTPs) emphasizing unique lexical information of each concept. If a linked concept-pair and an unlinked concept-pair generate the same DTP, then we suggest a potential missing IS-A relation among the unlinked concept-pair. Applying our approach to the 2022-02-14 release of HPO, we uncovered 2,516 potential missing IS-A relations in HPO. We validated 59 missing IS-A relations leveraging the Unified Medical Language System (UMLS) by mapping the concept-pair to UMLS concepts and verifying whether UMLS records an IS-A relation between the pair of concepts.
Keywords
Humans, Unified Medical Language System, Phenotype
Comments
PMID: 37128366