Dissertations & Theses (Open Access)

Graduation Date

Spring 2025

Degree Name

Doctor of Philosophy (PhD)

School Name

McWilliams School of Biomedical Informatics at UTHealth Houston

Advisory Committee

Licong Cui, PhD

Abstract

Biomedical ontologies or terminologies not only serve as a part of the metadata standards for describing data in the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), but also play a vital role in downstream applications such as cohort identification from electronic health records (EHR). However, there are two critical barriers that may lead to ambiguity, complexity, and inaccuracies in such ontology-based downstream applications. The first barrier is the quality of the ontology. Despite efforts by ontology curators to ensure ontology accuracy and comprehensiveness, errors and inconsistencies are unavoidable. The second barrier is the semantic heterogeneity since human experts may use different natural language terms to define the same ontological entities. It is critical to develop effective methods for the continued enhancement of the qualities of biomedical terminologies and to develop effective methods to establish meaningful connections between heterogeneous biomedical concepts.

This dissertation introduces a substring replacement approach for identifying missing IS-A relations in biomedical terminologies, an order-preserving intersection method leveraging non-lattice subgraphs to detect missing concepts, and a Graph Convolutional Network (GCN) and Pre-trained Language Model (PLM)-based approach to identify synonymous concept pairs across different biomedical terminologies. Additionally, this work leverages large-scale EHR data to assess the impact of terminology quality on cohort identification applications.

The research presented in this dissertation addresses critical challenges in biomedical ontology quality and interoperability. By doing so, this research enhances downstream biomedical ontology-driven applications, facilitates the clear exchange of health information, and ultimately supports more accurate and reliable clinical and research outcomes.

Keywords

Biomedical ontologies, SNOMED CT, UMLS Metathesaurus, Unified Medical Language System (UMLS), IS-A relations, Graph Convolutional Network (GCN), Pre-trained Language Models (PLMs), Electronic Health Record

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.