Graduation Date

Summer 8-2021

Degree Name

Doctor of Philosophy (PhD)

School Name

The University of Texas School of Biomedical Informatics at Houston

Advisory Committee

Hua Xu, PhD


In order to enhance the data interoperability, an expeditious and accurate standardization solution is highly desirable for naming rapidly emerging novel lab tests, and thus diminishes confusion in early responses to pandemic outbreaks. This is a preliminary study to explore the roles and implementation of medical informatics technology, especially natural language processing and ontology methods, in standardizing information about emerging lab tests during a pandemic, thereby facilitating rapid responses to the pandemic. The ultimate goal of this study is to develop an informatics framework for rapid standardization of lab testing names during a pandemic to better prepare for future public health threats. We first constructed an information model for lab tests approved during the COVID-19 pandemic and built a named entity recognition tool that can automatically extract lab test information specified in the information model from the Emergency Use Authorization(EUA)documents of the U.S. Food and Drug Administration (FDA), thus creating a catalog of approved lab tests with detailed information. To facilitate the standardization of lab testing data in electronic health records, we further developed the COVID-19 TestNorm, a tool that normalizes the names of various COVID-19 lab testing used by different healthcare facilities into standard Logical Observation Identifiers Names and Codes (LOINC). The overall accuracy of COVID-19 TestNorm on the development set was 98.9%, and on the independent test set was 97.4%. Lastly, we conducted a clinical study on COVID-19 re-positivity to demonstrate the utility of standardized lab test information in supporting clinical research. We believe that the result of my study indicates great a potential of medical informatics technologies for facilitating rapid responses to both current and future pandemics.


This dissertation has been published in 2 journals:

1.Dong X, Li J, Soysal E, Bian J, DuVall SL, Hanchrow E, Liu H, Lynch KE, Matheny M, Natarajan K, Ohno-Machado L, Pakhomov S, Reeves RM, Sitapati AM, Abhyankar S, Cullen T, Deckard J, Jiang X, Murphy R, Xu H. COVID-19 TestNorm: A tool to normalize COVID-19 testing names to LOINC codes. J Am Med Inform Assoc. 2020 Jul 1;27(9):1437-1442. doi: 10.1093/jamia/ocaa145. PMID: 32569358; PMCID: PMC7337837.

2. Dong X, Zhou Y, Shu XO, Bernstam EV, Stern R, Aronoff DM, Xu H, Lipworth L. Comprehensive Characterization of COVID-19 Patients with Repeatedly Positive SARS-CoV-2 Tests Using a Large U.S. Electronic Health Record Database. Microbiol Spectr. 2021 Sep 3;9(1):e0032721. doi: 10.1128/Spectrum.00327-21. Epub 2021 Aug 18. PMID: 34406805; PMCID: PMC8552669.


COVID-19, standardization, diagnostic test results, natural language processing, ontology, Electronic Health Records, LOINC, SNOME CT, Bi-LSTM-CRF, BERTD