Addressing the Analytical and Computational Challenges Using Machine Learning in Biomedical Research
Author ORCID Identifier
0000-0002-1870-0019
Date of Graduation
12-2023
Document Type
Dissertation (PhD)
Program Affiliation
Quantitative Sciences
Degree Name
Doctor of Philosophy (PhD)
Advisor/Committee Chair
Xuelin Huang
Committee Member
Ziyi Li
Committee Member
Bing Z Carter
Committee Member
Wei Peng
Committee Member
Ruitao Lin
Abstract
In the contemporary healthcare field, professionals are confronted with an ever-growing volume of clinical data stored in electronic health records, alongside the genomic data stemming from laboratory experiments. As a response to this deluge of data, the application of machine learning (ML) techniques is gaining popularity since ML techniques have demonstrated an exceptional proficiency in processing big data and deciphering complex nonlinear patterns that are intrinsic to biomedical research.
My research leverages ML's capabilities to address the computational challenges spanning diverse areas, including adaptive clinical trial designs, survival analysis, and high-dimensional genetic data analysis. Specifically, Chapter 2 focused on the application of ML in response-adaptive randomization designs. Compared to a traditional equal-randomization design, adaptive randomized trials allocate more patients to the superior treatment arm and increase the overall response rate. In Chapter 3, we proposed a statistical model to impute survival times for censored observations, allowing for a direct application of any ML method in the downstream analysis. We further improve the accuracy of our method using ML regression built on prognostic covariates. In Chapter 4, we developed an artificial neural network-based framework for analyzing longitudinal single-cell RNA sequencing data. Our pipeline achieves: (1) cross-time points cell annotation, (2) detection of novel cell type emerged over time, (3) visualization of cell population evolution, (4) identification of temporal differentially expressed genes. In each chapter, we provide both simulation studies and real-world application results for our methods. In the long run, we aim to lessen the gap between ML and biomedical research, and facilitate healthcare studies.
Keywords
Biostatistics, machine learning, clinical trial design, adaptive randomization, survival analysis, missing data, bioinformatics, high-dimensional data, scRNA data
Included in
Biostatistics Commons, Clinical Trials Commons, Data Science Commons, Public Health Commons, Survival Analysis Commons