Author ORCID Identifier

0000-0002-1870-0019

Date of Graduation

12-2023

Document Type

Dissertation (PhD)

Program Affiliation

Quantitative Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Xuelin Huang

Committee Member

Ziyi Li

Committee Member

Bing Z Carter

Committee Member

Wei Peng

Committee Member

Ruitao Lin

Abstract

In the contemporary healthcare field, professionals are confronted with an ever-growing volume of clinical data stored in electronic health records, alongside the genomic data stemming from laboratory experiments. As a response to this deluge of data, the application of machine learning (ML) techniques is gaining popularity since ML techniques have demonstrated an exceptional proficiency in processing big data and deciphering complex nonlinear patterns that are intrinsic to biomedical research.

My research leverages ML's capabilities to address the computational challenges spanning diverse areas, including adaptive clinical trial designs, survival analysis, and high-dimensional genetic data analysis. Specifically, Chapter 2 focused on the application of ML in response-adaptive randomization designs. Compared to a traditional equal-randomization design, adaptive randomized trials allocate more patients to the superior treatment arm and increase the overall response rate. In Chapter 3, we proposed a statistical model to impute survival times for censored observations, allowing for a direct application of any ML method in the downstream analysis. We further improve the accuracy of our method using ML regression built on prognostic covariates. In Chapter 4, we developed an artificial neural network-based framework for analyzing longitudinal single-cell RNA sequencing data. Our pipeline achieves: (1) cross-time points cell annotation, (2) detection of novel cell type emerged over time, (3) visualization of cell population evolution, (4) identification of temporal differentially expressed genes. In each chapter, we provide both simulation studies and real-world application results for our methods. In the long run, we aim to lessen the gap between ML and biomedical research, and facilitate healthcare studies.

Keywords

Biostatistics, machine learning, clinical trial design, adaptive randomization, survival analysis, missing data, bioinformatics, high-dimensional data, scRNA data

Share

COinS