Author ORCID Identifier


Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Peng Wei


Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. Rapid development of high-throughput technologies have made mediation analysis of multi-omics data critical to gain groundbreaking insights into the biological mechanisms underlying the disease etiology. This dissertation aims to develop mediation analysis methods that utilize the enormous amount of multi-omics data in assessing mechanisms of disease etiology. It contains three projects where I propose advanced mediation analysis frameworks for multi-omics data in non-linear models. The first and second projects propose novel mediation analysis frameworks of high-dimensional mediators for survival outcomes in Cox regression models and binary outcomes in logistic regression models, respectively. I leverage a second-moment-based measure analogous to the R-squared for linear models to quantify the total mediation effect. In addition, I develop a variable selection procedure for high-dimensional data to reduce bias introduced by non-mediators. Extensive simulations showed good performance of the proposed methods in estimating the total mediation effect and identifying true mediators. By applying the proposed methods to the Framingham Heart Study and diffuse large B-cell lymphoma genomics data set, I demonstrate how the proposed methods can be used to conduct high-dimensional mediation analysis for omics data, such as transcriptomics and metabolomics, in assessing mechanisms of disease etiology. In the last project, I propose an integrative mediation analysis framework for multi-omics mediators. I incorporate biological pathway information in knowledge-based dimension reduction to project unmatched multi-omics data, such as methylomics and transcriptomics, into a common space of lower dimension formed by pathways. Through simulation studies, I show that the proposed pathway-based approach is able to identify mediators with higher true positive rates compared to analysis of individual data types. Applying this approach to DNA methylation and gene expression data from the Framingham Heart Study, I identified nine KEGG pathways as significant mediators, providing a deeper insight into the biological processes through which aging influences the cardiovascular disease risk.


mediation analysis

Included in

Biostatistics Commons



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.