Author ORCID Identifier

0000-0003-4112-2074

Date of Graduation

5-2024

Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Peng Wei

Abstract

Environmental exposures such as cigarette smoking influence health outcomes through intermediate molecular phenotypes, such as the methylome, transcriptome, and metabolome. Mediation analysis is a useful tool for investigating the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposures and health outcomes. Rapid development of high-throughput technologies have made mediation analysis of multi-omics data critical to gain groundbreaking insights into the biological mechanisms underlying the disease etiology. This dissertation aims to develop mediation analysis methods that utilize the enormous amount of multi-omics data in assessing mechanisms of disease etiology. It contains three projects where I propose advanced mediation analysis frameworks for multi-omics data in non-linear models. The first and second projects propose novel mediation analysis frameworks of high-dimensional mediators for survival outcomes in Cox regression models and binary outcomes in logistic regression models, respectively. I leverage a second-moment-based measure analogous to the R-squared for linear models to quantify the total mediation effect. In addition, I develop a variable selection procedure for high-dimensional data to reduce bias introduced by non-mediators. Extensive simulations showed good performance of the proposed methods in estimating the total mediation effect and identifying true mediators. By applying the proposed methods to the Framingham Heart Study and diffuse large B-cell lymphoma genomics data set, I demonstrate how the proposed methods can be used to conduct high-dimensional mediation analysis for omics data, such as transcriptomics and metabolomics, in assessing mechanisms of disease etiology. In the last project, I propose an integrative mediation analysis framework for multi-omics mediators. I incorporate biological pathway information in knowledge-based dimension reduction to project unmatched multi-omics data, such as methylomics and transcriptomics, into a common space of lower dimension formed by pathways. Through simulation studies, I show that the proposed pathway-based approach is able to identify mediators with higher true positive rates compared to analysis of individual data types. Applying this approach to DNA methylation and gene expression data from the Framingham Heart Study, I identified nine KEGG pathways as significant mediators, providing a deeper insight into the biological processes through which aging influences the cardiovascular disease risk.

Keywords

mediation analysis

Included in

Biostatistics Commons

Share

COinS