Author ORCID Identifier

0000-0002-3935-8865

Date of Graduation

5-2025

Document Type

Dissertation (PhD)

Program Affiliation

Quantitative Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Peng Wei, Ph.D.

Committee Member

Ryan Sun, Ph.D.

Committee Member

Christine B. Peterson, Ph.D.

Committee Member

Ken Chen, Ph.D.

Committee Member

Gaiane M. Rauch, M.D., Ph.D.

Committee Member

Jingfei Ma, Ph.D.

Abstract

Mediation analysis is a widely used statistical method for examining how molecular traits, such as gene or protein expression, act as intermediaries linking an exposure to a health outcome. For example, it can help explain how smoking affects disease risk through molecular changes. The rapid progress in high-throughput omics profiling technologies and large-scale epidemiology consortia, such as the Trans-Omics for Precision Medicine (TOPMed) program from the National Heart, Lung and Blood Institute (NHLBI) and UK Biobank, now has resulted in an extensive accumulation of genomic data for biomedical research and analysis. At the same time, it poses significant methodological challenges, including computational inefficiency, inter-study heterogeneity, and unmeasured confounding. This dissertation addresses these challenges through three methodological innovations designed to advance high-dimensional mediation analysis for omics mediators. First, a computationally efficient two-stage framework using cross-fitting is introduced for the variance-based R-squared total mediation effect measure, which is specifically developed for high-dimensional omics mediators. The method applies variable selection for true mediator identification and ordinary least squares regression for estimation. A Wald-type confidence interval is then constructed using a newly derived closed-form asymptotic distribution, eliminating the need for resampling techniques like bootstrapping. The proposed method achieves coverage probability comparable to existing methods while significantly improving computational efficiency. Next, a novel meta-analysis framework is developed to estimate the R-squared-based total mediation effects of high-dimensional mediators while accounting for inter-study heterogeneity. This framework relies only on summary statistics from individual studies within large-scale consortia or biobanks. We show that using summary statistics alone achieves promising coverage probability and comparable bias to individual-level data analysis, which requires more computational resources and financial resources. Finally, a multivariate Mendelian randomization (MR) framework is introduced for estimating R-squared-based causal mediation effects in high-dimensional multi-omics settings. This method leverages expression quantitative trait loci (eQTLs) as instrumental variables to reduce bias caused by unmeasured confounders. Extensive simulations validate that MR-based method outperforms the standard linear regression-based method in the presence of unmeasured confounders. Additionally, these methods are applied to major studies within the TOPMed program, including the Framingham Heart Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. These studies contain over 7,000 participants from diverse populations with multi-omics data, including transcriptomics and proteomics. The proposed methods are used to identify gene and protein expression as mediators of age-, sex-, and obesity-related effects on cardiovascular traits (e.g., high-density lipoprotein (HDL) cholesterol and systolic blood pressure). To further investigate the biological mechanisms, downstream analyses such as pathway enrichment analysis and functional annotation, canonical correlation analysis, and causal direction analysis are conducted. These analyses provide deeper insights into our findings and help validate the biological plausibility and robustness. In summary, this dissertation provides a cohesive methodological framework for high-dimensional multi-omics mediation analysis, elucidating molecular mechanisms underlying complex diseases and offering foundational insights for precision medicine and therapeutic target discovery.

Keywords

Mediation analysis, high-dimensional analysis, multi-omics, causal inference, meta-analysis, Mendelian randomization

Available for download on Wednesday, April 22, 2026

Included in

Biostatistics Commons

Share

COinS