Sparse structural equation models for causal inference in genetic studies of multiple phenotypes with next-generation sequencing data
Abstract
Despite their differences in selection of specific methods for estimation, the widely used methods for genetic analysis of complex traits do not detect, distinguish and characterize the true biological, mediated and spurious pleiotropic effects and are unable unravel causal structures among multiple phenotype and genotype variants. Overcome these limitations, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we further extend the sparse multivariate SEMs to sparse functional SEMs. To improve computational efficiency and reduce dimension of the data, functional data analysis techniques and thealternative direction methods of multiplier (ADMM) are used to develop a novel sparse two-stage least square estimation method for the structure and parameter estimation of the SEMs with the large size. Borrowing causal information from the SEMs and maximizing the power of marginal association analysis, we develop a novel statistic for testing association of genetic variants with multiple variants. By large scale simulations we show that the true network structure can be accurately recovered by our models and the new statistics has higher power than the PCA-based statistics.The proposed method is applied to exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) with 11 phenotypes, which identify a network with140 genes connected to 11 phenotypes and 15 genes with pleiotropic genetic effects and demonstrate that the proposed statistic has smaller P-values than the PCA-based statistics for testing marginal associations.
Subject Area
Biostatistics|Genetics
Recommended Citation
Rahman, Mohammad Lutfur, "Sparse structural equation models for causal inference in genetic studies of multiple phenotypes with next-generation sequencing data" (2015). Texas Medical Center Dissertations (via ProQuest). AAI3720095.
https://digitalcommons.library.tmc.edu/dissertations/AAI3720095