Date of Award


Degree Name

Doctor of Philosophy (PhD)



Second Advisor


Third Advisor



Statistical analysis has experienced significant progress on association study, but it remains elusive to understand the etiology and mechanism of complex phenotypes. As a major analytical platform, association analysis may hamper the theoretic development of biomedical science and its application. Thus, many researchers suggest making the transition from association to causation. The mainstream of research in genetics and epigenetics data analysis focuses on statistical association or exploring statistical dependence between variables. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the signals identified by association analysis can only explain a small proportion of the heritability of complex diseases. A large fraction of risk genetic variants is still hidden. Finding causal SNPs only by searching the set of associated SNPs

may miss many causal variants. Using association analysis as a major analytical platform for the complex data analysis is a key issue that hampers the theoretic development of genomic science and its application in practice. Causality shapes how we view and understand mechanism of complex diseases. To explore bivariate causal discovery, I will introduce independence of cause and mechanism (ICM) as a basic principle, using additive noise model (ANM) as a major tool for bivariate causal discovery. Large-scale simulations will be performed to evaluate the feasibility of the ANM for bivariate causal discovery. Second, I will introduce machine-learning methods on confounder detection, to further analyze the case of no causation but having association. Last, I will expand causal analysis from bivariate discovery to network analysis, considering the causal relation between multiple variables. Entropy methods will be introduced to deal with the case of multiple factors and one cause, and structural equation models with nonlinear function scores will be applied to network analysis.