Causal Inference in Time-Course and Heterogeneous Data

Nan Lin, The University of Texas School of Public Health

Abstract

With the development of modern science and sensing technology, we are in an era of data explosion. Various types of data have been used for diagnosing the disease or understanding the disease mechanism. However, the current state of the art data analysis frameworks suffer from the following problems. First, the traditional statistical analysis can only identify association from the data. But the biological system always functions in a systematic or causal way. In most research or real-world data analysis, the scientists or researchers intend to use association to infer causation. The fundamental problem for this type of statistical inference is that the causation has the ability to infer association, but the reverse cannot be guaranteed. Second, the separate analysis of each type of data neglect the correlation between different types of data. As a result, the conclusions based on a collection of the data analysis from various types of data might be biased and inconclusive. As a result, it is a need to design an integrated data analysis framework to draw causal inference conclusion based on multiple types of data. In this dissertation, we proposed to use the structural equation model as the major vehicle for causal integrated data analysis. The current causal inference framework based on structural equation models cannot deal with the following two types of data effectively: 1. the time-course data such as the time-course gene expression data or longitudinal biomedical imaging data. 2. the heterogeneous data such as the family-based study or cell type specific gene expression data. As a result, we propose our novel causal inference framework which can handle large scale Bayesian network in the three specific aims. In Aim 1, we developed a sparse dynamic Bayesian network model coupled with integer programming for causal inference in time course data. In Aim 2, we developed a sparse mixed-effects structural equation models as a general framework to unify causal inference for heterogeneous data. In Aim 3, we applied the proposed models with other mathematical tools to the real data sets to understand the disease mechanism and progression along the time and at the cellular level.

Subject Area

Biostatistics|Genetics|Bioinformatics

Recommended Citation

Lin, Nan, "Causal Inference in Time-Course and Heterogeneous Data" (2018). Texas Medical Center Dissertations (via ProQuest). AAI10790001.
https://digitalcommons.library.tmc.edu/dissertations/AAI10790001

Share

COinS