The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access)
A tail-based test for differential expression analysis and pathway analysis in RNA-sequencing data
Author ORCID Identifier
Date of Graduation
Biostatistics, Bioinformatics and Systems Biology
Doctor of Philosophy (PhD)
Jianhua Hu, Ph.D.
Kim-Anh Do, Ph.D.
Jeffrey Morris, Ph.D.
Jing Ning, Ph.D.
Yun Wu, Ph.D.
Ying Yuan, Ph.D.
RNA sequencing data have been abundantly generated in biomedical research for biomarker discovery and pathway analysis. Such data at the exon-level are usually heavily tailed and correlated. Conventional statistical tests based on the mean or median difference for differential expression likely suffer from low power when the between-group difference occurs mostly in the upper or lower tail of the distribution of gene expression. We propose a tail-based test to make comparisons between groups in terms of a specific distribution area rather than a single location. The proposed test, which is derived from quantile regression, adjusts for covariates and accounts for within-sample dependence among the exons through a specified correlation structure. Through Monte Carlo simulation studies, we show that the proposed test is generally more powerful and robust in detecting differential expression than commonly used tests based on the mean or a single quantile. An application to TCGA lung adenocarcinoma data demonstrates the promise of the proposed method in terms of biomarker discovery. We also extend the proposed test to perform pathway analysis for a set of genes within the same pathway or share similar biological function. Genes in such sets are known to be dependent of each other and our test accounts for their pairwise correlation. Through simulation comparison with commonly used pathway analysis methods, we show the proposed test yields better results. An application on non-small cell lung cancer pathways from KEGG pathway Database also demonstrates the proposed test is a powerful method in detecting differentially expressed pathways.
Differential expressions analysis, Correlated data, RNA sequencing, Quantile regression, Robust tail based test, pathway and gene set analysis