Author ORCID Identifier

0000-0001-5971-1681

Date of Graduation

8-2017

Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Jianhua Hu, Ph.D.

Committee Member

Kim-Anh Do, Ph.D.

Committee Member

Jeffrey Morris, Ph.D.

Committee Member

Jing Ning, Ph.D.

Committee Member

Yun Wu, Ph.D.

Committee Member

Ying Yuan, Ph.D.

Abstract

RNA sequencing data have been abundantly generated in biomedical research for biomarker discovery and pathway analysis. Such data at the exon-level are usually heavily tailed and correlated. Conventional statistical tests based on the mean or median difference for differential expression likely suffer from low power when the between-group difference occurs mostly in the upper or lower tail of the distribution of gene expression. We propose a tail-based test to make comparisons between groups in terms of a specific distribution area rather than a single location. The proposed test, which is derived from quantile regression, adjusts for covariates and accounts for within-sample dependence among the exons through a specified correlation structure. Through Monte Carlo simulation studies, we show that the proposed test is generally more powerful and robust in detecting differential expression than commonly used tests based on the mean or a single quantile. An application to TCGA lung adenocarcinoma data demonstrates the promise of the proposed method in terms of biomarker discovery. We also extend the proposed test to perform pathway analysis for a set of genes within the same pathway or share similar biological function. Genes in such sets are known to be dependent of each other and our test accounts for their pairwise correlation. Through simulation comparison with commonly used pathway analysis methods, we show the proposed test yields better results. An application on non-small cell lung cancer pathways from KEGG pathway Database also demonstrates the proposed test is a powerful method in detecting differentially expressed pathways.

Keywords

Differential expressions analysis, Correlated data, RNA sequencing, Quantile regression, Robust tail based test, pathway and gene set analysis

Share

COinS