Statistical analysis of TCGA whole transcriptome sequencing data (RNA-Seq)

Wenfang Li, The University of Texas School of Public Health


Background: Brain cancer is a group of clinically, histopathologically and molecularly heterogeneous diseases, with different outcomes and responses to treatment. Since the tumor cells lack necessary receptors, common treatments such as hormone therapy and drugs are ineffective, which makes it more aggressive and difficult to treat. Fortunately, cumulative evidence indicates that aberrant long noncoding RNAs (lncRNAs) play crucial roles in cancer initiation, development and metastasis, and thus can be used for diagnosis of cancer, prediction of response to therapeutic treatment and prognosis of outcome. Methods: Our study tested functional long non-coding RNA classification in brain tumors and established a genome-wide classification of lncRNA in a large cohort of a very well-annotated brain tumor dataset from the Cancer Genome Atlas (TCGA) project. About 301 cancer cases and normal samples of brain tumors were available for analysis. In addition, subtype analyses were conducted to identify subgroups of low-grade gliomas (LGG) and glioblastoma multiforme (GBM) with unsupervised consensus clustering analysis to decrease the variation/heterogeneity and increase the accuracy of the cancer diagnosis and drug efficiency. The lists of differentially expressed lncRNA for each subtype were generated by comparing one subtype against other subtypes. Results and conclusions: Using stringent criteria, a series of significant unique overexpressed lncRNAs and mRNAs was identified, which could potentially be used as biomarkers. Comparing the lncRNA and mRNA classifications, lncRNA shows more tissue- specific features than mRNA. Relatively smaller groups of unique RNAs were identified by lncRNA classifications compared to mRNA, suggesting use of lncRNA rather than mRNA for classification may result in greater accuracy and specificity. This classification identified four molecular subclasses of brain malignancies that have clinical relevance, four subclasses for low-grade gliomas (LGG) and four subclasses for glioblastoma multiforme (GBM), respectively, through lncRNA-based clustering. It might provide efficient classification tools for establishment of clinical prognoses and selection of gene therapy targets in human brain cancers. Keywords: glioma, lncRNA, mRNA, LGG, GBM, molecular subtype.

Subject Area


Recommended Citation

Li, Wenfang, "Statistical analysis of TCGA whole transcriptome sequencing data (RNA-Seq)" (2015). Texas Medical Center Dissertations (via ProQuest). AAI1602764.