Author ORCID Identifier


Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biomedical Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Paul Scheet

Committee Member

Yasminka A. Jakubek-Swartzlander

Committee Member

Eduardo Vilar-Sanchez

Committee Member

Swathi Arur

Committee Member

Chad Huff

Committee Member

Yin Liu


Comprehensive genomic and transcriptomic characterization of tumors has uncovered enrichment for distinct aneuploidy and expression patterns, demonstrating the utility of molecular based classification of cancers and their subtypes. Existing cohorts with transcriptomic profiling from next-generation sequencing contain an untapped potential to also relate genomics with rich clinical phenotypes. Yet, derivation of somatic copy number and expression profiles from analyses of RNA has remained elusive. Further, DNA analysis in these cohorts is not always feasible due to limited tissue availability or financial constraints. Here, we present a statistical approach that overcomes these challenges using haplotype information to aid detection of somatic chromosomal copy number alterations (SCNAs), which result in allelic imbalance, i.e., deviations from the expected 1-to-1 allelic ratios at heterozygous loci. We initially applied a native version of our method to 1,970 tumor samples from 7 sites in The Cancer Genome Atlas (TCGA), inferring genotypes directly from RNA-sequencing (RNA-seq). This resulted in an SCNA detection rate of 68%. Encouraged by this, we next leveraged large public genetic reference data and array derived germline genotypes, from matched blood samples, to impute millions of germline variants for 4,942 patients across 28 TCGA cancer sites, resulting in improved genotype calling and haplotype inference. This latter approach increased our power for tumor SCNA detection from RNA-seq to 85%, while maintaining a false positive rate of ~5%. SCNA burden inferred from RNA-seq was highly correlated (R = 0.92) with “gold standard” DNA derived estimates. To demonstrate the approach’s potential clinical utility, we replicated SCNA features associated with clinical subtypes of breast cancer from RNA-seq successfully. Following this work, we investigated the role of the phenomenon of X-inactivation in female carcinogenesis through a comprehensive profiling of allelic imbalance observed in the X chromosome using tumor samples from the females in the TCGA breast cancer cohort. We observed higher rates of chromosome-level allelic imbalance for the X chromosome, both in comparison to the autosomes (derived from RNA) and the X chromosome derived from DNA, suggesting these are epigenetically driven by X-inactivation. Additionally, our results are in line with the findings from the literature that indicate loss of X chromosome inactivation’s role in female carcinogenesis through association with more aggressive and basal-like subtype of breast cancer. Taken together, our results suggest a substantial improvement over existing methods, allowing for comprehensive studies of SCNA from RNA-seq and opening avenues for cost effective large-scale studies of tumors, as well as elucidating epigenetically driven mechanisms’ contribution to carcinogenesis.


Aneuploidy, genomic instability, SCNAs, allelic imbalance, cancer biology



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.