Author ORCID Identifier


Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Paul Scheet, Ph.D.

Committee Member

Swathi Arur, Ph.D.

Committee Member

Veerabhadran Baladandayuthapani, Ph.D.

Committee Member

Chad Huff, Ph.D.

Committee Member

Humam Kadara, Ph.D.


Deviations from a diploid configuration of the human genome, spanning single genes or entire chromosomes, can have wide-ranging impacts on the variation of human phenotypes, including Mendelian and complex forms of diseases. These chromosomal alterations — such as duplications, deletions or copy-neutral loss-of-heterozygosity — are thus important forms of genetic variation for phenotyping populations of individuals as well as populations of cells. Indeed, copy number variants (CNVs) serve as hallmarks of critical changes in the development of particular diseases such as cancer and thus may be used as biomarkers. These CNVs may be either inherited (transmitted by germ cells, originating in meiosis; “germline”) or acquired (originating in mitosis; “somatic mosaicism”). The complex structure and the diverse mechanisms generating CNVs have been studied molecularly, but this has generally not been attempted using population data. This dissertation seeks to provide insights into CNV diversity in two complementary settings: 1) the genesis of germline copy number duplications, and 2) the diversity of acquired CNVs within distinct tumor tissues. First, we develop a novel method to disentangle the haplotype (the specific alleles on an inherited chromosome) composition of de novo germline duplications to characterize the “grandparental origin” of the extra piece of a chromosome. Using large family-based genome-wide association study data, we report the ratio of “bi-allelic” duplications, from inter-chromatid non-allelic homologous recombination (NAHR), to “tri-allelic” duplications, from inter-chromosomal NAHR, as 1.07:1. In addition, our method reveals a third configuration, consisting of both tri-allelic and bi-allelic duplications, which we hypothesize arose from spontaneous inter-chromosomal and inter-chromatid NAHR. The rate of these complex duplications among all the de novo duplications is 6%. Second, we assess tumor heterogeneity of biphasic uterine carcinosarcoma (UCS) from 10 patients by analyzing the data of component-specific tumor samples (carcinomatous, sarcomatous, and normal uterine tissues), generated from multiple platforms (SNP array, DNA target sequencing, and whole transcriptome sequencing). We augment the quantification of tumor heterogeneity by considering the haplotype information within the somatic copy number alterations for each sample to more precisely annotate recurrent copy number changes. Our results imply that the carcinomatous and the sarcomatous components in UCS originate from the same clone and the heterogeneity reflects relatively advanced stages. Our work confirms that profiling of carcinomas and sarcomas separately may offer clinical utility. Overall, this dissertation shows the potential utility of incorporating haplotype information in particular settings in population science and cancer biology.


copy number variation, haplotype, cancer, non-allelic homologous recombination, meiosis, germline, uterine carcinosarcoma



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.