Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Human and Molecular Genetics

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Paul Scheet

Committee Member

Swathi Arur

Committee Member

Jeffrey Morris

Committee Member

Ralf Krahe

Committee Member

Nicholas Navin


Somatic copy-number (CN) gains and losses and copy-neutral loss of heterozygosity (CNLOH) frequently occur in tumors and play a major role in the progression of disease by altering gene dosage and unmasking deleterious recessive variants. Characterizing these mutations in an individual tumor sample is therefore critical for research on the relationship of specific mutations to disease outcome and for clinical decision-making based on mutations with known impact. A pervasive hindrance to sensitive detection of these mutations is genetic heterogeneity and high levels of contaminating normal cells in tumor samples, which limit the fraction of cells carrying informative mutations. The method presented here is the first method to utilize population-based haplotype estimates to discover low-frequency somatic kilobase- to megabase-size CN alterations and CNLOH mutations using DNA microarrays. The major innovation of the method is the use of phase concordance as a robust metric to measure evidence of allelic imbalance in the face of sporadic phasing errors in the statistical haplotype estimates and stochastic variation in the microarray data. In addition to presenting a hidden Markov model that uses the phase concordance data to perform agnostic whole-genome discovery of imbalanced regions, we also describe how to test candidate regions, and to infer the haplotype of the major chromosome. We demonstrate through controlled experiments using lab-created tumor-normal mixture samples and in silico simulated data that the sensitivity is higher than that of existing methods, detecting specific imbalance events in samples with 7% tumor or less, while maintaining specificity. We also demonstrate the potential of the method via a real-data analysis of genomic mosaicism in the general population using over 30,000 samples that were previously analyzed using another method. We made nearly three times as many calls in these samples as the previous analysis (1,119 vs. 379), most of which appear to exist at low frequencies. These findings validate recent hypotheses that somatic variation in healthy tissues is more prevalent than had previously been reported, and provides valuable observations of in vivo mutations that can be studied to make inference on genetic robustness and how these mutations impact cell fitness.


haplotypes, mosaicism, tumor heterogeneity, somatic mutation, copy-number variation, copy-neutral loss of heterozygosity, allelic imbalance, microarray



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.