Author ORCID Identifier


Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Paul Scheet, Ph.D.

Committee Member

Humam Kadara, Ph.D.

Committee Member

Eduardo Vilar Sanchez, M.D. Ph.D.

Committee Member

Sadhan Majumder, Ph.D.

Committee Member

Yin Liu, Ph.D.


Lung cancer, of which non-small cell lung cancer (NSCLC) is the most common form, is the second most prevalent cancer and the leading cause of cancer-related deaths. NSCLCs primarily comprise adenocarcinomas (LUAD) and squamous cell carcinomas (LUSC). Advances in early detection and prevention have been limited by the lack of early-stage biomarkers and targets. A comprehensive molecular characterization of premalignant lesions and tumor-adjacent normal tissue can aid in better understanding NSCLC pathogenesis. However, these investigations are further challenged by limited tissue availability and low cellular fractions of detectable somatic mutations.

Therefore, there is a dearth of knowledge about the pathogenesis of premalignant lung lesions, especially for atypical adenomatous hyperplasia (AAH), the only known precursor to LUADs. We performed a cross-platform integrative analysis comprising targeted DNA sequencing, genotype array profiling and transcriptome sequencing of matched AAHs, LUADs and normal tissues from 23 early-stage patients. The study revealed potentially divergent pathways based on the mutation status of AAH (BRAF vs KRAS), recurrent chromosomal aberrations (17p loss) and the presence of immune deregulation early in the pathogenesis of AAHs.

Molecular changes, characteristic of NSCLCs, might also occur in normal tissues, preceding identifiable premalignancy-associated morphological changes. We sought to comprehensively survey the somatic mutational architecture of the normal airway in early-stage NSCLCs. Targeted DNA sequencing allowed us to capture driver mutations at low cellular fractions, typical of these non-malignant tissues. Additionally, genotype array profiling helped characterize subtle chromosomal aberrations in these tissues. This multi-region study included tumor-adjacent and -distant airways, nasal epithelia and uninvolved normal lung (collectively cancerized field) along with matched multi-region NSCLCs and blood cells from 48 patients. Integrative computational analysis revealed genomic airway field carcinogenesis in 52% of cases. The airway field exhibited mutations in known drivers, that were present at lower frequencies compared to NSCLCs, suggestive of selection-driven clonal expansion. These driver events also comprised somatic “two-hit” alterations in matched airway field and NSCLCs.

Our study design offers spatiotemporal insights into NSCLC development and suggests potential targets for early detection and treatment, in possibly less hostile environments of premalignancy. To validate and enhance the utility of the bioinformatic techniques devised and implemented for these investigations, I also provide methods to expand such analyses across multiple tumor sites.


Premalignant, Lung cancer, Field cancerization, Bioinformatics, Cancer genomics, TCGA, Genomic instability, Allelic imbalance



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.