Author ORCID Identifier

Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Chad Huff

Committee Member

Paul Scheet

Committee Member

Xuelin Huang

Committee Member

Carrie Daniel-MacDougall

Committee Member

Bo Peng


Genetic factors account for a substantial portion of Crohn’s disease and colorectal cancer (CRC) risk. Patients with Crohn’s disease, a condition that causes chronic inflammation of the gastrointestinal tract, are at increased risk of colorectal cancer morbidity and mortality. Genome-wide association studies using single marker approaches have identified loci responsible for these diseases, but disease susceptibility from rare variants is incompletely understood. This dissertation includes three chapters, two association studies for Crohn’s disease and CRC, and a statistical method to improve the power of statistical tests.

For Crohn’s disease, we performed targeted sequencing of 101 genes in 205 children with Crohn’s disease, including 179 parent-child trios and 200 controls. We identified eight novel rare variants in NOD2 that are likely disease-associated. Incorporation of rare variation and compound heterozygosity nominally increased the proportion of variance explained from 0.074 to 0.089.

For CRC, we conducted a rare-variant association study. We characterized the known CRC susceptibility genes using the whole-exome sequencing data of 2,161 CRC patients and 3,216 age-matched cancer-free controls. We further estimated effect sizes to CRC risk and genetic variance explained by a gene. We identified a list of potential novel CRC susceptibility genes and characterized the known CRC genes. The established associations with CRC were mainly from rare pathogenic variants of MLH1 (OR: 11.23) and MSH2 (OR: 9.72), and a proportion of association from the variant of unknown significance in the known CRC genes.

Chapter 4 described a new method to Leverage External Allele Frequencies (LEAF) in case-control studies to improve statistical power. LEAF detects and removes variants that are unlikely to substantially increase disease risk by comparing allele frequencies in the combined internal case-control dataset to the external controls without considering case-control status. Our results show that LEAF can improve statistical power in single-marker and gene-based RVAS in simulated and real datasets.

In conclusion, we identified several potential novel disease susceptibility genes and estimated the effect sizes of rare variants of established disease risk genes in Crohn’s disease and CRC. However, many rare variants of known CRC risk genes remain poorly characterized and require further investigation. LEAF provides a potential solution for rare variant association studies.


rare variant association study, Crohn’s disease, colorectal cancer, case-control study, statistical power, whole-exome sequencing



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.