Dissertations & Theses (Open Access)

Date of Award

Spring 5-2020

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Wenyaw Chan, Phd

Second Advisor

Peng Wei Phd

Third Advisor

Bing Yu Phd

Abstract

Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the TOPMed program, WGS-based association analysis of complex traits remains a tremendous challenge. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited and incorporating non-informative annotations will introduce noise and lose power. In the first part of this dissertation, we propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort of UK10K WGS data based on six functional annotations. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4 APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n=1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. Individual-level genetic data is not always accessible due to privacy concerns. Instead, summary association statistics are widely available based on large-scale meta analysis of genome-wide association studies (GWASs). We further extend adaptive tests incorporating functional annotations to summary statistics (FunSPUs) in the second part of this dissertation. We show that our test can identify more significant genes compared to the corresponding annotation-ignorant tests. Moreover, we obtained several genome-wide significant loci associated with high-density lipoprotein (HDL) levels from a smaller meta analysis of GWASs (n=94,595) which were reported by a follow-up meta-analysis with a larger sample size (n=188,577). In the third part, we propose to evaluate the performance of functional annotations by partitioning the heritability of complex traits. We focused on rare variants from WGS data. Our proposed method is phenotype-specific and no “gold standard” variants are required. We used the Atherosclerosis Risk in Communities Study (ARIC) WGS data to estimate heritability and evaluated the performance of 12 functional annotations including conservation scores and ensemble deleteriousness prediction scores.

Share

COinS