Statistical methods for incorporating biological knowledge into association tests of seqencing data
Abstract
Recently many rare variant analysis methods have been proposed. However, each method has its own advantages and disadvantages depending on properties of the data. Thus, there is no uniformly most powerful test for rare variant analysis. In this work I propose a statistical framework to improve the statistical power of existing rare variant analysis methods. Specically, I incorporate computational biological knowl- edge into existing rare variant analysis methods. Among the biological knowledge I use for whole exome sequence (WES) is SIFT, Polyphen2, PhyloP and GERP++. For whole genome sequencing (WGS) I use RegulomeDB that is based on the Ency- clopedia of DNA Elements (ENCODE) project. In addition, since the score system of RegulomeDB is categorized into 6 levels, I propose to transform the categories to numerical scores to use as a weight in association tests of WGS. I evaluate and com- pare the proposed methods with existing methods using extensive simulation studies as well as applications to the Genetic Analysis Workshop (GAW) 17 mini-exome se- quencing and GAW 19 WGS data. I also show how to combine multiple sources of biological knowledge and discuss how extreme scores of the transformation of cate- gories can lead to false positive discovery.
Subject Area
Biostatistics|Genetics
Recommended Citation
Kim, Taebeom, "Statistical methods for incorporating biological knowledge into association tests of seqencing data" (2014). Texas Medical Center Dissertations (via ProQuest). AAI3689775.
https://digitalcommons.library.tmc.edu/dissertations/AAI3689775