Detect genome-wide polymorphism patterns with NGS data

Jin Yu, The University of Texas School of Public Health


With next generation sequencing technologies (NGS), almost a complete set of genetics variants can be quickly sequenced in a large number of genomes. The availability of enormous sequencing data provides unpreceded opportunities to understand the nature of the genomes and its association with human diseases and biology functions. But the genome wide association studies (GWAS) are still extremely challenging, and in most cases, only a very small proportion of heritability could be explained in the candidate disease causative genes. A fundamental cause of the challenges is the large number and complex patterns of the genetic variants. Although local polymorphism patterns have been well studied on common genetics variants, and being routinely used in GWAS. The genome-wide polymorphism patterns on the full allele frequency spectrum are still largely unknown. In this study, we invented two methods: 1. Genome-wide genotype profile scan and 2. Functional canonical correlation scan, to detect genome-wide polymorphism patterns. And we applied these methods on 77 million SNP sites sequenced in 2504 individuals of 26 populations in 1000 genome Phase3 data set. A lot of interesting discoveries have been made in this study. For example, we found at least a half of SNP are completely linked with other SNPs and the average number of SNPs linked together is about 4 in European populations. Most interestingly, the range of these linked sites can be as far as 1Mbp for the rare variants even for the SNPs on protein coding regions. We found small regions harboring dozens of SNPs privately shared by a small subset of individuals. We also construct a correlation map for all gene pairs on chromosome 20 and found interesting segments exhibiting strong population differentiation etc.

Subject Area


Recommended Citation

Yu, Jin, "Detect genome-wide polymorphism patterns with NGS data" (2015). Texas Medical Center Dissertations (via ProQuest). AAI1597646.