In silico prediction of splice-altering single nucleotide variants in the human genome
Abstract
In silico tools have been developed to predict mutations that may have an impact on pre-mRNA splicing. The major problem that prohibits the use of these tools is the difficulty in interpreting the output. One reason is that most tools only output prediction scores for potential splice sites given a DNA sequence but do not directly tell if splicing signals change when one allele is substituted by another; another reason is the lack of large-scale evaluation studies of these tools. To assist completion of this interpretation gap, three aims were proposed and achieved. Specifically, (1) I compared eight in silico tools on 2,959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis with ten-fold cross-validation. The Position Weight Matrix model and MaxEntScan outperformed other methods. (2) Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods and calculate directly interpretable prediction scores. Both ensemble methods significantly improved prediction and were validated on an additional test set. (3) Using these two ensemble methods, I pre-computed prediction scores for all potential scSNVs across the human genome and applied them to the scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and in known cancer genes. These results demonstrated that some in silico methods are powerful tools in predicting splice-altering variants, and ensemble methods can further improve prediction. The pre-computed prediction scores for all potential scSNVs across the human genome provide a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies, which shall significantly facilitate splicing defect prediction and detection, in both basic research and clinical areas, and thus contribute to providing new targets for gene therapy and newborn screening.
Subject Area
Genetics|Bioinformatics|Epidemiology
Recommended Citation
Jian, Xueqiu, "In silico prediction of splice-altering single nucleotide variants in the human genome" (2014). Texas Medical Center Dissertations (via ProQuest). AAI3665015.
https://digitalcommons.library.tmc.edu/dissertations/AAI3665015