Identification of Cancer Driver Genes Through a Gene-based Permutation Approach

Alice Blanche Sylvie Djotsa Nono, The University of Texas School of Public Health

Abstract

Background: Identifying cancer “driver” genes (CDG) is a crucial step in cancer genomic toward the advancement of precision medicine. However, driver gene discovery is a very challenging task because we are not only dealing with huge amount of data; but we are also faced with the complexity of the disease including the heterogeneity of background somatic mutation rate in each cancer patient. It is generally accepted that CDG harbor variants conferring growth advantage in the malignant cell and they are positively selected, which are critical to cancer development; whereas, non-driver genes harbor random mutations with no functional consequence on cancer. Based on this fact, function prediction based approaches for identifying CDG have been proposed to interrogate the distribution of functional predictions among mutations in cancer genomes (Djotsa Nono et al., 2016). Assuming most of the observed mutations are passenger mutations and given the quantitative predictions for the functional impact of the mutations, genes enriched of functional or deleterious mutations are more likely to be drivers. The promises of these methods have been continually refined and can therefore be applied to increase accuracy in detecting new candidate CDGs. However, current function prediction based approaches only focus on coding mutations and lack a systematic way to pick the best mutation deleteriousness prediction algorithms for usage. Results: In this study, we propose a new function prediction based approach to discover CDGs through a gene-based permutation approach. Our method not only covers both coding and non-coding regions of the genes; but it also accounts for the heterogeneous mutational context in cohort of cancer patients. The permutation model was implemented independently using seven popular deleteriousness prediction scores covering splicing regions (SPIDEX), coding regions (MetaLR, and VEST3) and pan-genome (CADD, DANN, Fathmm-MKL coding and Fathmm-MKL noncoding). We applied this new approach to somatic single nucleotide variants (SNVs) from whole-genome sequences of five different cancer types including 119 breast, 24 lung, 88 liver, 100 medulloblastoma and 101 pilocytic astrocytoma cancer patients. We also compared the seven deleteriousness prediction scores for their performance across the five cancer tissue types. Conclusion: The new function prediction based approach not only predicted known cancer genes listed in the Cancer Gene Census (COSMIC database), but also new candidate CDGs that are worth further investigation. The results showed the advantage of utilizing pan-genome deleteriousness prediction scores in function prediction based methods. The top four ranked methods across the five cancer types are Fathmm-MKL coding, CADD, VEST3 and Fathmm-MKL noncoding.

Subject Area

Genetics|Bioinformatics|Epidemiology

Recommended Citation

Nono, Alice Blanche Sylvie Djotsa, "Identification of Cancer Driver Genes Through a Gene-based Permutation Approach" (2018). Texas Medical Center Dissertations (via ProQuest). AAI10930909.
https://digitalcommons.library.tmc.edu/dissertations/AAI10930909

Share

COinS