Faculty, Staff and Students Publications

The Landscape of Tolerated Genetic Variation in Humans and Primates

Hong Gao
Tobias Hamp
Jeffrey Ede
Joshua G Schraiber
Jeremy McRae
Moriel Singer-Berk
Yanshen Yang
Anastasia S D Dietrich
Petko P Fiziev
Lukas F K Kuderna
Laksshman Sundaram
Yibing Wu
Aashish Adhikari
Yair Field
Chen Chen
Serafim Batzoglou
Francois Aguet
Gabrielle Lemire
Rebecca Reimers
Daniel Balick
Mareike C Janiak
Martin Kuhlwilm
Joseph D Orkin
Shivakumara Manu
Alejandro Valenzuela
Juraj Bergman
Marjolaine Rousselle
Felipe Ennes Silva
Lidia Agueda
Julie Blanc
Marta Gut
Dorien de Vries
Ian Goodhead
R Alan Harris
Muthuswamy Raveendran
Axel Jensen
Idriss S Chuma
Julie E Horvath
Christina Hvilsom
David Juan
Peter Frandsen
Fabiano R de Melo
Fabrício Bertuol
Hazel Byrne
Iracilda Sampaio
Izeni Farias
João Valsecchi do Amaral
Mariluce Messias
Maria N F da Silva
Mihir Trivedi
Rogerio Rossi
Tomas Hrbek
Nicole Andriaholinirina
Clément J Rabarivola
Alphonse Zaramody
Clifford J Jolly
Jane Phillips-Conroy
Gregory Wilkerson
Christian Abee
Joe H Simmons
Eduardo Fernandez-Duque
Sree Kanthaswamy
Fekadu Shiferaw
Dongdong Wu
Long Zhou
Yong Shao
Guojie Zhang
Julius D Keyyu
Sascha Knauf
Minh D Le
Esther Lizano
Stefan Merker
Arcadi Navarro
Thomas Bataillon
Tilo Nadler
Chiea Chuen Khor
Jessica Lee
Patrick Tan
Weng Khong Lim
Andrew C Kitchener
Dietmar Zinner
Ivo Gut
Amanda Melin
Katerina Guschanski
Mikkel Heide Schierup
Robin M D Beck
Govindhaswamy Umapathy
Christian Roos
Jean P Boubli
Monkol Lek
Shamil Sunyaev
Anne O'Donnell-Luria
Heidi L Rehm
Jinbo Xu
Jeffrey Rogers
Tomas Marques-Bonet
Kyle Kai-How Farh

Language

English

Publication Date

6-2-2023

Journal

Science

DOI

10.1126/science.abn8197

PMID

37262156

PMCID

PMC10713091

PubMedCentral® Posted Date

12-11-2023

PubMedCentral® Full Text Version

Post-print

Abstract

INTRODUCTION:

Millions of people have received genome and exome sequencing to date, a collective effort that has illuminated for the first time the vast catalog of small genetic differences that distinguish us as individuals within our species. However, the effects of most of these genetic variants remain unknown, limiting their clinical utility and actionability. New approaches that can accurately discern disease-causing from benign mutations and interpret genetic variants on a genome-wide scale would constitute a meaningful initial step towards realizing the potential of personalized genomic medicine.

RATIONALE:

As a result of the short evolutionary distance between humans and nonhuman primates, our proteins share near-perfect amino acid sequence identity. Hence, the effects of a protein-altering mutation found in one species are likely to be concordant in the other species. By systematically cataloging common variants of nonhuman primates, we aimed to annotate these variants as being unlikely to cause human disease as they are tolerated by natural selection in a closely related species. Once collected, the resulting resource may be applied to infer the effects of unobserved variants across the genome using machine learning.

RESULTS:

Following the strategy outlined above we obtained whole-genome sequencing data for 809 individuals from 233 primate species and cataloged 4.3 million common missense variants. We confirmed that human missense variants seen in at least one nonhuman primate species were annotated as benign in the ClinVar clinical variant database in 99% of cases. By contrast, common variants from mammals and vertebrates outside the primate lineage were substantially less likely to be benign in the ClinVar database (71 to 87% benign), restricting this strategy to nonhuman primates. Overall, we reclassified more than 4 million human missense variants of previously unknown consequence as likely benign, resulting in a greater than 50-fold increase in the number of annotated missense variants compared to existing clinical databases.

To infer the pathogenicity of the remaining missense variants in the human genome, we constructed PrimateAI-3D, a semisupervised 3D-convolutional neural network that operates on voxelized protein structures. We trained PrimateAI-3D to separate common primate variants from matched control variants in 3D space as a semisupervised learning task. We evaluated the trained PrimateAI-3D model alongside 15 other published machine learning methods on their ability to distinguish between benign and pathogenic variants in six different clinical benchmarks and demonstrated that PrimateAI-3D outperformed all other classifiers in each of the tasks.

CONCLUSION:

Our study addresses one of the key challenges in the variant interpretation field, namely, the lack of sufficient labeled data to effectively train large machine learning models. By generating the most comprehensive primate sequencing dataset to date and pairing this resource with a deep learning architecture that leverages 3D protein structures, we were able to achieve meaningful improvements in variant effect prediction across multiple clinical benchmarks.

Keywords

Animals, Humans, Base Sequence, Gene Frequency, Genetic Variation, Primates, Whole Genome Sequencing

Published Open-Access

yes

Recommended Citation

Gao, Hong; Hamp, Tobias; Ede, Jeffrey; et al., "The Landscape of Tolerated Genetic Variation in Humans and Primates" (2023). Faculty, Staff and Students Publications. 2271.
https://digitalcommons.library.tmc.edu/baylor_docs/2271

Download

Included in

Biological Phenomena, Cell Phenomena, and Immunity Commons, Biomedical Informatics Commons, Genetics and Genomics Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons

COinS

Faculty, Staff and Students Publications

The Landscape of Tolerated Genetic Variation in Humans and Primates

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

INTRODUCTION:

RATIONALE:

RESULTS:

CONCLUSION:

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Students Publications

The Landscape of Tolerated Genetic Variation in Humans and Primates

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

INTRODUCTION:

RATIONALE:

RESULTS:

CONCLUSION:

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library