Publication Date
5-10-2024
Journal
Science Advances
DOI
10.1126/sciadv.adj1424
PMID
38718126
PMCID
PMC11078195
PubMedCentral® Posted Date
5-8-2024
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Keywords
Humans, Algorithms, Computational Biology, Databases, Genetic, Genetic Predisposition to Disease, Genome-Wide Association Study, Genomics, Neural Networks, Computer, Phenomics, Phenotype, UK Biobank, United Kingdom
Abstract
The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.
Included in
Biological Phenomena, Cell Phenomena, and Immunity Commons, Biomedical Informatics Commons, Genetics and Genomics Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons