Publication Date
3-30-2023
Journal
Cell
DOI
10.1016/j.cell.2023.02.018
PMID
37001506
PMCID
PMC10074325
PubMedCentral® Posted Date
3-30-2024
PubMedCentral® Full Text Version
Author MSS
Published Open-Access
yes
Keywords
Epigenome, Quantitative Trait Loci, Genome-Wide Association Study, Genomics, Phenotype, Polymorphism, Single Nucleotide
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Graphical Abstract
Included in
Biological Phenomena, Cell Phenomena, and Immunity Commons, Biomedical Informatics Commons, Genetics and Genomics Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons
Comments
Associated Data