Publication Date
2-21-2023
Journal
Genome Biology
DOI
10.1186/s13059-023-02863-7
PMID
36810122
PMCID
PMC9942314
PubMedCentral® Posted Date
2-21-2023
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Keywords
Humans, Genomics, Genome, Human, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Reference, GRCh38, T2T-CHM13, Variant, SNV, INDEL, Medically relevant genes, Remapping, GIAB, eQTL
Abstract
The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.
Included in
Biological Phenomena, Cell Phenomena, and Immunity Commons, Biomedical Informatics Commons, Genetics and Genomics Commons, Medical Genetics Commons, Medical Molecular Biology Commons, Medical Specialties Commons
Comments
Associated Data