
Faculty, Staff and Student Publications
Publication Date
1-1-2025
Journal
Computational and Structural Biotechnology Journal
Abstract
DNA methylations, such as 5-methylcytosine (5mC), are crucial in biological processes, and aberrant methylations are strongly linked to various human diseases. Genomic 5mC is not randomly distributed but exhibits a strong association with genomic sequences. Thus, various computational methods were developed to predict 5mC status based on DNA sequences. These methods generated promising achievements and overcome the limitations of experimental approaches. However, few studies have comprehensively investigated the dependency of 5mC on genomic sequences, and most existing methods focus on specific genomic regions. In this work, we introduce Deep5mC, a deep learning transformer-based method designed to predict 5mC methylations. Deep5mC leverages long-range dependencies within genomic sequences to estimate the probability of cytosine methylations. Through cross-chromosome evaluation, Deep5mC achieves Matthew's correlation coefficient over 0.86 and F1-score over 0.93, substantially outperforming state-of-the-art methods. Deep5mC not only confirms the influence of long-range sequence context on 5mC prediction but also paves the way for further studying 5mC-sequence dependency across species and in human diseases.
Keywords
DNA methylation prediction, Deep learning, Association of 5mC and genomic sequences
DOI
10.1016/j.csbj.2025.02.007
PMID
40041569
PMCID
PMC11879672
PubMedCentral® Posted Date
2-14-2025
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Graphical Abstract