Publication Date
2-26-2024
Journal
Cell Reports Methods
DOI
10.1016/j.crmeth.2024.100707
PMID
38325383
PMCID
PMC10921021
PubMedCentral® Posted Date
2-6-2024
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Keywords
Humans, Polyadenylation, RNA-Seq, RNA, Deep Learning, Sequence Analysis, RNA, Algorithms, alternative polyadenylation (APA), post-transcriptional regulation, deep learning, large language model (LLM), bioinformatics, computational biology, gene regulation
Abstract
Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.
Included in
Diseases Commons, Medical Genetics Commons, Neurology Commons, Neurosciences Commons, Pediatrics Commons