Publication Date
7-1-2020
Journal
RNA
DOI
10.1261/rna.074161.119
PMID
32220894
PMCID
PMC7297119
PubMedCentral® Posted Date
7-26-2020
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Keywords
3' Untranslated Regions, Alternative Splicing, Cell Line, Tumor, Cell Nucleus, Computational Biology, Cytoplasm, HeLa Cells, Humans, K562 Cells, RNA, Sequence Analysis, RNA, Transcriptome, localization mechanism, machine learning model, RNA localization, splicing in localization
Abstract
Subcellular localization is essential to RNA biogenesis, processing, and function across the gene expression life cycle. However, the specific nucleotide sequence motifs that direct RNA localization are incompletely understood. Fortunately, new sequencing technologies have provided transcriptome-wide atlases of RNA localization, creating an opportunity to leverage computational modeling. Here we present RNA-GPS, a new machine learning model that uses nucleotide-level features to predict RNA localization across eight different subcellular locations-the first to provide such a wide range of predictions. RNA-GPS's design enables high-throughput sequence ablation and feature importance analyses to probe the sequence motifs that drive localization prediction. We find localization informative motifs to be concentrated on 3'-UTRs and scattered along the coding sequence, and motifs related to splicing to be important drivers of predicted localization, even for cytotopic distinctions for membraneless bodies within the nucleus or for organelles within the cytoplasm. Overall, our results suggest transcript splicing is one of many elements influencing RNA subcellular localization.
Included in
Biomedical Informatics Commons, Genetic Phenomena Commons, Medical Genetics Commons, Medical Specialties Commons
Comments
Associated Data