Publication Date
12-22-2025
Journal
Nature Communications
DOI
10.1038/s41467-025-67237-y
PMID
41430046
PMCID
PMC12748580
PubMedCentral® Posted Date
12-22-2025
PubMedCentral® Full Text Version
Post-print
Abstract
Variant calling with long-read RNA sequencing (lrRNA-seq) helps to analyze full-length isoforms and gene expression but is complicated by high error rates, transcript diversity, RNA editing events, etc. Here, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data. Building upon Clair series' pipelines, Clair3-RNA enhances lrRNA-seq variant calling using optimized techniques, such as uneven coverage normalization, refined training data, editing site discovery, and haplotype phasing to enhance performance. Clair3-RNA supports various platforms, including PacBio, ONT complementary DNA sequencing (cDNA), and ONT direct RNA sequencing (dRNA). Clair3-RNA achieved a ~ 91% SNP F1-score on the ONT platform using the latest ONT SQK-RNA004 kit (dRNA004) and a ~ 92% SNP F1-score in PacBio Iso-Seq and MAS-Seq for variants with at least 4x coverage. With least 10x coverage and disregarding zygosity, the performance reached a ~ 95% and ~96% F1-score for ONT and PacBio, respectively. After phasing, the performance reached ~97% for ONT and ~98% for PacBio. Across GIAB samples, Clair3-RNA consistently outperformed existing callers and accurately identified RNA editing sites. Clair3-RNA is open-source at ( https://github.com/HKU-BAL/Clair3-RNA ).
Keywords
Deep Learning, Humans, Sequence Analysis, RNA, Polymorphism, Single Nucleotide, Software, RNA Editing, High-Throughput Nucleotide Sequencing
Published Open-Access
yes
Recommended Citation
Zheng, Zhenxian; Yu, Xian; Chen, Lei; et al., "Clair3-RNA: A Deep Learning-Based Small Variant Caller for Long-Read RNA Sequencing Data" (2025). Faculty, Staff and Students Publications. 6258.
https://digitalcommons.library.tmc.edu/baylor_docs/6258