Faculty, Staff and Student Publications
Language
English
Publication Date
1-3-2026
Journal
Bioinformatics
DOI
10.1093/bioinformatics/btaf655
PMID
41652996
PMCID
PMC12881829
PubMedCentral® Posted Date
12-5-2023
PubMedCentral® Full Text Version
Post-print
Abstract
Motivation: Single-cell RNA sequencing (scRNA-Seq) technology enables detailed exploration of gene expression at the individual cell level, crucial for annotating cell types and understanding cellular diversity. Traditional methods for cell type annotation often rely on marker genes and manual labeling, posing challenges due to low data quality and incomplete reference datasets.
Results: We developed CeLLTra, a novel contrastive learning framework that leverages a Transformer-based model integrating biological pathway information to group genes into super tokens, effectively capturing comprehensive gene expression from scRNA-Seq data. By combining this pathway-informed Transformer with a pretrained domain-specific language model, CeLLTra accurately aligns cell-type annotations with gene expression profiles. Evaluations on a large-scale human scRNA-Seq dataset showed that CeLLTra significantly outperformed state-of-the-art methods in supervised and zero-shot cell-type prediction. Additionally, CeLLTra generalized well to external datasets, improving clustering performance and enabling better characterization of cancerous cell states in tumor-infiltrating myeloid cells from non-small cell lung cancer patients.
Availability and implementation: CeLLTra is freely available on GitHub (https://github.com/WJZheng-group/CeLLTra) and Zenodo (https://doi.org/10.5281/zenodo.17666735). The datasets underlying this article are the following: GSE201333 and GSE127465. All these datasets are publicly available and can be freely accessed on the Gene Expression Omnibus repository.
Keywords
Humans, Single-Cell Analysis, Gene Expression Profiling, Software, Sequence Analysis, RNA, Computational Biology, Transcriptome, Algorithms, Cell type annotation, scRNA-Seq, Artificial Intelligence, Pathway informed transformer, Deep learning
Published Open-Access
yes
Recommended Citation
Li, Zhao; Zheng, Zaiyi; Li, Rongbin; et al., "CeLLTra: Aligning Cell Names With Gene Expression via a Pathway-Informed Transformer" (2026). Faculty, Staff and Student Publications. 814.
https://digitalcommons.library.tmc.edu/uthshis_docs/814