Faculty, Staff and Student Publications

Language

English

Publication Date

1-3-2026

Journal

Bioinformatics

DOI

10.1093/bioinformatics/btaf655

PMID

41652996

PMCID

PMC12881829

PubMedCentral® Posted Date

12-5-2023

PubMedCentral® Full Text Version

Post-print

Abstract

Motivation: Single-cell RNA sequencing (scRNA-Seq) technology enables detailed exploration of gene expression at the individual cell level, crucial for annotating cell types and understanding cellular diversity. Traditional methods for cell type annotation often rely on marker genes and manual labeling, posing challenges due to low data quality and incomplete reference datasets.

Results: We developed CeLLTra, a novel contrastive learning framework that leverages a Transformer-based model integrating biological pathway information to group genes into super tokens, effectively capturing comprehensive gene expression from scRNA-Seq data. By combining this pathway-informed Transformer with a pretrained domain-specific language model, CeLLTra accurately aligns cell-type annotations with gene expression profiles. Evaluations on a large-scale human scRNA-Seq dataset showed that CeLLTra significantly outperformed state-of-the-art methods in supervised and zero-shot cell-type prediction. Additionally, CeLLTra generalized well to external datasets, improving clustering performance and enabling better characterization of cancerous cell states in tumor-infiltrating myeloid cells from non-small cell lung cancer patients.

Availability and implementation: CeLLTra is freely available on GitHub (https://github.com/WJZheng-group/CeLLTra) and Zenodo (https://doi.org/10.5281/zenodo.17666735). The datasets underlying this article are the following: GSE201333 and GSE127465. All these datasets are publicly available and can be freely accessed on the Gene Expression Omnibus repository.

Keywords

Humans, Single-Cell Analysis, Gene Expression Profiling, Software, Sequence Analysis, RNA, Computational Biology, Transcriptome, Algorithms, Cell type annotation, scRNA-Seq, Artificial Intelligence, Pathway informed transformer, Deep learning

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.