Faculty, Staff and Student Publications
Publication Date
3-14-2024
Journal
Communications Biology
DOI
10.1038/s42003-024-05988-y
PMID
38486077
PMCID
PMC10940680
PubMedCentral® Posted Date
3-14-2024
PubMedCentral® Full Text Version
Post-print
Abstract
Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (LAD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate LAD on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). LAD provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.
Keywords
Animals, Mice, Cluster Analysis, Gene Expression, Data integration, Computational models
Published Open-Access
yes
Recommended Citation
Liang, Shaoheng; Dou, Jinzhuang; Iqbal, Ramiz; et al., "Label-Aware Distance Mitigates Temporal and Spatial Variability for Clustering and Visualization of Single-Cell Gene Expression Data" (2024). Faculty, Staff and Student Publications. 2801.
https://digitalcommons.library.tmc.edu/uthgsbs_docs/2801
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Data Science Commons, Genetic Phenomena Commons, Medical Genetics Commons, Oncology Commons