Faculty, Staff and Student Publications

A Novel Statistical Method for Decontaminating T-Cell Receptor Sequencing Data

Ruoxing Li
Mehmet Altan
Alexandre Reuben
Ruitao Lin
John V Heymach
Hai Tran
Runzhe Chen
Latasha Little
Shawna Hubert
Jianjun Zhang
Ziyi Li

Publication Date

7-20-2023

Journal

Briefings in Bioinformatics

DOI

10.1093/bib/bbad230

PMID

37337757

PMCID

PMC10359082

PubMedCentral® Posted Date

June 2023

PubMedCentral® Full Text Version

Post-print

Abstract

The T-cell receptor (TCR) repertoire is highly diverse among the population and plays an essential role in initiating multiple immune processes. TCR sequencing (TCR-seq) has been developed to profile the T cell repertoire. Similar to other high-throughput experiments, contamination can happen during several steps of TCR-seq, including sample collection, preparation and sequencing. Such contamination creates artifacts in the data, leading to inaccurate or even biased results. Most existing methods assume 'clean' TCR-seq data as the starting point with no ability to handle data contamination. Here, we develop a novel statistical model to systematically detect and remove contamination in TCR-seq data. We summarize the observed contamination into two sources, pairwise and cross-cohort. For both sources, we provide visualizations and summary statistics to help users assess the severity of the contamination. Incorporating prior information from 14 existing TCR-seq datasets with minimum contamination, we develop a straightforward Bayesian model to statistically identify contaminated samples. We further provide strategies for removing the impacted sequences to allow for downstream analysis, thus avoiding any need to repeat experiments. Our proposed model shows robustness in contamination detection compared with a few off-the-shelf detection methods in simulation studies. We illustrate the use of our proposed method on two TCR-seq datasets generated locally.

Keywords

Humans, Bayes Theorem, Receptors, Antigen, T-Cell, T-Lymphocytes, Models, Statistical, High-Throughput Nucleotide Sequencing

Published Open-Access

yes

Recommended Citation

Li, Ruoxing; Altan, Mehmet; Reuben, Alexandre; et al., "A Novel Statistical Method for Decontaminating T-Cell Receptor Sequencing Data" (2023). Faculty, Staff and Student Publications. 1929.
https://digitalcommons.library.tmc.edu/uthgsbs_docs/1929

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Medical Sciences Commons, Oncology Commons

COinS

Faculty, Staff and Student Publications

A Novel Statistical Method for Decontaminating T-Cell Receptor Sequencing Data

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

A Novel Statistical Method for Decontaminating T-Cell Receptor Sequencing Data

Authors

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library