Faculty, Staff and Student Publications

Publication Date

2-10-2025

Journal

Cancer Cell

Abstract

Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.

Keywords

Humans, Neoplasms, Machine Learning, Genomics, Biomarkers, Tumor, Databases, Genetic, Prognosis

DOI

10.1016/j.ccell.2024.12.002

PMID

39753139

PMCID

PMC11949768

PubMedCentral® Posted Date

3-28-2025

PubMedCentral® Full Text Version

Author MSS

nihms-2046254-f0001.jpg (388 kB)
Graphical Abstract

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.