Faculty, Staff and Student Publications

Language

English

Publication Date

12-1-2025

Journal

Cancer Epidemiology, Biomarkers & Prevention

DOI

10.1158/1055-9965.EPI-25-1032

PMID

40996315

PMCID

PMC12570486

PubMedCentral® Posted Date

10-30-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Background: Sensitivity and specificity are foundational metrics for cancer detection tools. However, most machine learning algorithms prioritize overall accuracy during optimization, which fails to align with clinical priorities of early detection. We aim to develop a feature selection machine learning algorithm while maximizing sensitivity at a given specificity.

Methods: We developed SMAGS-LASSO, a machine learning algorithm that combines our developed Sensitivity Maximization at a Given Specificity (SMAGS) framework with L1 regularization for feature selection. This approach simultaneously optimizes sensitivity at user-defined specificity thresholds while performing feature selection. SMAGS-LASSO utilizes a custom loss function with L1 regularization and multiple parallel optimization techniques. We used train-test splits and cross-validation, comparing against LASSO and random forest using sensitivity and AUC metrics. We evaluated our method on synthetic datasets and real-world protein colorectal cancer biomarker data.

Results: In synthetic datasets designed to contain strong signals for both sensitivity and specificity, SMAGS-LASSO significantly outperformed standard LASSO, achieving sensitivity of 1.00 (95% confidence interval, 0.98-1.00) compared with 0.19 (95% confidence interval, 0.13-0.23) for LASSO at 99.9% specificity. In colorectal cancer data, SMAGS-LASSO demonstrated 21.8% improvement over LASSO (P value = 2.24E-04) and 38.5% over random forest (P value = 4.62E-08) at 98.5% specificity while selecting the same number of biomarkers.

Conclusions: SMAGS-LASSO enables the development of minimal biomarker panels that maintain high sensitivity at predefined specificity thresholds, offering superior performance for early cancer detection.

Impact: This method provides a promising approach for early cancer detection and other medical diagnostics requiring sensitivity-specificity optimization.

Keywords

Humans, Early Detection of Cancer, Sensitivity and Specificity, Machine Learning, Algorithms, Biomarkers, Tumor, Colorectal Neoplasms

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.