Faculty, Staff and Student Publications

Language

English

Publication Date

11-1-2025

Journal

Briefings in Bioinformatics

DOI

10.1093/bib/bbaf666

PMID

41705520

PMCID

PMC12914468

PubMedCentral® Posted Date

12-12-2025

PubMedCentral® Full Text Version

Post-print

Abstract

High-throughput omics data present challenges for binary classification due to platform variability, batch effects, missing values, and high dimensionality. This study presents a novel Rank-Based Learning (RBL) method that leverages relative feature rankings to improve robustness and generalizability. We evaluated RBL against established methods like Logistic Regression (LR) and Random Forest (RF) using simulated data and two real-world plasma proteomics datasets: early-stage small cell lung cancer (SCLC) and duodenopancreatic neuroendocrine tumors (dpNET) in patients with Multiple Endocrine Neoplasia type 1 (MEN1). In simulation experiments, RBL outperformed LR under conditions involving batch effects, missing data, and varying numbers of true differential features. In SCLC, RBL yielded a test AUC of 0.76 (95% CI: 0.42-1.00), surpassing LR with Lasso (0.65 [95% CI: 0.47-0.84]) and RF with feature importance (0.59 [95% CI: 0.33-0.87]). In dpNET, RBL achieved an AUC of 0.83 (95% CI: 0.67-0.97) on the development set and 0.80 (95% CI: 0.54-0.98) on the test set, outperforming LR with Lasso (0.57 [95% CI: 0.40-0.77]) and RF with feature importance (0.53 [95% CI: 0.29-0.77]). By emphasizing feature ranking rather than absolute expression levels, RBL effectively mitigates the impact of non-biological variation. Overall, RBL improves the predictive accuracy of diagnostic models for complex diseases and provides a promising framework for developing more reliable and generalizable diagnostic tools from omics data, moving them closer to clinical application.

Keywords

Humans, Algorithms, Sample Size, Proteomics, Machine Learning, Small Cell Lung Carcinoma, Neuroendocrine Tumors, Lung Neoplasms, rank-based learning, high-throughput omics, missing data, machine learning

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.