Faculty, Staff and Student Publications
Language
English
Publication Date
11-1-2025
Journal
Briefings in Bioinformatics
DOI
10.1093/bib/bbaf666
PMID
41705520
PMCID
PMC12914468
PubMedCentral® Posted Date
12-12-2025
PubMedCentral® Full Text Version
Post-print
Abstract
High-throughput omics data present challenges for binary classification due to platform variability, batch effects, missing values, and high dimensionality. This study presents a novel Rank-Based Learning (RBL) method that leverages relative feature rankings to improve robustness and generalizability. We evaluated RBL against established methods like Logistic Regression (LR) and Random Forest (RF) using simulated data and two real-world plasma proteomics datasets: early-stage small cell lung cancer (SCLC) and duodenopancreatic neuroendocrine tumors (dpNET) in patients with Multiple Endocrine Neoplasia type 1 (MEN1). In simulation experiments, RBL outperformed LR under conditions involving batch effects, missing data, and varying numbers of true differential features. In SCLC, RBL yielded a test AUC of 0.76 (95% CI: 0.42-1.00), surpassing LR with Lasso (0.65 [95% CI: 0.47-0.84]) and RF with feature importance (0.59 [95% CI: 0.33-0.87]). In dpNET, RBL achieved an AUC of 0.83 (95% CI: 0.67-0.97) on the development set and 0.80 (95% CI: 0.54-0.98) on the test set, outperforming LR with Lasso (0.57 [95% CI: 0.40-0.77]) and RF with feature importance (0.53 [95% CI: 0.29-0.77]). By emphasizing feature ranking rather than absolute expression levels, RBL effectively mitigates the impact of non-biological variation. Overall, RBL improves the predictive accuracy of diagnostic models for complex diseases and provides a promising framework for developing more reliable and generalizable diagnostic tools from omics data, moving them closer to clinical application.
Keywords
Humans, Algorithms, Sample Size, Proteomics, Machine Learning, Small Cell Lung Carcinoma, Neuroendocrine Tumors, Lung Neoplasms, rank-based learning, high-throughput omics, missing data, machine learning
Published Open-Access
yes
Recommended Citation
Song, Lulu; Rudsari, Hamid Khoshfekr; Fahrmann, Johannes F; et al., "Rank-Based Learning: A Novel High-Throughput Algorithm Resilient to Missing Data and Effective for Datasets With Small Sample Size" (2025). Faculty, Staff and Student Publications. 6216.
https://digitalcommons.library.tmc.edu/uthgsbs_docs/6216
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Genetic Phenomena Commons, Medical Genetics Commons, Oncology Commons