Language

English

Publication Date

4-1-2026

Journal

JAMIA Open

DOI

10.1093/jamiaopen/ooaf152

PMID

41873434

PMCID

PMC13006063

PubMedCentral® Posted Date

3-13-2026

PubMedCentral® Full Text Version

Post-print

Abstract

Objectives: We evaluated the data requirement for modern AI tools to outperform simpler models in predicting short-term mortality in over 500 000 patients with hemodialysis-dependent kidney failure.

Materials and methods: We compared logistic regression, boosting, and transformers using increasingly complex feature sets (from last-visit data to full trajectories). Performance was measured using the area under the ROC curve (AUC-ROC) and the Precision-Recall curve (AUC-PR) across training data sizes ranging from 500 to 490 197 samples.

Results: Using features with temporal information is beneficial across all models. On the full dataset, Transformers (AUC-ROC = 0.8568) and boosting (AUC-ROC = 0.8598) perform similarly.

Discussion: Transformers require large datasets to outperform simpler models like boosting, limiting their usefulness in smaller datasets, even on datasets as big as 500K.

Conclusion: Modern AI tools require substantial data to justify their computational cost over simpler approaches. However, a more complex feature set seems to be beneficial across all models.

Keywords

mortality, risk prediction, artificial intelligence, machine learning, hemodialysis

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.