Faculty, Staff and Student Publications

Publication Date

2-1-2025

Journal

Advances in Radiation Oncology

Abstract

Purpose: To evaluate the efficacy of prominent machine learning algorithms in predicting normal tissue complication probability using clinical data obtained from 2 distinct disease sites and to create a software tool that facilitates the automatic determination of the optimal algorithm to model any given labeled data set.

Methods and materials: We obtained 3 sets of radiation toxicity data (478 patients) from our clinic: gastrointestinal toxicity, radiation pneumonitis, and radiation esophagitis. These data comprised clinicopathological and dosimetric information for patients diagnosed with non-small cell lung cancer and anal squamous cell carcinoma. Each data set was modeled using 11 commonly employed machine learning algorithms (elastic net, least absolute shrinkage and selection operator [LASSO], random forest, random forest regression, support vector machine, extreme gradient boosting, light gradient boosting machine, k-nearest neighbors, neural network, Bayesian-LASSO, and Bayesian neural network) by randomly dividing the data set into a training and test set. The training set was used to create and tune the model, and the test set served to assess it by calculating performance metrics. This process was repeated 100 times by each algorithm for each data set. Figures were generated to visually compare the performance of the algorithms. A graphical user interface was developed to automate this whole process.

Results: LASSO achieved the highest area under the precision-recall curve (0.807 ± 0.067) for radiation esophagitis, random forest for gastrointestinal toxicity (0.726 ± 0.096), and the neural network for radiation pneumonitis (0.878 ± 0.060). The area under the curve was 0.754 ± 0.069, 0.889 ± 0.043, and 0.905 ± 0.045, respectively. The graphical user interface was used to compare all algorithms for each data set automatically. When averaging the area under the precision-recall curve across all toxicities, Bayesian-LASSO was the best model.

Conclusions: Our results show that there is no best algorithm for all data sets. Therefore, it is important to compare multiple algorithms when training an outcome prediction model on a new data set. The graphical user interface created for this study automatically compares the performance of these 11 algorithms for any data set.

DOI

10.1016/j.adro.2024.101675

PMID

39717195

PMCID

PMC11665468

PubMedCentral® Posted Date

11-13-2024

PubMedCentral® Full Text Version

Post-print

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.