Faculty, Staff and Students Publications

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT

Language

English

Publication Date

10-1-2025

Journal

Academic Radiology

DOI

10.1016/j.acra.2025.07.055

PMID

40796469

PubMedCentral® Full Text Version

Post-print

Abstract

Rationale and objectives: To evaluate the post-deployment performance of an artificial intelligence (AI) system (Aidoc) for intracranial hemorrhage (ICH) detection and assess the utility of ChatGPT-4 Turbo for automated AI monitoring.

Materials and methods: This retrospective study evaluated 332,809 head CT examinations from 37 radiology practices across the United States (December 2023-May 2024). Of these, 13,569 cases were flagged as positive for ICH by the Aidoc AI system. A HIPAA (Health Insurance Portability and Accountability Act) -compliant version of ChatGPT-4 Turbo was used to extract data from radiology reports. Ground truth was established through radiologists' review of 200 randomly selected cases. Performance metrics were calculated for ChatGPT, Aidoc and radiologists.

Results: ChatGPT-4 Turbo demonstrated high diagnostic accuracy in identifying intracranial hemorrhage (ICH) from radiology reports, with a positive predictive value of 1 and a negative predictive value of 0.988 (AUC:0.996). Aidoc's false positive classifications were influenced by scanner manufacturer, midline shift, mass effect, artifacts, and neurologic symptoms. Multivariate analysis identified Philips scanners (OR: 6.97, p=0.003) and artifacts (OR: 3.79, p=0.029) as significant contributors to false positives, while midline shift (OR: 0.08, p=0.021) and mass effect (OR: 0.18, p=0.021) were associated with a reduced false positive rate. Aidoc-assisted radiologists achieved a sensitivity of 0.936 and a specificity of 1.

Conclusion: This study underscores the importance of continuous performance monitoring for AI systems in clinical practice. The integration of LLMs offers a scalable solution for evaluating AI performance, ensuring reliable deployment and enhancing diagnostic workflows.

Keywords

Humans, Retrospective Studies, Intracranial Hemorrhages, Artificial Intelligence, Male, Tomography, X-Ray Computed, Female, Middle Aged, United States, Aged, Sensitivity and Specificity, Radiographic Image Interpretation, Computer-Assisted, Adult, Generative Artificial Intelligence, Artificial intelligence, ChatGPT-4 Turbo, Intracranial hemorrhage, Large language models

Published Open-Access

yes

Recommended Citation

Rohren, Eric; Ahmadzade, Mohadese; Colella, Sofia; et al., "Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT" (2025). Faculty, Staff and Students Publications. 6118.
https://digitalcommons.library.tmc.edu/baylor_docs/6118

Download

Included in

Medical Sciences Commons, Radiation Medicine Commons, Radiology Commons

COinS

Faculty, Staff and Students Publications

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT

Language

Publication Date

Journal

DOI

PMID

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Students Publications

Post-deployment Monitoring of AI Performance in Intracranial Hemorrhage Detection by ChatGPT

Authors

Language

Publication Date

Journal

DOI

PMID

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library