Language

English

Publication Date

5-1-2025

Journal

Research and Practice in Thrombosis and Haemostasis

DOI

10.1016/j.rpth.2025.102896

PMID

40606764

PMCID

PMC12213262

PubMedCentral® Posted Date

5-21-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Background: Pulmonary embolism (PE) is a leading cause of preventable in-hospital mortality. Advances in diagnosis, risk stratification, and prevention can improve outcomes. Large, publicly available datasets are needed to move research forward, but are lacking in the field of hemostasis and thrombosis.

Objectives: In this study, we experiment using a machine learning language model to automatically add PE labels to a large dataset.

Methods: We extracted all computed tomography pulmonary angiography radiology reports (N = 19,942) from the Medical Information Mart for Intensive Care IV, a database of adult patients who presented to the emergency room or were admitted to the intensive care unit at one tertiary care center between 2008 and 2019. Two physicians manually labeled each report result as PE positive (acute PE) or PE negative. Using this as our gold standard, we compared the performance of a fine-tuned Bio_ClinicalBERT (bidirectional encoder representations from transformers) transformer language model, known as venous thromboembolism-BERT, with diagnosis codes in the ability to classify reports as PE positive or negative.

Results: Venous thromboembolism-BERT had a sensitivity of 92.4% and a positive predictive value of 87.8% in all 19,942 computed tomography pulmonary angiography reports. Diagnosis codes had a sensitivity of 95.4% and a positive predictive value of 83.8% in the subset of 11,990 reports with an associated discharge diagnosis code.

Conclusion: We successfully added nearly 20,000 PE labels to the publicly available Medical Information Mart for Intensive Care IV database and demonstrated how a transformer language model can automate and accelerate hematologic research.

Keywords

machine learning, natural language processing, pulmonary embolism, venous thromboembolism, databases

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.