Faculty, Staff and Student Publications
Publication Date
4-13-2022
Journal
JAMA America Medical Informatics Associaton
Abstract
OBJECTIVES: Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision.
MATERIALS AND METHODS: We focused on identifying imaging findings for 3 common causes of malpractice claims: (1) potentially malignant breast (mammography) and (2) lung (chest computed tomography [CT]) lesions and (3) long-bone fracture (X-ray) reports. We train our ClinicalBERT-based pipeline on existing typed/dictated reports classified manually or using ICD-10 codes, evaluate using a test set of manually classified SDs, and compare against string-matching (baseline approach).
RESULTS: A total of 393 mammograms, 305 chest CT, and 683 bone X-ray reports were manually reviewed. The string-matching approach had an F1 of 0.667. For mammograms, chest CTs, and bone X-rays, respectively: models trained on manually classified training data and optimized for F1 reached an F1 of 0.900, 0.905, and 0.817, while separate models optimized for recall achieved a recall of 1.000 with precisions of 0.727, 0.518, and 0.275. Models trained on ICD-10-labelled data and optimized for F1 achieved F1 scores of 0.647, 0.830, and 0.643, while those optimized for recall achieved a recall of 1.0 with precisions of 0.407, 0.683, and 0.358.
DISCUSSION: Our pipeline can identify abnormal reports with potentially useful performance and so decrease the manual effort required to screen for abnormal findings that require follow-up.
CONCLUSION: It is possible to automatically identify clinically significant abnormalities in SDs with high recall and practically useful precision in a generalizable and minimally laborious way.
Keywords
Electronic Health Records, Natural Language Processing, Research Report, Tomography, X-Ray Computed