Faculty, Staff and Student Publications

Publication Date

7-1-2022

Journal

JAMIA Open

Abstract

OBJECTIVE: Scanned documents in electronic health records (EHR) have been a challenge for decades, and are expected to stay in the foreseeable future. Current approaches for processing include image preprocessing, optical character recognition (OCR), and natural language processing (NLP). However, there is limited work evaluating the interaction of image preprocessing methods, NLP models, and document layout.

MATERIALS AND METHODS: We evaluated 2 key indicators for sleep apnea, Apnea hypopnea index (AHI) and oxygen saturation (SaO

RESULTS: Our proposed method using ClinicalBERT reached an AUROC of 0.9743 and document accuracy of 94.76% for AHI, and an AUROC of 0.9523 and document accuracy of 91.61% for SaO

DISCUSSION: There are multiple, inter-related steps to extract meaningful information from scanned reports. While it would be infeasible to experiment with all possible option combinations, we experimented with several of the most critical steps for information extraction, including image processing and NLP. Given that scanned documents will likely be part of healthcare for years to come, it is critical to develop NLP systems to extract key information from this data.

CONCLUSION: We demonstrated the proper use of image preprocessing and document layout could be beneficial to scanned document processing.

DOI

10.1093/jamiaopen/ooac045

PMID

35702624

PMCID

PMC9188320

PubMedCentral® Posted Date

6-11-2022

PubMedCentral® Full Text Version

Post-print

Published Open-Access

yes

Plum Print visual indicator of research metrics
PlumX Metrics
  • Citations
    • Citation Indexes: 29
  • Usage
    • Downloads: 6
    • Abstract Views: 1
  • Captures
    • Readers: 86
see details

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.