Faculty, Staff and Students Publications

Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing

Yvonne Sada
Jason Hou
Peter Richardson
Hashem El-Serag
Jessica Davila

Language

English

Publication Date

2-1-2016

Journal

Medical Care

DOI

oi: 10.1097/MLR.0b013e3182a30373

PMID

23929403

PMCID

PMC3875602

PubMedCentral® Posted Date

February 2017

PubMedCentral® Full Text Version

Author MSS

Abstract

BACKGROUND: Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification.

METHODS: We identified a cohort of patients with ICD-9 codes for HCC during 2005-2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated.

RESULTS: A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68.

CONCLUSIONS: A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.

Keywords

Aged, Algorithms, Databases, Factual, Electronic Health Records, Female, Humans, Information Storage and Retrieval, International Classification of Diseases, Liver Neoplasms, Male, Middle Aged, Natural Language Processing, Sensitivity and Specificity, Socioeconomic Factors, United States, United States Department of Veterans Affairs

Published Open-Access

yes

Recommended Citation

Sada, Yvonne; Hou, Jason; Richardson, Peter; et al., "Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing" (2016). Faculty, Staff and Students Publications. 16.
https://digitalcommons.library.tmc.edu/baylor_docs/16

Link to Full Text

COinS

Faculty, Staff and Students Publications

Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Students Publications

Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Share

Search

Browse

Author Corner

More Info

Library