Faculty, Staff and Student Publications

Cross-Institutional Dental Electronic Health Record Entity Extraction via Generative Artificial Intelligence and Synthetic Notes

Language

English

Publication Date

6-1-2025

Journal

JAMIA Open

DOI

10.1093/jamiaopen/ooaf061

PMID

40584736

PMCID

PMC12205731

PubMedCentral® Posted Date

6-28-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Background: While most health-care providers now use electronic health records (EHRs) to document clinical care, many still treat them as digital versions of paper records. As a result, documentation often remains unstructured, with free-text entries in progress notes. This limits the potential for secondary use and analysis, as machine-learning and data analysis algorithms are more effective with structured data.

Objective: This study aims to use advanced artificial intelligence (AI) and natural language processing (NLP) techniques to improve diagnostic information extraction from clinical notes in a periodontal use case. By automating this process, the study seeks to reduce missing data in dental records and minimize the need for extensive manual annotation, a long-standing barrier to widespread NLP deployment in dental data extraction.

Materials and methods: This research utilizes large language models (LLMs), specifically Generative Pretrained Transformer 4, to generate synthetic medical notes for fine-tuning a RoBERTa model. This model was trained to better interpret and process dental language, with particular attention to periodontal diagnoses. Model performance was evaluated by manually reviewing 360 clinical notes randomly selected from each of the participating site's dataset.

Results: The results demonstrated high accuracy of periodontal diagnosis data extraction, with the sites 1 and 2 achieving a weighted average score of 0.97-0.98. This performance held for all dimensions of periodontal diagnosis in terms of stage, grade, and extent.

Discussion: Synthetic data effectively reduced manual annotation needs while preserving model quality. Generalizability across institutions suggests viability for broader adoption, though future work is needed to improve contextual understanding.

Conclusion: The study highlights the potential transformative impact of AI and NLP on health-care research. Most clinical documentation (40%-80%) is free text. Scaling our method could enhance clinical data reuse.

Keywords

periodontal diseases, natural language processing, large language models, electronic health records, named entity recognition

Published Open-Access

yes

Recommended Citation

Chuang, Yao-Shun; Lee, Chun-Teh; Lin, Guo-Hao; et al., "Cross-Institutional Dental Electronic Health Record Entity Extraction via Generative Artificial Intelligence and Synthetic Notes" (2025). Faculty, Staff and Student Publications. 665.
https://digitalcommons.library.tmc.edu/uthshis_docs/665

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Periodontics and Periodontology Commons

COinS

Faculty, Staff and Student Publications

Cross-Institutional Dental Electronic Health Record Entity Extraction via Generative Artificial Intelligence and Synthetic Notes

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Cross-Institutional Dental Electronic Health Record Entity Extraction via Generative Artificial Intelligence and Synthetic Notes

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library