Faculty, Staff and Student Publications

Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering

Yan Hu
Qingyu Chen
Jingcheng Du
Xueqing Peng
Vipina Kuttichi Keloth
Xu Zuo
Yujia Zhou
Zehan Li
Xiaoqian Jiang
Zhiyong Lu
Kirk Roberts
Hua Xu

Language

English

Publication Date

9-1-2024

Journal

Journal of the American Medical Informatics Association

DOI

10.1093/jamia/ocad259

PMID

38281112

PMCID

PMC11339492

PubMedCentral® Posted Date

1-27-2024

PubMedCentral® Full Text Version

Post-print

Abstract

IMPORTANCE: The study highlights the potential of large language models, specifically GPT-3.5 and GPT-4, in processing complex clinical data and extracting meaningful information with minimal training data. By developing and refining prompt-based strategies, we can significantly enhance the models' performance, making them viable tools for clinical NER tasks and possibly reducing the reliance on extensive annotated datasets.

OBJECTIVES: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance.

MATERIALS AND METHODS: We evaluated these models on 2 clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) to identify nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.

RESULTS: Using baseline prompts, GPT-3.5 and GPT-4 achieved relaxed F1 scores of 0.634, 0.804 for MTSamples and 0.301, 0.593 for VAERS. Additional prompt components consistently improved model performance. When all 4 components were used, GPT-3.5 and GPT-4 achieved relaxed F1 socres of 0.794, 0.861 for MTSamples and 0.676, 0.736 for VAERS, demonstrating the effectiveness of our prompt framework. Although these results trail BioClinicalBERT (F1 of 0.901 for the MTSamples dataset and 0.802 for the VAERS), it is very promising considering few training samples are needed.

DISCUSSION: The study's findings suggest a promising direction in leveraging LLMs for clinical NER tasks. However, while the performance of GPT models improved with task-specific prompts, there's a need for further development and refinement. LLMs like GPT-4 show potential in achieving close performance to state-of-the-art models like BioClinicalBERT, but they still require careful prompt engineering and understanding of task-specific knowledge. The study also underscores the importance of evaluation schemas that accurately reflect the capabilities and performance of LLMs in clinical settings.

CONCLUSION: While direct application of GPT models to clinical NER tasks falls short of optimal performance, our task-specific prompt framework, incorporating medical knowledge and training samples, significantly enhances GPT models' feasibility for potential clinical applications.

Keywords

Natural Language Processing, Humans, Electronic Health Records, Data Mining

Published Open-Access

yes

Recommended Citation

Hu, Yan; Chen, Qingyu; Du, Jingcheng; et al., "Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering" (2024). Faculty, Staff and Student Publications. 197.
https://digitalcommons.library.tmc.edu/uthshis_docs/197

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Data Science Commons

COinS

Faculty, Staff and Student Publications

Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library