
Faculty, Staff and Student Publications
Publication Date
3-10-2025
Journal
Scientific Reports
Abstract
The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different i2b2/ n2c2 datasets for clinical named entity recognition. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.
Keywords
Humans, Deep Learning, Natural language processing, Large language models, Electronic health records, Weak supervision, Translational research, Machine learning, Computer science
DOI
10.1038/s41598-024-68168-2
PMID
40064991
PMCID
PMC11893743
PubMedCentral® Posted Date
3-10-2025
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Computer Sciences Commons, Data Science Commons, Translational Medical Research Commons