
Faculty, Staff and Student Publications
Publication Date
5-17-2025
Journal
npj Digital Medicine
Abstract
Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH .
DOI
10.1038/s41746-025-01645-8
PMID
40379919
PMCID
PMC12084648
PubMedCentral® Posted Date
5-17-2025
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Included in
Medical Sciences Commons, Mental and Social Health Commons, Psychiatry Commons, Psychiatry and Psychology Commons