
Faculty, Staff and Student Publications
Publication Date
4-14-2025
Journal
npj Digital Medicine
Abstract
Seizure frequency is essential for evaluating epilepsy treatment, ensuring patient safety, and reducing risk for Sudden Unexpected Death in Epilepsy. As this information is often described in clinical narratives, this study presents an approach to extracting structured seizure frequency details from such unstructured text. We investigated two tasks: (1) extracting phrases describing seizure frequency, and (2) extracting seizure frequency attributes. For both tasks, we fine-tuned three BERT-based models (bert-large-cased, biobert-large-cased, and Bio_ClinicalBERT), as well as three generative large language models (GPT-4, GPT-3.5 Turbo, and Llama-2-70b-hf). The final structured output integrated the results from both tasks. GPT-4 attained the best performance across all tasks with precision, recall, and F1-score of 86.61%, 85.04%, and 85.79% respectively for frequency phrase extraction; 90.23%, 93.51%, and 91.84% for seizure frequency attribute extraction; and 86.64%, 85.06%, and 85.82% for the final structured output. These findings highlight the potential of fine-tuned generative models in extractive tasks from limited text strings.
Keywords
Predictive markers, Epilepsy, Risk factors
DOI
10.1038/s41746-025-01592-4
PMID
40229513
PMCID
PMC11997153
PubMedCentral® Posted Date
4-14-2025
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes