
Dissertations & Theses (Open Access)
Graduation Date
Fall 2024
Degree Name
Doctor of Philosophy (PhD)
School Name
McWilliams School of Biomedical Informatics at UTHealth Houston
Advisory Committee
Licong Cui, PhD
Abstract
This dissertation investigates adverse events (AEs) associated with COVID-19 vaccinations through a multi-faceted approach involving structured and unstructured data. The study is organized into three specific aims, each contributing to a comprehensive understanding of vaccine safety. Aim 1 focuses on the temporal and spatial analysis of AEs reported in the Vaccine Adverse Event Reporting System (VAERS). We performed a detailed temporal analysis to detect patterns over time, revealing a significant increase in reported AEs shortly after the vaccine rollout. Spatial analysis highlighted regional variations in AE reporting, with higher frequencies observed in middle and north regions of the United States compared to other areas. Statistical tests, including zero-truncated Poisson regression, logistic regression, spearman rank correlation coefficient, and linear regression were used to assess the significance of the findings. Aim 2 involves extracting AE-related information from unstructured text in VAERS reports and social media platforms (Twitter and Reddit). For Named Entity Recognition (NER), we evaluated several models, including BERT, LSTM, Llama 2, GPT 3.5, GPT 4. We also fine-tuned GPT-3.5 to further improve its performance. To achieve the best performance, we utilized model ensembles combining the outputs of these models. The fine-tuned GPT-3.5 model achieved the strict F1 score of 0.716, where the ensemble enhanced the performance with the strict F1 score of 0.903. For relation extraction, we fine-tuned a large language model, GPT-3.5, which initially achieved a precision of 0.86, recall of 0.41, and an F1 score of 0.55. After applying post-processing rules, the model's performance greatly improved, reaching an F1 score of 0.97. Aim 3 is dedicated to developing RefAI, a tool that leverages the strong language understanding and summarization capabilities of GPT, and developed a novel literature ranking algorithm. RefAI specifically recommends and summarizes biomedical literature. We tested it on the adverse events of COVID-19 vaccine topic, achieving a relevance score of 3.65 ± 1.06 and a quality score of 3.70 ± 1.19. For literature summarization, we evaluated the tool on accuracy, comprehensiveness, and reference integration, with scores of 4.50 ± 0.50 for accuracy, 4.00 ± 0.00 for comprehensiveness, and a perfect 5.00 ± 0.00 for reference integration. We developed a web-based platform that integrates results from VAERS, social media, and biomedical literature on COVID-19 vaccine-related adverse events. The platform offers interactive visualizations, enabling users to explore and analyze key findings intuitively. Overall, this dissertation provides valuable insights into the safety profile of COVID-19 vaccines by combining diverse data sources and advanced language models, natural language processing techniques and statistical analysis. The findings contribute to a deeper understanding of vaccine-associated AEs and support informed decision-making in public health. The developed RefAI and visualization platform enhances accessibility to the data, promoting transparency and engagement with the research community and the general public.
Recommended Citation
Li, Yiming, "Comprehensive Analysis of Adverse Events Following COVID-19 Vaccination: Insights from Diverse Data Sources" (2024). Dissertations & Theses (Open Access). 66.
https://digitalcommons.library.tmc.edu/uthshis_dissertations/66
Keywords
Covid 19, Vaccine Adverse Event Reporting System (VAERS), natural language processing, Large Language models, RefAI, Spatial Analysis, Temporal Analysis