Faculty, Staff and Student Publications

Language

English

Publication Date

3-19-2025

Journal

Journal of Medical Internet Research

DOI

10.2196/67677

PMID

40106799

PMCID

PMC11966073

PubMedCentral® Posted Date

3-19-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Background: Dietary supplements (DSs) are widely used to improve health and nutrition, but challenges related to misinformation, safety, and efficacy persist due to less stringent regulations compared with pharmaceuticals. Accurate and reliable DS information is critical for both consumers and health care providers to make informed decisions.

Objective: This study aimed to enhance DS-related question answering by integrating an advanced retrieval-augmented generation (RAG) system with the integrated Dietary Supplement Knowledgebase 2.0 (iDISK2.0), a dietary supplement knowledge base, to improve accuracy and reliability.

Methods: We developed iDISK2.0 by integrating updated data from authoritative sources, including the Natural Medicines Comprehensive Database, the Memorial Sloan Kettering Cancer Center database, Dietary Supplement Label Database, and Licensed Natural Health Products Database, and applied advanced data cleaning and standardization techniques to reduce noise. The RAG system combined the retrieval power of a biomedical knowledge graph with the generative capabilities of large language models (LLMs) to address limitations of stand-alone LLMs, such as hallucination. The system retrieves contextually relevant subgraphs from iDISK2.0 based on user queries, enabling accurate and evidence-based responses through a user-friendly interface. We evaluated the system using true-or-false and multiple-choice questions derived from the Memorial Sloan Kettering Cancer Center database and compared its performance with stand-alone LLMs.

Results: iDISK2.0 integrates 174,317 entities across 7 categories, including 8091 dietary supplement ingredients; 163,806 dietary supplement products; 786 diseases; and 625 drugs, along with 6 types of relationships. The RAG system achieved an accuracy of 99% (990/1000) for true-or-false questions on DS effectiveness and 95% (948/100) for multiple-choice questions on DS-drug interactions, substantially outperforming stand-alone LLMs like GPT-4o (OpenAI), which scored 62% (618/1000) and 52% (517/1000) on these respective tasks. The user interface enabled efficient interaction, supporting free-form text input and providing accurate responses. Integration strategies minimized data noise, ensuring access to up-to-date, DS-related information.

Conclusions: By integrating a robust knowledge graph with RAG and LLM technologies, iDISK2.0 addresses the critical limitations of stand-alone LLMs in DS information retrieval. This study highlights the importance of combining structured data with advanced artificial intelligence methods to improve accuracy and reduce misinformation in health care applications. Future work includes extending the framework to broader biomedical domains and improving evaluation with real-world, open-ended queries.

Keywords

Dietary Supplements, Humans, Information Storage and Retrieval, Knowledge Bases, Databases, Factual, Large Language Models, dietary supplements, knowledge representation, knowledge graph, retrieval-augmented generation, large language model, user interface

Published Open-Access

yes

Included in

Public Health Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.