Faculty, Staff and Student Publications

Language

English

Publication Date

10-1-2025

Journal

JAMIA Open

DOI

10.1093/jamiaopen/ooaf101

PMID

41127255

PMCID

PMC12539179

PubMedCentral® Posted Date

10-21-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Objective: The Bridge2AI program is establishing rules of practice for creating ethically sourced health data repositories to support the effective use of ML/AI in biomedical and behavioral research. Given the initially undefined nature of ethically sourced data, this work concurrently developed definitions and guidelines alongside repository creation, grounded in a practical, operational framework.

Materials and methods: A Value Sensitive Design (VSD) approach was used to explore ethical tensions across stages of health data repository development. The conceptual investigation drew from supply chain management (SCM) processes to (1) identify actors who would interact with or be affected by the data repository use and outcomes; (2) determine what values to consider (ie, traceability accountability, security); and (3) analyze and document value trade-offs (ie, balancing risks of harm to improvements in healthcare). This SCM framework provides operational guidance for managing complex, multi-source data flows with embedded bias mitigation strategies.

Results: This conceptual investigation identified the actors, values, and tensions that influence ethical sourcing when creating a health data repository. The SCM steps provide a scaffolding to support ethical sourcing across the pre-model stages of health data repository development. Ethical sourcing includes documenting data provenance, articulating expectations for experts, and practices for ensuring data privacy, equity, and public benefit. Challenges include risks of ethics washing and highlight the need for transparent, value-driven practices.

Discussion: Integrating VSD with SCM frameworks enables operationalization of ethical values, improving data integrity, mitigating biases, and enhancing trust. This approach highlights how foundational decisions influence repository quality and AI/ML system usability, addressing provenance, traceability, redundancy, and risk management central to ethical data sourcing.

Conclusion: To create authentic, impactful health data repositories that serve public health goals, organizations must prioritize transparency, accountability, and operational frameworks like SCM that comprehensively address the complexities and risks inherent in data stewardship.

Keywords

artificial intelligence, machine learning, health data repository, ethically sourced, research ethics

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.