Faculty, Staff and Student Publications
Language
English
Publication Date
12-1-2025
Journal
JAMIA Open
DOI
10.1093/jamiaopen/ooaf134
PMID
41334246
PMCID
PMC12668681
PubMedCentral® Posted Date
12-1-2025
PubMedCentral® Full Text Version
Post-print
Abstract
Objectives: The NIH's Bridge2AI Program has funded 4 "new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.
Materials and methods: We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.
Results: We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.
Discussion: The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.
Conclusions: This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.
Keywords
governance, data privacy, informed consent, data access
Published Open-Access
yes
Recommended Citation
Clayton, Ellen Wright; Rose, Susannah; Nebecker, Camille; et al., "Biomedical Data Repositories Require Governance for Artificial Intelligence/Machine Learning Applications at Every Step" (2025). Faculty, Staff and Student Publications. 735.
https://digitalcommons.library.tmc.edu/uthshis_docs/735