Faculty, Staff and Student Publications

Language

English

Publication Date

1-1-2024

Journal

AMIA Annual Symposium Proceedings

PMID

40417476

PMCID

PMC12099396

PubMedCentral® Posted Date

5-22-2025

PubMedCentral® Full Text Version

Post-print

Abstract

Genomic research is becoming increasingly data-intensive, yet the proper reference of data remains a persistent challenge. Despite various efforts to establish and standardize data citation practices, scientists frequently fall short of accurately referencing data in their papers. This deficiency complicates the attribution of contributions to data providers and impedes the reproducibility of findings in genomic research. This study addresses this gap by introducing a gold standard corpus designed to identify mentions of genomic data sources and associated attributes, thereby offering insights into data source availability and accessibility. Within this corpus, we categorize entities into six classes, encompassing three primary entities (Dataset, Repository, and Contributor) and three attributes (Accession Number, URL, and DOI). We also define and annotate the relations between these main entities and attributes. We perform a comprehensive analysis of the corpus, by assessing inter-annotator agreements and implementing an information extraction pipeline using BERT-based models. Our BERT-based models achieve a best F1 score of 0.94 in recognizing mentions of genomic data sources and 0.76 in extracting relationships between these mentions and associated attributes. By introducing this genomic data source mention corpus, we aim to propel the progress of data sharing and reuse in forthcoming genomic research.

Keywords

Genomics, Information Storage and Retrieval, Data Mining, Humans, Databases, Genetic, Information Sources

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.