Faculty, Staff and Student Publications

Using Citation Data To Improve Retrieval From Medline

Elmer V Bernstam, School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX
Jorge R Herskovic, School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX
Yindalon Aphinyanaphongs, Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
Constantin F Aliferis, Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
Madurai G Sriram, School of Health Information Sciences, The University of Texas Health Science Center at Houston, Houston, TX
William R Hersh, Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR

Publication Date

1-1-2006

Journal

Journal of the American Medical Informatics Association

Abstract

OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.

Keywords

Algorithms, Artificial Intelligence, Bibliometrics, Evidence-Based Medicine, Information Storage and Retrieval, Internet, MEDLINE, PubMed

DOI

10.1197/jamia.M1909

PMID

16221938

PMCID

PMC1380202

PubMedCentral® Posted Date

January 2006

PubMedCentral® Full Text Version

Post-Print

Published Open-Access

yes

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Data Science Commons

COinS

Faculty, Staff and Student Publications

Using Citation Data To Improve Retrieval From Medline

Publication Date

Journal

Abstract

Keywords

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Published Open-Access

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Using Citation Data To Improve Retrieval From Medline

Authors

Publication Date

Journal

Abstract

Keywords

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Published Open-Access

Included in

Share

Search

Browse

Author Corner

More Info

Library