Dissertations & Theses (Open Access)

Graduation Date

12-2008

Degree Name

Doctor of Philosophy (PhD)

Advisory Committee

Elmer V. Bernstam, MD, MSE, MS (chair); Jack W. Smith, MD, PhD; M. Sriram Iyengar, PhD; Devika Subramanian; PhD

Abstract

The biomedical literature is extensively catalogued and indexed in MEDLINE. MEDLINE indexing is done by trained human indexers, who identify the most important concepts in each article, and is expensive and inconsistent. Automating the indexing task is difficult: the National Library of Medicine produces the Medical Text Indexer (MTI), which suggests potential indexing terms to the indexers. MTI’s output is not good enough to work unattended. In my thesis, I propose a different way to approach the indexing task called MEDRank. MEDRank creates graphs representing the concepts in biomedical articles and their relationships within the text, and applies graph-based ranking algorithms to identify the most important concepts in each article. I evaluate the performance of several automated indexing solutions, including my own, by comparing their output to the indexing terms selected by the human indexers. MEDRank outperformed all other evaluated indexing solutions, including MTI, in general indexing performance and precision. MEDRank can be used to cluster documents, index any kind of biomedical text with standard vocabularies, or could become part of MTI itself.

Herskovic PhD Dissertation.pdf (1750 kB)
Higher-resolution pdf

Keywords

MEDRank, Medical Text Indexer, Abstracting and Indexing as Topic/methods; Information Storage and Retrieval/methods

Share

COinS