Doctor of Philosophy (PhD)
Elmer V. Bernstam, MD, MSE, MS (chair); Jack W. Smith, MD, PhD; M. Sriram Iyengar, PhD; Devika Subramanian; PhD
The biomedical literature is extensively catalogued and indexed in MEDLINE. MEDLINE indexing is done by trained human indexers, who identify the most important concepts in each article, and is expensive and inconsistent. Automating the indexing task is difficult: the National Library of Medicine produces the Medical Text Indexer (MTI), which suggests potential indexing terms to the indexers. MTI’s output is not good enough to work unattended. In my thesis, I propose a different way to approach the indexing task called MEDRank. MEDRank creates graphs representing the concepts in biomedical articles and their relationships within the text, and applies graph-based ranking algorithms to identify the most important concepts in each article. I evaluate the performance of several automated indexing solutions, including my own, by comparing their output to the indexing terms selected by the human indexers. MEDRank outperformed all other evaluated indexing solutions, including MTI, in general indexing performance and precision. MEDRank can be used to cluster documents, index any kind of biomedical text with standard vocabularies, or could become part of MTI itself.
Herskovic, Jorge R., "UNSUPERVISED INDEXING OF MEDLINE ARTICLES THROUGH GRAPH-BASED RANKING" (2008). Dissertations (Open Access). 11.
MEDRank, Medical Text Indexer, Abstracting and Indexing as Topic/methods; Information Storage and Retrieval/methods