Faculty, Staff and Student Publications

Generalized and Transferable Patient Language Representation For Phenotyping With Limited Data

Yuqi Si, University of Texas Health Science Center at Houston, School of Health Information Sciences, Houston TX, USA
Elmer V Bernstam
Kirk Roberts

Language

English

Publication Date

4-1-2021

Journal

Journal of Biomedical Informatics

DOI

10.1016/j.jbi.2021.103726

PMID

33711541

PMCID

PMC11577729

PubMedCentral® Posted Date

November 2024

PubMedCentral® Full Text Version

Author MSS

Abstract

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.

Keywords

Humans, Language, Natural Language Processing

Published Open-Access

yes

Recommended Citation

Yuqi Si, Elmer V Bernstam, and Kirk Roberts, "Generalized and Transferable Patient Language Representation For Phenotyping With Limited Data" (2021). Faculty, Staff and Student Publications. 103.
https://digitalcommons.library.tmc.edu/uthshis_docs/103

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Data Science Commons

COinS

Faculty, Staff and Student Publications

Generalized and Transferable Patient Language Representation For Phenotyping With Limited Data

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Generalized and Transferable Patient Language Representation For Phenotyping With Limited Data

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Keywords

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library