Journal Articles

Generalized and transferable patient language representation for phenotyping with limited data.

Yuqi Si, University of Texas Health Science Center at Houston, School of Health Information Sciences, Houston TX, USA
Elmer V Bernstam
Kirk Roberts

Publication Date

4-1-2021

Journal

Journal of Biomedical Informatics

Abstract

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.

Keywords

Humans, Language, Natural Language Processing

Download

Included in

Bioinformatics Commons, Medicine and Health Sciences Commons

COinS

Journal Articles

Generalized and transferable patient language representation for phenotyping with limited data.

Publication Date

Journal

Abstract

Keywords

Included in

Search

Browse

Author Corner

More Info

Library

Journal Articles

Generalized and transferable patient language representation for phenotyping with limited data.

Authors

Publication Date

Journal

Abstract

Keywords

Included in

Share

Search

Browse

Author Corner

More Info

Library