Faculty, Staff and Student Publications

Med-Bert: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction

Language

English

Publication Date

5-20-2021

Journal

NPJ Digital Medicine

DOI

10.1038/s41746-021-00455-y

PMID

34017034

PMCID

PMC8137882

PubMedCentral® Posted Date

May 2021

PubMedCentral® Full Text Version

Post-print

Abstract

Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

Published Open-Access

yes

Recommended Citation

Rasmy, Laila; Xiang, Yang; Xie, Ziqian; et al., "Med-Bert: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction" (2021). Faculty, Staff and Student Publications. 95.
https://digitalcommons.library.tmc.edu/uthshis_docs/95

Download

Included in

Bioinformatics Commons, Biomedical Informatics Commons, Data Science Commons

COinS

Faculty, Staff and Student Publications

Med-Bert: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Published Open-Access

Recommended Citation

Included in

Search

Browse

Author Corner

More Info

Library

Faculty, Staff and Student Publications

Med-Bert: Pretrained Contextualized Embeddings On Large-Scale Structured Electronic Health Records For Disease Prediction

Authors

Language

Publication Date

Journal

DOI

PMID

PMCID

PubMedCentral® Posted Date

PubMedCentral® Full Text Version

Abstract

Published Open-Access

Recommended Citation

Included in

Share

Search

Browse

Author Corner

More Info

Library