Student and Faculty Publications

Publication Date

1-19-2023

Journal

Briefings in Bioinformatics

Abstract

The coronavirus disease of 2019 pandemic has catalyzed the rapid development of mRNA vaccines, whereas, how to optimize the mRNA sequence of exogenous gene such as severe acute respiratory syndrome coronavirus 2 spike to fit human cells remains a critical challenge. A new algorithm, iDRO (integrated deep-learning-based mRNA optimization), is developed to optimize multiple components of mRNA sequences based on given amino acid sequences of target protein. Considering the biological constraints, we divided iDRO into two steps: open reading frame (ORF) optimization and 5' untranslated region (UTR) and 3'UTR generation. In ORF optimization, BiLSTM-CRF (bidirectional long-short-term memory with conditional random field) is employed to determine the codon for each amino acid. In UTR generation, RNA-Bart (bidirectional auto-regressive transformer) is proposed to output the corresponding UTR. The results show that the optimized sequences of exogenous genes acquired the pattern of human endogenous gene sequence. In experimental validation, the mRNA sequence optimized by our method, compared with conventional method, shows higher protein expression. To the best of our knowledge, this is the first study by introducing deep-learning methods to integrated mRNA sequence optimization, and these results may contribute to the development of mRNA therapeutics.

Keywords

mRNA vaccine optimization, sequence deep learning, transformer-based model

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.