Faculty, Staff and Student Publications

Language

English

Publication Date

11-18-2025

Journal

BMC Bioinformatics

DOI

10.1186/s12859-025-06302-1

PMID

41254570

PMCID

PMC12625376

PubMedCentral® Posted Date

11-18-2025

PubMedCentral® Full Text Version

Post-print

Abstract

BACKGROUND: Innovations in protein engineering offer promising solutions for redesigning allergenic proteins to minimize adverse reactions in sensitive individuals. Earlier models for predicting allergenicity have relied on the knowledge of physicochemical properties and sequence homology to assess the potential risk. However, to better understand the allergenic proteins’ sequence features, we need a novel sequence-based deep learning model for predicting allergenicity.

RESULTS: We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a protein’s sequence without using any other known features. Our study utilized allergenic protein sequence data archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation. We then used AllergenAI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens. We then used the approximately 1600 allergen structures in our SDAP database, in a pilot study to show the potential to incorporate 3D information in a CNN model. The prediction quality was slightly increased.

CONCLUSION: Our allergenicity prediction study through the development of AllergenAI provides a foundation for identifying the critical features that distinguish allergenic proteins.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-025-06302-1.

Keywords

Allergenic proteins, Novel vicilin allergen analogs, Deep learning, 3D structure, CNN

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.