Faculty, Staff and Student Publications
Language
English
Publication Date
11-18-2025
Journal
BMC Bioinformatics
DOI
10.1186/s12859-025-06302-1
PMID
41254570
PMCID
PMC12625376
PubMedCentral® Posted Date
11-18-2025
PubMedCentral® Full Text Version
Post-print
Abstract
BACKGROUND: Innovations in protein engineering offer promising solutions for redesigning allergenic proteins to minimize adverse reactions in sensitive individuals. Earlier models for predicting allergenicity have relied on the knowledge of physicochemical properties and sequence homology to assess the potential risk. However, to better understand the allergenic proteins’ sequence features, we need a novel sequence-based deep learning model for predicting allergenicity.
RESULTS: We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a protein’s sequence without using any other known features. Our study utilized allergenic protein sequence data archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation. We then used AllergenAI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens. We then used the approximately 1600 allergen structures in our SDAP database, in a pilot study to show the potential to incorporate 3D information in a CNN model. The prediction quality was slightly increased.
CONCLUSION: Our allergenicity prediction study through the development of AllergenAI provides a foundation for identifying the critical features that distinguish allergenic proteins.
SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-025-06302-1.
Keywords
Allergenic proteins, Novel vicilin allergen analogs, Deep learning, 3D structure, CNN
Published Open-Access
yes
Recommended Citation
Liu, Jiajia; Negi, Surendra S; Yang, Chengyuan; et al., "AllergenAI: A Deep Learning Model Predicting Allergenicity Based on Protein Sequence" (2025). Faculty, Staff and Student Publications. 752.
https://digitalcommons.library.tmc.edu/uthshis_docs/752