Faculty, Staff and Student Publications
Language
English
Publication Date
3-1-2026
Journal
Quantitative Biology
DOI
10.1002/qub2.70014
PMID
41676319
PMCID
PMC12806030
PubMedCentral® Posted Date
9-28-2025
PubMedCentral® Full Text Version
Post-print
Abstract
With the rapid advancements in large language model technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.
Keywords
bioinformatics‐specific language models, biological systems, biomedical AI, large language models, life active factors
Published Open-Access
yes
Recommended Citation
Ruan, Wei; Lyu, Yanjun; Zhang, Jing; et al., "Large language models for bioinformatics" (2026). Faculty, Staff and Student Publications. 6747.
https://digitalcommons.library.tmc.edu/uthgsbs_docs/6747
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Genetic Phenomena Commons, Medical Genetics Commons, Oncology Commons