Machine Learning Methods for Endocrine Disrupting Potential Identification Based on Single-Cell Data
Publication Date
11-5-2023
Journal
Chemical Engineering Science
DOI
10.1016/j.ces.2023.119086
PMID
37637227
PMCID
PMC10448728
PubMedCentral® Posted Date
11-5-2024
PubMedCentral® Full Text Version
Author MSS
Published Open-Access
yes
Keywords
Machine learning, Endocrine disrupting chemicals, Estrogen receptor activity, Predictive modeling, Classification analysis, High throughput microscopy
Abstract
Humans are continuously exposed to a variety of toxicants and chemicals which is exacerbated during and after environmental catastrophes such as floods, earthquakes, and hurricanes. The hazardous chemical mixtures generated during these events threaten the health and safety of humans and other living organisms. This necessitates the development of rapid decision-making tools to facilitate mitigating the adverse effects of exposure on the key modulators of the endocrine system, such as the estrogen receptor alpha (ERα), for example. The mechanistic stages of the estrogenic transcriptional activity can be measured with high content/high throughput microscopy-based biosensor assays at the single-cell level, which generates millions of object-based minable data points. By combining computational modeling and experimental analysis, we built a highly accurate data-driven classification framework to assess the endocrine disrupting potential of environmental compounds. The effects of these compounds on the ERα pathway are predicted as being receptor agonists or antagonists using the principal component analysis (PCA) projections of high throughput, high content image analysis descriptors. The framework also combines rigorous preprocessing steps and nonlinear machine learning algorithms, such as the Support Vector Machines and Random Forest classifiers, to develop highly accurate mathematical representations of the separation between ERα agonists and antagonists. The results show that Support Vector Machines classify the unseen chemicals correctly with more than 96% accuracy using the proposed framework, where the preprocessing and the PCA steps play a key role in suppressing experimental noise and unraveling hidden patterns in the dataset.
Included in
Biological Phenomena, Cell Phenomena, and Immunity Commons, Life Sciences Commons, Medical Cell Biology Commons, Medical Microbiology Commons, Medical Molecular Biology Commons, Obstetrics and Gynecology Commons, Oncology Commons
Comments
Associated Data