
Faculty, Staff and Student Publications
Publication Date
5-10-2023
Journal
Genes
Abstract
Biological data at the omics level are highly complex, requiring powerful computational approaches to identifying significant intrinsic characteristics to further search for informative markers involved in the studied phenotype. In this paper, we propose a novel dimension reduction technique, protein-protein interaction-based gene correlation filtration (PPIGCF), which builds on gene ontology (GO) and protein-protein interaction (PPI) structures to analyze microarray gene expression data. PPIGCF first extracts the gene symbols with their expression from the experimental dataset, and then, classifies them based on GO biological process (BP) and cellular component (CC) annotations. Every classification group inherits all the information on its CCs, corresponding to the BPs, to establish a PPI network. Then, the gene correlation filter (regarding gene rank and the proposed correlation coefficient) is computed on every network and eradicates a few weakly correlated genes connected with their corresponding networks. PPIGCF finds the information content (IC) of the other genes related to the PPI network and takes only the genes with the highest IC values. The satisfactory results of PPIGCF are used to prioritize significant genes. We performed a comparison with current methods to demonstrate our technique's efficiency. From the experiment, it can be concluded that PPIGCF needs fewer genes to reach reasonable accuracy (~99%) for cancer classification. This paper reduces the computational complexity and enhances the time complexity of biomarker discovery from datasets.
Keywords
Protein Interaction Maps, dimension reduction, protein–protein interaction, gene ontology, Pearson’s correlation, information content
DOI
10.3390/genes14051063
PMID
37239423
PMCID
PMC10218330
PubMedCentral® Posted Date
5-10-2023
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes