Author ORCID Identifier
Date of Graduation
8-2021
Document Type
Thesis (MS)
Program Affiliation
Biomedical Sciences
Degree Name
Masters of Science (MS)
Advisor/Committee Chair
John A. Tainer
Committee Member
Daniel E. Frigo
Committee Member
Jason T. Huse
Committee Member
Margarida Almeida Santos
Committee Member
Nayun Kim
Committee Member
Traver Hart
Abstract
G-quadruplexes are non-B DNA structures formed by four or more runs of repeated guanines that confer unique features to living organism’s genomes. These sequences are enriched in regulatory regions, such as promoters and 5’ UTRs, and have distinct regulatory roles in both health and disease states. Even though previous studies showed the impact of G4 in gene expression, none of them summarized the location-specific effect of G4. Also, there is no broad understanding about the most common G4 repeat in the human genome, named here as G4-22, and how it links to the evolution of mammals and their biology. In this dissertation, we try to assess the expression patterns of genes containing G4 and attempt to find a biological role for G4-22. Using bioinformatics algorithms, we assessed the location of all potential G4 sequences (PQS) in the human genome, filtered them by gene regulatory location and evaluated their expression. We also searched for mutations occurring at PQS regions using well-established mutations databases. Twenty mammalian genomes were screened for PQS and their flanking sequences to find conservation patterns. Structural work and G4-ChIP-seq analyses were used to assess the stability and formation of G4-22. The results showed that PQS are present in a wide set of genes, clustered in different gene ontology (GO) terms depending on PQS location. Overall, PQS at intronic regions are correlated with increased expression and at exonic regions with lower expression. G4-22 sequences are present in the reference human genome (hg38 assembly) mostly from L1PA2 retrotransposons and their G4 structures are efficiently resolved by the DHX36 helicase. G4-22 was found mainly within introns, away from splice sites, and PANTHER analysis indicated that G4-22-containing genes are specifically enriched in GO terms related to the brain and nervous system. These findings reinforce and summarize the biological importance of G4 in gene expression and reveal a potential role for G4-22 in the evolution and function of brain-related genes in both humans and other higher primates. In the future, this knowledge may prove foundational for personalized diagnostics and therapeutics of G4-related disorders as well as ancestry analysis based on G4-22 flanking sequences.
Keywords
g-quadruplex, G4, non-B DNA, epigenetics, bioinformatics, structural biology, expression, transposon, cancer, disease
Included in
Bioinformatics Commons, Biology Commons, Computational Biology Commons, Laboratory and Basic Science Research Commons, Medicine and Health Sciences Commons, Structural Biology Commons