Author ORCID Identifier

0000-0003-1022-2056

Date of Graduation

8-2021

Document Type

Thesis (MS)

Program Affiliation

Biomedical Sciences

Degree Name

Masters of Science (MS)

Advisor/Committee Chair

John A. Tainer

Committee Member

Daniel E. Frigo

Committee Member

Jason T. Huse

Committee Member

Margarida Almeida Santos

Committee Member

Nayun Kim

Committee Member

Traver Hart

Abstract

G-quadruplexes are non-B DNA structures formed by four or more runs of repeated guanines that confer unique features to living organism’s genomes. These sequences are enriched in regulatory regions, such as promoters and 5’ UTRs, and have distinct regulatory roles in both health and disease states. Even though previous studies showed the impact of G4 in gene expression, none of them summarized the location-specific effect of G4. Also, there is no broad understanding about the most common G4 repeat in the human genome, named here as G4-22, and how it links to the evolution of mammals and their biology. In this dissertation, we try to assess the expression patterns of genes containing G4 and attempt to find a biological role for G4-22. Using bioinformatics algorithms, we assessed the location of all potential G4 sequences (PQS) in the human genome, filtered them by gene regulatory location and evaluated their expression. We also searched for mutations occurring at PQS regions using well-established mutations databases. Twenty mammalian genomes were screened for PQS and their flanking sequences to find conservation patterns. Structural work and G4-ChIP-seq analyses were used to assess the stability and formation of G4-22. The results showed that PQS are present in a wide set of genes, clustered in different gene ontology (GO) terms depending on PQS location. Overall, PQS at intronic regions are correlated with increased expression and at exonic regions with lower expression. G4-22 sequences are present in the reference human genome (hg38 assembly) mostly from L1PA2 retrotransposons and their G4 structures are efficiently resolved by the DHX36 helicase. G4-22 was found mainly within introns, away from splice sites, and PANTHER analysis indicated that G4-22-containing genes are specifically enriched in GO terms related to the brain and nervous system. These findings reinforce and summarize the biological importance of G4 in gene expression and reveal a potential role for G4-22 in the evolution and function of brain-related genes in both humans and other higher primates. In the future, this knowledge may prove foundational for personalized diagnostics and therapeutics of G4-related disorders as well as ancestry analysis based on G4-22 flanking sequences.

Keywords

g-quadruplex, G4, non-B DNA, epigenetics, bioinformatics, structural biology, expression, transposon, cancer, disease

Share

COinS