Author ORCID Identifier


Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Edgar T. Walters - Advisory Professor

Committee Member

Olivier Lichtarge - Advisory Professor

Committee Member

Prahlad Ram

Committee Member

Anil Korkut

Committee Member

Marsal Sanches


Identifying genes involved in disease pathology has been a goal of genomic research since the early days of the field. However, as technology improves and the body of research grows, we are faced with more questions than answers. Among these is the pressing matter of our incomplete understanding of the genetic underpinnings of complex diseases. Many hypotheses offer explanations as to why direct and independent analyses of variants, as done in genome-wide association studies (GWAS), may not fully elucidate disease genetics. These range from pointing out flaws in statistical testing to invoking the complex dynamics of epigenetic processes. In the studies outlined here, however, we focus on the hypothesis that interactions between genes may be a potential culprit. To probe this hypothesis, we begin by developing an algorithm, GeneEMBED, to model the total effect of protein coding variants in various genes across a molecular network of genetic interactions. Given a population of disease and healthy individuals, GeneEMBED systematically evaluates the relative contribution of a gene to disease. The associations are quantified by examining the patterns of differential perturbations in the gene's interactions throughout a biological network. As a proof-of-concept, we applied GeneEMBED to two late-onset Alzheimer's disease (AD) cohorts of 5,169 exomes and 969 genomes. We identified 143 candidate disease-associated genes across the two cohorts and three biological networks. These candidate genes were differentially expressed in both bulk and single-cell RNA expression data from post-mortem AD brains. Knockouts of these candidates in mice were known to lead to abnormal neurological phenotypes. Lastly, in vivo drosophila assays of candidates showed they modified neurodegenerative phenotypes. Next, we focus on the discrepancies between the functional impact of mutations across different genes. While tools to predict the degree of functional impact a given coding mutation will have on the encoded protein are widely successful, they often make predictions relative to the given gene. To this effect, we extend principles of statistical mechanics to biology to measure any given gene's relative mutational intolerance. Importantly, these mutational intolerance scores can distinguish essential genes from non-essential genes in E.coli. In humans, they can segregate genes that cause autosomal dominant Mendelian diseases from non-disease genes. Similarly, highly mutationally intolerant genes were enriched in core and conserved biological processes across three different species. Conversely, mutationally tolerant genes were involved in adaptive processes, again across three different species. Most notably, we found that mutational intolerance scores highly correlated with experimentally measured fitness effects of gene knockdowns. Together, these efforts provide new tools with which to investigate disease-gene associations and provide insights into the biological dynamics of gene networks.


Machine Learning, Statistical Mechanics, Thermodynamics, Genomics, Evolution, Alzheimer's Disease

Available for download on Tuesday, August 01, 2023