A software pipeline for the analysis of genomic data, and functional genomic approaches to explore the molecular mechanisms of amyotrophic lateral sclerosis

  • Christina Vasilopoulou

Student thesis: Doctoral Thesis

Abstract

Genomics is a rapidly developing research area which has contributed immensely to our understanding of the aetiology of various diseases. Genomic quality control (QC) is an essential step to ensure reliable downstream results that reflect true biology. Current genomic QC approaches present a number of challenges including numerous complicated checks, requiring independent installation and expert familiarity with multiple bioinformatics tools that may present incompatibilities and numerical instability across computing environments. Such approaches can be time-consuming, lack flexibility and scalability, and lead to poor reproducibility.

Amyotrophic Lateral Sclerosis (ALS) is a fatal, rare, late-onset neurodegenerative disease, characterized by the loss of both the upper and lower motor neurons. Despite the increased efforts of the ALS research community, an understanding of the aetiology, genetic architecture and the underlying biological mechanisms of ALS remains unknown.

The two primary aims of this thesis concern: (a) the development of a flexible, reproducible software that provides comprehensive pipelines of genomic quality control, imputation, and association analysis and (b) the exploration of the molecular mechanisms that are affected in ALS using individual-level genotype data.

To address the challenges in current genomic analysis software, we present snpQT: a scalable, user-friendly stand-alone software pipeline using Nextflow, container engines and environment managers, for comprehensive, reproducible, time-efficient genomic analyses including interactive workflows of human genome build conversion, quality control, population stratification, imputation and association analysis of genomic data with binary and quantitative phenotypes. The snpQT software pipeline is designed to run with minimal user input and coding experience, with numerous user-modifiable thresholds, and workflows that can be flexibly combined in custom combinations.

To explore the underlying molecular mechanisms in ALS we consider two main methods including machine learning gene prioritisation and functional genome-wide gene-set analysis approaches, reviewing current ALS studies that employ such approaches, and investigating their reproducibility, strengths and limitations in terms of their results and methodology. We perform our own analyses using two large dbGaP studies of ALS-control individual-level genotype data, applying comprehensive genomic approaches using mainly our snpQT software, the Sanger Imputation Server, and meta-analysis, assembling a large European descent ALS-control cohort (N = 22,039 samples) of 19,242 genes from 9,244 ALS cases and 12,795 controls. Furthermore, we employ Multi-Marker Analysis of GenoMic Annotation (MAGMA) using an extensive number of gene-sets from the Molecular Signatures Database, as well as Enrichment Maps, to identify statistically significant associated molecular mechanisms to ALS. Our results show various associations to gene-sets related to immune response, development, apoptosis, the nervous system and muscle processes, as well as lipid metabolism and homeostasis. We also report novel interactions between gene sets, representing putative important joint involvement in ALS. We highlight ALS-associated patterns of immune dysregulation, neuroinflammation, innate immune responses and immune cell infiltration processes. In addition, we identify oxidized phospholipid pathways, and prion-related mechanisms that show a high association in our results, and that may play a role in the pathology of ALS. Lastly, we report highly associated developmental pathways that relate to neuroprotection, decreased neuroinflammation, and cytoprotective pathways, among which are mechanisms suggested to be protective against oxidative and excitotoxic stress in ALS pathology.
Date of AwardOct 2022
Original languageEnglish
SponsorsDepartment for the Economy
SupervisorPriyank Shukla (Supervisor), William Duddy (Supervisor) & Stephanie Marie Duguez (Supervisor)

Keywords

  • ALS
  • GWAS
  • Gene-set analysis
  • snpQT
  • Nextflow
  • Containers
  • Machine learning
  • ALS pathology
  • Functional genomics
  • GSA

Cite this

'