Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field

William Duddy, Stephanie Duguez, Apostolos Malatras

Research output: Contribution to journalArticle

Abstract

Background: The approach of building large collections of gene sets and then systematically testing hypotheses across these collections is a powerful tool in functional genomics, both in the pathway analysis of omics data and to uncover the polygenic effects associated with complex diseases in genome-wide association study. The Molecular Signatures Database includes collections of oncogenic and immunologic signatures enabling researchers to compare transcriptional datasets across hundreds of previous studies and leading to important insights in these fields, but such a resource does not currently exist for neuromuscular research. In previous work, we have shown the utility of gene set approaches to understand muscle cell physiology and pathology. Methods: Following a systematic survey of public muscle data, we passed gene expression profiles from 4305 samples through a robust pre-processing and standardized data analysis pipeline. Two hundred eighty-two samples were discarded based on a battery of rigorous global quality controls. From among the remaining studies, 578 comparisons of interest were identified by a combination of text mining and manual curation of the study meta-data. For each comparison, significantly dysregulated genes (FDR adjusted p < 0.05) were identified. Results: Lists of dysregulated genes were divided between upregulated and downregulated to give 1156 Muscle Gene Sets (MGS). This resource is available for download (www.sys-myo.com/muscle-gene-sets) and is accessible through three commonly used functional genomics platforms (GSEA, EnrichR, and WebGestalt). Basic guidance and recommendations are provided for the use of MGS through these platforms. In addition, consensus muscle gene sets were created to capture the overlap between the results of similar studies, and analysis of these highlighted the potential for novel disease-relevant findings. Conclusions: The MGS resource can be used to investigate the behaviour of any list of genes across previous comparisons of muscle conditions, to compare previous studies to one another, and to explore the functional relationship of muscle dysregulation to the Gene Ontology. Its major intended use is in enrichment testing for functional genomics analysis.

LanguageEnglish
Article number9:10
Pages1-12
Number of pages12
JournalSkeletal muscle
Volume9
Issue number10
DOIs
Publication statusPublished - 3 May 2019

Fingerprint

Genomics
Muscles
Genes
Chemical Databases
Cell Physiological Phenomena
Gene Ontology
Data Mining
Genome-Wide Association Study
Transcriptome
Quality Control
Muscle Cells
Down-Regulation
Research Personnel
Pathology

Keywords

  • Gene sets
  • Skeletal muscle
  • Neuromuscular
  • Functional genomics
  • Pathway analysis
  • Functional enrichment
  • GWAS
  • Gene expression
  • Transcriptomics

Cite this

@article{41d476680b3c4262989451dbbaebbda2,
title = "Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field",
abstract = "Background: The approach of building large collections of gene sets and then systematically testing hypotheses across these collections is a powerful tool in functional genomics, both in the pathway analysis of omics data and to uncover the polygenic effects associated with complex diseases in genome-wide association study. The Molecular Signatures Database includes collections of oncogenic and immunologic signatures enabling researchers to compare transcriptional datasets across hundreds of previous studies and leading to important insights in these fields, but such a resource does not currently exist for neuromuscular research. In previous work, we have shown the utility of gene set approaches to understand muscle cell physiology and pathology. Methods: Following a systematic survey of public muscle data, we passed gene expression profiles from 4305 samples through a robust pre-processing and standardized data analysis pipeline. Two hundred eighty-two samples were discarded based on a battery of rigorous global quality controls. From among the remaining studies, 578 comparisons of interest were identified by a combination of text mining and manual curation of the study meta-data. For each comparison, significantly dysregulated genes (FDR adjusted p < 0.05) were identified. Results: Lists of dysregulated genes were divided between upregulated and downregulated to give 1156 Muscle Gene Sets (MGS). This resource is available for download (www.sys-myo.com/muscle-gene-sets) and is accessible through three commonly used functional genomics platforms (GSEA, EnrichR, and WebGestalt). Basic guidance and recommendations are provided for the use of MGS through these platforms. In addition, consensus muscle gene sets were created to capture the overlap between the results of similar studies, and analysis of these highlighted the potential for novel disease-relevant findings. Conclusions: The MGS resource can be used to investigate the behaviour of any list of genes across previous comparisons of muscle conditions, to compare previous studies to one another, and to explore the functional relationship of muscle dysregulation to the Gene Ontology. Its major intended use is in enrichment testing for functional genomics analysis.",
keywords = "Gene sets, Skeletal muscle, Neuromuscular, Functional genomics, Pathway analysis, Functional enrichment, GWAS, Gene expression, Transcriptomics",
author = "William Duddy and Stephanie Duguez and Apostolos Malatras",
year = "2019",
month = "5",
day = "3",
doi = "10.1186/s13395-019-0196-z",
language = "English",
volume = "9",
pages = "1--12",
journal = "Skeletal muscle",
issn = "2044-5040",
publisher = "BioMed Central",
number = "10",

}

Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field. / Duddy, William; Duguez, Stephanie; Malatras, Apostolos.

In: Skeletal muscle, Vol. 9, No. 10, 9:10, 03.05.2019, p. 1-12.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field

AU - Duddy, William

AU - Duguez, Stephanie

AU - Malatras, Apostolos

PY - 2019/5/3

Y1 - 2019/5/3

N2 - Background: The approach of building large collections of gene sets and then systematically testing hypotheses across these collections is a powerful tool in functional genomics, both in the pathway analysis of omics data and to uncover the polygenic effects associated with complex diseases in genome-wide association study. The Molecular Signatures Database includes collections of oncogenic and immunologic signatures enabling researchers to compare transcriptional datasets across hundreds of previous studies and leading to important insights in these fields, but such a resource does not currently exist for neuromuscular research. In previous work, we have shown the utility of gene set approaches to understand muscle cell physiology and pathology. Methods: Following a systematic survey of public muscle data, we passed gene expression profiles from 4305 samples through a robust pre-processing and standardized data analysis pipeline. Two hundred eighty-two samples were discarded based on a battery of rigorous global quality controls. From among the remaining studies, 578 comparisons of interest were identified by a combination of text mining and manual curation of the study meta-data. For each comparison, significantly dysregulated genes (FDR adjusted p < 0.05) were identified. Results: Lists of dysregulated genes were divided between upregulated and downregulated to give 1156 Muscle Gene Sets (MGS). This resource is available for download (www.sys-myo.com/muscle-gene-sets) and is accessible through three commonly used functional genomics platforms (GSEA, EnrichR, and WebGestalt). Basic guidance and recommendations are provided for the use of MGS through these platforms. In addition, consensus muscle gene sets were created to capture the overlap between the results of similar studies, and analysis of these highlighted the potential for novel disease-relevant findings. Conclusions: The MGS resource can be used to investigate the behaviour of any list of genes across previous comparisons of muscle conditions, to compare previous studies to one another, and to explore the functional relationship of muscle dysregulation to the Gene Ontology. Its major intended use is in enrichment testing for functional genomics analysis.

AB - Background: The approach of building large collections of gene sets and then systematically testing hypotheses across these collections is a powerful tool in functional genomics, both in the pathway analysis of omics data and to uncover the polygenic effects associated with complex diseases in genome-wide association study. The Molecular Signatures Database includes collections of oncogenic and immunologic signatures enabling researchers to compare transcriptional datasets across hundreds of previous studies and leading to important insights in these fields, but such a resource does not currently exist for neuromuscular research. In previous work, we have shown the utility of gene set approaches to understand muscle cell physiology and pathology. Methods: Following a systematic survey of public muscle data, we passed gene expression profiles from 4305 samples through a robust pre-processing and standardized data analysis pipeline. Two hundred eighty-two samples were discarded based on a battery of rigorous global quality controls. From among the remaining studies, 578 comparisons of interest were identified by a combination of text mining and manual curation of the study meta-data. For each comparison, significantly dysregulated genes (FDR adjusted p < 0.05) were identified. Results: Lists of dysregulated genes were divided between upregulated and downregulated to give 1156 Muscle Gene Sets (MGS). This resource is available for download (www.sys-myo.com/muscle-gene-sets) and is accessible through three commonly used functional genomics platforms (GSEA, EnrichR, and WebGestalt). Basic guidance and recommendations are provided for the use of MGS through these platforms. In addition, consensus muscle gene sets were created to capture the overlap between the results of similar studies, and analysis of these highlighted the potential for novel disease-relevant findings. Conclusions: The MGS resource can be used to investigate the behaviour of any list of genes across previous comparisons of muscle conditions, to compare previous studies to one another, and to explore the functional relationship of muscle dysregulation to the Gene Ontology. Its major intended use is in enrichment testing for functional genomics analysis.

KW - Gene sets

KW - Skeletal muscle

KW - Neuromuscular

KW - Functional genomics

KW - Pathway analysis

KW - Functional enrichment

KW - GWAS

KW - Gene expression

KW - Transcriptomics

UR - http://www.scopus.com/inward/record.url?scp=85065231286&partnerID=8YFLogxK

U2 - 10.1186/s13395-019-0196-z

DO - 10.1186/s13395-019-0196-z

M3 - Article

VL - 9

SP - 1

EP - 12

JO - Skeletal muscle

T2 - Skeletal muscle

JF - Skeletal muscle

SN - 2044-5040

IS - 10

M1 - 9:10

ER -