Gene Set Enrichment Analysis (GSEA)

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

GSEA provides -- Tools and Information --

1) A Downloads page that contains - Implementations of GSEA plus additional resources to analyze, annotate and interpret enrichment results.

2) The Molecular Signatures Database (MSigDB) - a collection of gene sets for use with GSEA software. The MSigDB contains more than 3,000 gene sets for use with GSEA.

From the MSigDB web site, you can:

The MSigDB gene sets are divided into five (5) major 'gene set' Collections:

C1) Positional gene sets - Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene.

C2) Curated gene sets - Gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts. The gene set page for each gene set lists its source.

CGP: chemical and genetic perturbations - Gene sets that represent gene expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: an xxx_UP (xxx_DN) gene set representing genes induced (repressed) by the perturbation. The gene set page for each gene set lists the PubMed citation on which it is based.

CP: canonical pathways - Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts.

C3) Motif gene sets - Gene sets that contain genes that share a cis- regulatory motif that is conserved across the human, mouse, rat, and dog genomes. The motifs are catalogued and represent known or likely regulatory elements in promoters and 3'-UTRs.

These gene sets make it possible to link changes in a microarray experiment to a conserved, putative cis-regulatory element.

TFT: transcription factor targets - Gene sets that contain genes that share a 'transcription factor' binding site defined in the TRANSFAC database (see G6G Abstract Number 20121).

Each of these gene sets is annotated by a TRANSFAC record.

MIR: microRNA targets - Gene sets that contain genes that share a 3'- UTR microRNA binding motif.

C4) Computational gene sets - Computational gene sets defined by mining large collections of cancer-oriented microarray data.

CM: cancer modules - Gene sets defined by Segal et al. (Nature Genetics 36, 1090 - 1098, 2004). Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, Gene Ontology (GO), and others.

By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions.

CGN: cancer gene neighborhoods - Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes.

C5) GO gene sets - Gene sets are named by GO term and contain genes annotated by that term.

Note: GSEA identifies gene sets consisting of 'co-regulated genes'; GO gene sets are based on ontologies and 'do Not' generally consist of co- regulated genes.

BP: GO biological process - Gene sets derived from the Biological Process Ontology.

MF: GO molecular function - Gene sets derived from the Molecular Function Ontology.

CC: GO cellular component - Gene sets derived from the Cellular Component Ontology.

3) GSEA Documentation -- Information on the GSEA software and the GSEA algorithm are provided via:

GSEA - What’s New --

A new release of the Molecular Signatures Database (MSigDB) is now available. The release includes new gene sets based on KEGG pathways, GO annotations, and the module map for cancer compiled by Segal et al. (Nature Genetics 36, 1090 - 1098, 2004).

System Requirements

The GSEA program is provided as an standalone R program, which is available on the downloads page. Java source code.

Manufacturer

Manufacturer Web Site GSEA

Price Contact manufacturer.

G6G Abstract Number 20266

G6G Manufacturer Number 101795