ToppCluster

Category Genomics>Gene Expression Analysis/Profiling/Tools and Cross-Omics>Pathway Analysis/ Gene Regulatory Networks/Tools

Abstract ToppCluster is a multiple gene list feature enrichment analyzer for the dissection of biological systems.

This web server application leverages an advanced enrichment analysis and underlying data environment for comparative analyses of multiple gene lists.

ToppCluster generates Heatmaps or connectivity networks that reveal functional features shared or specific to multiple gene lists.

ToppCluster uses hypergeometric tests to obtain list-specific feature enrichment P-values for 17 categories (currently) of annotations of human-ortholog genes, and provides user-selectable cutoffs and multiple testing correction methods to control false discovery.

Each nameable gene list represents a column input to a resulting matrix whose rows are overrepresented features, and individual cells per-list P-values and corresponding genes per feature.

ToppCluster provides users with choices of tabular outputs, hierarchical clustering and Heatmap generation, or the ability to interactively select features from the functional enrichment matrix to be transformed into XGMML or GEXF network format documents for use in Cytoscape - (see G6G Abstract Number 20092) or Gephi applications, respectively.

The manufacturers have demonstrated the ability of ToppCluster to enable identification of list-specific phenotypic and regulatory element features (both cis-elements and 3' UTR microRNA binding sites) among tissue-specific gene lists.

ToppCluster’s functionalities enable the identification of specialized biological functions and ‘regulatory networks’ and systems biology-based dissection of biological states.

ToppCluster System workflow --

The primary object of ToppCluster is the identification of biological themes in data sets involving numerous gene sets.

A typical example is a time-series microarray experiment. The principal strength of ToppCluster lies in the ability to co-analyze multiple gene lists and to depict the results in a form that facilitates comparative and contrastive analysis.

The input to ToppCluster consists of multiple gene lists from various experiments involving, for example, different tissues, time-points, cell-types, microRNA targets etc. P-value cutoff and the correction method chosen [Bonferroni, false discovery rate (FDR) or None] are used as filters.

The user can select one or more annotation types to be included in the output.

The enrichment functionality of the ToppGene suite - (see G6G Abstract Number 20420) is used by ToppCluster to derive over-represented annotations.

ToppGene contains 17 human gene-based annotation types, including Gene Ontology-Biological Process, Molecular Function, Cellular Component, Mouse Phenotype, Human Phenotype, Pathways, Transcription Factor Binding Sites, predicted MicroRNA targets, PubMed co-citations, Protein domains, Protein-Protein Interactions, Cytoband, Gene Coexpression, Expression Correlation (‘Computational’), Drug/Chemical and Disease.

Links to data sources for these annotations in ToppGene can be found in the ‘Links’ section of the ToppCluster website.

Additional details about the types and numbers of annotations can be found in the ‘Database Info’ section on the ToppCluster homepage under the ‘ToppGene’ header.

After finalizing the input parameters, gene-associated feature enrichments are computed in ToppGene based on the hypergeometric distribution test.

The initial output is a result matrix that has columns that relate to each input gene list, and rows that represent the overrepresented features of any of the gene lists.

One column for each named gene list is a significance value equal to the negative log of the P-value, and the other column for each gene list is a comma-delimited list of genes that have that feature.

If a given feature has a significant association for multiple gene list inputs, it is possible that there is an identical significance score, but completely different lists of genes that relate to that feature.

The resulting functional enrichment matrix can be hierarchically clustered for visualization and analysis as a Heatmap or transformed into a Cytoscape-compatible XGMML network format or a Gephi compatible GEXF network format.

If the Heatmap generation option is chosen, the functional enrichment matrix is subjected to two-dimensional hierarchical clustering, where first the rows and then the columns are reordered according to similar scores.

In the tabular format, the genes from the particular gene list contributing to the significance score are provided in an adjoining table. Third-party software can be used to import and visualize the Heatmaps or networks. The networks can also be obtained as static images.

ToppCluster Data input and interface --

ToppCluster accepts input in one of two ways:

1) As separate lists of genes which can be successively added and named; or

2) Using the ‘alternative entry’ method, as a two-column list with genes in the first column and the name of the gene list in the second column.

Accepted input is limited to human genes at present. One or any of the 17 annotation sources can be used for feature enrichment analyses. Each feature analysis can be adjusted based on the P-value cutoff, the multiple testing correction method or the minimum and maximum number of genes present for each annotation type.

For example, limiting enrichments to ontologies that have fewer associated genes can allow for a greater focus on specific classes of gene feature or function. Multiple choices are available for the formatting and delivery of results.

The user can opt for results to be obtained in tabular format as comma-separated values, tab-separated values or HTML table format.

It is also possible to obtain the results in various visualization formats --

A standard Heatmap in a PDF file generated using R; TreeView clustered data tree (CDT) Heatmap files;

GenePattern GCT format - (see G6G Abstract Number 20181);

Cytoscape XGMML importable network formats; Gephi importable GEXF network formats or as pre-laid out network images using the PNG option.

ToppCluster Generation of enrichment data map --

Each labeled gene list is fed to the ToppGene web service. The functional enrichment results for each gene list for the selected categories are then compiled and concatenated into a tabular format.

Here, the manufacturers have used a new approach to represent the significance of the functional term in a gene list.

ToppCluster Visualization --

An interactive HTML output format lets the user select features of interest from the results to be included in the network. Following this, the user is allowed to select the type, layout and file-format for the network.

The network can be displayed in two (2) very different ways:

A ‘Gene Level’ option generates the entire network including the genes, while an ‘Abstracted’ option excludes the genes from the network, retaining only the enriched terms as nodes that are related to the input gene lists via edge relationships that subsume the list of specific genes.

In this option, the network shows the input gene lists connected to the enriched terms by weighted edges; the edge-weight is set to the significance score of the enriched term, and the list of genes are available as an annotation field from Cytoscape’s data panel window in the Edge Attribute Browser.

ToppCluster Implementation --

ToppCluster is a distributed system implemented in Java that runs across a cluster of Linux servers utilizing the Sun Glassfish Enterprise Server environment.

ToppCluster passes data to ToppGene via Java Messaging Services (JMS). JMS automatically distributes all gene-list enrichment jobs to available ToppGene enrichment analysis nodes.

The TreeView clustered data files and the PDF Heatmap are generated using embedded R scripts that run as scheduled jobs on the CCHMC Computational Cluster.

Network images are generated using the JAVA JUNG libraries for analysis and visualization of network data. ToppCluster uses the jQuery AJAX Library for dynamic HTML-based user interfaces.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site ToppCluster

Price Contact manufacturer.

G6G Abstract Number 20651

G6G Manufacturer Number 104049