WebGestalt

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) is an integrated data mining system for the management, information retrieval, organization, visualization and statistical analysis of large sets of genes.

WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of large sets of genes.

It enables biologists to manipulate integrated information and find patterns that are Not detectable otherwise.

WebGestalt is designed for functional genomic, proteomic and large scale genetic studies from which high-throughput data are continuously produced. It currently works for human and mouse studies.

WebGestalt features/capabilities include:

Database - GeneKeyDB -- WebGestalt is based on an ORACLE relational database, GeneKeyDB.

This database offers a strong gene and protein centric viewpoint. Gene and gene product information is primarily taken from NCBI LocusLink, Ensembl, Swiss-Prot, HomoloGene, Unigene, CGAP, UCSC, GO Consortium, KEGG, BioCarta and Affymetrix.

WebGestalt modules -- WebGestalt is composed of four (4) modules: gene set management, information retrieval, organization/visualization and statistics.

1) The gene set management module receives gene sets submitted by the users. Received gene sets can be saved, retrieved and deleted. Boolean operations are also provided by this module to generate the unions, intersections or differences between gene sets.

2) The information retrieval module currently retrieves information for up to 20 attributes through the manufacturer’s local database GeneKeyDB for the received gene sets.

3) The organization/visualization module helps the users to explore efficiently the retrieved information in various biological contexts, using eight (8) sub-modules:

Subsets of genes based on the organization can be generated and saved as new gene sets.

4) The statistics module (see below).

1) Gene set management module – This module accepts gene sets submitted by files, by GO categories or by chromosome location ranges. The input file should be a plain text file, including the appropriate IDs (required) and corresponding microarray ratios or other values (optional), separated by tabs in the format of one ID per row.

Gene identifiers that can be recognized are Entrez Gene IDs, Swiss-Prot IDs, Ensembl IDs, Unigene IDs, gene symbols and Affymetrix probe set IDs.

Sub-sets of genes can be generated from an existing gene set through the organization/visualization module and saved as new gene sets through the management module.

The management module also performs Boolean operations to generate the union, intersection and difference between two (2) existing gene sets.

Recursively applying these Boolean operations makes it possible to combine information from more than two sets of genes.

Orthologs can be retrieved for a gene set using the management module. The orthologs are defined by HomoloGene from NCBI. Inclusion of orthologous information could assist in comparative genomics studies.

2) Information retrieval module - This module provides rapid access to the existing information for all genes in a gene set. The attributes that can be retrieved include nomenclature, identifiers to different databases, map and functional information.

Retrieved information for all genes in a gene set can be downloaded as a tab-delimited file or opened directly in the web browser using Microsoft Excel.

3) Organization/visualization module - This module in WebGestalt is intended to assist biologists in exploring large gene sets by organizing and visualizing the genes in various biological contexts.

4) Statistics module - The statistics module currently provides two statistical tests (the hypergeometric test and the Fisher's exact test) to identify interesting patterns in the gene sets.

The users can select different significance levels for the statistical analysis. The users can also specify the minimum number of genes in a significant category.

The hypergeometric test can be used for the evaluation of the over/under-representation of individual genes in a selected tissue type.

Note: WebGestalt has been implemented in WebQTL (see G6G Abstract Number 20326), which is a unique service that allows biologists to rapidly identify and map genes and Quantitative Trait Loci (QTL).

The WebGestalt modules are used to analyze sets of genes that are highly correlated with various phenotypes in WebQTL.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site WebGestalt

Price Contact manufacturer.

G6G Abstract Number 20327

G6G Manufacturer Number 102881