G6G Directory of Omics and Intelligent Software

ToppGene Suite

Category Genomics>Genetic Data Analysis/Tools and Genomics>Gene Expression Analysis/Profiling/Tools

Abstract The ToppGene Suite is a one-stop portal (web-server) for (a) gene list functional enrichment, (b) candidate gene prioritization using either functional annotations or network analysis or both, and (c) identification and prioritization of novel disease candidate genes in the interactome.

Functional annotation based disease candidate gene prioritization uses fuzzy-based similarity measures to compute the similarity between any two genes based on semantic annotations.

The similarity scores from individual features are combined into an overall score using statistical meta-analysis. A p-value of each annotation of a test gene is derived by random sampling from the whole genome.

The protein-protein interaction network (PPIN) based disease candidate gene prioritization uses social and Web networks analysis algorithms (extended versions of the PageRank and HITS algorithms, and the K- Step Markov method).

ToppGene Suite computational software tools enable biomedical researchers to:

1) Perform gene list enrichment analysis (ToppFun);

2) Perform candidate gene prioritization based on functional annotations (ToppGene);

3) Perform candidate gene prioritization based on protein interactions network analysis (ToppNet); and

4) Identify and rank candidate genes in the interactome based on both functional annotations and PPIN analysis (ToppGeNet).

Instructions and ‘help’ for each of these modules can be accessed from the manufacturer's homepage.

The database is updated periodically, and the current status of the data (versions and coverage) can also be accessed from the manufacturer's homepage.

ToppGene Suite modules --

1) ToppFun - Detects functional enrichment of an input gene list based on Transcriptome (gene expression), Proteome (protein domains and interactions), Regulome (TFBS and miRNA), Ontologies (GO, Pathway), Phenotype (human disease and mouse phenotype), Pharmacome (Drug-Gene associations) and Bibliome (literature co-citation).

Hypergeometric distribution with Bonferroni correction is used as the standard method for determining statistical significance.

a) Input - Supported identifiers include NCBI Entrez gene IDs, approved human gene symbols, NCBI Reference Sequence accession numbers; single gene list.
b) Output - HTML output; Tab-delimited downloadable text file; graphical charts.

2) ToppGene - Prioritize or rank candidate genes based on functional similarity to a training gene list.

ToppGene works by generating a representative profile of the training genes using as many as 14 features and identifies over-representative terms from the training genes.

This forms the first step and is done by using ToppFun. The test set genes are compared to this representative profile of the training set or the overrepresented terms from the training genes for all categorical annotations and the average vector for the expression values.

a) Input - Supported identifiers include NCBI Entrez gene IDs, approved human gene symbols, NCBI Reference Sequence accession numbers; two gene lists (1 for training and 1 for test).
b) Output - HTML output.

3) ToppNet - Prioritize or rank candidate genes based on topological features in a protein-protein interaction network.

ToppNet gene prioritization is based on protein-protein interaction network (PPIN) analyses.

Based on the observation that 'biological networks' share many properties with Web and social networks, ToppNet uses extended versions of three (3) algorithms from White and Smyth - a) PageRank with Priors, b) HITS with Priors and c) K-step Markov - to prioritize 'disease candidate genes' by estimating their relative importance in the PPIN to the disease-related genes.

a) Input - Supported identifiers include NCBI Entrez gene IDs, approved human gene symbols, NCBI Reference Sequence accession numbers; two gene lists (1 for training and 1 for test).
b) Output - HTML output; Cytoscape-compatible (see G6G Abstract Number 20092) input file; graphical networks.

4) ToppGeNet - Identify and prioritize the neighboring genes of the ‘seeds’ in a protein-protein interaction network based on functional similarity to the ‘seed’ list (ToppGene) or topological features in a protein-protein interaction network (ToppNet).

ToppGeNet differs from ToppGene and ToppNet in that the test set is derived from the protein interactome.

In other words, for a training set of known disease genes, the test set is generated by mining the 'protein interactome' and compiling the genes either directly or indirectly interacting (based on user input) with the training set.

After any overlapping or common genes between test and training sets are removed, interactome-based test set genes can be prioritized using either a functional annotation-based method (ToppGene) or PPIN- based method (ToppNet).

a) Input - Supported identifiers include NCBI Entrez gene IDs, approved human gene symbols, NCBI Reference Sequence accession numbers; single gene list.
b) Output - HTML output; Cytoscape-compatible input file; graphical networks.

ToppGene Suite is capable of identifying true candidate genes --

However, it needs to be emphasized that the manufacturer's aim is Not to prove that ToppGene Suite-prioritized genes are true disease genes but rather to aid in selection of a subset of most likely 'disease gene' candidates from larger sets of disease-implicated genes identified by high-throughput genome-wide techniques like linkage analysis and microarray analysis.

As the functional annotations of human and mouse genes and the quality of PPIN improves, the manufacturers envisage a proportional increase in the performance of the ToppGene Suite and strongly believe that it will be a valuable adjunct to 'wet lab experiments' in human genetics and disease research.

The manufacturers further hypothesize that integrating the rankings obtained using functional annotations and PPIN-based approaches may improve the prioritization of disease genes.

System Requirements

Web-based.

Manufacturer

Divsion of Biomedical Informatics
Cincinnati Children's Hospital Medical Center
3333 Burnet Ave
Cincinnati, OH 45229
USA
Tel: 513-636-0261
Fax: 513-636-2056

Manufacturer Web Site ToppGene Suite

Price Contact manufacturer.

G6G Abstract Number 20420

G6G Manufacturer Number 104049

The G6G Directory of Omics and Intelligent Software

ToppGene Suite