G6G Directory of Omics and Intelligent Software

Ontologizer

Category Genomics>Genetic Data Analysis/Tools

Abstract The Ontologizer is a tool for the statistical analysis and visualization of high-throughout biological data using Gene Ontology (GO).

It provides a versatile WebStart (Java application) or desktop application for the GO term enrichment analysis whose user interface utilizes Eclipse's Standard Widget Toolkit.

The Ontologizer supports the standard approach to statistical analysis (based on the one-sided Fisher's exact test) to GO term enrichment analysis and the manufacturer's new 'parent-child' method (see below...) as described in Grossmann et al. (2007) as well as 'topology based' methods as described in Alexa et al., Bioinformatics. 2006 Jul 1; 22(13):1600-7. Epub 2006 Apr 10, and Falcon et al., Bioinformatics. 2007 Jan 15;23(2):257-8. Epub 2006 Nov 10.

Following the analysis with one of the above methods and optionally with a 'multiple-testing' correction procedure (see below...), the Ontologizer displays rows of terms together with their p-values (or scores), annotation counts and other information.

The Ontologizer produces listings of GO annotations for user supplied lists of genes or gene products. One situation in which this can be useful is for "clustering" analysis of microarray data, but there are many other potential uses.

The Ontologizer assumes that each group of genes resides in its own file, and presently accepts FASTA files as well as files in which each gene is on its own line.

Enrichment of a term is indicated by 'color coding' according to the sub ontology to which the term belongs, whereby the intensity of the color correlates with the significance of the enrichment.

Users can click on any term in the table to display properties and results related to the term such as its parents and children, its description, and a list of all genes annotated to the term in the study set.

This information is presented as a hypertext in the lower panel with links to parent and child terms. The Ontologizer also provides a tightly integrated graphical display of the results using GraphViz.

Parent-Child Analysis --

The Parent-Child method represents a new algorithm for identifying overrepresented Gene Ontology (GO) annotations in gene sets. While the current methods treats each term independently and hence ignores the structure of the GO hierarchy, the manufacturer's approach takes parent-child relationships into account.

Over-representation of a term is measured with respect to the presence of its parental terms in the set. This resolves the problem that the standard approach tends to falsely detect an over-representation of more specific terms below terms known to be over-represented.

This approach comes at No additional computational complexity when compared to the standard approach.

The parent-child method is described in detail in Grossmann et al., Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 2007 Nov 15;23(22):3024-31. Epub 2007 Sep 11.(It is available as an Open-Access Bioinformatics Article).

Multiple Testing --

Since in general, one will test up to thousands of GO terms for overrepresentation, one needs to apply some correction for multiple testing.

At present, the Ontologizer uses a classic Bonferroni correction, meaning it multiplies the nominal p-values calculated as described above by the number of tests performed.

This is a very conservative form of correction for multiple testing. One can limit the number of tests performed by deciding Not to test GO terms that do Not annotate any genes in the population (since the 'study group' is drawn from the population, if No genes are annotated than obviously overrepresentation of the term is impossible).

Additionally, if a term annotates only one gene in the population than it is apparent that testing for overrepresentation in the study group has little meaning.

Note that the number of genes in the study group annotated to the term does (and should) Not need to be taken into account here.

It is possible to perform analysis on any number of groups (clusters) of genes simultaneously.

The Ontologizer does Not perform multiple testing correction based on the number of clusters analyzed. Depending on the question posed by the user, it may or may Not to be appropriate to do so.

Finally, note that Gene Ontology annotations are made to the most specific term possible. All ancestors of the term are considered to be implicitly annotated.

Therefore, if one is calculating the total annotations of a term, one needs to count annotations to all (more specific) descendents of the term also.

Note that one needs to avoid introducing "extra" (duplicate) counts if there are multiple paths from a descendent term to an ancestor term, or if two distinct descendents of a term are annotated for a certain gene.

A further discussion of these issues is available in Robinson et al. (2004) Gene-Ontology analysis reveals association of tissue-specific 5' CpG-island genes with development and embryogenesis. Hum Mol Genet 13:1969-78.

System Requirements

Contact manufacturer.

Manufacturer

The CBB group
Institute for Medical Genetics
Charité Universitätsmedizin Berlin
Augustenburger Platz 1, 13353
Berlin, Germany

Manufacturer Web Site Ontologizer

Price Contact manufacturer.

G6G Abstract Number 20291

G6G Manufacturer Number 100505

The G6G Directory of Omics and Intelligent Software

Ontologizer