Gaggle

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The Gaggle is a framework for exchanging data between independently developed software tools and databases to enable interactive exploration of 'systems biology' data.

Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it combines existing popular programs and web resources into a user-friendly, rich, and easily extended environment in which to do ‘systems biology’.

Note: The practice of systems biology depends upon many software tools, operating on many kinds of data from many different sources. Each of these tools typically excels at one (or a few) types of analysis with one (or a few) types of data.

A crucial challenge, therefore, is to combine the capabilities of these and other, forthcoming tools to create a data exploration and analysis environment which can do justice to the variety and complexity of systems biology. Gaggle solves this problem.

Gaggle currently supports a number of geese -- the manufacturers name for any 'open source software' which is adapted to run in the gaggle. This adaptation is generally only a small amount of programming work.

Once gaggled the program can broadcast and receive any of a small number of data types which together constitute an adequate basis for exploratory analysis in systems biology. These data types include:

1) Name list (i.e., these genes are interesting).

2) Name list combined with a condition list (i.e., these genes are interesting in these conditions).

3) HashMap: a collection of name/value pairs.

4) Matrix: rows and columns, each named, containing numerical data.

5) Network: a collection of nodes and edges, with arbitrary hashmaps associated with each.

Gaggle Boss -- The Gaggle Boss is an indispensable part of the Gaggle. Most geese will automatically launch the Boss (in a minimized state) if it is Not already running.

Once started, the Boss often retreats into the background, providing the channel over which the geese communicate, but little used by the user.

The following is a (partial) list of existing 'geese' with some of their features/capabilities:

1) The Annotation Goose -- displays short bits of descriptive text indexed by an identifier, such as Open Reading Frame (ORF), gene, or protein name. Features include keyword search and broadcasting lists of identifiers to and from the Gaggle.

2) Cytoscape Goose -- Some features of Cytoscape:

Networks and name lists are most commonly broadcast to and from Cytoscape (see G6G Abstract Number 20092) from other geese in a Gaggle.

3) The DMV: Data Matrix Viewer -- This is an Institute for Systems Biology (ISB) goose, with a few useful features:

4) Firegoose -- Firefox toolbar for the Gaggle - The Firegoose toolbar connects the Gaggle to the web. By downloading and installing this extension into your Firefox browser you can broadcast data between the Gaggle and web resources.

Supported web sites include KEGG pathways, EMBL STRING (a database of functional associations), DAVID (which enables clustering by functional annotations), and Entrez Gene and Protein. With a little scripting, the Firegoose can potentially exchange data with practically any bioinformatics website.

5) Genome Browser -- The genome browser is a way of visualizing data plotted against coordinates on the genome. Tiling arrays and ChIP-chip data are a couple use cases. It's still a work in progress...

6) MeV Goose -- MultiExperiment Viewer (MeV) is a versatile microarray data analysis tool, incorporating sophisticated algorithms for clustering, visualization, classification, statistical analysis and biological theme discovery.

The MeV goose is most commonly used in Gaggle as follows:

7) The R Goose -- The R Goose allows you to use R -- a language and environment for statistical computing and graphics -- for data exploration in the Gaggle. R is especially useful with microarray, massively parallel signature sequencing (MPSS) and proteomics data.

8) Translator -- In biology, there is a large number of naming systems for ORFs, genes, and their products. The Translator attempts to manage some of that complexity by allowing relatively painless conversion between one naming system and another.

Different naming systems are often mutually inconsistent, so mapping between them is destined to be a 'lossy' process. That's lossy as in lossy data compression (A lossy compression method is one where 'compressing data' and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way).

The Translator software supports a loose definition of translation, encompassing scenarios like mapping peptides to genes or mapping across species via Cluster of Orthologous Group (COG) membership.

These go beyond simply exchanging one naming system for another. Maintaining the desired degree of rigor is up to the user's judgement.

System Requirements

The Gaggle requires Java version 1.5 or later in order to run. The Gaggle depends upon Remote Method Invocation (RMI) for communication among the geese. In earlier versions of Java, the Gaggle Boss had to be compiled together with every goose; if we did not compile the Boss specifically with your goose, then your goose would not run.

With version 1.5, this restriction is eliminated. We find this very useful, and feel that it justifies the extra burden placed -- upon Mac OSX users in particular -- who must specially install Java 1.5 (and who may, additionally, need to upgrade their operating system to 10.4). We provide explicit step-by-step Mac OSX instructions to lighten that burden; these may be found, along with instructions for Windows and Linux, by following the links at Gaggle Prerequisites

Manufacturer

The Gaggle was originally conceived and implemented at the Baliga Laboratory at the Institute for Systems Biology.

Development continues in collaboration with the Bonneau Laboratory at New York University.

Manufacturer Web Site Gaggle and video tutorial

Price Contact manufacturer.

G6G Abstract Number 20222

G6G Manufacturer Number 101457