Generic Model Organism Database (GMOD)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract GMOD is the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases.

You can use it to create a small laboratory database of genome annotations, or a large web-accessible community database. GMOD tools are in use at many large and small community databases.

GMOD is a collection of interconnected applications and databases that biologists use as repositories and as tools. That connectivity is really the key here.

GMOD also describes a community. Many of the pieces of GMOD, or components, are mature software with many human-years of software development behind them.

GMOD is Not just for model organisms -- the current list of GMOD databases demonstrates that GMOD is widely used, with all sorts of organisms represented, and that these databases can hold sequences of any kind.

GMOD is a federation of software applications (components) aimed at providing the functionality that is needed by all organism databases.

Popular GMOD Software Tools --

Genome Browsing and Editing --

Apollo: Genome annotation editor - The Apollo genome editor is a Java- based application for browsing and annotating genomic sequences.

There are currently two branches of Apollo, one primarily used for genome browsing and maintained at Ensembl, and the other primarily used for genome annotation and maintained at the Berkeley Bioinformatics and Ontologies Project. The latter is part of the GMOD project.

GBrowse: Genome annotation viewer - The Generic Genome Browser is a combination of database and interactive Web page for manipulating and displaying annotations on genomes. Some of its features/capabilities are:

1) Simultaneous bird's eye and detailed views of the genome.

2) Scroll, zoom, center.

3) Use a variety of pre-made glyphs (see the BioPerl Glyph documentation for a list) or create your own.

4) Attach arbitrary URLs to any annotation.

5) Order and appearance of tracks are customizable by administrator and end-user.

6) Search by annotation ID, name, or comment.

7) Supports third party annotation using GFF formats.

8) Settings persist across sessions.

9) DNA and General Feature Format (GFF) dumps.

10) Connectivity to different databases, including BioSQL and Chado.

11) Multi-language support.

12) Third-party feature loading.

13) Customizable plug-in architecture (e.g. run BLAST, dump & import many formats, find oligonucleotides, design primers, create restriction maps, edit features).

Comparative Genomics --

CMap: Comparative map viewer - CMap is a web-based application for graphically comparing genomic maps.

It was originally written for the Gramene project under the supervision of Drs. Lincoln Stein and Doreen Ware at Cold Spring Harbor Laboratories for comparing crop grasses (rice, wheat, oat, barley, sorghum, etc.).

CMap was then altered to be able to handle more types of data than just plants and was subsequently incorporated into the Generic Model Organism Database toolkit.

Sybil: Comparative genome viewer - Sybil is a web-based system for comparative genomics visualizations. It provides a primarily web-based front-end to comparative genome datasets warehoused in a Chado relational database.

Database (DB) Tools --

Chado: Biological database schema - Chado is a relational database schema that underlies many GMOD installations.

It is capable of representing many of the general classes of data frequently encountered in modern biology such as sequence, sequence comparisons, phenotypes, genotypes, ontologies, publications, and phylogeny.

It has been designed to handle complex representations of biological knowledge and should be considered one of the most sophisticated relational schemas currently available in molecular biology.

The price of this capability is that the new user must spend some time becoming familiar with its fundamentals.

BioMart: Data mining system - BioMart is a robust, query-oriented data integration system, based on distributed data warehousing ideas.

The system can be applied to a single or multiple databases. It supports scalable large scale querying of individual databases as well as query-chaining between them.

All data sources in the system comply with the BioMart data model - a simple, query optimized database schema. The system consists of database schema specification, administration tools for deploying and configuring mart-spec databases and data access software, which includes web and standalone interfaces.

GMODTools: Chado to Fasta, GFF, etc. - Bulkfiles - is a GMOD Perl software package that generates Fasta, General Feature Format (GFF), DNA, Gene Ontology (GO) and other 'bulk genome' annotation files from Chado databases.

It works with several FlyBase Chado releases, with SGDLite [The SGD Lite database is now part of Yeast Functional Genomics Database (YFGdb)], and has been tested with other Chado databases.

Once tuned to your project's needs with its organism and site configurations, it can generate public data releases on a regular basis. It produces all the contents needed for a GMOD Standard URL genome data download folder.

Biological Pathways --

Pathway Tools: Metabolic, Regulatory - Pathway Tools (see G6G Abstract Number 20236) is a bioinformatics software system for predicting the metabolic pathways of an organism from its genome, and for creating Pathway/Genome Databases (PGDBs).

A PGDB such as EcoCyc (see G6G Abstract Number 20231) is a bioinformatics DB that integrates genomic data with detailed functional annotations of the genome, such as descriptions of metabolic and signaling pathways.

A PGDB is a type of model-organism DB. Pathway Tools supports extensive functionality including prediction, interactive editing, querying, and visualization of metabolic pathways and related data-types including reactions, metabolites, and enzymes.

It also includes query, visualization, and editing support for genes and proteins, and for transcriptional regulatory networks.

PGDBs support analysis of omics datasets by painting omics data onto a diagram of the full metabolic map of the organism, or onto a diagram of the full transcriptional regulatory network (see G6G Abstract Number 20237).

In addition, the software can generate a metabolic map poster for the organism. Pathway Tools has its own genome browser, which has a microbial orientation. In addition, Pathway Tools can be coupled with other genome browsers to add support for pathways to an existing Model Organism Database (MOD).

Infrastructure --

GMODWeb: Website for Chado DBs - GMODWeb is a Web application that uses Chado, a flexible and modular schema for representing biological data.

GMODWeb is based on Turnkey (see below), a generic Web framework built on Apache, mod_perl, and SQLFairy.

GMODWeb takes a basic Turnkey site built with the Chado schema and adds to the default templates to create a custom look and feel for GMOD.

This GMODWeb skin includes code to display information using a variety of GMOD applications, including GBrowse.

New model organism databases can use the GMODWeb skin as a starting point for creating a new organism website.

Turnkey - Turnkey takes a relational schema of a given database as input and transforms it into a fully-functional and customizable website within minutes. This automated process frees developers to work on the content of a website rather than the underlying architectural details.

Modware: Middleware for Chado - Modware is an object-oriented Perl Application Programming Interface (API) for Chado.

It allows object-oriented (OO) querying and loading of a Chado database and returns data structures that a programmer can readily use without knowing the details of how the object is stored in the relational schema.

Many bioinformatics programmers are familiar with the Bio::SeqFeature object system for representing biological features in BioPerl.

Modware utilizes this framework for programmatic access and manipulation of biological features directly from Chado.

System Requirements

Web-based.

Manufacturer

GMOD community and Contributing Organizations

Support

GMOD is supported by a specific cooperative agreement from the USDA Agricultural Research Service, and by NIH grants co-funded from the National Human Genome Research Institute and the National Institute of General Medical Sciences.

Manufacturer Web Site GMOD

Price Free except where noted.

G6G Abstract Number 20310

G6G Manufacturer Number 101133