BioMart Central Portal

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract BioMart Central Portal is a query-oriented data management system. BioMart offers a one-stop shop solution to access a wide array of biological databases.

These include major biomolecular sequence, pathway and annotation databases such as Ensembl, Universal Protein Resource (Uniprot), Reactome (see G6G Abstract Number 20267), HUGO Gene Nomenclature Committee (HGNC), Wormbase and PRoteomics IDEntifications database (PRIDE); for a complete list, visit the manufacturer's web-site.

The purpose of BioMart is to convert one or more data source (flat files or relational) into 'data marts' which can be accessed via its standardized web browser interface and also via its Perl, Java and web- service APIs.

The system comes with built-in support for query-optimization and database federation. BioMart software provides users with the ability to conduct fast, advanced queries using a selection of web, graphical and text based applications.

Programmatic execution of queries is also available via a web-services Application Programming Interface (API), or direct-access software libraries written in Perl and Java. For data providers, the system simplifies the task of integrating their own data with other datasets hosted on the network.

BioMart System Overview --

BioMart is designed around three (3) tier architecture. The first tier consists of one or more relational databases. Each of these databases can hold one or more marts, which are schemas compliant with BioMart definitions. Inside each mart can be a number of individual datasets.

Dataset configuration is stored in additional tables inside each mart and is created using the MartEditor.

Two (2) tools are provided to build and configure the mart databases in the first tier:

1) MartBuilder, to construct SQL statements that will transform your schema into a mart.

2) MartEditor, to configure the finished mart for use with the rest of the system.

The second tier contains two (2) APIs - one written in Perl (distributed in the Biomart-perl package) and the other in Java (distributed in the martj package).

The third tier consists of the query interfaces:

1) MartView - a web browser interface, based on the Perl API.

2) MartService - a web services interface, based on the Perl API.

3) MartURLAccess - a URL based access to MartView, based on Perl API.

4) MartExplorer - a standalone GUI tool, based on the Java API.

5) MartShell - a command-line tool, also based on the Java API.

Dataset configuration is stored in an XML format in a special table inside the database schema that the mart lives in.

A registry XML file on the client-side, managed by the user, dictates which datasets in which marts on which database servers are available for querying.

BioMart version 0.7 supports three (3) major relational database platforms for hosting marts: MySQL, Oracle and Postgres.

BioMart Queries –

The BioMart queries can be fundamentally categorized into two (2) types; Metadata and Data access. A machine readable XML based description of inputs and outputs of these queries are published in the Web Service Definition Language (WSDL) and XML Schema Definition (XSD) files.

Metadata Access –

These requests are used to retrieve info about which databases, datasets, filters, attributes and associated formatters are made available by the BioMart Central Portal. These queries support Not only programmatic access, they also return additional info which may be used to write domain specific specialized clients to access the BioMart Central Portal remotely.

These requests are described as follows:

getRegistry - This request retrieves information contents such as name, location, host, port, etc. about all the databases/marts available at the BioMart Central Portal. The output is equivalent to the list displayed by MartView.

getDatasets - This request retrieves a list of datasets available under each mart, mart name being the input of the request.

getFilters and getAttributes - These two requests retrieve a list of all the filters and attributes available given a dataset. Additional information about hierarchy, limitations and output formatters is also returned.

Most importantly, the W3C suggested property ‘modelReference’ in the output, if configured by the data publisher, provides the Uniform Resource Identifier (URI) of the concept in an ontology that contains the description of the output attribute(s).

This feature offers a framework for 'semantic annotation' of terms in BioMart databases. This feature will improve interoperability of BioMart results with non-BioMart data sources and analysis tools.

Data Access –

In order to access biological content of the marts available through the BioMart web server, a query request is used. A user can specify the attributes of interest along with any possible limitations (filters) from a given dataset(s) and in return gets results.

Users are neither expected to ascertain the database specific access protocol, nor its physical location. From a user's point of view, all datasets appear to be residing at the BioMart Central Portal that takes care of all underlying federation logic.

Query processing –

The BioMart server-side software consists of a QueryPlanner and an Aggregator.

The QueryPlanner consumes data access queries and formulates an execution plan. If the BioMart Central Portal has direct access credentials to the database server, then SQL statements are compiled, otherwise XML-based web service requests are sent to the remote BioMart web server over an HTTP stream and results are retrieved over the same connection.

The Aggregator component enables merging of data coming from different sources on a common concept. This is achieved by extending the abstractions, Attributes and Filters, to Exportables and Importables.

A dataset that exposes an attribute as exportable is able to integrate data from all those sources whereby a filter with similar name is tagged as importable.

The exportables and importables are columns with similar contents in a database table. The aggregation of results is an in-memory operation that does Not prove to be very costly.

Registry –

The BioMart Central Portal does Not store any data locally except meta information of all the datasets. The server maintains a Registry containing references to remote BioMart web servers.

To add a new mart to this registry, the manufacturer only requires the URL of the BioMart server hosting the databases or read access to the database server.

This information is added to the registry file of the web server and following a configuration rerun, the whole bioinformatics community can benefit from the data through the BioMart Central Portal as well as several third party software products.

The web server stays in sync with any of the data updates carried out on various databases. However, updates relating to metadata are made available shortly after the stable release of such updates upon reconfiguration of the web server.

Third party software available via the BioMart Plugin –

Bioclipse; biomaRt-BioConductor; Cytoscape (see G6G Abstract Number 20092);

Galaxy; Taverna (see G6G Abstract Number 20514); WebLab (see G6G Abstract Number 20518); and Ruby API.

System Requirements

Contact manufacturer.

Manufacturer

BioMart is developed jointly by the Ontario Institute for Cancer Research (OICR) and the European Bioinformatics Institute (EBI).

Manufacturer Web Site BioMart Central Portal

Price Contact manufacturer.

G6G Abstract Number 20517

G6G Manufacturer Number 104133