GeneBrowser

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualization and analysis methods to help identify the most relevant and common biological characteristics.

GeneBrowser provides a unique entry point to several visualization and analysis methods, providing fast and easy analysis of a set of genes.

The functionalities provided include the following:

1) A central point with the most relevant biological information for each inserted gene;

2) A list of the most related papers in PubMed and gene expression studies in ArrayExpress.; and

3) An extended approach to functional analysis applied to Gene Ontology (GO), homologies, gene chromosomal localization, and pathways.

GeneBrowser Implementation --

GeneBrowser is a web application that combines data from several biological data sources and visualization methods to explore a list of genes (as stated above...).

As input, this tool takes a list of gene identifiers that are subsequently used to retrieve information from several public data sources, such as UniProt, Entrez gene, Gene Ontology (GO), KEGG and PubMed.

These data are then processed and merged, allowing the user to further explore the results via several visualization perspectives and methods.

Moreover, the system provides direct links to the original repositories, where complementary information is available.

The main requirements of GeneBrowser development were to fill the gap between functional analysis tools and Web portals and to allow a fast response to user requests by means of state-of-the-art Web technologies.

Although GeneBrowser can be used to answer many different biological questions, a particular question set was used to tune its development:

1) What public databases provide relevant information about my dataset and how can I navigate through them?

2) What biological processes are enriched with respect to my input list of genes?

3) What are the most relevant metabolic pathways that contain my genes?

4) What are the genomic regions of these genes?

5) Which are the most relevant homologue classes in my list of genes?

6) What gene expression experiments have been previously conducted with the same genes?

7) What are the most relevant publications associated with my study?

GeneBrowser Integrated access to biological data --

The functionalities provided by GeneBrowser require intensive access to several biological databases.

For each set of genes, GeneBrowser must independently access an array of databases as a means to validate every single entry and obtain additional biological data to provide as much relevant information as possible.

The nature of this procedure determines that its response time is directly proportional to the number of genes evaluated.

Notwithstanding, the platform must have a low response time if it is to be of any practical use. For that purpose, the manufacturers have developed GeNS, a database that works as a name server for biological entities.

GeNS has a generic database schema that supports an unlimited number of biological databases.

Addition of a new database requires the identification of the most suitable method to obtain data and the development of a specific loader responsible for converting the data to a format compatible with the schema.

Currently (as of July 2010), the manufacturers have integrated data for roughly 1,000 species, representing over 7 million gene products with 70 million alternative gene/protein identifiers and 140 million associations to biological entities.

For instance, the species Saccharomyces cerevisiae has 7,421 gene products that can be mapped to 105,000 synonyms and 213,000 associations with biological entities, such as pathways, Gene Ontology (GO) terms or homologues.

Despite the variety of data stored in GeNS, it is more focused on the mapping of biological identifiers than on the actual data (e.g. functional descriptions, structural, and sequence data).

Given the need to complement this lack of relevant information, GeneBrowser performs direct, run-time access to a selected set of data sources.

Some examples include the following:

a) Extended protein details, obtained in XML format from the UniProt REST interface;

b) Bibliographical abstracts, obtained from PubMed; and

c) Other data necessary for construction of the Gene Explorer perspective such as the sequence obtained from GenBank and the protein structure obtained from the Protein Data Bank (PDB).

GeneBrowser Background processing --

As previously mentioned, one of the requirements in developing GeneBrowser was the need to offer a low response time to user requests. While the use of GeNS was a major step towards meeting this goal, some tools possess relatively heavy processing needs that require fine-tuning.

This is the case of computing the GO directed acyclic graph (DAG) and the bibliographical list.

Because two (2) out of the seven (7) functionalities provided are computationally intensive, and as such, cannot be made immediately available after submitting the dataset, their processing is executed in the background and it is made available as soon as it is complete.

After insertion of a new dataset, GeneBrowser launches a background process that pre-computes the p-value for each entry and stores it in the database.

While all the other tools are made available immediately, Ontology and bibliographical options may trigger a message informing the user that the values are still being processed.

For registered users, future access to the dataset will Not require reprocessing because the results are permanently stored.

GeneBrowser Extended approach to functional Gene clustering --

Gene Ontology (GO) is the most relevant biological ontology, containing structured information about biological processes, cellular components and molecular functions.

It is commonly used by establishing a match between genes in the dataset and terms in the ontology. The terms that accumulate the higher number of genes are the ones with more potential interest to the study.

To be valid, this gene accumulation procedure requires the use of statistical measures that consider the number of expected genes in each category and the occurrence of several simultaneous tests.

Despite the major relevance of Gene Ontology (GO), other terminologies can be used to extract communalities from a dataset. Herein, the manufacturers have extended the use of this approach to pathways, protein domains, orthologues and homologues.

The implemented procedure works as follows.

For each term t of a specific terminology, the manufacturers obtain the associated genes from list L1 (representing the genes of interest) and the genes from list L2 (containing all genes under study - by default, all the genes from the genome).

Then, for each term, the manufacturers use the number of associated genes to calculate the p-value.

Although several methods are available to calculate the p-value, GeneBrowser utilizes binomial distribution, mainly due to its good balance between performance and robustness.

Because the p-values for all categories are calculated separately, the final step consists of adjusting the p-values to consider the occurrence of multi-testing. GeneBrowser uses the false discovery rate (FDR) correction proposed by Benjamini and Yekutieli.

GeneBrowser Future developments --

Possible future developments include the addition of regulatory information, miRNAs, and phenotype associations.

Other feature that the manufacturers aim to explore is to create a unified view that merges the different analysis outputs into a single one, providing a rich summary of the main evidences found by the several methods.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site GeneBrowser

Price Contact manufacturer.

G6G Abstract Number 20792

G6G Manufacturer Number 104320