G6G Directory of Omics and Intelligent Software - Max-Planck-Institut Informatik EpiGRAPH

EpiGRAPH

Category Genomics>Genetic Data Analysis/Tools

Abstract EpiGRAPH is user-friendly software that can be used for advanced (epi-) genome analysis and prediction.

It was developed to help biomedical researchers making sense of large-scale datasets, which are nowadays routinely generated with technologies such as ChIP-on-chip, tiling microarrays and resequencing.

EpiGRAPH is both simple and advanced. For occasional users, the EpiGRAPH website provides a default analysis workflow that is applicable to most datasets.

To find out more about any dataset of genomic regions, EpiGRAPH performs statistical analyses and applies 'advanced machine learning algorithms', based on a huge database of genome and epigenome information.

For advanced users, EpiGRAPH allows full access to its standardized XML-based analysis and documentation system.

EpiGRAPH addresses two (2) tasks that are common in genome biology: discovering novel associations between a set of genomic regions with a specific biological role (for example, experimentally mapped enhancers, hotspots of epigenetic regulation or sites exhibiting disease-specific alterations) and the bulk of genome annotation data that are available from public databases; and assessing whether it is possible to predictively identify additional genomic regions with a similar role without the need for further wet-lab experiments.

EpiGRAPH is designed to facilitate complex bioinformatic analyses of genome and epigenome datasets. Such datasets frequently consist of sets of genomic regions that share certain properties, for example, being bound by a specific transcription factor or exhibiting characteristic patterns of evolutionary conservation.

Typically, these genomic regions fall into opposing classes, for example, transcription factor bound versus unbound promoter regions or significantly conserved versus non-conserved regulatory elements.

Even when this convenient situation does Not emerge by default, it is straightforward and common practice to establish it artificially, by generating a randomized set of control regions to complement a given set of genomic regions.

EpiGRAPH thus focuses on the analysis of sets of genomic regions that fall into two classes, which we denote as 'positives' (cases) and 'negatives' (controls).

Key features of EpiGRAPH --

EpiGRAPH analyses are performed with established statistical methods and advanced machine learning algorithms.

Two scenarios are treated separately, the class analysis (which compares different classes of genomic regions) and the use of EpiGRAPH as an advanced genome calculator and data retrieval tool.

1) Class analysis - The class analysis is tailored to the analysis of sets of regions that belong to different classes, such as methylated vs. unmethylated promoters, as experimentally determined for a particular cell line.

EpiGRAPH implements statistical tests that identify attributes exhibiting significant differences between the classes, as well as more sophisticated ‘machine learning’ methods that make it possible to assess the global relationship between the classes and entire groups of logically related attributes.

In addition, classification algorithms such as ‘support vector machines’ (SVMs), logistic regression and ‘ensemble learning’ methods can be used to predict the class value for regions that have Not been analyzed experimentally.

2) Genome attribute calculator - EpiGRAPH utilizes a highly customizable genome attribute calculator in order to access a large database of genome and epigenome attributes.

Users who prefer to perform statistical analysis outside EpiGRAPH can use this component directly. In contrast to BioMart (an additional product), EpiGRAPH’s genome calculator does Not only enable data acquisition but also supports complex attributes as well as frequency and overlap calculations.

EpiGRAPH provides four (4) analytical modules --

1) The statistical analysis module identifies attributes that differ significantly between the sets of positives and negatives, based on an attribute database comprising a broad range of genome and epigenome datasets.

2) The diagram generation module draws box plots that visualize the distribution of a selected attribute among the sets of positives versus negatives.

3) The machine learning analysis module evaluates how well prediction algorithms - such as support vector machines - can discriminate between positives and negatives in the input dataset, based on different combinations of (epi) genomic attributes from the database.

4) The prediction analysis module predicts whether a genomic region that is Not contained in the input dataset belongs to the set of positives or negatives, thus exploiting any correlations detected by the machine learning analysis module for the prediction of new data.

Things You Can Do with EpiGRAPH --

Whenever you have a set of genomic regions, EpiGRAPH can help you to find out more about these regions and predict other regions of the same type in the genome.

The following are ideas for using EpiGRAPH:

1) Epigenomics: Is it possible to predict and explain which genomic regions are subject to 'tissue-specific methylation', based on DNA sequence and structure?

This analysis requires large-scale DNA methylation data for multiple tissues.

2) Retro-virology: Which genomic regions are preferential targets of integration for retroviruses and transposable elements? Which role does the local chromatin structure play?

This analysis requires sequence data for a few hundred retroviral integration sites.

3) Developmental Epigenetics: What are the characteristics of 'Polycomb Response Elements' in mammals as compared to Drosophila? Is it possible to predict their location genome-wide?

This analysis requires ChIP-on-chip data for Polycomb repression complex proteins in several mammals.

4) Cancer Genomics: To what degree do factors like gene richness, local recombination rates and chromatin structure influence cancer- specific micro-deletions and other structural variations?

And which role does tumor evolution play (i.e. do we see different determinants of cancer-specific micro-deletions in early-stage vs. late- stage tumors)?

This analysis requires large-scale resequencing data for a number of tumors.

EpiGRAPH has already been applied in a number projects. And several other projects that involve EpiGRAPH analyses are currently in progress and will be added to the manufacturer’s website in due time.

In the future, the manufacturer believes that EpiGRAPH may converge with other web services into a loosely coupled network of (epi-) genome analysis and data mining tools.

Such a network would accept standardized XML-based analysis requests centrally and process them in a decentralized manner, with each web service contributing a specific analysis or access to a particular database.

A descriptive term for this vision could be 'Statistical Genome Browser Network', and the standardized and extensible XML data format specified for EpiGRAPH may provide an appropriate internal language for data exchange within this network.

System Requirements

Contact manufacturer.

Manufacturer

Max-Planck-Institut für Informatik
Computational Biology and Applied Algorithmics
Campus E1 4
66123 Saarbrücken, Germany

Manufacturer Web Site EpiGRAPH

Price Contact manufacturer.

G6G Abstract Number 20409

G6G Manufacturer Number 104039

The G6G Directory of Omics and Intelligent Software

EpiGRAPH