GraphWeb

Category Cross-Omics>Pathway Analysis/Gene Regulatory Networks/Tools

Abstract GraphWeb is a public web server for ‘biological network analysis’ and module discovery.

GraphWeb provides methods to:

1) Integrate heterogeneous and multi-species data for constructing directed and undirected, weighted and unweighted networks;

2) Discover ‘network modules’ using a variety of algorithms and topological filters; and

3) Interpret modules using ‘functional knowledge’ of Gene Ontology (GO) and pathways, as well as regulatory features such as ‘binding motifs’ and microRNA targets.

GraphWeb is designed to analyze individual or multiple ‘merged networks’, search for conserved features across ‘multiple species’, mine large ‘biological networks’ for smaller modules, discover novel candidates and connections for known pathways and compare results of high- throughput datasets.

Networks in GraphWeb -- The primary input of GraphWeb is a combined biological network of a selected species, consisting of genes, proteins or microarray probe-sets as nodes and corresponding associations as edges. The user may upload the input data as a file or type it into the web- form.

Genes, proteins and microarray probe-sets of various databases and platforms are automatically mapped to gene IDs of the Ensembl database using the g:Profiler software - (see G6G Abstract Number 20555).

Unrecognized and ambiguous IDs may be optionally removed, but remain unchanged by default in order to keep the input networks intact. Associations between nodes may be represented as directed or undirected edges, and weights may be assigned to edges to convey ‘quantitative relations’ between corresponding nodes.

A collection of pre-defined datasets is available for immediate analysis, including protein-protein interaction (PPI) from IntAct (IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data) and The Human Protein Reference Database (HPRD) and the S.cerevisiae ‘transcription regulatory network’ by MacIsaac et al. (An improved map of conserved regulatory sites for ‘Saccharomyces cerevisiae’).

GraphWeb Data integration -- GraphWeb allows the user to insert and combine different data sources and align these into a global network. Besides native plain text format, Graphweb supports the import of other network files such as SIF, GML, XGMML and BioPAX through the ‘Cytoscape BiNoM’ plug-in.

Labels can be used to distinguish associations of different sources, and a ‘network score’ may be assigned to each label to denote the predictive power of corresponding associations. For example, ‘Transcription Factor’ (TF)-binding networks from ChIP-chip experiments may be combined and aligned with ‘motif discovery’ results, and scored with predictive values learned from ‘gene expression’ data.

GraphWeb Multi-species networks -- GraphWeb provides means to incorporate data from different organisms in order to improve network construction. When the user selects a target organism in the GraphWeb interface the nodes and corresponding associations of the input are automatically mapped to ‘orthologous genes’ in the target. The orthology mapping information is retrieved from Ensembl via g:Profiler software.

Resulting ortholog networks can be combined with other datasets of the target organism to highlight conserved associations. Similarly to single- species data integration, GraphWeb ignores ‘ambiguous orthologs’ in network alignments to avoid noise and misleading results. Such a solution retains the cleanest possible network but undoubtedly results in a certain loss of information.

GraphWeb Graph filtering -- GraphWeb filters help the user detect network areas with strong associations. Three (3) types of filters may be used for selecting edges: minimum number of supporting datasets (i.e. labels), lower threshold on edge weights and selection of top-ranking edges.

Node filtering excludes unrecognized or ambiguous genes and proteins, while module filtering limits the result to larger modules or those with significant functional enrichments. Filtering techniques are especially useful when incorporating edges from different datasets or species.

GraphWeb Gene module discovery -- GraphWeb provides a number of methods and algorithms for detecting gene modules in directed and undirected networks. Resulting “gene modules” may easily be saved for later use or redirected to input for further analysis. GraphWeb identifies the following types of modules:

1) Connected components - A ‘connected component’ is a group of genes, where every pair of genes, is connected either directly or indirectly via a path. GraphWeb also supports two (2) extensions to the above: a ‘strongly connected component’ relates to ‘directed networks’ and requires connections in both directions, and a ‘biconnected component’ requires at least two (2) non-overlapping paths.

Connected component detection is the first step in studying ‘network structure’.

2) Neighborhood modules - A neighborhood module is based on a user- defined list of genes and proteins {G} and on a distance d. If d = 0, GraphWeb retrieves modules that consist of nodes G with internal associations inside the list. If d = 1, modules consist of the initial list {G} and nodes connected to the latter via paths of maximum length d.

Neighborhood modules allow the user to study their focus list in a ‘network context’, and retrieve related nodes and associations to propose new hypotheses.

3) Hub-based modules - A hub-based module consists of a central hub (a node with many connections) and related genes and proteins within distance d. GraphWeb extracts a list of hub-based modules ranked by the central hub degree (number of connections).

Hubs in protein-protein interaction (PPI) networks have been described in the context of lethality, and proteins linking to the same hub often refer to similar function. Hub-based modules may also reflect systems of Transcription Factors (TFs) and target genes.

4) Cliques - A clique is a fully ‘connected module’ where every pair of nodes is directly connected. Cliques in PPI networks have often been related to protein complexes and common functions. Fully connected modules also reflect clusters of ‘co-expressed genes’.

5) Cluster modules - A cluster module corresponds to a tightly connected group of nodes. GraphWeb provides two (2) ‘network clustering’ algorithms: the Markov Cluster (MCL) algorithm and Betweenness Centrality Clustering (BCC).

These algorithms break networks down into separate modules by removing certain edges, and have been successfully applied in a number of studies, such as protein family detection and essentiality assessment.

MCL constructs modules of edges that are frequently visited during random walks, while BCC removes paths that act as bridges between separate tightly connected modules.

Graph clustering is successful in ‘integrative network analysis’ since it prefers associations with evidence from multiple datasets, and allows the detection of ‘hybrid modules’ that combine the characteristics of different module types.

Module interpretation and evaluation -- Interpretation and evaluation are an integral process of ‘module detection’ in GraphWeb. Once a module has been identified, GraphWeb automatically assesses its biological importance through the known properties of its members using the g: Profiler software.

Functional profiling of the module involves statistically enriched annotations of biological processes (bp), cellular locations (cc) and molecular functions (mf) from the Gene Ontology (GO), and related pathways (pw) from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome - (see G6G Abstract Number 20267).

Besides functional annotations, the analysis takes into account ‘cis- regulatory motif’ enrichments from TRANSFAC - (see G6G Abstract Number 20121), and miRNA target site enrichments from miRBase.

GraphWeb executes on-the-fly ‘functional profiling’ and scoring of ‘detected modules’, displaying the names and P-values of most important discovered features from all the covered functional domains (GO:bp, GO: cc, GO:mf, KEGG:pw, Reactome:pw, TRANSFAC, and miRBase).

Hyperlinks to g:Profiler allow the user to access related terms and pathways, ortholog mapping and expression similarity search for related genes.

In addition, a hyperlink to g:Cocoa (a software module of g:Profiler) at the bottom of the GraphWeb interface sends all discovered modules to comparative ‘functional enrichment analysis’.

System Requirements

Web-based.

Manufacturer

Bioinformatics, Algorithmics, and Data Mining group BIIT
Institute of Computer Science
University of Tartu
Liivi 2-314 Tartu 50409
Estonia
And
EMBL Outstation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
UK

Manufacturer Web Site GraphWeb

Price Contact manufacturer.

G6G Abstract Number 20556

G6G Manufacturer Number 104027

The G6G Directory of Omics and Intelligent Software

GraphWeb