## GraphCrunch

** Category** Cross-Omics>Pathway Analysis/Tools

** Abstract** GraphCrunch is a software tool that implements some of the latest research on biological network models and properties.

Finding adequate null-models for biological networks is a challenge and thus GraphCrunch addresses this research problem.

It finds well-fitting network models by comparing large real-world networks against random graph models according to various network structural similarity measures.

GraphCrunch has unique capabilities of finding computationally expensive Relative graphlet frequency distance (RGF-distance) and Graphlet degree distribution agreement (GDD-agreement) measures.

In addition, it computes several standard global network measures and thus supports the largest variety of network measures thus far.

Also, it is one of the first software tools that compares real-world networks against a series of network models and that it has built-in parallel computing capabilities allowing for a user specified list of machines on which to perform compute intensive searches for local network properties.

Furthermore, GraphCrunch is easily extendible to include additional network measures and models.

GraphCrunch automates the process of generating random networks drawn from user specified random graph models and evaluating the fit of the network models to a real-world network with respect to global and local network properties.

In a single command, GraphCrunch performs all of the following tasks:

1) computes user specified global and local properties of an input real- world network;

2) creates a user specified number of random networks belonging to user specified random graph models;

3) compares how closely each model network reproduces a range of global and local properties (specified in point 1 above) of the real-world network; and 4) produces the statistics of network property similarities between the data and the model networks.

Network models --

GraphCrunch currently supports five (5) different types of random graph models:

1) Erdös-Rényi random graphs;

2) Random graphs with the same degree distribution as the data;

3) Barabási-Albert type scale-free networks;

4) n-dimensional geometric random graphs for all positive integers n; and

5) Stickiness model networks.

*Note: All generated model networks have the number of nodes and
edges within 1% of those in the real-world networks.*

Network properties currently supported by GraphCrunch --

Global Properties:

- a) Degree distribution;
- b) Clustering coefficient;
- c) Clustering spectrum;
- d) Average diameter;
- e) Spectrum of shortest path lengths.

Local Properties:

- a) Relative graphlet frequency distance (RGF-distance);
- b) Graphlet degree distribution agreement (GDD-agreement).

GraphCrunch interfaces -- There are three (3) ways of running GraphCrunch: via the 'command-line' interface, the 'run-dialog' interface, and the 'on-line web' interface. Upon installation, a user can choose either the command-line or the run-dialog interface.

The command-line interface allows for specifying all of the following in a single command:

the real-world network (input graph) to be processed, the random graph models against which the data is to be compared, the number of networks to be generated per random graph model, the network properties and comparisons between the data and the model networks, and the name of the output data file.

The run-dialog interface is available for the Linux and MacOS versions of GraphCrunch. It provides the same functionality as the command-line interface in a more user-friendly manner.

Input format -- GraphCrunch supports two (2) input graph formats: the LEDA graph format (.gw) and the "edge list" format (.txt). The specifics of the LEDA graph format are given at the GraphCrunch web page.

The edge list format is simply the graph adjacency list, i.e., the list of node pairs (edges of the network) separated by tabs or spaces, with one node pair per line.

The current implementation of GraphCrunch deals with undirected, simple (i.e., No loops or multiple edges), and unweighted graphs. Thus, for either of the above two formats, GraphCrunch automatically removes all self-loops, multiple edges, and edge directions.

Output format and results -- GraphCrunch creates three (3) types of output: the tabular output file, the set of intermediate files, and the visualized output.

1) The 'tabular output' file is a spreadsheet of tab-separated values (. tsv) that contains summarized output statistics.

2) The 'set of intermediate files' includes generated model networks corresponding to the input network, in LEDA graph (.gw) format, and the files containing the network properties (e.g., clustering spectra, graphlet counts, graphlet degree distributions, etc.).

The intermediate files allow for additional analyses of the results without performing any additional compute-intensive processing.

Also, the tabular output file contains only the statistics of network parameter similarities between the data and the model networks, but Not the results from which the statistics were computed - these results are contained in the intermediate files.

3) The 'visualized output' is a set of files (in .ps format) that contain user- friendly graphical interpretations of the results presented in the tabular output file. One graphical file (plot) is created per network property.

A plot illustrates the fit of the network models to one or more real-world networks with respect to the given property.

Thus, in a single plot, it is possible to simultaneously illustrate the fit of network models to many real-world networks with respect to one property.

*System Requirements*

1) GraphCrunch runs under Linux, MacOS, and Windows Cygwin.

2) The manufacturer recommends that Perl 5.6+ as well as dialog 0.3+ or Xdialog are also installed for each of the three (3) operating systems.

3) The system needs to have up to 20MB of disk space available (depending on the operating system) for installing GraphCrunch.

4) Note that processing a large number of model networks may put a demand on the available disk space in the system (storing a single network takes about 600 KB of disk space).

The manufacturer recommends processing of up to 30 networks per network model.

*Manufacturer*

- Department of Computer Science
- University of California
- Irvine, CA 92697-3435
- USA

** Manufacturer Web Site**
GraphCrunch

** Price** Contact manufacturer.

** G6G Abstract Number** 20339

** G6G Manufacturer Number** 104000