## CFinder

** Category** Cross-Omics>Pathway Analysis/Gene Regulatory Networks/Tools

** Abstract** CFinder is a fast software system for locating and visualizing overlapping, densely interconnected groups of nodes in undirected graphs, and allowing the user to easily navigate between the original graph and the web of these groups.

It offers a fast and efficient method for clustering data represented by large graphs, such as genetic or social networks and microarray data.

The manufacturer has shown that in gene (protein) association networks, CFinder can be used to predict the function(s) of a single protein and to discover novel modules.

CFinder is also very efficient for locating the cliques of large sparse graphs.

CFinder reads a list of binary interactions, performs a search for dense subgraphs (groups), and it allows for any node to belong to more than one group.

Owing to its algorithm and implementation, CFinder is efficient for networks with millions of nodes and as a byproduct of its search, the full clique overlap matrix of the network, is determined.

CFinder’s results can be used to predict novel modules and novel individual protein functions.

CFinder overview --

The computational core of CFinder was implemented in C++, while the visualization and analysis components were written in Java.

The search algorithm uses the Clique Percolation Method (CPM) - (see below...) to locate the k-clique percolation clusters of the network that the manufacturers interpret as modules.

The user interface of CFinder offers several views of the analyzed network and its module structure.

Alternative views currently available in CFinder are ‘Communities’ (displaying the identified modules), ‘Cliques’, ‘Stats’ (statistics of, e.g. module and overlap sizes) and ‘Graph of communities’.

1) The Clique Percolation Method --

The original Clique Percolation Method (CPM) used by CFinder is designed to locate the k-clique communities of unweighted, undirected networks. (Extended versions of this algorithm, included in CFinder can handle directed and weighted networks as well - see below...)

This community definition is based on the observation that a typical member in a community is linked to many other members, but Not necessarily to all other nodes in the community.

In other words, a community can be interpreted as a union of smaller complete (fully connected) subgraphs that share nodes.

Such complete subgraphs in a network are called k-cliques, where k refers to the number of nodes in the subgraph, and a k-clique-community is defined as the union of all k-cliques that can be reached from each other through a series of adjacent k-cliques.

Two k-cliques are said to be adjacent if they share k-1 nodes.

An illustration of these communities can be given by *k-clique template rolling*. A k-clique template can be thought of as an object that is isomorphic to a complete graph of k nodes.

Such a template can be placed onto any k-clique of the network, and rolled to an adjacent k-clique by relocating one of its nodes and keeping its other k-1 nodes fixed.

Thus, the k-clique-communities of a graph are all those subgraphs that can be fully explored by rolling a k-clique template in them but cannot be left by this template.

The CPM was inspired by the fact that the k-clique communities also correspond to percolation clusters in the k-clique adjacency graph of the system.

The nodes of the k-clique adjacency graph represent the k-cliques of the original network, and there is an edge between two (2) nodes if the corresponding two k-cliques are adjacent.

The advantages of the above community definition are the following:

- a) It is Not too restrictive (unlike cliques that require each node to be connected to all other nodes);
- b) It is based on the density of links;
- c) It is local;
- d) It does Not yield cut-nodes or cut-links (whose removal would disjoin the community); and
- e) It allows overlaps: 1) a node can be a member of several different communities at the same time, and 2) communities can overlap with each other by sharing nodes.

Naturally, the communities also constitute a network with these overlaps as their links.

The number of such links of a community can be called its community degree.

2) Outline of the community finding algorithm --

- a) The k-clique community finding algorithm implemented in CFinder first extracts all such complete subgraphs of the network that are Not included in any larger complete subgraph.
- These maximal complete subgraphs are simply called cliques (the difference between k-cliques and cliques is that k-cliques can be subsets of larger complete subgraphs).
- b) Once the cliques are located, the clique-clique overlap matrix is prepared.
- In this symmetric matrix each row (and column) represents a clique and the matrix elements are equal to the number of common nodes between the corresponding two cliques, while each diagonal entry is equal to the size of that clique.
- c) The k-clique-communities for a given value of k are equivalent to such connected clique components in which the neighboring cliques are linked to each other by at least k-1 common nodes.

These components can be found by erasing every off-diagonal entry smaller than k-1 and every diagonal element smaller than k in the matrix, replacing the remaining elements by one, and then carrying out a component analysis of this matrix.

The resulting separate components will be equivalent to the different k-clique-communities.

3) Directed and weighted networks --

The original CPM method, as described above, can only handle undirected, unweighted networks.

CFinder, however, also contains algorithms for handling directed (CPMd) and weighted (CPMw) networks. These are natural extensions of the CPM method.

In both cases the key idea is that one can add extra criteria for a clique to be used when building the community.

*Note: For details on the criteria and the above algorithms, see the numerous papers on the manufacturer's web-site under Publications.*

CFinder documentation --

The manufacturer also provides an extensive FAQ section and User Manual.

*System Requirements*

Contact manufacturer.

*Manufacturer*

- Department of Biological Physics
- Eötvös University
- Pázmány P. stny. 1A, H-1117 Budapest, Hungary
- And
- Research Group for Statistical and Biological Physics of the
- Hungarian Academy of Sciences
- Pázmány P. stny. 1A, H-1117 Budapest, Hungary

** Manufacturer Web Site**
CFinder

** Price** Contact manufacturer.

** G6G Abstract Number** 20778

** G6G Manufacturer Number** 104355