ComPath

Category Cross-Omics>Pathway Analysis/Tools

Abstract ComPath (Comparative Pathway Workbench) is a web-based interactive workbench for metabolic pathway reconstruction, annotation, and analysis where users can perform various sequence, domain, and context analysis, using an intuitive and interactive spreadsheet-style interface and various computational tools.

The sequence, motif, structure, ligand, and pathway information from Kyoto Encyclopedia of Genes and Genomes (KEGG), SCOP, SCOPEC, SUPERFAMILY, and Protein Data Bank (PDB) databases are internally integrated.

From KEGG, 205 biological pathways from 11 categories and 467 prokaryotic and eukaryotic genomes are available for pathway and genome selection.

Users may use the 122 subsystems from the SEED system or define their own subsystems in terms of the Enzyme Classification (EC) number.

Upon the selection of a pathway and multiple genomes, a spreadsheet- style enzyme-genome table is created which can be used in many analyses.

This interactive spreadsheet can be easily edited, updated, downloaded to a local computer, and reloaded later for further analysis.

A union of enzymes (i.e. EC numbers) which belong to a given metabolic pathway are compiled from all genomes in the KEGG database and considered as a generic "backbone" structure of a given biological pathway.

Based on this backbone structure and selected genome(s), ComPath searches candidate enzyme genes against the genome(s) corresponding to the EC number, and fills in the entry in the table with any matches found.

This step is called 'EC-based pathway reconstruction'. Detected genes are considered as candidates for pathway components (enzymes).

One major goal of ComPath is to provide an exploratory computational environment to search for missing genes involved with a given pathway and to fill in so-called pathway holes.

ComPath adopts motif- and structural domain-based search techniques in addition to the standard similarity-based gene search.

The whole protein sequences or parts of sequences are searched against query genomes directly or after generating a Hidden Markov Model (HMM) model and well known sequence analysis tools, such as FASTA, CLUSTALW, and HMMer are used in this step.

Methods used in ComPath are:

1) Whole-HMM search;

2) Common Shared Region (CSR)-HMM search;

3) PDB-domain search; and

4) Simple FASTA search.

The Whole-HMM method builds an HMM model using selected 'whole enzyme sequences' which belong to the same EC group and then searches this HMM model against query genome(s).

In contrast, the 'CSR-HMM method' (developed by the manufacturer) uses a 'common shared region' generated by the 'BAG clustering algorithm' to build an HMM model.

The PDB-domain search method first converts an EC number into a SCOP identifier using the SCOPEC database (a database of protein catalytic domains), and then PDB entries/sequences/HMM models are retrieved from SCOP, PDB, and SUPERFAMILY databases respectively.

The FASTA search using the ‘whole enzyme sequence’ is also used, but Not generally recommended because of its low specificity problem.

After gene finding, users can freely delete or add any gene using the interactive spreadsheet, if such candidates are suspected of being false positives they can be deleted.

The candidate matches need to be further examined --

ComPath provides several computational methods for the further evaluation of 'candidate matches'.

Phylogenetic tree analysis is probably one of the most powerful methods that visualize the relationship among candidates and known enzyme sequences.

ComPath uses the PHYLogeny Inference Package (PHYLIP) package to generate a phylogenetic tree by using the neighbor-joining algorithm.

Multiple sequence alignment is also available to users while generating the phylogenetic tree.

The BAG sequence clustering program is used in both gene finding and the refinement step.

Gibbs Motif Sampler is also available to predict conserved regions and motifs of the sequences.

The motif information can be searched against the Conserved Domain Database (CDD) and PROSITE databases. Chromosomal neighborhood and metabolic network neighborhood search tools are also provided for 'context analysis'.

ComPath can also be used as a genome annotation system --

Once an un-annotated or poorly annotated genome is uploaded onto the pathway analysis spreadsheet, users can use ComPath to perform pathway searches as described earlier and the result can be used as evidence for 'gene function' determination.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site ComPath

Price Contact manufacturer.

G6G Abstract Number 20279

G6G Manufacturer Number 101385