PathExpress

Category Genomics>Gene Expression Analysis/Profiling/Tools and Cross-Omics>Pathway Analysis/Tools

Abstract PathExpress is a web-based tool that can be used to interpret gene expression data obtained from microarray experiments by identifying the most relevant 'metabolic pathways' associated with a subset of genes (e.g. differentially expressed genes).

A graphical pathway representation permits the visualization of the expressed genes in a functional context.

Based on the publicly accessible Kyoto Encyclopedia of Genes and Genomes (KEGG) Ligand database, PathExpress can be adapted to any organism (see below...) and it currently supports 28 Affymetrix 3' Gene-expression Analysis Arrays, representing 32 distinct organisms.

Probe sets of each array have been assigned to Enzyme Commission (EC) numbers by homology relationship and linked to corresponding metabolic pathways.

PathExpress can be extended to any organism, as it uses similarity between the probe set sequences of supported 'genome arrays' and the sequences of genes with known Enzyme Commission (EC) numbers in order to link probe sets to the metabolic networks.

To take into account how reactions are linked in a pathway, sub- pathways are defined as a chain of reactions linked to each other by a common compound (substrate or product).

Two (2) statistical approaches can be considered to perform a pathway analysis. The first compares a gene list to a pathway using a chi- squared test, a Fisher exact test or the hypergeometric distribution to calculate the probability of a specific number of genes from one pathway.

The second is based on the analysis of all genes present on the genome array and measures the significance of pathway-level statistics computed from the gene-level statistics, using gene set enrichment analysis (GSEA); random forests methods; Hotelling's T-square statistics or random-set methods.

With the aim of providing some flexibility to the user in defining their genes of interest, PathExpress compares a submitted list of genes to the genes involved in annotated pathways.

The significantly overrepresented sets of reactions (pathways or sub- pathways) in the query list of genes are identified using a hypergeometric distribution test as developed in the BlastSets system - (This tool allows the user to automatically identify verified relationships between correlated expression profiles and biological pathways).

As the comparisons are based on enzyme compositions rather than single probe assignments, problems that arise from a multiplicity of genes coding for the same enzyme are largely overcome and the functional activities become apparent.

In addition, an automatically generated graphical representation of the metabolic pathways allows the visualization of differential gene expression in a functional context.

PathExpress Data representation --

PathExpress is based on a directed graph modeling enzymatic reactions as used in the Petri net representation of biological networks. Two types of nodes are used to represent compounds and reactions. Specific reactions can encompass one or more enzymes.

Directed edges, connecting these nodes, correspond to the consumption or the production of compounds by the reaction.

The manufacturer first built the global metabolic network consisting of 2,276 enzymes and 3,810 compounds involved in 3,663 reactions as specified in the KEGG Ligand database.

In order to avoid annotation errors due to the misinterpretation of partial Enzyme Commission (EC) numbers, the manufacturer only utilized enzymes defined by a full EC term.

This database has the advantage of providing a manually curated representation of enzymatic reactions involved in metabolic pathways where most secondary metabolites (very common and highly connected compounds such as water, oxygen, major coenzymes and prosthetic groups) have been removed, thus avoiding invalid metabolic connections and unspecified pathways.

PathExpress Input --

The input data for PathExpress consists of a list of genes of interest (Affymetrix probe set identifiers and/or GenBank accession numbers) present in the selected genome array.

Other parameters can be specified: the type of comparison (pathway or sub-pathway), the P-value significant threshold and the adjustment method for multiple testing.

PathExpress Output --

The PathExpress output contains the list of pathways or sub-pathways that are significantly associated with the enzymes in a list of submitted sequence identifiers.

Metabolic pathways are ranked by increasing P-values whereas sub- pathways are grouped according to the pathway to which they belong.

In each case, those that are significant (according to the P-value threshold defined by the user) are highlighted.

Each pathway can be displayed as an automatically generated graphical representation and as an enumeration of reactions. On these pictures, reactions are highlighted if the 'according enzyme' was identified in the genome array and in the submitted list of identifiers.

The name of the compounds and the definition of the reactions are displayed as a tool-tip when the mouse is over any of the nodes in the graph.

In addition, compounds are linked on the corresponding KEGG entry.

If the user clicks on a reaction node, a new page containing the description of the enzymes associated with the list of probe sets assigned in the selected genome array is opened.

Basic Local Alignment Search Tool (BLAST) results used for the EC assignments are available for each probe set in its ‘detail’ page.

All results can be downloaded as tab-delimited text files for further statistical analyses.

Pictures representing the pathways can be saved in Portable Network Graphics (PNG) or DOT format (DOT is a plain text graph description language) and visualized locally using the GraphViz software.

To enhance the visualization of the expression of individual probe sets, all resources (EC assignments and pictures with XML descriptions) are available to be imported into MapMan - (MAPMAN is a user-driven tool that displays large data sets onto diagrams of 'metabolic pathways' or other processes).

New recent development in PathExpress --

Recently, the manufacturers have added the enzyme neighborhood (EN) method to PathExpress. The manufacturer defines the EN as a sub-network of linked enzymes with a limited path length.

The EN method enables the user to explore the metabolic network and identify the most relevant sub-networks affected in gene-expression experiments without being restricted to predefined pathways.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site PathExpress

Price Contact manufacturer.

G6G Abstract Number 20493

G6G Manufacturer Number 104114