MetaboAnalyst

Category Metabolomics/Metabonomics>Metabolic Profiling/Analysis Systems/Tools

Abstract MetaboAnalyst is a web server/tool for metabolomic data analysis and interpretation. This web-based metabolomic data processing tool is Not unlike many of today’s web-based microarray analysis packages.

The purpose of MetaboAnalyst is to provide a user-friendly and easily accessible tool for analyzing data arising from high-throughput metabolomics data.

It is designed to address two (2) common types of problems:

1) To identify features that are significantly different between two conditions (biomarker discovery); and

2) To use the metabolomic data to predict the conditions under study (classification).

It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping.

In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods.

It also employs a large library of reference spectra to facilitate compound identification, from most kinds of input spectra.

MetaboAnalyst guides users through a step-by-step analysis pipeline (workflow) using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs.

MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses.

MetaboAnalyst on-line analysis pipeline --

MetaboAnalyst is an on-line analysis pipeline similar in concept to several existing on-line microarray analysis tools such as GEPAS and CARMAweb.

It is primarily designed to allow users to conduct two-group discriminant analysis (i.e. control vs. non-control -- the most common type of metabolomic analysis) for classification and ‘significant feature’ identification. MetaboAnalyst also supports both paired and unpaired data analyses.

A typical MetaboAnalyst run consists of six (6) steps:

Step 1) data upload;

Step 2) processing;

Step 3) normalization;

Step 4) statistical analysis;

Step 5) annotation; and

Step 6) summary report download.

Users are guided through these steps by MetaboAnalyst’s intuitive interface and the navigation bar on the left panel of each page.

Detailed descriptions help files or helpful hints are either shown on the corresponding web pages or are provided as mouse-over pop-up balloons. This support is further enhanced by the availability of several step-by-step tutorials, sample data sets (NMR, GC/LC–MS, binned data, etc.), sample summary files and frequently asked questions (FAQs) are available on MetaboAnalyst's web-site.

Step 1: data upload -- Users can begin a MetaboAnalyst analysis by pressing the ‘Click Here to Start’ link on the MetaboAnalyst’s Home page. This takes users to the data upload page.

Because there is No widely-accepted standard format for reporting metabolomics experiments MetaboAnalyst has been designed to accept diverse data types including compound concentration tables (from quantitative metabolomic studies), binned spectral data, NMR or MS peak lists, as well as raw GC-MS and raw LC-MS spectra.

Detailed instructions on how to specify paired information (for paired data analysis) as well as examples for each data type are available through MetaboAnalyst’s ‘Data Formats’ link on the manufacturers home page.

Step 2: data processing and data integrity checking -- Depending on the type of uploaded data, different processing strategies can be employed to convert the raw numbers into a data matrix suitable for downstream analysis. For compound concentration lists, the data can be used immediately after MetaboAnalyst’s data integrity check. For binned spectral data, a linear filter is first applied in order to remove baseline noise.

Often there are large numbers of missing values in a typical quantitative metabolomics dataset (10%-40% in the manufacturer’s experience). To allow selected analyses to precede (without divide-by-zero problems) these missing values are replaced by the half of the minimum value found in the dataset by default.

The manufacturer’s also implemented a variety of methods which enable users to manually or automatically perform missing value exclusion, missing value replacement, as well as missing value imputation by Probabilistic PCA (PPCA), Bayesian PCA (BPCA) and Singular Value Decomposition Imputation (SVDImpute).

In addition, as part of the data integrity check, MetaboAnalyst also verifies class labels and pair specification (if applicable) to make sure all the required information is present and consistent before proceeding to the next step.

Step 3: data normalization -- At this stage, the uploaded data is compiled into a table in which each sample is formally represented by a row and each feature identifies a column. With the data structured in this format, two (2) types of data normalization protocols - row-wise normalization and column-wise normalization - may be used.

Row-wise normalization aims to normalize each sample (row) so that it is comparable to the other. Four (4) commonly used metabolomic normalization methods have been implemented in MetaboAnalyst, including normalization to a constant sum, normalization to a reference sample (probabilistic quotient normalization), normalization to a reference feature (creatinine or an internal standard) and sample-specific normalization (dry weight or tissue volume).

In contrast to row-wise normalization, column-wise normalization aims to make each feature (column) more comparable in magnitude to the other. Four widely-used methods are offered in MetaboAnalyst - log transformation, auto-scaling, Pareto scaling, and range scaling.

Step 4: data analysis – MetaboAnalyst’s data analysis module is a collection of well-established statistical and machine learning algorithms that have been shown to be particularly robust for high-dimensional data analysis. These algorithms are organized into five (5) analysis ‘paths’ for users to explore.

PLS-DA based feature selection and classification was previously discussed in the chemometrics analysis path (see above...).

Random forest uses an ensemble of classification trees, each of which is grown by random feature selection from a bootstrap sample at each branch. Class prediction is based on the majority vote of the ensemble.

The SVM classification algorithm aims to find a nonlinear decision function in the input space by mapping the data into a higher dimensional feature space and separating it by means of a maximum margin hyper-plane. MetaboAnalyst's SVM analysis is done through recursive feature selection and sample classification using a linear kernel.

Step 5: Data annotation (peak search and pathway mapping) -- A key step in placing statistically significant findings from chemometric analyses (as opposed to quantitative metabolomic analyses) into a ‘biological context’ is to identify significantly altered compounds represented by certain spectral bins or certain clusters of spectral peaks.

Once a user has identified lists of MS or NMR peaks that exhibit statistically significant changes, they may use one of several spectral comparison routines and spectral libraries to attempt to identify the compound(s) based on either lists of MS peaks, GC-MS peaks or NMR peaks.

These compound identification routines and spectral reference libraries were originally developed for the Human Metabolome DataBase (HMDB) and for MetaboMiner. While Not as comprehensive as some commercial libraries or commercial software, these freely available tools have been shown to be quite advanced in identifying many common compounds.

Once compound information becomes available (via quantitative routes or via MetaboAnalyst's metabolite ID software), more insight can be obtained by which metabolic pathways are involved. Pathway mapping has been implemented in MetaboAnalyst using more than 70 pathway diagrams and metabolite libraries derived from the HMDB.

Step 6: summary report download -- When users finish their analyses and click the download link, a comprehensive report will be generated containing a detailed description of each step performed embedded with graphical and tabular outputs. In addition, the processed numeric data, high-resolution images (PNG format), R scripts, as well as the R command history are also available for downloading.

Users familiar with R can easily reproduce the results on their local machine after installation of R and the required packages. Users have the option of providing an email address (to which the summary report is sent) or simply downloading the compressed file that contains all the data (graphs, tables, etc.) produced during the analysis.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site MetaboAnalyst

Price Contact manufacturer.

G6G Abstract Number 20654

G6G Manufacturer Number 104301