Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract BRB-ArrayTools is an integrated software package for the visualization and statistical analysis of DNA microarray gene expression data.

It was developed by professional statisticians experienced in the analysis of microarray data and involved in the development of improved methods for the design and analysis of microarray based experiments.

The array tools package utilizes an Excel front end. Scientists are familiar with Excel and utilizing Excel as the front end makes the system portable and Not tied to any database.

The input data is assumed to be in the form of Excel spreadsheets describing the expression values and a spreadsheet providing user- specified phenotypes for the samples arrayed.

The analytic and visualization tools are integrated into Excel as an add- in. The analytic and visualization tools themselves are developed in the R statistical system, in C and FORTRAN and in Java applications.

Visual Basic for Applications is the glue that integrates the components and hides the complexity of the analytic methods from the user.

The system incorporates a variety of advanced analytic and visualization tools developed specifically for microarray data analysis.

BRB-ArrayTools can be used for performing the following analysis tasks:

1) Collating data - Importing your data to the program and aligning genes from different experiments. The software can load a maximum of 249 experiments consisting of, up to 65,000 genes. Both dual-channel and single-channel (such as Affymetrix) microarrays can be analyzed.

A data import wizard prompts the user for specifications of the data, or special interface may be used for Affymetrix or National Cancer Institute (NCI) format data. Data should be in tab-delimited text format.

Data which is in Excel workbook format can also be used, but will automatically be converted by BRB-ArrayTools into tab-delimited text format.

2) Gene annotations - Data can be automatically annotated using standard gene identifiers, either using the SOURCE database, or by importing automatic annotations for specific Affymetrix chips.

If data has been annotated using the gene annotation tool, then annotations will appear with all output results, and Gene Ontology (GO) classification terms may be analyzed for the class comparison, class prediction, survival, and quantitative traits analyses.

Gene Ontology structure files may also be automatically updated from the GO website.

3) Filtering, normalization, and gene sub-setting - Filter individual spots (or probe sets) based on channel intensities (either by excluding the spot or thresholding the intensity), and by spot flag and spot size values.

Affymetrix data can also be filtered based on the Detection Call.

Each array in a multi-chip set is normalized separately. Outlying expression levels may be truncated.

Genes may be filtered based on the percentage of expression values that are at least a specified fold-difference from the median expression over all the arrays, by the variance of log-expression values across arrays, by the percentage of missing values, and by the percentage of “Absent” detection calls over all the arrays (for Affymetrix data only).

Genes may be excluded from analyses based on strings contained in gene identifiers (for example, excluding genes with “Empty” contained in the Description field). Genes may also be included or excluded from analyses based on membership within defined gene-lists.

4) Scatter-plot of experiment v. experiment - For dual-channel data, create clickable scatter-plots using the log-red, log-green, average log- intensity of the red and green channels, or log-ratio, for any pair of experiments (or for the same experiment).

For single-channel data, create clickable scatter-plots using the log- intensity for any pair of experiments. All genes or a defined subset of genes may be plotted. Hyperlinks to NCI feature reports, GenBank, NetAffx, and other genomic databases.

5) Scatter-plot of phenotype classes - Create clickable scatter-plots of average log-expression within phenotype classes, for all genes or a defined subset of genes. If more than two (2) class labels are present, then a scatter-plot is created for each pair of class labels.

6) Hierarchical cluster analysis of genes - Create cluster dendrogram and color image plot of all genes. For each cluster, provides a hyperlinked list of genes, and a line-plot of median expression levels within the cluster versus experiments.

A color image plot of median expression levels for each gene cluster versus experiments is also provided. The cluster analysis may be based on all data or on a user-specified subset of genes and experiments.

7) Hierarchical cluster analysis of experiments - Produces cluster dendrogram, and statistically-based cluster-specific reproducibility measures for a given cut of the cluster dendrogram.

The cluster analysis may be based on all data or on a user specified subset of genes and experiments.

8) Interface for Cluster 3.0 and TreeView - Clustering and other analyses can now be performed using the Cluster 3.0 and TreeView software, which was originally produced by the Stanford group.

This feature is only available for academic, government and other non- profit users.

9) Multidimensional scaling of samples - Produces clickable 3-D rotating scatter-plot where each point represents an experiment, and the distance between points is proportional to the dissimilarity of expression profiles represented by those points.

10) Global test of clustering - Statistical significance tests for presence of any clustering among a set of experiments, using either the correlation or Euclidean distance metric.

This analysis is given as an option under the multidimensional scaling tool.

11) Class comparison between groups of arrays - Uses univariate parametric and non-parametric tests to find genes that are differentially expressed between two or more phenotype classes.

This tool is designed to analyze either single-channel data or a dual- channel reference design data. The class comparison analysis may also be performed on paired samples.

The tool also includes an option to analyze randomized block design experiments, i.e., take into account influence of one additional covariate (such as gender) while analyzing differences between classes.

12) Class comparison between red and green channels - Uses univariate parametric and non-parametric tests to find genes that are differentially expressed between two phenotype classes.

This tool is designed to analyze data from a non-reference design experiment where the red and green samples represent the two distinct phenotype classes. As a special case, this tool can also analyze reference design with one class compared with the common reference.

13) Class prediction - Constructs predictors for classifying experiments into phenotype classes based on expression levels.

Six (6) methods of prediction are used: compound covariate predictor, diagonal linear discriminant analysis, k nearest neighbor (using k=1 and k=3), nearest centroid, and support vector machines.

14) Binary tree prediction - The multistage algorithm constructs a binary tree for classifying experiments into phenotype classes based on expression levels. Each node of the tree provides a classifier for distinguishing two (2) groups of classes.

The structure of the tree is optimized to minimize the cross-validated misclassification rate. The binary tree prediction method can be based on any of the six 'prediction methods' stated above.

15) Survival analysis - Uses Cox regression (with Efron handling of ties) to identify genes that are significantly correlated with survival. The output contains a listing of genes that were significant and hyperlinks to NCI feature reports, GenBank, NetAffx, and other genomic databases.

16) Quantitative traits analysis - Correlates gene expression with any quantitative trait of the samples. Either Spearman or Pearson correlation tests are used.

17) Gene Ontology comparison tool - Classes are compared by GO category rather than with regard to individual genes. This tool provides a list of GO categories that have more genes differentially expressed among the classes than expected by chance.

18) Gene List comparison tool - Investigates user-defined gene-lists and selects a set of gene-lists with more genes differentially expressed among the classes than expected by chance.

19) Plug-ins - Allows users to share their own analysis tools with other users. Advanced users may create their own analysis tools using the R language, which can then be distributed to other users who have No knowledge of R.

System Requirements

The array tools package is an add-in utilizing an Excel front end.

Java Runtime Environment, version 1.2.2 or later.

R 1.8.1 or later


Manufacturer Web Site BRB-ArrayTools

Price See manufacturer web site.

G6G Abstract Number 20287

G6G Manufacturer Number 101838