HarvESTer Analysis System

Category Cross-Omics>Next Generation Sequence Analysis/Tools

Abstract The HarvESTer™ Analysis System is a sophisticated high-throughput system for the analysis of sequence tags.

The system performs clustering and assembly of sequence tags to obtain consensus sequences. Additionally, heterozygote positions, insertions, deletions and single nucleotide polymorphisms (SNPs) are identified in the assembled sequence data.

Researchers today have access to advanced sequencing techniques: from Sanger to Next-Generation sequencing (454 Life Sciences).

Due to the massive amount and diversity of data, sequence clustering projects require customized schemes and varying computational power.

The HarvESTer system meets these needs by coupling optimized algorithms with intelligent parallelization and highly flexible process configurations.

The software’s series of flexible modules are designed for incrementally added sequence data processing.

The HarvESTer system provides efficient processing of sequence data and with it you can:

a) Analyze sequences for the development of diagnostic and molecular markers.

b) Detect SNPs and genotypes for plant breeding.

c) Perform clustering and assembly of Expressed Sequence Tags (ESTs) to obtain the expressed gene set of an organism.

d) Customize your analysis workflow with flexible processing modules.

e) Keep up with ongoing projects by incrementally adding new sequences.

f) Get a fast, clear overview of your projects through the statistical report.

1) Build the processing pipeline --

The HarvESTer system has several modules that can be organized into a customized processing pipeline.

The pre-processing modules identify and eliminate identical, low- quality, vector and poly-A sequences, contaminations and repeats:

The main processing modules cluster and assemble the sequences:

The post-processing modules detect heterozygote and SNPs:

2) Mix data from different sources --

ABI™ trace files and Standard Chromatography Files (SCF) can be uploaded with Standard Flowgram Format (SFF) files from 454 Life Sciences. These file types can be assembled and visualized in one analysis.

3) Incorporate pre-assembled data --

Pre-assembled data in Phred/Phrap, Pileup (MSF), CAP3 or FASTA formats can be loaded into the HarvESTer system for further analysis (SNP and heterozygote detection).

4) Navigate the results --

In the HarvESTer system, sequence changes are tracked throughout the processing pipeline.

The convenient and logical organization of clusters and specific assemblies makes rapid identification of gene families and splice variants possible.

Data are summarized in statistics that allow the user to assess the quality of the analyzed data set easily.

To accommodate the dynamic nature of sequencing projects, the HarvESTer system allows incremental addition of ‘new sequences’ into existing analyses.

Early results can be viewed while sequencing and subsequent processing continues.

A history function is available to follow changes made to clusters and assemblies as well as individual sequences.

5) Functional and structural annotation --

HarvESTer results are optimized for further analysis with the Pedant- Pro™ Sequence Analysis Suite (see G6G Abstract Number 20094), which provides access to diverse types of information ranging from protein function to the sequence level of trace files.

6) Technical description --

The HarvESTer system is designed for a distributed environment. Database servers, processes and user applications can reside on different machines within a local network.

The layered and open design of the system architecture allows new modules to be integrated easily.

A relational database management system (MySQL™ or Oracle®) is used to store all information from HarvESTer projects.

The optimized indexing structure allows efficient clustering and assembly. The easy-to-use graphical and command line user interfaces are based on Java™ technology.

System Requirements

The HarvESTer system is available for various UNIX® operating systems (e.g., Linux®) and can be integrated with other Biomax software components to build comprehensive bioinformatics solutions fully optimized to meet specific needs.

Manufacturer

Manufacturer Web Site HarvESTer Analysis System

Price Contact manufacturer.

G6G Abstract Number 20397

G6G Manufacturer Number 100421