Bioinformatics Solutions ZOOM

Category Cross-Omics>Next Generation Sequence Analysis/Tools

Abstract ZOOM (Zillions Of Oligos Mapped) is designed to map millions to hundreds of millions of short reads produced by next-generation sequencing technologies back to reference genomes.

Based on a newly designed multiple spaced seeds theory, ZOOM guarantees great mapping accuracy with unparalleled speed.

Both single-end and paired-end reads with various lengths from 15bp to 240bp will be handled properly.

Best (or top N) mapping results for each mapped read will be reported according to the maximum mismatches and insertions/deletions (indels) (specifiable by users) allowed between the read and its target positions.

Based on the information from the mapped reads, ZOOM can reconstruct a ‘consensus sequence’ and output the coverage and heterozygote frequency of each position, which could be helpful for ChIP- Seq analysis (Chip-Seq is used primarily to determine how ‘transcription factors’ and other chromatin-associated proteins influence phenotype-affecting mechanisms) or Single Nucleotide Polymorphism (SNP) identification.

ZOOM supports both Illumina/Solexa and ABI SOLiD instruments. For Illumina/Solexa data, quality scores are used to reduce ambiguity of read mapping. For ABI SOLiD data, ZOOM differentiates true polymorphism from sequencing errors.

All color space reads, after error correction mapped to the reference sequence will be decoded to base space, with both true polymorphisms and sequencing errors marked.

Methods --

Several techniques were used to improve the speed while maintaining high sensitivity. The ‘spaced seed’ method was developed to speed up the DNA similarity search in the PatternHunter software (Bioinformatics 18: 440-445, 2002).

A spaced seed is a given pattern such as 11*1**11**1*1*1111. The number of 1-positions is called the weight of the seed.

Different spaced seeds have different hit probabilities in a randomly sampled similarity. In PatternHunter (see G6G Abstract Number 20379), one or several optimized spaced seeds are determined.

In order to find all high-scoring local alignments between two long DNA sequences, PatternHunter first finds all the hits and performs extensions nearby the hits.

This saves the computing time on most of the low-scoring local alignments (because they usually do Not provide a hit).

Consequently, the speed is greatly improved. The main difference between PatternHunter and BLAST is that BLAST uses a consecutive seed (without the * in the middle), resulting in lower sensitivity.

Researchers have extended the spaced seed strategy in short reads mapping and carried on specialized optimization for the new application area. Low memory consumption and high throughput performance are the two (2) main goals of ZOOM.

The new improvements in the spaced seed design guarantee ZOOM’s high speed and 100% sensitivity for a wide range of read lengths and mismatch numbers.

ZOOM supports the mapping of paired end reads. Only when the mapping distance between two paired reads is within a range limit, their mapping information is reported and collected.

Experiments show that the ‘paired information’ helps to identify the true mapping positions and contributes significantly to mapping accuracy.

ZOOM also utilizes the quality score of Illumina/Solexa reads. Low quality Illumina/Solexa reads are recognized and reads are mapped relying on only high quality bases.

For ABI SOLiD data, sequencing errors in color spaces can be corrected, polymorphisms on base space and sequencing errors on color spaces are marked respectively.

ZOOM's features/capabilities include:

ZOOM is fast - A single CPU with only 6.5G of memory is capable of mapping 15X coverage of a human genome in one day using ZOOM at full sensitivity while tolerating two (2) mismatches.

ZOOM is accurate - The spaced seed strategy specially extended for short reads mapping problem guarantees 100% sensitivity for a wide range of read length and mismatch numbers. Tests on benchmarks also show great accuracy with the presence of insertions and deletions.

ZOOM is flexible --

1) Supports Illumina/Solexa and ABI SOLiD instruments.

2) Easily maps reads 15 to 240 bps long.

3) Handles both mismatches and insertions/deletions.

4) Supports paired-end reads mapping.

5) Assesses alignment probability using read sequence quality scores.

6) Reports uniquely mapping results or best N mapping results for each read.

7) Offers integrated multiple sequence alignment for consensus reconstruction and SNP identification.

8) Offers coverage and heterozygous information for each position of the consensus sequence reported.

9) Automatically detects and corrects sequencing errors for ABI SOLiD data.

10) Decodes reads in color space into base space, with sequencing errors and polymorphisms highlighted.

ZOOM is Scalable - Time complexity increases approximately linearly with respect to genome length and reads number.

False positive(s) problem - ZOOM increases uniquely mapped reads with a quality score and paired end statistics. This reduces ambiguity of read mapping and increases the likelihood of identification.

System Requirements



Manufacturer Web Site Bioinformatics Solutions ZOOM

Price Contact manufacturer.

G6G Abstract Number 20378

G6G Manufacturer Number 100432