Bioinformatics Solutions PatternHunter

Category Cross-Omics>Sequence Analysis/Tools

Abstract PatternHunter is a general-purpose homology search tool, based on innovative and proprietary technologies.

It provides all the tools necessary for a fast and sensitive homology search in all flavors including DNA-DNA, Protein-Protein, translated DNA-protein, and translated DNA-DNA searches.

It is recognized and used by leading Genomics centers around the world.

PatternHunter is fast - Working with the Mouse Genome Sequence Consortium, PatternHunter finished comparing the mouse genome with the human genome in 20 CPU-days (Nature, 420(6915):520-522. December 2002), while the Basic Local Alignment Search Tool (BLAST), needs at least 20 CPU-years to do the same process at the same sensitivity.

PatternHunter is sensitive - BLAST was invented to trade speed for sensitivity. With the manufacturer's proprietary ‘multiple optimal spaced seed’ technology, PatternHunter improves speed and sensitivity simultaneously (see Bioinformatics, 18(3):440-445. March 2002).

PatternHunter can even approach Smith-Waterman's exhaustive dynamic programming sensitivity at a speed 3,000 times faster than Smith-Waterman.

Why PatternHunter is better --

In the 1980’s, BLAST was designed to speed up the Smith-Waterman algorithm for homology search by trading sensitivity for speed. Today, BLAST and Smith-Waterman are No longer sufficient for the exponential growth of genomics data.

PatternHunter (Bioinformatics, 18(3):440-445, 2002; Journal of Bioinformatics and Computational Biology, 2004) uses modern homology search technology invented by the manufacturer's founders.

One such technology is called 'optimized multiple spaced seeds'.

With new algorithms and ideas, PatternHunter is changing the way homology search is done. One No longer needs to trade sensitivity for speed. As mentioned above, PatternHunter can approach Smith- Waterman sensitivity and yet run thousands of times faster.

Spaced seeds --

Because of the large size of databases and queries, comparing each position in a query with each position in the database, as in the Smith- Waterman algorithm, is too computational intensive.

For better speed, heuristic methods have been used in the homology search.

One heuristic method, as in BLAST, uses a short, ‘continuous sequence of letters’ as a "seed". An exact match of this seed is a hint that there may be a longer match surrounding it.

Hence, BLAST only tries to find homologies in those regions with hits.

PatternHunter also uses seeds; the difference being that PatternHunter uses a 'discontinuous sequence of letters' as its seeds.

By adjusting the relative positions of letters in a discontinuous sequence, one can optimize the seed to increase sensitivity.

The relative positions of the letters are denoted by a 0-1 string.

For example: in the seed model "111010010100110111", a "1" means the letter at that position is required to match, and a "0" means the letter at that position is Not required to match. The number of 1s is called the weight of the seed.

Why spaced seeds are better than consecutive seeds --

There are two (2) factors that affect the performance of a seed: the selectivity and the sensitivity.

Selectivity determines the search speed - more required matches (more 1's) in a seed means fewer hits, and a faster search Sensitivity determines the search quality - Not all homologies can be hit by a given seed.

For example, the seed 11111111111 cannot hit the above-mentioned alignment. We want to optimize the seed, so that, on average, the number of homologies hit by the seed is maximized.

Two seeds with the same weight will generate approximately the same number of hits (Bioinformatics, 18(3):440-445, 2002). That is to say, a ‘spaced seed’ and a ‘consecutive seed’ with the same weight will have very similar selectivity.

However, the spaced seed will have better sensitivity. This is because, when a consecutive seed finds a hit, a second hit at the next position of the homology is very likely -- it requires only one more letter match.

The second hit is redundant because only one hit is required to find the homology.

Spaced seeds are more independent. Therefore, it is more difficult to have more than one hit in a homology. Therefore, using approximately the same amount of hits, a spaced seed will detect more homologies.

The multiple seed technique and why it increases sensitivity --

As explained before, any given seed may fail to detect some homologies.

Because different seeds tend to fail at different homologies, using several different seeds simultaneously can significantly improve the success rate.

However, it is very important to optimize the combination of the multiple seeds, so that their detection ability is complementary to each other.

System Requirements

Different Java Virtual Machines may change the performance speed of PatternHunter significantly. Some old Java VMs without JIT or HotSpot may reduce the run time speed by a factor of 10. Make sure that you have the most updated Java VM installed on your system.

Manufacturer

Manufacturer Web Site Bioinformatics Solutions PatternHunter

Price Contact manufacturer.

G6G Abstract Number 20379

G6G Manufacturer Number 100432