HHpred

Category Proteomics>Protein Structure/Modeling Systems/Tools

Abstract HHpred is a fast server for remote protein homology detection and structure prediction and is one of the first software products to implement pairwise comparison of profile Hidden Markov Models (HMMs).

It allows you to search a wide choice of databases, such as the PDB, SCOP, PFAM, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input.

Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring of secondary structure similarity.

HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software - from these alignments.

Note: A detailed help facility is also available.

Why was HHpred developed?

The primary aim in developing HHpred was to provide biologists with a method for sequence database searching and structure prediction that is as easy to use as BLAST or PSI-BLAST and that is at the same time much more sensitive in finding remote homologs.

In fact, HHpred’s sensitivity is competitive with the most advanced servers for structure prediction currently available.

HHpred is one of the first servers that are based on the pairwise comparison of profile hidden Markov models (HMMs) (as stated above...).

Whereas most conventional sequence search methods search sequence databases such as UniProt or the non-redundant (NR) protein sequence database, HHpred searches alignment databases, like PFAM or SMART.

This greatly simplifies the list of hits to a number of sequence families instead of a clutter of single sequences. All major publicly available profile and alignment databases are available through HHpred.

When can HHpred be useful for you?

It is well known that sequence search methods such as the Basic Local Alignment Search Tool (BLAST), FASTA, or PSI-BLAST are of prime importance for biological research because functional information of a protein or gene can be inferred from homologous proteins or genes identified in a sequence search.

But quite often No significant relationship to a protein of known function can be established. This is certainly the case for the most interesting group of proteins, those for which No ortholog has yet been studied.

It is less well known that in cases where conventional sequence search methods fail, the recently developed, highly sensitive methods for homology detection or structure prediction quite often allow you to make inferences from more remotely homologous relationships.

If the relationship is so remote that No common function can be assumed, one can generally still derive hypotheses about possible mechanisms, active site positions and residues, or the class of substrate bound.

When a homologous protein with known structure can be identified, its structure can be used as a template to model the 3D structure for the protein of interest, since even remotely homologous proteins generally have quite a similar 3D structure.

The 3D model may then help you to generate hypotheses to guide your experiments.

What is HMM-HMM comparison and why is it so advanced?

When searching for remote homologs, it is wise to make use of as much information about the query and database proteins as possible in order to better distinguish true from false positives and to produce optimal alignments.

This is the reason why sequence-sequence comparison is inferior to profile-sequence comparison.

Sequence profiles contain, for each column of a multiple alignment the frequencies of the 20 amino acids. They therefore contain detailed information about the conservation of each residue position, i.e. how important each position is for defining other members of the protein family, and about the preferred amino acids.

Profile Hidden Markov Models (HMMs) are similar to simple sequence profiles, but in addition to the amino acid frequencies in the columns of a multiple sequence alignment they contain information about the frequency of inserts and deletions at each column.

Using profile HMMs in place of simple sequence profiles should therefore further improve sensitivity.

HHpred is one of the first servers to employ HMM-HMM comparison, based on a novel statistical method that the manufacturers have recently developed.

Using HMMs both on the query and the database side greatly enhances the sensitivity and selectivity over sequence-profile based methods such as PSI-BLAST (Position Specific Iterated BLAST).

Structure prediction with HHpred --

The most successful techniques for protein structure prediction rely on identifying homologous sequences with a known structure to be used as a template.

This works well because structures diverge much more slowly than sequences and homologous proteins may have very similar structures even when their sequences have diverged beyond recognition.

But sensitivity in homology detection is crucial for success since many proteins have only remote relatives in the structure database.

Most publicly available alignment databases can be searched --

A large number of publicly available (partly redundant) and manually or automatically annotated databases of protein family alignments can be searched through the HHpred webserver, such as:

PFAM (PFAM - multiple sequence alignments and HMM-profiles of protein domains); SMART (SMART - identification and annotation of domains from signaling and extracellular protein sequences);

PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System;

TIGRFAMs - (TIGRFAMs are protein families based on Hidden Markov Models or HMMs); Protein Information Resource SuperFamily (PIRSF), or Clusters of Orthologous Groups of proteins (COGs)/KOG.

Two (2) composite databases in HHpred that include most of these original databases are InterPro (InterPro is an integrated database of predictive protein “signatures” used for the classification and automatic annotation of proteins and genomes) from the European Bioinformatics Institute (EBI); and

The Conserved Domain Database (CDD) - (CDD is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins) from the National Center for Biotechnology Information (NCBI).

A number of alignment databases are built around sequences of known structure, generally by starting with such a sequence, as a seed and searching for sequence homologs.

The manufacturers have built two (2) such databases, one using full-length sequences from the Protein Data Bank (PDB) as seeds, and another database that uses PDB sequences cut into structural and evolutionary domains as defined by the Structural Classification of Proteins (SCOP) database.

These two (2) databases are referred to simply as pdb70 and scop70 in HHpred.

An alignment database very much like scop70 is the SUPERFAMILY - (SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. The SUPERFAMILY annotation is based on a collection of Hidden Markov Models, which represent structural protein domains at the SCOP superfamily level);

Whereas CATH/Gene3D [Gene3D extends the CATH superfamilies to sequenced genomes and the major protein sequence repositories (i.e. UniProt) through the generation of a set of statistical models (Hidden Markov Models or HMMs) for each superfamily, etc.] is based on CATH, a database for hierarchical classification of structural domains that is similar to SCOP.

Note: Open-source HH-suite 2.0 is now available (February 2012) with new functionality and better performance according to the manufacturers.

System Requirements

Web-based and Contact manufacturer.

Manufacturer

Manufacturer Web Site HHpred

Price Contact manufacturer.

G6G Abstract Number 20717

G6G Manufacturer Number 104286