Structure ALignment And Match Inquiry (SALAMI)

Category Proteomics>Protein Structure/Modeling Systems/Tools

Abstract SALAMI is protein structure search server.

Given the coordinates of a protein chain, the SALAMI server will search the Protein Data Bank (PDB) and return a set of similar structures without using sequence information.

The results page lists the related proteins, details of the sequence, and structure similarity and implied sequence alignments.

Via a simple structure viewer, one can view super-positions of a query and library structures and download superimposed coordinates.

The alignment method is very tolerant of large gaps and insertions and tends to produce slightly longer alignments than other similar systems.

Purpose of SALAMI --

Sequence similarity is the classic measure for finding related proteins and the starting point for assigning function, building phylogenies and protein modeling. Sequence similarity will Not, however, be enough to detect remote relationships.

For this, one needs methods that detect pure structural similarity. Given the coordinates of a protein chain (as stated above...), the SALAMI server will search the Protein Data Bank, for similar chains, calculate structural alignments and generate a list of structurally related proteins.

In some sense, structure is preserved more than sequence during evolution, so even within a family of related proteins, there may be members with No significant sequence similarity to another.

This means that questions of function or phylogenetic relations will often only be answerable given structural relationships. Furthermore, there is the question of alignment quality.

In the case of weak sequence similarity, the alignment implied by a structural superposition should be more reliable and more useful for problems such as predicting functional sites.

SALAMI Methods --

The following is a short summary of the underlying methods.

Protein Library -

Every week, the manufacturers download the “90 % sequence identity” list from the Protein Data Bank.

No two (2) proteins in the list have more than 90 % sequence identity. This means that your favorite protein may Not appear. If you expect to find 4pti, it is Not there, but 1bth is.

Despite the different names, the two (2) proteins are almost identical.

Calculating alignments -

The method is based on fragments of length of 6 residues. Some time ago, a set of 1.5 × 106 fragments were taken from the protein data bank and classified, based on backbone dihedral angles, into a bit more than 200 probabilistic classes.

If you have a protein fragment, you can now calculate its probability of being in each of these classes. You could also say that you can take any fragment and represent it as a vector of these probabilities.

You can then take any two fragments and say how similar they are, by taking the dot product of these vectors. If a fragment is some typical structure it has a normal vector with most of the probability in one or two elements.

If a fragment is very unusual, it may have a probability distributed over a few more elements, but it will still be most similar to fragments with similar patterns of probability.

To do an alignment, each protein is broken into all overlapping fragments. A similarity matrix is built by taking the dot products of the corresponding vectors of the probabilities.

The manufacturer’s then calculate an alignment using the Gotoh version of a conventional Smith and Waterman alignment.

The alignments are dominated by local properties.

For fragments of length k = 6, an element in the score matrix is sensitive to 2k - 1 = 11 residues.

The gap penalties were also optimized once using a simplex method where the cost function measured the quality of models.

SALAMI Input data and library --

The server takes the coordinates of a protein chain in PDB format (as stated above...) and an E-mail address for sending results to. The only adjustable parameter is the number of aligned structures to return.

SALAMI Output from the Web server --

The server sends a rather minimal mail message as its result. It contains only a link to a temporary web page (lifetime 1 week) containing a list of candidate structurally related proteins.

Selecting a candidate brings up a view of the superposition using Jmol by E. Willighagen et al. (requires Java plug-in).

Jmol - Jmol is an open-source Java viewer for chemical structures in 3D with features for chemicals, crystals, materials and biomolecules.

In another pane, the implied sequence alignment is shown, the superimposed coordinates can be downloaded and a list of more proteins with 90% or more sequence similarity to the candidate is given.

Each alignment is evaluated by scoring functions such as the alignment length, root mean squared difference (rmsd) of Ca atoms of aligned residues, a z-score calculated from a distribution of random alternative alignments, Smith and Waterman alignment scores and a quality score based on the fraction of distance matrices which are similar between the query and aligned protein.

This measure is used for the initial sorting of the list, but one can select a ranking by any of the other scores.

SALAMI advantages and disadvantages --

SALAMI has the disadvantage that it relies on chain connectivity and can be confused by broken structures.

This means it may Not be very useful for the broken skeletons that one can encounter in crystallographic structures with initial phasing.

SALAMI has the advantage that it relies on chain connectivity and has No problem finding similarities when there are hinge-bending or domain motions.

The graduated similarity measures mean that poor quality structures and deviations from regular geometry are well treated.

The methodology here has another interesting property. The graduated measure of similarity leads to a scoring function which is reliable and applies to any kind of structural unit.

The use of a dynamic programming method then guarantees that the alignments are optimal within this scoring function.

This, together with the good results for difficult structures and the flexible interface make it a valuable alternative to existing web-servers.

System Requirements

Web-based and contact manufacturer.

Manufacturer

Manufacturer Web Site SALAMI

Price Contact manufacturer.

G6G Abstract Number 20729

G6G Manufacturer Number 104293