SUPERFAMILY

Category Proteomics>Protein Structure/Modeling Systems/Tools

Abstract SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.

The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the Structural Classification of Proteins (SCOP) superfamily level.

A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 1,700 completely sequenced genomes against the hidden Markov models.

1) For each protein you can:

Submit sequences for SCOP classification; and View domain organization, sequence alignments and protein sequence details.

2) For each genome you can:

Examine superfamily assignments, phylogenetic trees, domain organization lists and networks; and Check for over- and under-represented superfamilies within a genome.

3) For each superfamily you can:

Inspect SCOP classification, functional annotation, Gene Ontology (GO) annotation, InterPro abstract and genome assignments; and Explore taxonomic distribution of a superfamily, across the tree of life.

Note: All annotation, models and the database dump are freely available for download to everyone.

SUPERFAMILY has been used in structural, functional, evolutionary and phylogenetic research projects.

Server Purpose --

The purpose of this server is to provide structural (and hence implied functional) assignments to protein sequences primarily at the SCOP superfamily level.

A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor.

What this service offers is sophisticated and expertly chosen remote homology detection.

What it does Not offer is an improvement in speed or assignment of superfamilies Not of known structure.

There is a facility to compute assignments for your own DNA or protein sequences, and there is access to genome assignments and to multiple sequence alignments of SCOP superfamilies.

Note: If you have an interest in running large numbers of sequences, then please don't hesitate to contact the manufacturer.

The web site includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences, and keyword searches.

Sequence Search Description --

The sequence search method uses a library (covering all proteins of known structure) consisting of over 1,776 [SCOP 1.75 release (June 2009)] superfamilies from classes a to g.

Each superfamily is represented by a group of hidden Markov models. Your query sequences will be assigned e-value scores for all models, and the significant ones will be returned.

Each sequence may well hit a superfamily more than once as there are several overlapping models for each superfamily; however it is the hit to the superfamily which is meaningful.

Each model is created from a seed sequence which is aligned to many superfamily homologues. The model is built from the alignment via the Sequence Alignment and Modeling System (SAM) - (see below...).

Note: A hit to a model is Not a hit to the seed but is a hit to the superfamily which the model represents.

You may view sequences aligned to the models which represent a view of the superfamily although it may be biased towards the seed. You may also see the genome assignments for each superfamily or view alignments of the genome sequences.

The SUPERFAMILY server is based upon release 1.75 of the SCOP (structural classification of proteins), the corresponding sequences from ASTRAL, and the SAM and HMMER3 hidden markov model software packages.

ASTRAL - The ASTRAL compendium provides databases and tools useful for analyzing protein structures and their sequences. It is partially derived from, and augments the SCOP: Structural Classification of Proteins database.

Most of the resources provided by ASTRAL depend upon the coordinate files maintained and distributed by the Protein Data Bank (PDB).

SAM - The Sequence Alignment and Modeling system (SAM) is a collection of flexible software tools for creating, refining, and using linear hidden Markov models for biological sequence analysis.

The model states can be viewed as representing the sequence of columns in a multiple sequence alignment, with provisions for arbitrary position-dependent insertions and deletions in each sequence.

The models are trained on a family of protein or nucleic acid sequences using an expectation-maximization algorithm and a variety of algorithmic heuristics.

A trained model can then be used to both generate multiple alignments and search databases for new members of the family.

HMMER3 - HMMER3 is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments.

It implements methods using probabilistic models called “profile hidden Markov models” (profile HMMs).

Comparative Genomics Tools --

The SUPERFAMILY web site provides a number of comparative genomics tools for the analysis of superfamily, and family, domains from across the tree of life.

These tools include: lists of unusual (over- and under-represented) superfamilies and families, adjacent domain pair lists and graphs, unique domain pairs, domain combinations, domain architecture co-occurrence networks, and domain distribution across taxonomic kingdoms for each organism.

A detailed description of what these tools can do, and how to use them can be found on the comparative genomics page.

System Requirements

Contact manufacturer.

Manufacturer

MRC Laboratory of Molecular Biology
Hills Road, Cambridge CB2 2QH, UK
E-mail: superfamily@cs.bris.ac.uk

Manufacturer Web Site SUPERFAMILY

Price Contact manufacturer.

G6G Abstract Number 20727

G6G Manufacturer Number 104297

The G6G Directory of Omics and Intelligent Software

SUPERFAMILY