Variant Annotation, Analysis & Search Tool (VAAST) and additional Tools

Category Genomics>Genetic Data Analysis/Tools and Cross-Omics>Next Generation Sequence Analysis/Tools

Abstract VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences.

VAAST builds upon existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood-framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion.

VAAST can score both coding and non-coding variants, evaluating the cumulative impact of both types of variants simultaneously.

VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases.

And, according to the manufacturer, VAAST has a much greater scope of use than any existing methodology.

MAKER 2 --

MAKER is a portable and easily configurable genome annotation pipeline.

Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.

MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations, having evidence-based quality values.

MAKER is also easily trainable: outputs of preliminary runs can be used to automatically retrain its gene prediction algorithm, producing higher quality gene-models on subsequent runs.

MAKER’s inputs are minimal and its outputs can be directly loaded into a GMOD database.

They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database.

MAKER should prove especially useful for emerging model organism projects with minimal bioinformatics expertise and computer resources.

MAKER Web Annotation Service --

The MAKER Web Annotation Service (MWAS) is an easily configurable web-accessible genome annotation pipeline.

It’s purpose is to allow research groups with small to intermediate amounts of eukaryotic and prokaryotic genome sequence (i.e. BAC clones, small whole genomes, preliminary sequencing data, etc.) to independently annotate and analyze their data and produce output that can be loaded into a genome database.

MWAS was built on the stand alone genome annotation pipeline MAKER (see above…), and users who wish to annotate larger datasets and whole genomes are free to download MAKER for use on their own systems.

MWAS identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values.

MWAS can also automatically train popular gene prediction algorithms for use on new genomes for which pre-existing information is limited.

MAKER is a member of the Generic Model Organism Database (GMOD) project and output produced by this site can be directly used with other GMOD tools.

Annotations can be directly viewed online by the user via GBrowse, JBrowse, and Apollo, or they can be downloaded for local analysis and integration into a genome database.

MWAS also supplies summary statistics on sequence features via the ‘Sequence Ontology’ tool SOBA.

SOBA - The Sequence Ontology Bioinformatics Analysis Tool provides a high-level overview of the features in a GFF3 sequence annotation file.

While GFF3 - the standard file format for genome annotation - is simple to produce and work with, whole genome annotation data still present a large and complex dataset. SOBA automatically calculates and displays some common statistics and graphics used when working with GFF3 files.

MWAS should prove especially useful for emerging model organism genome projects with minimal bioinformatics expertise and computer resources, since a user can produce final genome annotations without having to install and configure any software locally.

RepeatRunner --

RepeatRunner is a Comparative Genomics Library (CGL)-based program (see below…) that integrates RepeatMasker with BLASTX to provide a comprehensive means of identifying repetitive elements.

Because RepeatMasker identifies repeats by means of similarity to a nucleotide library of known repeats, it often fails to identify highly divergent repeats and divergent portions of repeats, especially near repeat edges.

To remedy this problem, RepeatRunner uses BLASTX to search a database of repeat encoded proteins (reverse transcriptases, gag, env, etc.).

Because protein homologies can be detected across larger phylogenetic distances than nucleotide similarities, this BLASTX search allows RepeatRunner to identify divergent protein coding portions of retro-elements and retro-viruses Not detected by RepeatMasker.

RepeatRunner merges its BLASTX and RepeatMasker results to produce a single, comprehensive XML-based output. It also masks the input sequence appropriately.

In practice RepeatRunner has been shown to greatly improve the efficacy of repeat identification.

RepeatRunner can also be used in conjunction with PILER-DF - a program designed to identify novel repeats - and RepeatMasker to produce a comprehensive system for repeat identification, characterization, and masking in the newly sequenced genomes.

CGL --

CGL is a software library designed to facilitate the use of genome annotations as substrates for computation and experimentation; the manufacturer calls it “CGL”, an acronym for Comparative Genomics Library, and pronounce it “Seagull”.

The purpose of CGL is to provide an informatics infrastructure for a laboratory, department, or research institute engaged in the large-scale analysis of genomes and their annotations.

System Requirements

Contact manufacturer.


Manufacturer Web Site VAAST

Price Contact manufacturer.

G6G Abstract Number 20803

G6G Manufacturer Number 104296