G6G Directory of Omics and Intelligent Software - Max Planck Society MIGenAS Toolkit

MIGenAS Toolkit

Category Cross-Omics>Sequence Analysis/Tools and Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract The MIGenAS Toolkit is a versatile and extensible integrated workflow engine/bioinformatics toolkit for the analysis of biological sequences over the Internet.

This web portal offers interactive access to a growing pool of ‘chainable bioinformatics software tools’ and databases that are centrally installed and maintained by the Garching Computing Centre of the Max-Planck- Society (RZG).

Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein- structure prediction.

Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity of taking care of any format conversions or tedious parsing of intermediate results.

This toolkit is part of the Max Planck Integrated Gene Analysis System (MIGenAS).

MIGenAS features/capabilities include –

The MIGenAS workflow engine/bioinformatics toolkit is a web-based application for processing basic bioinformatics tasks as well as orchestrating them into complex workflows within a single, coherent web interface.

End-users are only assumed to be familiar with the basic functionality offered by these popular sequence analysis tools.

Neither additional computational prerequisites (A modern version of one of the popular web browsers, Mozilla/Firefox, Opera or Internet Explorer are required with JavaScript enabled) nor in-depth bioinformatics experience is considered to be necessary for working with this toolkit.

MIGenAS Tools –

The web application supports the main categories of classic bioinformatics tasks (see the list of ‘Tools’ located on the manufacturer's website). The manufacturer has opted for a manageable selection of packages for each functional category rather than providing an anonymous collection of a large number of tools.

Packages are carefully selected according to their performance, circulation and computational efficiency.

MIGenAS Databases –

For efficient access by the MIGenAS server the following FASTA nucleic and amino acid sequence databases are mirrored locally at RZG with at least a weekly update interval: National Center for Biotechnology Information (NCBI), Swiss-Prot, TrEMBL (UniProt), UniRef, RCSB Protein Bata Bank (PDB) and KEGG GENES database.

A complete and up-to-date collection of organism-specific FASTA databases of the completed microbial genomes from NCBI is available together with a number of eukaryotic genomes.

Clustered EST sequences are provided as FASTA databases for Homo sapiens, Mouse and Drosophila. In addition, HMM libraries based on Pfam-A can be searched.

Uploading of user-supplied sequence databases is supported by the majority of tools. Such (private) data are Not visible outside of the user's session.

MIGenAS Basic user interface –

The essential user interaction occurs in the large, central part of the web portal which displays the forms prompting the user for input data and parameters and renders the output of completed computations. The set of supported tools is arranged in a hierarchical tabbed structure.

The user navigates between tools by first selecting the tab with the corresponding tool category and then clicking a particular tool. Basic controls for working with a tool are located in the narrow horizontal bar shown at the top of the page.

This control bar hosts a number of pull-down menus which allow you to switch between different runs with the same tool (Runs), to navigate between input form, documentation and output display (Vie’), to redirect results to other tools (Forward) and to download (Export) results.

MIGenAS Pipelining –

The notion of a ‘run’ with a tool is the central concept underlying the pipelining capabilities of this application: if output data of tool A can (in principle) be used as input for another tool B, all runs the user has already performed with tool A are offered as selectable input for tool B.

For example, the target sequences found in a run with a search tool such as BLAST can be immediately used as input for an alignment tool such as ClustalW.

And, the above mentioned Forwar’ pull-down menu which is displayed when the inspecting tool results facilitates the forwarding of results to another tool for further processing.

In addition to such semi-automatic workflow management where the user interactively coordinates the succession of tools it is also possible to pre-configure a custom Meta-tool (tab-group Pipelines) as a pipeline of individual tools and intermediate filters.

The same pipeline can then be employed for conveniently processing different sets of input data and parameters.

MIGenAS Customization of results, data integration –

All relevant results of computations are internally interpreted (parsed) by the server. This is Not only a fundamental prerequisite for the pipelining capabilities described above but also allows the manufacturer to add value to the raw results delivered by the underlying software packages.

As an example of a more advanced feature the manufacturer points out the capability for comprehensive and reliable annotation of sequences by species and gene names, protein names as well as possible synonyms and accession codes in various sequence databases.

The manufacturer also shows literature links to PubMed, which are related [according to the information provided by (UniProt)] to the protein under consideration.

The complete text of PubMed abstracts gets asynchronously retrieved and is displayed in a small frame when the user mouse’s over the PubMed icon, which is displayed next to, e.g. a BLAST hit.

Tasks for display and post processing of results, which require a higher degree of interactivity than an HTML-based web application can conceivably offer, are delegated to Java Applets.

Examples are the applets named ‘ATV’ for tree-viewing, ‘JalView’ for editing alignments, ‘Jmol’ for rendering 3D protein structures and ‘CLANS’ for interactive visualization of pairwise sequence similarities.

Parallel processing –

The majority of tools supported by the MIGenAS toolkit allow parallel processing of multiple, mutually independent input data.

When pasting or uploading a set of protein sequences, for example, or selecting multiple outputs from a preceding run for further processing with another tool, a new run with this tool is created automatically and executed in parallel for each individual input with only a single step of user interaction.

Additional parts of the Max-Planck Integrated Gene Analysis System (MIGenAS) are –

1) GenDB - An Annotation system for ‘prokaryotic genomes’ (provided by the University of Bielefeld) –

a) Software system for automatic identification, classification and annotation of genes;
b) Web interface allows manual annotation with geographically dispersed teams of experts; and
c) Local installation of GenDB 2.2 available at RZG with a connection to the dedicated computing facilities.

2) HaloLex - A ‘Genome information system’ for archaea and other prokaryotic genomes –

a) Data management and analysis platform for microbial genomes and related omics data;
b) Web interface supports browsing and versatile searches in annotated (public) microbial genomes; and
c) Provides access to high-quality, expert-curated annotations of a number of halophilic archaea.

System Requirements

Contact manufacturer.

Manufacturer

MIGenAS (Max Planck Integrated Gene Analysis System) is a joint initiative of several research groups of the Max-Planck-Society in collaboration with partners from numerous University institutes.

Manufacturer Web Site MIGenAS Toolkit

Price Contact manufacturer.

G6G Abstract Number 20523

G6G Manufacturer Number 104140

The G6G Directory of Omics and Intelligent Software

MIGenAS Toolkit