GEMS Launcher

Category Cross-Omics>Sequence Analysis/Tools

Abstract GEMS Launcher is a set of tools for regulatory sequence analysis. GEMS Launcher integrates the highest quality databases and gold standard algorithms for in-depth analysis of transcriptional regulation.

The tasks are divided into the following categories:

1) Transcription factor binding sites;

2) Complex regulatory patterns;

3) Alignment;

4) Miscellaneous; and

5) Genomatix sequence tools.

GEMS Launcher consists of the following advanced tools:

1) MatInspector -- is a tool that can be used for transcription factor analysis. It utilizes MatBase, a comprehensive transcription factor knowledge base, to locate transcription factor binding sites in sequences of unlimited length.

MatInspector assigns a quality rating to matches (called matrix similarity) and thus allows similarity-based filtering and selection of results. For every single matrix, an individually optimized similarity score is specified, minimizing the number of false positive hits in non- regulatory sequences.

Each matrix definition is quality tested, resulting in superior usability for functional analysis of regulatory regions.

2) MatDefine -- is a tool that can be used for the fully automatic definition and evaluation of weight matrices from a set of short DNA sequences. The resulting weight matrix can be used by MatInspector to scan nucleic acid sequences for matches to the described binding site.

The quality of a matrix is estimated by a value for random expectation (RE-value), which is defined as the number of matches with high matrix similarity (>= 0.85) expected in a random sequence of 1000 base pairs (bp). This RE-value is assigned to each matrix.

3) CoreSearch -- is a tool that can be used to define unknown 'common motifs' in a set of unaligned DNA sequences. CoreSearch starts with a search for a highly conserved core sequence which occurs in almost all of the input sequences.

In most cases this initial search defines more than one core. Consecutive selection steps are then employed in order to reduce the number of core candidates as soon as possible.

The selection is based on the maximization of the information content (consensus index), first of the core and then of regions around the core.

4) FrameWorker -- is a complex software tool that allows users to extract a common framework of elements from a set of DNA sequences. These elements are usually transcription factor binding sites since this tool is designed for the comparative analysis of promoter sequences (e.g. inter-species analysis).

FrameWorker returns the most complex models that are common to the input sequences (satisfying the user’s parameters). Models/frameworks are defined as all elements (TF sites) that occur in the same order and in a certain distance range in all (or a subset of) the input sequences.

The resulting models can be saved in a user-directory and subsequently can be used to scan any set of sequences.

5) FastM -- is a method to develop user defined models of transcriptional regulatory DNA units (e.g. promoters). Thus, modular organizations of functional sequence regions (e.g. promoters) can be modeled.

Models generated by FastM can then be used with ModelInspector to scan any DNA sequence(s) or sequence databases for matches to the model.

6) ModelInspector -- uses a library of predefined models or models defined with FastM or FrameWorker to scan DNA sequences for matches to these models. A model consists of various individual elements (like transcription factor binding sites, repeats, hairpins), their strand orientation, their sequential order, and their distance ranges.

ModelInspector uses a proprietary scoring algorithm to allow inclusion of very different element types into the composite scoring of matches.

Thus, International Union of Pure and Applied Chemistry (IUPAC) sequence elements can be successfully combined with different types of weight matrices and structural elements (e.g. hairpins) in the assessment of match quality.

7) SequenceShaper -- is a software tool developed for the design of regulatory sequences. It allows for the generation and deletion of transcription factor binding sites.

8) SNPInspector -- analyses the potential effects of a single nucleotide polymorphism (SNP) associated with your sequence. For each SNP allele the transcription factor binding sites either deleted or generated by the nucleotide exchange are determined.

The analysis is based on MatInspector (see above) and Genomatix' library of matrix descriptions for transcription factor binding sites.

9) DiAlign -- is a (DNA or protein) alignment program that relies on comparison of whole segments of sequences instead of comparison of single nucleic/amino acids.

The program DiAlign constructs alignments from gap free pairs of similar segments of the sequences. Such segment pairs are referred to as diagonals.

Every possible diagonal is given a so-called weight reflecting the degree of similarity among the two (2) segments involved. The overall score of an alignment is then defined as the 'sum of weights' of the diagonals it consists of and the program finds an alignment with a maximum score -- in other words: the program tries to find a consistent collection of diagonals with a maximum sum of weights.

10) DiAlign TF -- displays transcription factor (TF) binding site matches within a 'multiple alignment'. It is possible to display all TF binding site matches, TF binding site matches common to all or a subset of the input sequences, or common TF binding site matches that are located in aligned regions.

The TF binding sites are visualized in the alignment as colored boxes.

The input sequences are aligned with the 'multiple alignments' program DiAlign and TF binding site matches are identified by MatInspector (see above).

11) SMARTest -- is a software tool that utilizes a proprietary library currently containing 97 S/MAR-associated weight matrices to test genomic DNA sequences for the occurrence of potential regions of S/MARs (Scaffold/Matrix Attachment Regions).

12) ExonMapper -- is a software tool that can be used to map your cDNA sequences to genomic databases.

13) PromoterInspector -- is a program that predicts eukaryotic pol II promoter regions with high specificity in mammalian genomic sequences. The program PromoterInspector focuses on the genomic context of promoters rather than their exact location.

Prediction is based on context specific features previously extracted from training sequences (all mammalian) by a heuristic free approach. The novel idea of the PromoterInspector approach is the way of feature definition:

Features are defined by 'equivalence classes' of IUPAC groups which allow a fuzzy description of the promoter context. Prediction is based on the analysis of feature frequencies.

System Requirements

GEMS Launcher is available as

An intranet installation

An online account

An evaluation account with registration

Manufacturer

Manufacturer Web Site GEMS Launcher

Price Contact manufacturer.

G6G Abstract Number 20229

G6G Manufacturer Number 101114