Web site and design © 2008-2010 by G6G Consulting Group. All Rights Reserved. Most product content has been taken directly from manufacturer's web
sites; other product content is assembled by G6G Consulting Group. G6G welcomes any corrections and/or comments.
Product Feedback
* Required Field
*Your name:
*Email:
*Questions, comments, or feedback:
    mGene

    Category  Genomics>Genetic Data Analysis/Tools

    Abstract  mGene is an advanced computational tool for the genome-
    wide prediction of protein coding genes from eukaryotic DNA
    sequences.

    It is based on recent advances in machine learning and uses
    discriminative training techniques, such as Support Vector Machines
    (SVMs) and Hidden Semi-Markov Support Vector Machines (HSMSVMs).

    Its excellent performance was proved in an objective competition based
    on the genome of the nematode Caenorhabditis elegans.

    The evaluated developmental version of mGene exhibited the best
    prediction performance (in terms of the average between sensitivity and
    specificity) for the multiple-genome prediction tasks on all four
    evaluation levels (considering, nucleotides, exons, transcripts and
    genes).

    The manufacturer tackles the 'gene prediction' problem taking a two (2)-
    layered approach.

    1) In a first step, state-of-the-art kernel machines are employed to detect
    signal sequences in genomic DNA (like splice sites or transcription
    start sites) and to discriminate the content of different DNA sequences
    (like coding exons, introns, etc.).

    2) In a second step their outputs are combined to predict whole gene
    structures. In this step, the manufacturer uses a discriminative training
    approach based on HSMSVMs.

    The manufacture offers mGene via two (2) options - 1) Standalone
    Tools for Training and Prediction with mGene and 2) mGene as a web
    service (mGene.web).

    mGene.web --

    mGene.web is a ‘web service’ for the genome-wide prediction of protein
    coding genes from eukaryotic DNA sequences.

    It offers pre-trained models for the recognition of gene structures,
    including untranslated regions in an increasing number of organisms.

    mGene.web additionally allows you to train the system for other
    organisms on the push of a button, a functionality that greatly
    accelerates the annotation of newly sequenced genomes.

    The system is built in a highly modular way, such that individual
    components of the framework, like the promoter prediction tool or the
    splice site predictor, can be used autonomously.

    mGene.web is free of charge, and can be used for eukaryotic genomes
    of small to moderate size (several hundred Mbp).

    mGene.web main features/capabilities include:

    1) Simple one-step procedure to train an ab initio gene predictor for a
    new organism based on a FASTA and a GFF3 (or GTF) file.

    2) Gene prediction for a growing list of organisms from a given FASTA
    file using pretrained mGene instances.

    3) Easy access to the signal predictions, e.g. for splice sites,
    transcription start sites, etc.

    4) Integration of externally provided signal or content predictions/tracks
    into the mGene gene finder.

    5) High accuracy of mGene's gene and signal predictions.
    mGene.web modules --

    The web service (mGene.web) currently provides fourteen (14) core
    modules. They can be grouped into four (4) groups: Data preparation;
    Signal training and prediction; Content training and prediction; and
    Gene structure training and prediction.

    Each tool requires a set of inputs and provides at least one output. They
    are managed by the Galaxy system according to their data types.

    Data preparation --

    GenomeTool needs a file in FASTA format containing genomic
    sequences as input that allows it to create a genome object, stored in a
    Genome Information Object (GIO), to be used by other mGene modules.

    Additionally, one may create a GIO from an internal database of more
    than 50 genomes.

    Signal training and prediction --

    Anno2SignalLabel - uses an Annotation Gene Structure (AGS) to collect
    labeled genomic positions for the selected genomic signal. Possible
    signals include transcription start and stop sites, translation initiation
    and termination sites, as well as donor and acceptor splice sites.

    It uses the regions covered by annotated features to generate negative
    examples at all consensus positions unless they were annotated as
    true sites. The output is a file in signal prediction format (SPF) providing
    chromosome/contig name, position, strand, and the label of the
    example.

    SignalTrain - trains a signal predictor using SVMs with pre-selected
    kernels for each signal. Input is a genome information object (GIO) and
    an SPF file with labeled genomic positions. The output is a trained
    signal predictor (TSP) that can be used with SignalPredict to perform
    predictions on genomic sequences.

    SignalPredict - uses a GIO and TSP to predict signals on the given DNA
    sequences. The output is given in signal prediction format (SPF).

    SignalEval - takes a label file and a prediction file (both SPF files) as
    input and computes several accuracy measures for the predictions,
    including the areas under the Receiver-Operator-Curve (ROC) and the
    Precision-Recall-Curve (PRC). This tool is useful for prediction quality
    monitoring.

    Content training and prediction --

    Anno2ContentLabel - collects labeled genomic segments for the
    selected content types, analogous to Anno2SignalLabel. Possible
    content types include 5' UTR, exonic, intronic, 3' UTR, and intergenic.
    Any segment included that is Not of the specified type is used as a
    negative example.

    The output is a file in content prediction format (CPF) providing
    chromosome/contig name, start position, end position, strand, and the
    label of the example.

    ContentTrain - is analogous to SignalTrain, with a GIO and an SPF file
    as inputs and a Trained Content Predictor object (TCP) as output.

    ContentPredict - is analogous to SignalPredict, with a GIO and a trained
    content predictor (TCP) as input and an SPF file as output.

    ContentEval - analogous to SignalEval, takes a CPF and SPF file as
    input and performs an evaluation.

    Gene structure training and prediction --

    GeneTrain - trains the 'second layer' of mGene.web. Based on the GIO,
    genome-wide predictions for all relevant signals and content types, and
    a set of annotated genes,

    GeneTrain learns to predict gene structures from genomic DNA.

    The output is an internal data structure containing the Trained Gene
    Predictor (TGP) that can be used with mGenePredict to predict genes.

    GenePredict - uses the TGP (either from the current history or from a list
    of pre-trained predictors) as well as genome-wide signal and content
    predictions to predict genes from the provided DNA sequences. The
    output is provided as a GFF3 (genome annotation and gene prediction)
    file.

    GeneEval - takes two GFF3 files, one containing an annotation, the
    other the genome-wide gene predictions, and evaluates the prediction
    performance by comparing the two annotations.

    Note that the ‘annotated genes’ should be distinct from the annotated
    genes used for training; otherwise a training error will be reported.

    Evaluation criteria include sensitivity and specificity on nucleotide, exon,
    and gene levels.

    ComposeMGenePredictor - bundles all necessary trained signal,
    content, and gene predictor objects into a trained mGene predictor that
    can be used with mGenePredict to predict genes.

    DecomposeMGenePredictor - decomposes a trained mGene predictor
    into its components, i.e. the individual predictors.

    System Requirements  

    Web-based and Contact manufacturer

    Manufacturer   

    Rätsch Lab: Machine Learning in Biology
    Friedrich Miescher Laboratory of the Max Planck Society
    Spemannstraße 39
    72076 Tübingen, Germany

    Manufacturer's Web Site   

    http://www.mgene.org/

    Price   Contact manufacturer

    G6G Product Number  20470

    G6G Manufacturer Number 104095
The G6G Directory of Omics and Intelligent Software
Search www.G6G-SoftwareDirectory.com
Bookmark and Share