miRanalyzer

Category Cross-Omics>Next Generation Sequence Analysis/Tools and Genomics>Gene Expression Analysis/Profiling/Tools

Abstract miRanalyzer is a web server tool which implements all the necessary methods for a comprehensive analysis of deep-sequencing experiments of small RNA molecules.

It detects known microRNAs annotated in miRBase and matches them to other transcribed sequences (RNA, RFam and RepBase).

Furthermore, miRanalyzer implements a highly accurate machine learning algorithm to predict new microRNAs [Area Under the Curve (AUC) - value of 97.9%].

The algorithm is based on the random forest classifier and was trained on experimental data. This high accuracy is important for the identification of novel microRNAs, a process which usually results in high false positive rates.

The tool also includes a Perl script for the proper generation of the input file using the Genome Analyzer (Illumina Inc.) pipeline results.

Currently, miRanalyzer works for seven (7) frequently used model species (human, mouse, rat, fruit-fly, round-worm, zebra-fish and dog).

miRanalyser processing/analysis --

miRanalyser processes small-RNAs data obtained from next generation sequencing techniques such as the Genome Analyzer of Illumina Inc. or the Genome Sequencer™ FLX (454 Life Science™ and Roche Applied Science).

The input data is grouped sequence reads (sequence tags, unique reads) that are typically 16 to 26 bp long (depending on the sequencing protocol), and their expression values (number of times the unique read has been found to be expressed in the experiment).

Then, the tool performs several analysis steps and outputs detailed results.

The analysis steps are as follows:

1) Alignment of all reads against the libraries of known mature microRNAs (including also the mature-star libraries - the sequences which pair with the mature microRNAs in the secondary structure of the pre-microRNAs).

Additionally, the tool outputs the predicted target genes together with direct links to ontological analyses of the predictions.

2) Mapping against all theoretically possible mature-star microRNAs including those which are Not annotated in miRBase. This allows you to detect the expression of previously undetected mature-star microRNAs.

3) Alignment against other libraries of transcribed sequences. For example, the number of matches found in the transcriptome will be inversely proportional to the RNA sample quality (the fewer degradation products, the fewer the matches).

Furthermore, matches found in RFam will indicate the expression of other small ncRNA molecules.

RFam - RFam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs).

The primary aim of RFam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs.

4) Prediction of previously unknown microRNAs.

This is important because: 1) experiments can be mined for the detection of previously unknown microRNAs and 2) for many species none or just a few microRNAs are known and therefore the detection of microRNAs relies almost entirely on the computational prediction of new microRNAs.

miRanalyzer detailed output pages --

Known microRNAs -

This output is contained in a file where the reads are sorted on the level of expression, in descending order. The total expression value is determined by the sum of the number of copies of all reads which mapped to a particular microRNA.

The table also shows the normalized expression values:

The first value is the normalization of expression counts to the total number of expressed sequence reads, and the second is the same value normalized according to the number of reads that mapped to microRNAs.

Furthermore, the number of reads and their expression counts are given for perfect matches, 1 mismatch and 2 mismatches.

Note that if the query was limited to perfect matches or 1 mismatch, the columns corresponding to the other options will just show zero (0) counts.

Finally, the last columns are dedicated to putative target genes and ontological analyses. The target genes can be viewed by clicking the corresponding link.

By clicking the “launch AM” link, the target genes are sent automatically to the Annotation-Modules algorithm and the results are depicted in another window.

Once an ontological analysis have been run for a list of target genes, the output page detects the presence of some results and after reloading (i.e. pressing F5), it replaces the “launch AM” link with the “see AM results” link.

Mapping to other libraries of transcribed sequences -

For each sequence-read which maps perfectly to an mRNA, the following information is given: its expression count, the normalized expression based on the count-sum of all sequence reads; the name(s) of the mRNAs (RefSeq identifiers) separated by a “:”, and the number of mRNAs where each read matched perfectly.

microRNA prediction -

The following information is provided for all the reads that are potential sequences of a new microRNA: the sequence read, the read count, the normalized expression value, the chromosome, the start and end coordinates on the chromosome, the sequence and secondary structure of the new microRNA and the mean free energy of the pre-microRNA.

Overlap with repetitive sequences and transposons -

Each sequence read can map in several positions in the genome, and therefore, one sequence read may have several overlaps. For each match, the RepeatMasker annotation is checked. In case of an overlap, the repeat is assigned to the sequence read and written out.

Unmapped sequence reads -

Counts and expression values of the sequence reads that were Not identified in the previous steps (or filtered out) are shown.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site miRanalyzer

Price Contact manufacturer.

G6G Abstract Number 20749

G6G Manufacturer Number 104332