Category Genomics>Gene Expression Analysis/Profiling/Tools and Cross-Omics>Next Generation Sequence Analysis/Tools

Abstract mirTools is a comprehensive web server that was developed to allow researchers to comprehensively characterize small RNA transcriptome. With the aid of mirTools, users can:

1) Filter low-quality reads and 3/5' adapters from raw sequenced data;

2) Align large-scale short reads to the reference genome and explore their length distribution;

3) Classify small RNA candidates into known categories, such as known miRNAs, non-coding RNA, genomic repeats and coding sequences;

4) Provide detailed annotation information for known miRNAs, such as miRNA/miRNA*, absolute/relative read counts and the most abundant tag;

5) Predict novel miRNAs that have Not been characterized before; and

6) Identify differentially expressed miRNAs between samples based on two (2) different counting strategies: total read tag counts and the most abundant tag counts.

The manufacturers believe that the integration of multiple computational approaches in mirTools will greatly facilitate current microRNA researches in multiple ways.

mirTools Analysis Workflow --

The current procedure of mirTools used to annotate small RNA transcriptome by high-throughput sequencing is as follows:

Read filter --

Briefly, for deep sequencing reads produced by the Illumina Genome Analyzer or the 454 FLX instrument, low-quality reads must be filtered out to exclude those most likely to represent sequencing errors and 3/5' adaptor sequences.

Subsequently, they are trimmed into clean full-length reads and formatted into a non-redundant FASTA file. The occurrence of each unique sequence read is counted as a sequence tag and the number of reads for each tag reflects its relative expression level.

Small RNA annotation --

All unique sequence tags that pass through the above filtering criteria are mapped onto the reference genome using the SOAP2 program (SOAP2 is an improved ultrafast tool for short read alignment).

Subsequently, these unique sequence tags are also aligned against miRBase; Rfam [Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars];

The repeat database produced by RepeatMasker; and the coding genes of the reference genome.

By this process, the unique sequence tags can be classified into the following categories: known miRNA, degradation fragments of non-coding RNA, genomic repeats, and mRNA.

In case of a conflict, a hierarchy is conducted to assign the tag to a unique category, which starts with non-coding RNA, then known miRNA and followed by repeat associated RNA and mRNA.

Sequences that are assigned to none of these annotations but those that can be mapped to the reference genome are classified as ‘unclassified’.

Differential expression detection --

To compare differentially expressed miRNAs between multiple samples, the read count of each identified miRNA is normalized to the total number of miRNA read counts that are matched to the reference genome in each sample.

The statistical significance (P-value) is inferred based on a Bayesian method, which was developed for analyzing digital ‘gene expression’ profiles and could account for the sampling variability of tags with low counts.

By default, a specific miRNA will be deemed to be significantly differentially expressed when the P-value given by this method is =0.01 and there is at least a 2-fold change in normalized sequence counts.

Novel miRNA prediction --

Sequences that do Not fall into the above annotation categories but match on the reference genome are used to detect candidate novel miRNA genes.

By default, 100 nucleotides of genomic sequence flanking each side of these sequences are extracted and their RNA secondary structures are predicted using RNAfold (RNAfold is a RNA secondary structure server).

Novel miRNAs are identified by folding the flanking genomic sequence using the miRDeep program.

mirTools Data output --

A typical output of mirTools consists of six (6) parts: length distribution, reference genome mapping, annotation, known miRNAs, novel miRNAs and differentially expressed miRNAs.

All these components are well organized with examples to facilitate the users with the correct input and expected results.

The first two (2) parts give an overview of the length distribution of miRNAs and their mapping ratios against the reference genome. mirTools plots both the unique read distribution and expression levels (the number of reads for each tag reflects its relative abundance).

This is useful to allow users to easily determine the efficiency of the deep sequencing procedure for miRNA detection and to simultaneously compare length distributions between samples.

The third part summarizes the percentage of small RNAs classified into different functional categories.

Currently, mirTools assign the small RNA sequences into one of following categories: known miRNA, degradation fragments of non-coding RNA (tRNA, rRNA, snRNA/snoRNA, etc.), genomic repeat, mRNA and unclassified.

In the fourth part, mirTools provides a detailed annotation for each known miRNA. On the left side of the output, a table shows known miRNA ID, 5'/3' arm, absolute count, relative counts (normalized to the total number of miRNA reads and then multiplied by 106), miRNA sequence and most abundant tags with Tag ID, absolute/relative counts and corresponding tag sequence.

Visual sequence alignments matched to a specific miRNA are listed in tabulated text files on the right side of the output.

In the fifth part, novel miRNAs identified by miRDeep are provided, which contain novel miRNA sequence, tag number, tag count and responding hairpin structure, which is displayed in a SVG format and thus requires a SVG plug-in to be installed on the users computer.

In the last (sixth) part, the relative expression level of all miRNAs is illustrated in a scatter plot with red dots representing differentially expressed miRNAs.

Also, detailed annotation of these differentially expressed miRNAs is provided, including miRNA ID, relative sequenced count, fold change, up-regulated/down-regulated and P-value.

It should be noted that mirTools employs two (2) different measures to evaluate miRNA expression levels:

One is based on the total tag count (number of specific miRNA tags/number of total miRNA sequence tags) and the other is based on the most abundant tag count (number of the most abundant tags of specific miRNA /number of total miRNA sequence tags).

System Requirements

Contact manufacturer.


Manufacturer Web Site mirTools

Price Contact manufacturer.

G6G Abstract Number 20754

G6G Manufacturer Number 104335