DataBase of CpG islands and Analytical Tool (DBCAT)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract DBCAT (DataBase of CpG islands and Analytical Tool) is a web-based application and methylation database containing several convenient tools for investigating epigenetic regulation in human diseases.

DBCAT was developed to characterize comprehensive DNA methylation profiles in human cancers.

To the manufacturer’s knowledge, DBCAT is one of the first online methylation analytical tools, and is composed of three (3) parts: a CpG island finder, a genome query browser, and a tool for analyzing methylation microarray data.

The analytical tools can quickly identify genes with methylated regions from microarray data, compare the methylation status changes between different arrays, and provide functional analysis in addition to co-localizing transcription factor binding sites.

DBCAT Structure and Implementation --

DBCAT includes three (3) analytical tools that act on methylation microarray data (as stated above):

1) CpG Island Finder;

2) Database and Queries; and

3) Methylation Microarray Data Analyzer.

1) CpG island finder --

CpG island finder is a tool for identifying CpG islands. Gardiner-Garden and Frommer [Gardiner-Garden, M., and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282. (1987)] defined CpG islands using three (3) parameters: sequence length, G+C content, and ratio of observed CpG to expected CpG.

DBCAT uses these three (3) parameters to set the stringency of selection criteria for CpG islands. DNA sequences in the FASTA format are imported into the finder, and the output consists of both text and graphics.

According to the manufacturer, the processing time of their CpG island finder algorithm is very fast; only 10 seconds is required to process 1 million nucleotides.

2) Database construction and queries --

To map CpG island regions in the human genome, a database for genes and CpG islands was established.

This database Not only serves as a human genome browser for CpG islands, but also incorporates information from the database into the algorithm for analyzing microarray data.

Complete human genome sequence files and annotations were downloaded from the UCSC Genome Browser system.

TSSs were obtained from the DBTSS - DBTSS is a database which contains precise positional information for Transcription Start Sites (TSSs) of eukaryotic mRNAs.

Transcription Factor Binding Sites (TFBS) matrices were extracted using TRANSFAC Professional 10.2 software (BIOBASE).

Biological processes, molecular functions, and related human gene pathways were obtained from the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.

Many false-positive CpG islands, under the original selection criteria defined by Gardiner-Garden and Frommer (1987), were observed because of the presence of repetitive sequences.

Therefore, a higher stringency for the observation/expectation (o/e) ratio and Guanine-Cytosine (GC) content was used that excluded repetitive sequences.

Three (3) query types are provided to the user for browsing human genes: RefSeq ID, Entrez Gene ID, and gene symbol.

3) Methylation microarray data analyzer --

This tool is designed for analyzing data from methylation microarrays. Probe intensities were normalized by Lowess, and a sliding window approach was used to identify the methylated DNA regions.

The methylated DNA regions were defined as a segment of length l, containing at least d probes, and the proportion of the probes with a log2 ratio larger than r was greater than 50%.

Note: These three (3) parameters must be determined by the user prior to analysis.

As the windows moved from probe to probe at each step, overlapping methylated windows were then grouped together as a single methylated region.

Many studies have indicated that methylation regions near promoters or transcription start sites may prevent transcription factors from binding to specific DNA sequences and, as a result, markedly reduce the levels of gene expression.

Therefore, the distance between methylated regions and TSSs was also considered in the manufacturer’s analytical tool.

All analyzed results can be stored in the database, and users can compare results from different arrays.

After analysis, genes with different methylation status levels are listed. These genes are classified according to their biological processes and molecular Gene Ontology (GO) functions.

The probe sites, intensities, and transcription binding sites were incorporated into the analysis and are shown in the genome browser, and the corresponding transcription factors are listed.

Through the graphical display, this tool provides comprehensive information regarding methylated DNA regions and TFBSs, helping users evaluate the relationship between gene methylation and expression profiles.

DBCAT also provides comparisons across arrays. This is useful for comparing the methylation status changes between different arrays.

For cross-array comparison, users must determine 1) the methylation status change at a single probe measured in log2 scale, and 2) the percentage of all probes from one gene that meet the parameter requirements set in (1).

All qualified genes will be listed, and the changes in methylation status of every listed gene are graphically shown. These graphs are bidirectional.

Methylation changes toward either one sample or the other are indicated by arrows with different colors.

DBCAT Implementation -

DBCAT was created via MySQL and operates on a LINUX system using an Apache web server.

The search engine and results display were written in PERL, as were the algorithms for locating CpG islands and analyzing methylation microarray data.

Currently, DBCAT can only analyze data from Agilent and Illumina methylation microarray data, but will accommodate other commercial or in-house printing microarray formats in the future.

DBCAT documentation --

The manufacturer provides an extensive User Guide for DBCAT.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site DataBase of CpG islands and Analytical Tool (DBCAT)

Price Contact manufacturer.

G6G Abstract Number 20776

G6G Manufacturer Number 104353