SchizophreniaGene (SZGene) database

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The SchizophreniaGene (SZGene) database aims to provide a comprehensive, unbiased and regularly updated collection of genetic association studies performed on schizophrenia phenotypes.

Eligible publications are identified following systematic searches of scientific literature databases, as well as the table of contents of journals in genetics and psychiatry.

The database can be searched either by a variety of dropdown menus or by specific keywords. For each gene, summary overviews are provided displaying key characteristics for each publication, including links to genotype distributions of the polymorphisms studied, random- effects allelic meta-analyses, and funnel plots for an assessment of publication bias.

Database Organization and Methods --

If an association study also included subjects afflicted with disorders other than schizophrenia (e.g., bipolar disorder or schizoaffective disorder), generally only samples fulfilling diagnostic criteria for schizophrenia are included in the database and subsequent analyses, if they were listed separately in the original publication.

Data selected for display summarize key characteristics of the investigated study cohorts (e.g., gene overview), as well as genotype distributions in cases and controls (e.g., polymorphism details).

For polymorphisms with genotype data in at least four (4) case-control samples, continuously updated random-effects meta-analyses are presented (see meta-analysis methods below...).

Note that data obtained from family-based studies are Not included in the meta-analyses, as crude odds ratios cannot be readily calculated from overall genotype distributions.

However, these studies and their qualitative results are still listed on the ‘gene-summary’ pages of the SZGene website.

Meta-Analysis Methods --

For all polymorphisms with minor allele frequencies in healthy controls >1%, and for which case-control genotype data are available in four (4) or more independent samples, crude odds ratios (ORs) and 95 percent confidence intervals (CIs) are calculated from the reported allele distributions for each study.

Summary ORs and 95 percent CIs are calculated using the DerSimonian and Laird (1986) random-effects model, which utilizes weights that incorporate both within-study and between-study variance.

This procedure is done including all studies irrespective of ethnicity (denoted by "All Studies" on the meta-analysis figures), and repeated after exclusion of the initial study ("All Excl Initial Study"), after exclusion of studies in which a deviation of Hardy-Weinberg Equilibrium (HWE) was detected in controls ("All Excl HWE Deviations"), and after exclusion of samples of non-Caucasian ancestry ("All Caucasian Studies").

Overlapping samples (of which usually only the largest is included), studies with missing data, or control samples deviating from HWE are indicated on the meta-analysis graphs.

Please note, that when only a few studies are included in the meta- analyses (i.e. less than ~10), the random effects model may yield summary ORs and confidence bounds that are slightly anti-conservative.

To allow a visual assessment of the presence of publication bias (or other sorts of reporting bias), the manufacturer uses a Begg modified funnel plot which depicts the allele-specific OR (on a logarithmic scale) against its standard error for each study (Egger, 1997) including studies of all ethnicities.

Note that the power to detect deviations from a symmetrical distribution is limited, especially for analyses based on less than ~20 individual studies.

Inclusion of Genome-wide Association (GWA) Analyses --

The manufacturer has devised the following step-wise protocol, which they believe allows them to capture the most relevant genetic information without the need to include every data-point from these studies.

Note that this feature of SZGene is new and still under development.

Please visit the "Overview of all published large-scale and genome- wide association studies in SZ" page to see a summary of all published large-scale studies currently included in SZGene.

SZGene Stage 1 - Represents the inclusion of genes and polymorphisms “featured” or highlighted by the authors of the large- scale study, usually because they show some degree of genetic association after completion of all analyses, e.g. testing multiple independent samples.

These genes and polymorphisms probably represent the most important findings of each large-scale analysis and are therefore included here with highest priority.

This stage has already been implemented in the current version of SZGene (e.g. for the CSF2RA gene featured in the GWA study by Lencz et al. [2007]).

For large-scale/GWA studies that have made their genotype data publicly available, the manufacturer will also make use of “non-featured” genotype distributions, i.e. of polymorphisms Not believed to be associated with schizophrenia in the original publications:

SZGene Stage 2 - Will add large-scale/GWA genotype data for polymorphisms already available in SZGene, i.e. usually derived from candidate gene studies published prior to 2007.

Large-scale/GWA data for such overlapping polymorphisms will be added to the gene-specific entries and, if genotype data is then available in a total of at least four (4) independent case-control samples, included and displayed in the meta-analyses.

This stage adds valuable information to the existing SZGene meta- analyses as it is derived from assessments that are largely unbiased with respect to gene function, in contrast to most conventional candidate gene studies. This feature is Not yet available in SZGene.

SZGene Stage 3 - Applies to GWA studies only. If genotype distributions are publicly available for multiple GWA scans, the manufacturer will perform systematic meta-analyses for all markers overlapping in at least four (4) independent case-control samples. Only those showing significant summary ORs will be displayed on the SZGene website.

The threshold of declaring statistical significance (resulting in being displayed at the front-end of the database) in this context will be more stringent, due to the large number of tests performed (i.e. P-values of the summary ORs much less than 0.05).

Procedures for implementing this stage and the definition of appropriate threshold criteria is currently underway and will follow guidelines suggested previously (Evangelou, 2007). This feature is Not yet available in SZGene.

Summary of Meta-analysis Highlights: The "Top Results" List --

In an effort to facilitate the identification of the most promising meta- analysis results available in SZGene, a continuously updated list displaying the most strongly associated genes ("Top Results") has been added to the manufacturer's homepage.

The list is ranked by effect size, and only includes genes that contain at least one variant showing a nominally significant summary OR in the analysis of all ethnic groups (“All”), or those limited to samples of Caucasian ancestry (“Caucasian only”).

While the manufacturer believes that this list represents an up-to-date summary of particularly promising schizophrenia candidate genes that warrant follow-up with high priority, the manufacturer notes that many of these may represent false-positive findings, in particular those based on small (less than 10) sample sizes.

System Requirements



Manufacturer Web Site SchizophreniaGene (SZGene) database

Price Contact manufacturer.

G6G Abstract Number 20478

G6G Manufacturer Number 104103