Sidekick

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract Sidekick is a web-based biological decision-making framework that helps you explore relationships among genes.

Sidekick enables scientists without training in computation and data management to pursue answers to research questions like “What are the mechanisms for disease X” or “Does the set of genes associated with disease X also influence other diseases”.

Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions.

Sidekick is designed to be an assistant that eases analysis burdens (both computational and book-keeping) and helps you organize both the results and your belief in the quality of the results.

Initial exploration -- Before going to the wet-lab, you can use Sidekick with an initial list of genes or concepts of interest to focus possible directions for research by quickly exploring interactions, orthologies, and enrichment.

Explanation of results -- After experiments or an analysis have produced results, you can further explore and develop explanations for the meaning of these results.

Sidekick features/capabilities include:

1) Sidekick uses up-to-date information -- Sidekick uses information from core resource sites such as NCBI, NCIBI, and EBI, to get the most recent available information and manages this information using the Sidecache webserver caching system. The Sidekick clients access web-services through Sidecache, which saves web-service results locally and refreshes information older than a specified time (usually a week).

For frequently used information or when web services are Not available, Sidecache downloads files with the information from the public website and stores these files on the Sidekick server. The files are automatically updated at off-times according to a pre-specified schedule based on the file update schedules of the source websites.

2) Easy workspace management -- Sidekick displays each gene list or a gene-pair list in the workspace by an icon. You can save and restore these collections of workflows, annotating each list with other information.

3) Gene-pair lists -- Gene-pair lists provide a way of representing arbitrary relationships between genes such as protein-protein interactions (PPIs) of their products. Sidekick represents gene-pair lists in the workspace by double rectangular icons.

You can produce gene-pair lists from gene lists using orthology or protein interactions as the relationship. You can also input your own gene-pair lists representing other types of relationships. You can translate two transitive gene-pair lists (i.e. A->B and B->C gives A->C) or combine pair lists using standard set operations. You can also do enrichment analysis on gene-pair lists.

4) Orthology -- Apply orthology to move from one species to another and back again. Once you have a gene list, you can find orthologs using Sidekick. You can use these orthologous gene-pair lists to draw conclusions in one species from data available in another species.

Apply orthology with the Sidekick translation operation:

Combine: Translate geneA-geneB translate geneB-geneC -> geneA-geneC to move back and forth between species.

5) Finding interactions -- Once you have a gene list, you can find published interactions with Sidekick and then use the resulting gene-pair list in additional analyses.

a) Use: Query: genes->interacting pair list to create a gene-pair list of interactions.
b) Use: Manage: import NCBI gene-pair id list to import a gene-pair list representing arbitrary gene relationships into Sidekick.

6) Enrichment -- Enrichment determines whether a particular concept appears in a gene list or a gene-pair list more frequently than would occur from a randomly-selected subset of the population.

Sidekick enrichment calculations return a score reflecting how probable this would be. Sidekick supports enrichment based on the following concepts (score type in parentheses):

a) GO gene ontology terms (p-value);
b) disease terms (p-value);
c) gene expression (up regulated - down regulated); and
d) chromosomal proximity (p-value).

7) Confidence and belief -- Sidekick allows you to assert confidence or belief in results as you proceed through an analysis. Sidekick uses Dempster-Schafer Theory to combine scores and credibilities into a final confidence value (which you are free to ignore if you feel incredulous).

For example, NCIBI returns a p-value indicating the significance of its search results. Sidekick displays this p-value in the Score-from-source column of its Results section. The Score-from-source-credibility indicates how reliable you think this returned score is. You can also assign credibility to the source itself as well as to individual results.

8) Documented workflows -- Sidekick uses rectangular icons to visually represent your workflows enabling you to recognize analysis steps and trace information flow. You can annotate, save and restore workflows.

Sidekick Web Sources --

1) Search with NCBI eSummary -- The National Center for Biotechnology Information (NCBI) ESummary web service produces a gene list given a term (Not necessarily a disease term). You can choose between disease only and general within the NCBI search input filter to focus on only disease terms or to allow for generalized searches.

NCBI does Not return a quality measure for its results, so Sidekick assigns a default p-value of 0.05, which you can adjust to reflect stronger or weaker belief.

2) Search with NCIBI’s Gene2MeSH -- NCIBI Gene2MeSH uses a statistical approach to automatically annotate genes with the concepts defined in MeSH, the National Library of Medicine’s controlled vocabulary for biology and medicine.

Gene2MeSH returns a p-value representing the significance of the association between the input disease and genes derived from PubMed abstracts.

3) Gene interactions with MiMI -- National Center for Integrative Biomedical Informatics (NCIBI)’s MiMI (Michigan Molecular Interactions) compile several publicly available data sources including:

Biomolecular Interaction Network Database (BIND); Biological General Repository for Interaction Datasets (BioGrid); Center for Cancer Systems Biology Interaction Datasets (CCSB); Database of Interacting Proteins (DIP); Human Protein Reference Database (HPRD);

Molecular Interaction Database (IntAct); Unified Human Interactome (MDC); Molecular Interaction Database (MINT); and the Curated Knowledge-Base of Biological Pathways (Reactome).

MiMI web-services return the number of articles that describe a specific interaction.

4) NCBI gene interactions -- NCBI provides bulk transfer of information on genes and their interactions through the Gene database. Sidecache refreshes this data locally based on the update schedule provided by NCBI.

5) Gene orthology -- Orthology maps genes from one species to corresponding genes in another species. Sidekick uses Ensembl orthologous gene lists. Sidekick uses the percent identity between the two (2) orthologs as the score.

6) NCBI GO terms -- Sidekick uses the NCBI’s GO term database for its Gene Ontology (GO) term enrichment analysis. NCBI provides the GO annotation for each gene including how the annotation was formed.

IEA, Inferred from Electronic Annotation, consists only of evidence from computational analysis and is considered by some as less trustworthy. Either Curated Only (No IEA) or All Types (include IEA) can be searched.

The Score-from-source is the p-value representing the likelihood that the shared GO terms of the gene subset could have happened in a randomly chosen subset.

7) NCBI gene chromosomal proximity -- Sidekick uses NCBI’s gene chromosomal location for its chromosomal proximity enrichment analysis. Proximity of genes along the chromosome can indicate functional relationships between genes.

The gene groups that are most enriched for chromosomal proximity as determined by the number of base pairs separating the start positions of genes, are retrieved.

The Score-from-source is the p-value representing the likelihood that the proximity of the gene subset could have happened in a randomly chosen subset.

8) EBI gene expression -- Sidekick uses European Bioinformatics Institute (EBI)’s web service Gene Expression Atlas, within ArrayExpress for the calculation of enrichment based on gene expression. ArrayExpress provides a large corpus of microarray data sets that have been hand-curated and labeled with attributes such as cell type, developmental stage, and disease state among many others.

You can select general conditions such as disease state or select from the more complete list provided by ArrayExpress.

Sidekick returns the terms that are enriched in your gene list. The Score-from-source is the difference between the numbers of up regulated and down regulated experiments returned from ArrayExpress.

Sidekick Additional Info --

1) Sidekick Caching -- Sidekick uses the Sidecache caching system developed by the Visualization and Modeling Laboratory at UTSA.

2) Sidekick Enrichment -- Sidekick uses a modified version of the Grossman hierarchical enrichment algorithm based on Parent-child analysis for both GO term enrichment and enrichment by disease term (see hierarchical enrichment below...).

Sidekick’s chromosomal enrichment calculation uses a standard enrichment algorithm based on distance between start positions of the genes (based on chromosomal position provided by NCBI).

Sidekick’s gene expression enrichment calculation uses a standard enrichment algorithm based on the difference between the number of up-regulated and down-regulated experiments reported by ArrayExpress for a given term or condition.

3) Sidekick Hierarchical enrichment -- The terms in the Gene Ontology and disease terms are Not independent, but rather form a directed acyclic graph with more specific terms as the children of more general parents.

Sidekick uses a novel approach introduced by Grossmann et al. (see paper’s title below...) for detecting overrepresentation of GO terms and disease terms using parent-child analysis. The Grossman method addresses Not only the hierarchical nature of these terms but also occurrences of the same term in multiple branches of the graph.

Grossmann et al. paper -- Grossmann S, Bauer S, Robinson PN, Vingron M - Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 2007, 23:3024-3031.

System Requirements

Contact manufacturer.

Manufacturer

UTSA Visualization and Modeling Laboratory
Department of Computer Science
University of Texas at San Antonio (UTSA)
One UTSA Circle
San Antonio, TX 78249 USA
Tel: 1-210-458-5543

Manufacturer Web Site Sidekick

Price Contact manufacturer.

G6G Abstract Number 20735

G6G Manufacturer Number 104321

The G6G Directory of Omics and Intelligent Software

Sidekick