Gene Interaction Network (GIN)
Category Cross-Omics>Data/Text Mining Systems/Tools
Abstract Gene Interaction Network (GIN) is a system for browsing articles and molecule interaction information.
What makes GIN stand out from other similar systems is that it uses automated methods (such as dependency parsing) to mine the text for relevant information (such as protein interactions) and computes statistics for the interaction network.
The user can browse articles with highlighted summary sentences, citing sentences (sentences from other articles that cite the article in question), and interaction sentences.
The user can also browse molecules to view their interactions, neighborhood, and other network statistics.
GIN uses text mining and network analysis methods to predict gene- disease associations.
Disease-Specific interaction networks are built by starting with initial lists of "Seed Disease Genes" that are known to be related to a disease.
All the interactions among the seed genes and the genes that interact with them are extracted automatically from the literature.
Centrality metrics are used to rank the genes in the constructed disease-specific networks. The hypothesis is that, the central genes in these networks ("Inferred Disease Genes") are likely to be related to the diseases.
The gene names in GIN are normalized to their official HGNC (HUGO Gene Nomenclature Committee) symbols.
HGNC is in the process of giving unique and meaningful names to every human gene - For each known human gene HGNC approves a gene name and symbol (short-form abbreviation) are given.
All approved symbols are stored in the HGNC database. Each symbol is unique and HGNC ensures that each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that HGNC and others can talk about them; it also facilitates electronic data retrieval from publications.
In preference each symbol maintains parallel construction in different members of a gene family and can also be used in other species, especially the mouse. HGNC has approved over 26,000 human gene symbols and names.
GIN features/capabilities/options include:
1) Molecule Search -- Used to search for information about a molecule. A list of molecules will be returned whose names contain the given molecule name as a sub-string. A list Global Network Statistics is also displayed - showing:
- a) Degree - the number of molecules this molecule interacts with.
- b) Clustering coefficient - clustering coefficient is a number that describes how well connected this molecule's neighbors are.
- It is defined as the number of interactions between this molecule's neighbors divided by the number of possible interactions between them.
- c) Disease-Specific Networks, such as, Prostate Cancer.
- d) Eigenvector Centrality Percentile;
- e) Degree Centrality Percentile;
- f) Closeness Centrality Percentile; and
- g) Betweenness Centrality Percentile - Each of these percentiles is a number that describes how central this molecule is in the graph of interactions.
- h) MiMI - Listed is a link to extensive information about the molecule via MiMI (see G6G Abstract Number 20311).
- i) Cytoscape - Listed is a link to extensive information about the molecule via Cytoscape (see G6G Abstract Number 20092).
- j) Second Neighbors - a list of Second Neighbor links that can viewed/listed.
2) Article Search -- The search results lists an extensive list of links to PubMed article excerpts (highlighted in yellow) that pertain to the interactions of the molecule being searched and all the other seed genes that interact with them.
3) Disease-Specific Networks -- Some of the molecules available in GIN have been sorted into disease-specific networks. The user clicks on a link to get a list of molecules in each of these networks. The Disease-Specific Network example currently offered is for Prostate Cancer.
4) Seed Disease Genes -- Disease-specific networks are built automatically from the literature by starting with lists of seed genes that are known to be related to a disease. New disease genes are inferred by using centrality metrics.
The user clicks on a link to get a list of seed genes in each of these networks. The Seed Disease Genes example currently offered is for Prostate Cancer.
5) Inferred Disease Genes -- Disease-specific networks are built automatically from the literature by starting with lists of seed genes that are known to be related to a disease. New disease genes are inferred by using centrality metrics.
The user clicks on the link to get a list of inferred disease genes, which are ranked among the top 20 by eigenvector, degree, closeness, or betweenness centrality metrics. The Inferred Disease Genes example currently offered is for Prostate Cancer.
6) Prostate Cancer Case Study -- The manufacturer collected an initial list of seed genes known to be related to a disease (Prostate Cancer) and constructed a disease-specific 'gene-interaction network' by extracting the interactions among the seed genes and their neighbors automatically from the biomedical literature by using Support Vector Machines (SVMs) with the dependency path edit kernel.
Next, the manufacturer used degree, eigenvector, closeness and betweenness centrality metrics to rank the genes in the network according to their relevance to the disease.
The manufacturer hypothesized that the genes that are central in the constructed disease-specific network are likely to be associated with the disease.
The manufacturer evaluated their approach for prostate cancer and showed that degree and eigenvector centrality metrics achieve highly accurate results (95% of the top 20 genes are actually related to the disease), whereas closeness and betweenness centrality metrics introduce genes that are currently unknown to be related to the disease.
The manufacturers were able to extract genes, which are Not marked as being related to prostate cancer by the curated Prostate Gene DataBase (PGDB) even though there are recent articles that confirm the association of these genes with the disease.
This approach can be used to extract known 'gene-disease associations' from the literature, as well as to infer unknown gene disease associations which are good candidates for experimental analysis.
System Requirements
GIN is both a web service that integrates data and an application for research. It applies open source software to the problem of supporting the web service.
Manufacturer
GIN is developed by the Computational Linguistics And Information Retrieval (CLAIR) group at the University of Michigan.
The National Center for Integrative Biomedical Informatics (NCIBI). NCIBI is based at the University of Michigan as a part of The Center for Computational Medicine and Biology (CCMB).
Manufacturer Web Site Gene Interaction Network (GIN)
Price Contact manufacturer.
G6G Abstract Number 20313
G6G Manufacturer Number 102857




