PhenoGO
Category Genomics>Genetic Data Analysis/Tools
Abstract PhenoGO is a computationally-derived resource that is primarily intended to provide phenotypic context (cell type, tissue, organ, and disease) for mining existing associations between gene products and GO terms specified in the Gene Ontology Databases, Automated natural language processing (BioMedLEE - see below...) and computational ontology (PhenOS- see below...) methods were used to derive these relationships from the literature.
It includes over 600,000 phenotypic contexts spanning eleven (11) species from five GO annotation databases.
A comprehensive evaluation evaluating the mappings (n = 300) found precision (positive predictive value) at 85%, and recall (sensitivity) at 76%.
A web portal is provided, allowing for advanced filtering and querying of the database as well as download of the entire dataset (see below).
PhenoGO is a multi-organism database that provides phenotypic context to existing associations between gene products and GO terms as specified in the Gene Ontology Annotations (GOA).
Context for identifiers are mapped to widely employed ‘biological ontologies’, including the Cell Type Ontology (CO), the Unified Medical Language System (UMLS), and National Library of Medicine's Medical Subject Headings terminology (MeSH) and some specialized ontologies such as the Mammalian Phenotype Ontology (MP) and adult Mouse Anatomy (MA).
This set of ontologies and terminologies allows for the contextualization at multiple scales of biology; mutations in a gene can be analyzed from multiple perspectives, from the resulting disruption of a biological process, and subsequent dysfunction in a cellular context, to changes in anatomy and morphology, and scaling up to the manifest disorder on an organismal level.
The database includes annotations for eleven (11) of the species defined in the National Center for Biotechnology Information (NCBI) taxonomy, including Schizosaccharomyces pombe, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Drosophila sp., Danio rerio, Gallus gallus, Homo sapiens, Bos taurus, Mus musculus, and Rattus norvegicus.
Data sources include GO annotations from the Saccharomyces Genome Database (SGD), Wormbase, Flybase, the Zebrafish Information Network (ZFIN), the European Bioinformatics Institute (EBI), Mouse Genome Informatics at the Jackson Laboratories (MGI), and the Rat Genome Database (RGD).
The integration of knowledge from these heterogeneous sources using established, standardized coding schemes enables broader application of multi-scale systems approaches to the analysis of complex disease and biological processes.
The PhenoGO dataset was developed to facilitate high throughput mining of experimental, phenotypic or disease contexts associated to gene-to-GO annotations.
Automated mapping of phenotypes --
The natural language processing (NLP) component of PhenoGO utilizes an existing system, called BioMedLEE, which is under development jointly by the Friedman and Lussier research groups.
The BioMedLEE system is an adaptation of the MedLEE system, which accurately extracts and encodes clinical phenotypic information in patient reports. BioMedLEE extracts and encodes genotype-phenotype relations from information in text.
Computational ontologies --
The Phenotype Organizer System (PhenOS) is a system developed by the Lussier Research Group with the purpose of bridging the gaps among heterogeneous biomedical terminologies.
The system provides lexico-semantic and model-theoretic methods for automatically mapping one ontology to another independently of the UMLS, and organizing and structuring phenotypes across heterogeneous datasets.
PhenoGO web portal --
The PhenoGO database is made publicly accessible through a web portal using Java Server Pages to access an underlying MySQL database.
The web portal provides access and filtering functionality for the database. This portal provides two (2) modes of querying the data.
The first is a simple query which users are first exposed to on the front page of the portal.
It allows for a search by all the fields of the database, including PubMed ID, gene accession number, gene name, gene description, GO ID code, GO Term name, phenotype or experimental context code, and phenotype or experimental context description.
This query mechanism is designed to provide users with a large number of results from the database, essentially corresponding to a logical OR query for all the query terms.
An advanced query system is also made available to provide more exact results.
The advanced query allows for searches based on the same fields as the basic interface; however it is focused on providing sets of results passing a number of strict criteria.
This equates to a logical AND query between all the search terms specified by the user in specific fields.
The interface also makes use of the structured organization of the Gene Ontology, the UMLS, and the Cell Ontology to provide hierarchical query functionality for the GO and context fields.
This is done through the generation of a number of ancestor- descendent tables which are recursively processed at query time to determine all descendents or descendents and subclasses of user- specified contextual or GO terms.
Note: PhenoGO is computed using natural language processing and thus some mappings are inaccurate.
System Requirements
Web-based.
Manufacturer
- Center for Biomedical Informatics
- Department of Medicine
- The University of Chicago
- Chicago, IL
- USA
- And
- Department of Biomedical Informatics
- Columbia University
- New York, NY
- USA
Manufacturer Web Site PhenoGO
Price Contact manufacturer.
G6G Abstract Number 20423
G6G Manufacturer Number 104052




