Genomic Metadata for Infectious Agents (GeMInA)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract GeMInA is an open source web-based pathogen-centric tool designed to provide an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease metadata anchored on the taxonomic ID of the pathogen and host.

The Gemina project has developed a rigorous system of ontological standards that enable the tracking of pathogen related metadata, based on rigorous literature data mining.

The Gemina system enables biomedical, bioforensics, and biodefense users to ask the questions of Who and What are these pathogens and hosts being affected, When and Where are the incidents occurring, and What diseases and symptoms are being reported in the current or month, year or decade.

The Gemina system links unique genomic representations of each pathogen with ontology regularized metadata for the associated epidemiological information.

Gemina provides a metadata selection query interface to guide identification of the NIAID category A-C viral and bacterial pathogens connecting the pathogen metadata to a selection tool to calculate unique regions within the genomes of these pathogens identifying DNA signatures using the 'Insignia Detection Tool'.

(Insignia Detection Tool is a computational pipeline that is used to generate unique DNA signatures for any and all pathogens in the manufacturer's database).

Gemina supports the development of DNA signature-based assays for the detection of pathogens or sets of pathogen.

The Gemina database and query web interface provide a greater understanding of the interactions of viral and bacterial pathogens, their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development, metadata curation and representative genomic sequence identification and standards development.

The Gemina system enables users to explore the diversity of outbreak data for each NIAID category A-C pathogen reported in literature including the published CDC's; Morbidity and Mortality Weekly Reports that have been regularized through a set of mature community-adopted ontologies, to identify the breadth of hosts and diseases known for these pathogens, where these pathogens have been reported to occur in the world and to link to the Insignia Detection Tool to calculate the unique regions within the genomes of these pathogens.

Outbreak surveillance reporting sites, such as BioCaster, HealthMap, ProMed-mail and the World Health Organization (WHO) Disease Outbreak News identify outbreaks in real-time contributed by member institutions, news reports, and personal accounts.

RSS feeds and online outbreak reports are rich resources of automated data feeds.

Online reporting site data includes a mixture of suspected and documented outbreak cases and contains unfiltered data that requires additional quality control and data cleanup.

Mining of the online data sets through the filter of controlled vocabularies has provided a rich resource of additional metadata for published outbreak cases.

Gemina and the real-time reporting sites provide complimentary resources for outbreak surveillance.

Gemina Database Access --

Query interface - The Gemina database query web interface provides a suite of metadata types as a query selection tool to explore the diversity of infectious pathogens, selects these pathogens to identify their associated DNA via the Insignia Selection pipeline or examines the pathogen's; current outbreaks and outbreak history along the geospatial axis in Google Earth and Google Maps.

Pathogens can be selected and explored based on their curated infection metadata (host, source, reservoir, disease, transmission method, or symptoms) or the incident metadata (location, date, gender or age).

The Gemina system provides a standardized set of data types of infectious pathogen information.

This standardization provides the research community with reliable, quality controlled data prepared in a format amenable for data exchange, comparisons and analysis.

Therefore, data collected from diverse studies, over many years are comparable. The Gemina system is unique in that it contains the breadth of literature reported outbreak data for each pathogen standardized into a uniform format and set of vocabularies.

Gemina's web interface provides a gateway to explore data in user specified ways.

Data output - The Gemina Search Report results page is pathogen- centric with the metadata for Infections Systems or Incidents reported, in separate columns, under the strain name of each pathogen.

Each pathogen result set is presented as a separate block of results, with pathogens presented in alphabetical order from top to bottom.

Gemina internal identifiers (infection system transmission IDs or incident IDs in column 2).

The report page provides links to reservoir and toxin lists, curated references, options to select and submit pathogens to Insignia's; DNA Sequence pipeline or download results in a table or location data in Keyhole Markup Language (KML) format.

(KML is an XML grammar and file format for modeling and storing geographic features such as points, lines, images, polygons, and models for display in Google Earth, Google Maps and other applications).

Location data is viewable for each incident as place names, in Google Maps and Google Earth.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site GeMInA

Price Contact manufacturer.

G6G Abstract Number 20504

G6G Manufacturer Number 104123