Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract PHIDIAS (Pathogen-Host Interaction Data Integration and Analysis System) is a web-based modular system and centralized resource (database) for biomedical researchers to investigate integrated genome sequences, curated literature information, and gene expression data related to pathogen-host interactions (PHI, also called host-pathogen interactions or HPI) for pathogens with high priority in public health and biological defense.

Infectious diseases remain among the most common and fatal of diseases. According to estimations of the World Health Organization, infectious diseases caused 14.7 million deaths in 2001, accounting for 26% of the total global mortality.

Infectious disease is the result of an interactive relationship between a pathogen and its host. Integration and analysis of various data related to pathogens and pathogen-host interactions will yield a better understanding of and means for the control of infectious diseases induced by pathogens.

PHIDIAS is aimed at organizing and elucidating the fundamental PHI insights.

Genomic information of completely sequenced host and pathogen organisms provides valuable information Not only for identification and reconstruction of intra-organismic processes but also for interactions between host and microbial organisms.

To facilitate genome analysis and comparison, PHIDIAS integrates genome data from more than 20 sources [e.g., National Center for Biotechnology Information (NCBI), RefSeq, and Swissprot] and provides a genome browser allowing users to browse and compare more than 30 microbial genomes.

PHIDIAS also links publicly available human and mouse genome browsers for users to browse and analyze human and mouse genomes.

Conserved domains are critical for assessing protein functions which provide important clues to microbial pathogenesis and interactions between pathogens and hosts.

While NCBI's Conserved Domain Database (CDD) contains conserved domains derived from various eukaryotic and prokaryotic organisms, it is difficult to compare and analyze pathogen-specific conserved domains.

Therefore, the manufacturer has developed PHIDIAS to search and store all pathogen-specific conserved domains.

All sequence information is available for comparison and analysis using the manufacturer's customized Basic Local Alignment Search Tool (BLAST) programs.

A large amount of information about human and animal pathogens has been acquired, stored, and displayed through different resources, both electronically and by other means.

Most electronic resources are formatted in HyperText Markup Language (HTML) and/or Portable Document Format (PDF) files.

While these resources are good for viewing and navigation, they do Not permit machine-based data transfer and query.

To allow machine-readable data exchange of the voluminous pathogen information, Dr. Yongqun "Oliver" He and colleagues at the Virginia Bioinformatics Institute (VBI) at Virginia Tech developed the Extensible Markup Language (XML)-based Pathogen Information Markup Language (PIML).

This language represents comprehensive pathogen-oriented information including pathogen taxonomy, genomic information, life cycle, and epidemiology, induced diseases in hosts, diagnosis, treatment, and relevant laboratory analysis.

A list of PIML documents addressing pathogens has been created and is available through a public VBI web service.

However, compared to relational databases, XML databases do Not support efficient query functions and scalability. These deficiencies prompted the manufacturer to design a web-based relational database for general PHI information.

This allows storage, integration, query and data mining of parsed PIML data and other PHI-related information, for example, data related to the pathobiology and management of pathogen-infected laboratory animals from the Hazards in Animal Research Database (HazARD).

The molecular functions of pathogen and host genes as well as their roles in microbial pathogenesis and host immunity have been extensively studied. However, a systematic collation from the literature of these molecules and their PHI functions is lacking.

Although richly documented in the literature, descriptions of the networks of microbial and host molecular and cellular interactions that occur during pathogenic infections of hosts are underrepresented in current database systems.

PHIDIAS targets to integrate and curate PHI specific molecules and their interaction networks from publicly available databases (e.g., KEGG and MiMi) and by manual curation.

PHIDIAS also incorporates data from MINet based on an XML-based Molecular Interaction Network Markup Language (MINetML, also known as ProNet).

PHIDIAS also includes a program to transfer PHI specific network data into the Biological Pathways Exchange format (BioPAX). Additionally, a ‘network visualization’ tool has been developed to graphically browse PHI specific networks.

Large-scale experimental techniques such as microarrays and mass spectrometry result in abundant sources of PHI data previously unavailable to investigators. While a large amount of PHI related gene expression data is publicly available in different databases, it is often difficult to query.

PHIDIAS stores information of 'gene expression' experiments related to pathogens and host-pathogen interactions, from public gene expression repositories including NCBI GEO (see G6G Abstract Number 20013) and European Bioinformatics Institute (EBI) ArrayExpress (see G6G Abstract Number 20012).

In PHIDIAS, a one-stop gateway is provided for users to query PHI gene expression data and to link it to original data sources, thereby permitting further analysis.

PHIDIAS utilizes online data submission systems for efficient data curation making integrative PHI data more comprehensive.

All PHIDIAS components are scalable. More pathogens and PHI systems may be added to the system in the future.

With the inclusion of an ever increasing number of pathogens in PHIDIAS and the increasing amount of information in the literature information, an ongoing challenge will be to curate all significant genes and keep current PHI-related information contained in the PHI DataBase (PhiDB).

A future direction will be to explore ontology-based natural language processing and statistical methods to promote efficient literature acquisition and curation.

The manufacturer is currently upgrading and implementing a literature mining and curation system (Limix) that the manufacturer originally developed for the Brucella genome annotation in the Brucella Bioinformatics Portal (BBP).

Additionally, a web-based pipeline for PHI 'gene expression data' analysis and modeling will be developed.

The manufacturer anticipates that PHIDIAS will become a system for researchers to address scientific PHI questions with the ultimate goal of successfully fighting infectious diseases.

Overall, PHIDIAS includes the following components:

1) PGBrowser – The Pathogen Genome Browser can be used to search and analyze genome features.

2) Pacodom - Pathogen Protein Conserved Domains - stores and allows searching of the conserved domains of proteins from 77 pathogen genomes.

3) BLAST - BLAST search programs with customized BLAST libraries.

4) Phinfo - Pathogen-Host Interaction Information - Phinfo is a MySQL- based module that stores pathogen and pathogen-host interaction information curated from biomedical literature or other curated data sources (particularly PathInfo, HazARD, MINet, and KEGG). Also includes the ability to search this information.

5) Phigen - Pathogen-host interaction Genes - Phigen is targeted to provide annotation for PHI-related pathogen and host genes. It contains the manufacturer’s manually curated PHI gene information.

It also links to all the genes from all the pathogen genomes available in PHIDIAS.

Users are also encouraged to submit manually annotated PHI data for any particular gene through the manufacturer's online submission system.

6) Phinet - Pathogen-Host Interaction Networks - Phinet is aimed at analyzing molecular networks responsible for pathogen-host interactions.

Phinet data is mainly derived from the MINetML XML database extracted through a web service, other curated databases (e.g., KEGG), and local manual annotation based on literature curation.

The manufacturer has also developed a new visualization program to dynamically display Phinet information.

7) Phix - PHI Gene Expression - The Phix module stores all possible gene expression experiment records related to a list of priority pathogens and relevant host-pathogen interactions from NCBI GEO and EBI ArrayExpress databases.

These experiment records can be searched from one of the manufacturer's Phix search programs. Phix also provides search for specific gene profiles from studies involving one or all of the manufacturer’s priority pathogens from GEO or ArrayExpress.

System Requirements

Contact manufacturer.


Manufacturer Web Site PHIDIAS

Price Contact manufacturer.

G6G Abstract Number 20308

G6G Manufacturer Number 101201