Analysis Tool for Heritable and Environmental Network Associations (ATHENA)

Category Genomics>Genetic Data Analysis/Tools

Abstract Analysis Tool for Heritable and Environmental Network Associations (ATHENA) is a multi-functional analytical tool designed to perform the three (3) main functions essential to determining the genetic architecture of complex diseases:

1) Performing variable selection from categorical or continuous independent variables;

2) Modeling main and interaction effects that best predict categorical or continuous outcome data; and

3) Interpreting the significant models for use in further biomedical research.

ATHENA is unique in its ability to integrate multiple types of information relevant to a phenotype [examples: single nucleotide polymorphisms (SNPs), microarray, proteomics, and clinical data] that can be combined and used for a single analysis.

In addition, ATHENA incorporates Biofilter (see below...), which makes use of publicly available biological domain knowledge, in order to filter out statistical noise in favor of signals that have true biological relevance.

Several different strategies that are effective in modeling genetic susceptibility factors are available in the software package [examples: grammatical evolution neural networks (GENN - see below...), regression and classification trees, and support vector machines].

The goal of ATHENA is to perform sophisticated analysis of genetic, genomic, and other biological data for use in a complex association analysis.

Biofilter --

Biofilter is a tool for knowledge-driven multi-SNP analysis of large scale SNP data. The Biofilter fundamentally differs from other methods in the way knowledge is incorporated into the analysis pipeline.

The Biofilter uses biological information about gene-gene relationships and gene-disease relationships to construct multi-SNP models before conducting any statistical analysis.

Rather than annotating the independent effect of each SNP in a Genome-Wide Association Study (GWAS) dataset, the Biofilter allows the explicit detection and modeling of interactions between a set of SNPs.

In this manner, the Biofilter process provides a tool to discover significant multi-SNP models with non-significant main effects that have established biological plausibility.

This approach has the added benefit of reducing both the computational and statistical burden of exhaustively evaluating all possible multi-SNP models.

Overall, the Biofilter provides a systematic way to assess the level of knowledge-based support for a given genetic model, provide a ranked list of all possible knowledge-based models, and to statistically test each of these hypotheses in genome-wide association data.

Biofilter is capable of integrating information from several publicly available biological databases in order to assess specific combinations of genetic variations and their effect on the outcome based on prior statistical and biological knowledge.

Specifically, this tool uses the Gene Ontology (GO), the Database of Interacting Proteins (DIP), the Protein Families Database (Pfam), the Kyoto Encyclopedia of Genes and Genomes (KEGG);

Reactome - (see G6G Abstract Number 20267), NetPath, and the Genetic Association Database (GAD) - (see G6G Abstract Number 20314) in order to construct two-SNP models that are supported by the biological literature.

Their degree of support in the literature is characterized by an implication index - which is a count of how many times a relationship between a pair of two (2) genes appears across multiple databases incorporated into Biofilter.

Grammatical Evolution Neural Networks (GENN) --

GENN is a variation on Genetic Programming of Neural Networks (GPNN) - (GPNN is a technique that utilizes ‘genetic programming’ to optimize neural nets for classification and identification of gene-gene interactions).

The main difference being that in GPNN, evolutionary operators such as cross-over and mutation act directly on the neural network, whereas in GENN, evolution occurs at the level of the binary string which is later translated into a neural network (NN) using a set of rules or grammars.

GENN applies grammatical evolution (GE) to optimize neural nets for detection and modeling of gene-gene interactions.

Grammatical evolution (GE) is an evolutionary algorithm that uses linear genomes and grammars to define the populations. In GE, each individual consists of a binary genome divided into codons. Mutation takes place on individual bits but crossover only takes place between the codons.

An individual or phenotype is produced by translating the codons using the grammar. The resulting individual can then be tested for fitness in the population and the usual evolutionary operators can be carried out.

By using a grammar to define the phenotype, GE separates the genotype from the phenotype and allows greater genetic diversity within the population than other evolutionary algorithms.

Since GENN uses a grammar to define the structure of the resulting neural network (NN), one can easily vary the behavior of the program with changes to the grammar.

In GPNN the GP was constrained so that only valid neural networks can be produced. Any change to the behavior required changes to the code.

The constraints for GENN are provided by the grammar itself and can be easily modified without modification of the code. For example, Boolean operators can be added or removed by changing only the grammar file used as an input to the program.

In addition, GPNN uses a binary tree for the genome and therefore, only two (2) connections between nodes are possible. In GENN the grammar allows for defining multiple connections between nodes selected by the algorithm.

Variable numbers of connections allows for more complicated neural networks to be evolved and potentially makes GENN more advanced than GPNN.

GENN evolution allows for a more computationally efficient analysis because evolution takes place on a simple binary string rather than an entire neural network and thus creates more genetic diversity.

GENN uses evolutionary computation (EC) to optimize neural networks and detect genetic models associated with a particular phenotype. This is Not a novel concept; EC has been used to evolve NN’s in other fields of study such as aircraft simulation, robot path planning, and many more.

However, one of the benefits of EC is that it does Not require a priori variable selection or architecture definition, rather it allows the user to optimize weight, inputs, and architecture simultaneously.

The underlying model for each disease is different, and widening the search space to include all possible networks is optimal.

GENN can outperform back-propagation and random search strategies in finding gene-gene interactions in simulated data sets.

ATHENA Additional Information --

For additional information on Grammatical Evolution Neural Networks (GENN), Biofilter, and ATHENA - an advanced software/methodology, see ATHENA: A knowledge-based hybrid Backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci; Stephen D Turner, Scott M Dudek, and Marylyn D Ritchie; BioData Mining 2010, 3:5doi:10.1186/1756-0381-3-5 and contact the manufacturer via their web-site.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site ATHENA

Price Contact manufacturer.

G6G Abstract Number 20681

G6G Manufacturer Number 104259