## PRojection Onto the Most Interesting Statistical Evidence (PROMISE)

** Category** Genomics>Gene Expression Analysis/Profiling/Tools

** Abstract** PRojection Onto the Most Interesting Statistical Evidence (PROMISE) is a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables.

Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable.

A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics.

By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation.

In simulation studies PROMISE has shown greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis.

PROMISE performs one hypothesis test for the specified pattern for each genomic variable. Additionally, PROMISE is a flexible procedure that can accommodate various types of endpoints, which classical multivariate procedures are Not designed to manage.

Other Approaches than PROMISE --

1) There are other approaches that could be taken to identify genomic variables that exhibit a specific pattern of association with multiple endpoints.

A seemingly straightforward approach would be to screen the association of the genomic variables with each endpoint individually and then identify genes that are significant in each analysis and have the desired pattern of association.

This approach is problematic because it lacks statistical power, and the results are difficult to interpret statistically. The analysis for each endpoint involves multiple testing.

After adjusting each endpoint’s results for multiple testing, it is quite likely that No gene will meet the criteria for inclusion as a ‘significant’ result because it is unlikely that a genomic variable could meet the stringent P-value threshold for each endpoint.

Additionally, with this approach, if any genes are identified, it would be difficult to assign a meaningful false discovery rate (FDR) estimate to the result.

For example, what FDR estimate should be ascribed to a set of genes that are inferred to have the association pattern of interest if these genes are selected because they meet a certain FDR or P-value threshold in several single-endpoint analyses?

PROMISE avoids these problems by performing a single test for the pattern of association for each gene.

For each genomic variable, PROMISE performs one test that directly addresses the question of whether a gene shows the association pattern of interest. This improves the statistical power and simplifies the interpretation.

2) Gene Set Enrichment Analysis (GSEA) and other gene-set analysis approaches could be used to determine whether gene sets identified from the analysis of association with one endpoint are associated with another endpoint.

For example, one could identify genes that are associated with one endpoint and then explore whether the set of identified genes is associated with another endpoint. While this exercise may provide useful biological insights, it does Not give results with the same interpretation as PROMISE.

PROMISE provides a P-value for each gene, whereas gene-set methods give a P-value for each gene set. Additionally, the interpretation of gene-set results may be difficult.

For instance, what if the list of genes associated with endpoint A are associated with endpoint B, but the list of genes associated with endpoint B are Not associated with endpoint A?

Such questions could easily become quite frustrating when more than two (2) endpoints are involved. Nevertheless, permutation-based gene-set analyses can be performed in conjunction with PROMISE to identify gene sets that are enriched among genes that show the association pattern of interest.

Integrating gene-set methods with PROMISE may prove to be a synergistic combination in terms of improving statistical power to reveal important biological insights.

PROMISE Pros and Future research work --

PROMISE is a general procedure designed specifically to increase statistical power to identify genomic features that show a biologically most interesting pattern of association with multiple endpoint variables (as stated above...).

PROMISE defines a test statistic that measures the evidence for the association pattern of interest by projecting the observed vector of association statistics onto the vector of conceptually most interesting values for those statistics.

Permutation is used to compute P-values. Unlike classical multivariate statistical methods such as principal components (PC) or canonical correlation (CC), which is designed for data with a multivariate normal distribution, PROMISE can manage ordinal and censored time-to-event endpoints.

Furthermore, as observed in a completed simulation study, CC and PC are Not designed to detect a specific pattern of association and therefore do Not have as much statistical power to detect the association pattern of interest as does PROMISE.

PROMISE showed better power to identify genes with an interesting pattern of association in an example application performed by the manufacturer, than searching for such genes within lists of significant genes identified by individual endpoint analyses.

Finally, GSEA can be incorporated into PROMISE so that the advantages of both approaches may be simultaneously realized (as stated above…).

In an example application performed by the manufacturer, the PROMISE-based GSEA gave biologically interesting results and showed much greater statistical power than identifying overlap among the results of the individual-endpoint GSEAs.

PROMISE is a very general procedure that must be customized to specific applications. PROMISE can also be adapted so that Single Nucleotide Polymorphism (SNP) genotypes can be used as the genomic variables.

Future research could explore how to modify or generalize the correlation statistics and the way they are combined to form the PROMISE statistic.

Additionally, it would be interesting to develop methods to define interesting result vectors and test statistics for applications with thousands of endpoint variables and thousands of genomic variables.

*System Requirements*

Contact manufacturer.

*Manufacturer*

- Department of Biostatistics and Department of Pharmaceutical Sciences
- St. Jude Children’s Research Hospital
- 262 Danny Thomas Place, Memphis, TN 38105 USA

** Manufacturer Web Site**
PROMISE

** Price** Contact manufacturer.

** G6G Abstract Number** 20736

** G6G Manufacturer Number** 104322