Corporate Intellect Capri

Category Intelligent Software>Data Mining Systems/Tools

Abstract Corporate Intellect's Capri software is a model for sophisticated data mining. It is a leading Sequence Detection Algorithm currently installed in over 60 companies throughout the world.

'Sequence Detection Algorithms' discover patterns (rules) consisting of frequently co-occurring events in time-stamped data. The data input to the algorithm consists of a set of objects, each being identified by a "Primary Key". Each object has a number of events attributed to it. The event may be a change in commodity price, the purchase of a product, accessing of a web page etc. A “Secondary Key”, often a date-time stamp, defines the order in which the events occurred.

Sequence Detection Algorithms have their origin in basket analysis and web mining though they have successfully been applied to fraud detection and analysis of commodity prices.

Capri belongs to the Apriori-family of data mining algorithms, with its origin in 'association rule discovery'. Capri is used within the third party applications to discover different types of sequences across records (and therefore, over time). Prior to Capri, general sequence questions could Not always be answered unless you knew the sequences in advance or had narrow constraints on the problem.

Typical sequential patterns that can be found in data sets using Capri are:

1) Products bought by customers across multiple transactions.

2) Financial transactions made by a business in a fiscal year.

3) Click-streams or Web site paths for understanding purchases, exits, traffic, and crime on the Web.

4) Frequent sequences of the chemical bases that make up human DNA.

5) Patterns of non-compliance or fraud over time.

Association Algorithms --

Association Algorithms such as General Rule Induction (GRI) and Apriori (the algorithm that Capri is based on) generate rules showing which things (events, attributes, purchases, etc.) typically occur together. Using an association algorithm one produces a list of rules. The rules describe the conditions under which certain conclusions occur. A typical rule from GRI/Apriori might look like this:

Conclusion less than/equal to condition A & condition B & ... beer less than/equal to snacks & newspaper

This rule is interpreted as follows: Customers who buy snacks and a newspaper are also likely to buy beer.

Note: This rule does Not show a causal relationship; it is merely showing the likelihood of certain things occurring together. Association rules normally include information on:

1) Coverage (or Support). Indicates how often the conditions and conclusion occur together.

2) Accuracy (or Confidence). Indicates how often, when the conditions occur, that the conclusion also occurs.

Capri's Uniqueness -- Since it is based on the Apriori association algorithm, Capri finds association rules like GRI or Apriori. The strength of Capri, however, is its ability to discover associations over time. These associations between things that occur together across a set of records (and therefore, over time) are known as sequences.

Capri also enables you to incorporate knowledge of what you are looking for in sequences by allowing you to specify features such as the start and end items for a sequence, sequence length, and time constraints like maximum time allowed between items in the sequence.

Capri key features/capabilities also include:

Highly Scalable -- Capri has been successfully applied to gigabytes of data consisting of millions of records.

Ability to incorporate Domain Knowledge -- The user can define taxonomy on each of the attributes describing the events from which the sequence patterns are to be discovered. This allows sequences to be discovered at varying levels of generalization.

Template description language -- Sequence Discovery Algorithms in keeping with the Apriori family of algorithms can discover a large number of sequences. Capri provides an Extensible Markup Language (XML) based language to describe the patterns of interest in a specific discovery run of Capri. Only sequences matching the templates defined are discovered.

PMML and XML -- Capri provides XML representations of the data input and results that ease its integration with other software systems. Predictive Modeling Markup Language (PMML) is a standard developed by Corporate Intellect in collaboration with Oracle, IBM, SPSS, SAS, Magnify and other key data-mining vendors.

The goal of PMML is to represent knowledge discovered by data mining algorithms in an open XML-based standard enabling interoperability of models between data mining vendors, knowledge generation providers and knowledge consumer software systems in general.

Complex Sequence Pattern Discovery -- Capri provides the widest range of parameters to its users allowing them to have the flexibility to discover more specific sequence types based on their needs.

Unique Visualization -- Visualization of the output sequence is enabled through a 3-dimenional cone-tree visualizer that describes the event- space and displays individual sequences on the structure presented. The event space can be defined by the user in XML format or extracted from the sequences discovered.

System Requirements

Capri is supported on the Sun Solaris and the Microsoft platforms. The system requirements for installing and running Capri are:

Hardware Pentium-compatible processor or higher and a monitor with 1024 x 768 resolution or higher (support for 65,536 colors is recommended). A CD-ROM drive for installation is also required.

Operating system Windows 98, Windows 2000, or Windows NT 4.0 with Service Pack 6 or higher. Solaris 2.6 is also supported.

Min free disk space 5MB is required for CAPRI.

Min RAM 128MB or more of RAM

Manufacturer

Manufacturer Web Site Corporate Intellect Capri

Price Contact manufacturer.

G6G Abstract Number 20162

G6G Manufacturer Number 101036