## XLMiner® Version 3

** Category** Intelligent Software>Data Mining Systems/Tools

** Abstract** XLMiner for Windows is a comprehensive data mining add-
in for Excel.

XLMiner Capabilities -- XLMiner provides a comprehensive set of analysis features based both on statistical and machine learning methods. A problem or a data set can be analyzed by several methods. It is usually a good idea to try different approaches, compare their results, and then choose a model that suits the problem well.

Limits -- XLMiner can work with large data sets which may exceed the limits in Excel. A standard procedure is to sample data from a larger database, bring it into Excel to fit a model, and, in the case of supervised learning routines, score output back out to the database.

In the standard edition of XLMiner, this feature is supported for Oracle, SQL Server and Access databases. This feature is Not available in the education or free web trial versions. The free web trial demo version handles a maximum of 200 records per partition.

Operations -- There are six (6) broad groups of operations in XLMiner:

1) Partitioning -- A data set with known values of an outcome (response) variable is necessary to train a data mining model. For training a model, the manufacturer usually chooses (at random) a fraction of the available data -- the 'training partition'.

Trained models can then be applied to another partition -- the validation partition -- of the same data set to see how well they do with data that they were Not trained with.

In this phase, models can be adjusted and the best performing model selected. After a final model is selected, it can be applied to a third partition -- the 'test partition' -- to test how well the final model will do with data that have been used neither in testing Nor in validation.

XLMiner also supports partitioning with over-sampling, used when rare events are modeled and you need to assure an adequate supply of those events in the modeling process.

2) Classification -- When the outcome variable is discrete or categorical, the objective of the data mining exercise is to classify the records into the discrete classes or categories.

XLMiner offers several techniques for 'Classification':

- a) Discriminant analysis - is a technique for classifying a set of observations into predefined classes.
- b) Logistic Regression with best subset selection - Logistic regression - is a variation of ordinary regression which is used when the dependent (response) variable is a dichotomous variable (i. e. it takes only two values, which usually represent the occurrence or non- occurrence of some outcome event, usually coded as 0 or 1) and the independent (input) variables are continuous, categorical, or both.
- c) Classification tree (also known as decision tree) methods - are a good choice when the data mining task is the classification or prediction of outcomes and the goal is to generate rules that can be easily understood, explained, and translated into SQL or a natural query language.
- d) Naïve Bayes Classifiers - assume that the effect of a variable value on a given class is independent of the values of the other variable. This assumption is called 'class conditional independence'. It is made to simplify the computation and in this sense considered to be “Naïve”.
- e) Artificial Neural Networks (NN) - are relatively crude electronic networks of "neurons" based on the neural structure of the brain. They process records one at a time, and "learn" by comparing their classification of the record (which, at the outset, is largely arbitrary) with the known actual classification of the record.
- The errors from the initial classification of the first record is fed back into the network, and used to modify the networks algorithm the second time around, and so on for many iterations.
- f) k-Nearest Neighbors (k-NN) - In k-nearest-neighbor classification, the training dataset is used to classify each member of a "target" dataset. The structure of the data is that there is a classification (categorical) variable of interest ("buyer," or "non-buyer," for example), and a number of additional predictor variables (age, income, location...).

3) Prediction -- When the outcome variable is continuous, the objective is to predict the value of the outcome variable for each of the data records. XLMiner offers the following methods of prediction:

- a) Multiple Linear Regression with best subset selection;
- b) k-Nearest Neighbors (k-NN);
- c) Regression Trees;
- d) Neural Networks (NN).

4) Affinity Analysis -- Some problems involve detecting association among the properties of data records. XLMiner supports the generation of 'Association Rules' for showing which attributes of the data occur frequently together.

One common application is to determine groups of products customers are likely to buy together, also known as Market Basket Analysis.

5) Time Series -- XLMiner offers time series forecasting, with the exploratory techniques ACF (Autocorrelation function) and PACF (Partial autocorrelation function), smoothing techniques (moving average, exponential, double exponential and Holt-Winters), as well as ARMA and ARIMA modeling.

6) Data Reduction and Exploration -- It is often useful or necessary to reduce the dimensionality of data into only a few attributes that matter more than others. In this situation, the manufacturer does Not attempt to classify or predict an outcome variable.

Instead, the objective is to discover similarities in records and group them together using the available attributes (variables).

One such method involves deciding which variables matter most in explaining differences among records. Other methods categorize data into clusters that can be represented as a new categorical variable added to the data.

XLMiner supports the following methods of data exploration and reduction:

- a) Principal Components Analysis;
- b) k-Means Clustering;
- c) Hierarchical Clustering.

Output Presentation and Graphics -- XLMiner provides special graphics to enhance the understanding of the data and the analysis outcomes. For instance, tree diagrams in classification and regression trees, and dendrograms in hierarchical clustering give very useful insights.

In conjunction with XLMiner outputs, you can use Excel's built-in features to work with the output. For instance, histograms, scatter plots, and bubble plots are very useful to provide an insight into the data and the fitted outcomes.

Lift charts and gain charts can be easily generated from XLMiner outputs to see the benefit produced by the data mining exercise.

XLMiner comes in four (4) editions. All require Windows and Excel 2000, 2003 or 2007:

1) Demo edition (functional 30-day web download);

2) Education edition;

3) Professional edition;

4) Academic Research edition.

*Note: The Academic Research edition is virtually the same as the
Professional edition, except that purchasers must be on the faculty of a
college or university.*

*System Requirements*

*Minimum**

- Pentium 133 MHz Processor
- 25 MB free disk space
- 64 MB RAM
- Windows 2000, XP or Vista (use US regional settings)
- Microsoft Excel 2000 / 2003 / 2007 (English version**)

*Recommended *

- Pentium P4 Processor
- 60 MB free disk space
- 256 MB RAM
- Windows 2000, XP or Vista (use US regional settings)
- Microsoft Excel 2000 / 2003 / 2007 (English version**)

* Minimum configuration is sufficient for Education or Classroom editions only.

**multilingual pack will not work

*Manufacturer*

- statistics.com
- 612 N. Jackson St.
- Arlington, VA 22201
- USA
- Tel: (703) 522-5410
- Fax: (703) 522-5846
- Email: sales@xlminer.com

** Manufacturer Web Site**
XLMiner Version 3

** Price** Contact manufacturer.

** G6G Abstract Number** 20240

** G6G Manufacturer Number** 102522