DTREG

Category Intelligent Software>Neural Network Systems/Tools, Intelligent Software>Data Mining Systems/Tools and Intelligent Software>Gene Expression Programming Systems/Tools

Abstract DTREG is a decision tree building software product that can be used for Predictive Modeling (Data Mining) and Forecasting. DTREG offers the most advanced 'predictive modeling' methods: Decision Trees (Classification and Regression Trees); TreeBoost -- Boosted Decision Trees (Stochastic Gradient Boosting); Decision Tree Forests (Ensemble of Trees); Multilayer Perceptron Neural Networks; Radial Basis Function (RBF) Neural Networks; Cascade Correlation Neural Networks (“self organizing” networks); Probabilistic Neural Networks (PNN); General Regression Neural Networks (GRNN); Support Vector Machines (SVM); Gene Expression Programming (Symbolic Regression); Linear Discriminant Analysis (LDA); and Logistic Regression. DTREG is the ideal tool for modeling business and medical data with categorical variables such as sex, race and marital status.

Note: The process of extracting useful information from a set of data values is called “data mining”. This data can be used to create models to make predictions. Many techniques have been developed for predictive modeling, and there is an art to selecting and applying the best method for a particular situation. DTREG implements the most advanced 'predictive modeling' methods that have been developed (stated above).

DTREG features/capabilities of Decision Tree Based Models:

1) Decision trees are easy to build -- Just feed a dataset into DTREG, and it will do all the work of building a decision tree, support vector machine (SVM), gene expression programming, linear discriminant function or logistic regression model.

2) Decision trees are easy to understand -- Decision trees provide a clear, logical representation of the data model. They can be understood and used by people who are Not mathematically gifted.

3) Decision trees handle both continuous and categorical variables -- Categorical variables such as gender, race, religion, marital status and geographic region are difficult to model using numerically-oriented techniques such as regression. In contrast, categorical variables are handled easily by decision trees.

4) Decision trees can perform classification as well as regression -- The predicted value from a decision tree is Not simply a numerical value but can be a predicted category such as male/female, malignant/benign, frequent buyer/occasional buyer, etc.

5) Decision trees automatically handle interactions between variables -- There may be significant differences between men/women, people living in the North and the South, etc.; these effects are known as variable interactions. Decision trees automatically deal with these interactions by partitioning the cases and then analyzing each group separately.

6) Highly accurate "ensemble" tree models -- DTREG provides classical, single-tree models and also TreeBoost and Decision Tree Forest models. For many applications these "ensemble" tree methods produce the most accurate results of any modeling methods.

7) Decision trees identify important variables -- By examining which variables are used to split nodes near the top of the tree, you can quickly determine the most important variables. DTREG carries this further by analyzing all of the splits generated by each variable and the selection of surrogate splitters. A table ranking overall variable importance is included in the analysis report.

Features of Neural Network models:

1) Wide applicability -- Neural networks have been successfully applied to a wide variety of classification and regression problems. Neural networks have the theoretical capability of modeling any type of function.

2) Accuracy -- Probabilistic neural networks are extremely accurate and fast to train.

3) DTREG variety -- DTREG supports 3- and 4-layer perceptron network models, Radial Basis Function (RBF) neural networks, self-organizing Cascade Correlation neural networks, Probabilistic neural networks and General Regression neural networks.

4) Automated architecture. DTREG includes an automated search for the optimal number of hidden neurons.

Features of Support Vector Machine (SVM) models:

1) SVM is a modern outgrowth of artificial neural networks -- Support Vector Machine models are close cousins to neural networks. In fact, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed-forward neural network.

2) Highly accurate models -- Research has shown that for some classes of problems such as pattern recognition SVM models outperform all other types of models.

3) Classification and Regression analyses -- The DTREG implementation of SVM models supports binary and multi-class classification problems as well are regression. DTREG implements the most popular kernel functions including radial basis functions, sigmoid, polynomial and linear.

4) Automatic grid search and pattern search for optimal parameters -- The accuracy of SVM models depends on selecting appropriate parameter values. DTREG provides an automatic grid and pattern search facility that allows it to iterate through ranges of parameters and perform cross-validation to find the optimal parameter values.

5) Model building performance -- The DTREG implementation of SVM is capable of handling very large problems. Kernel matrix row caching, shrinking heuristics to eliminate outlying vectors and an SMO-type algorithm are used to boost the speed of modeling.

6) Continuous, categorical and non-numeric variables -- DTREG supports continuous and categorical (nominal) variables. Categorical variables can have symbolic values such as "Male"/"Female", "Live"/" Die", etc.

7) Missing value substitution -- If there are scattered missing values for predictor variables, DTREG can replace those missing values with median values so that the case can be salvaged and the other, non- missing variable values used to the maximum extent.

8) V-fold cross validation -- DTREG provides V-fold cross validation both during the search process to select the optimal parameters and as a verification method for the final model. You also have the option of using a hold-back sample for verification.

Features of Gene Expression Programming -- Symbolic Regression models:

1) Gene Expression Programming (GEP) is a new, highly efficient genetic algorithm that evolves symbolic expressions to fit data.

2) GEP expressions are usually very compact and ideal for implementation in real-time control systems with embedded processors.

3) DTREG can evolve both mathematical and logical expressions.

4) DTREG fully supports categorical target and predictor variables.

5) Parsimony pressure and post-training simplification can be used to simplify expressions.

6) Random constants are supported and nonlinear regression is used to optimize their final values.

DTREG Features/Capabilities:

1) Ease of use -- DTREG is a robust application that can be installed on any Windows system. DTREG reads Comma Separated Value (CSV) data files that can be created from almost any data source. Once you create your data file, just feed it into DTREG, and let DTREG do all of the work of creating a decision tree, Support Vector Machine, Linear Discriminant Function or Logistic Regression model. Even complex analyses can be set up in minutes.

2) Classification and Regression Trees -- DTREG can build Classification Trees where the target variable being predicted is categorical and Regression Trees where the target variable is continuous like income or sales volume.

3) Automatic tree pruning -- DTREG uses V-fold cross-validation to determine the optimal tree size. This procedure avoids the problem of "overfitting" where the generated tree fits the training data well but does not provide accurate predictions of new data.

4) Surrogate splitters for missing data -- DTREG uses a sophisticated technique involving "surrogate splitters" to handle cases with missing values. This allows cases with some available values and some missing values to be utilized to the maximum extent when building the model. It also enables DTREG to predict the values of cases that have missing values.

5) Visual display of the tree -- DTREG can display the generated decision tree on the screen, write it to a .jpg or .png disk file or print it. When printed, DTREG uses a sophisticated technique for paginating trees that cross multiple pages.

6) DTREG accepts text data as well as numeric data -- If you have categorical variables with data values such as “Male”, “Female”, “Married”, “Protestant”, etc., there is No need to code them as numeric values.

7) Data Transformation Language (DTL) -- DTREG includes a full Data Transformation Language (DTL) programming language for transforming variables, creating new variables and selecting which cases are to be included in the analysis.

8) Project files for saving analyses -- DTREG saves all of the information about variables, analysis parameters as well as the generated report and tree in a project file. You can later open the project file, alter parameters or rerun it with a different dataset.

9) Scoring to predict values -- Once a decision tree has been built, you can use DTREG to "score" a new dataset and predict values for the target variable.

10) Generated scoring source code -- The "Translate" function in DTREG generates C, C++ and SAS® source code to compute predicted values. This source code can be included in application programs to perform high performance scoring of large volumes of data.

11) Heavy duty capability -- The Enterprise Version of DTREG can handle an unlimited number of data rows -- hundreds of thousands or millions are No problem. DTREG can build classification trees with predictor variables that have hundreds of categories by using an efficient clustering algorithm. Many other decision tree programs limit predictor variables to 16 or less categories.

12) DTREG COM Library -- The DTREG COM Library can be called from application programs to compute predicted target values using a decision tree generated by DTREG.

System Requirements

Manufacturer

Manufacturer Web Site DTREG

Price Complete pricing information.

G6G Abstract Number 20114

G6G Manufacturer Number 102150