Semcasting Modeler

Category Intelligent Software>Genetic Algorithm Systems/Tools and Intelligent Software>Data Mining Systems/Tools

Abstract Semcasting Modeler is a software solution that takes a radically different approach to predictive analytics. Based on patented “genetic algorithms”, Semcasting Modeler uses hundreds, rather than tens, of data variables throughout the modeling process.

Since the software uses a much broader set of data during the model building process, there is a greater likelihood that subtle predictors will be found.

Semcasting Modeler uses automation to enhance the speed and accuracy of predictive modeling. In traditional regression-based approaches, data cleaning and variable reduction consume large amounts of time, while in the end only a small number of variables are used to actually build the regression model.

Unlike the time consuming traditional regression-based modeling approaches, Semcasting Modeler takes hours rather than days or weeks to complete and often produces models that outperform the traditional approaches.

What is the underlying theory of genetic algorithms?

Genetic algorithms (GA) are computational problem-solving procedures modeled after the evolutionary theories put forth in Charles Darwin’s “The Origin of Species”, where he introduced the concept of “Survival of the Fittest”.

In the 1970s, John Holland of the University of Michigan expanded upon this concept when he presented the genetic algorithm as a way of logically reproducing the workings of evolution to perform optimization functions on a computer.

In his book “Adaptation in Natural and Artificial Systems”, Holland introduced the creation of new offspring using natural selection together with genetics-based operators of crossover and mutation.

Inspired by the theories of Charles Darwin, genetic algorithms are a way to accelerate the building of predictive models while using a large number of data attributes.

This model building approach begins with the system creating a random set of models; the models then compete, with only the most predictive and strongest models surviving to the next generation - much like survival of the fittest.

Each succeeding generation competes on predictive power and strength; only the best fitting and those with the greatest predictive power move on to the next generation. After thousands of generations, that only take hours to process, the model with the greatest predictive power emerges.

How is the theory of genetic algorithms applied within the Semcasting software?

Semcasting software is based on the broad concept of genetic algorithms, applied specifically to predictive modeling. As potential solutions to business problems, models are genetically encoded into digital chromosomes (patent pending).

Gene groups are used to represent data attributes, with separate genes used for modeling transformations of the data (e.g., coefficients, outlier trimming thresholds, missing values substitutions, and categorical combinations).

Semcasting Modeler Benefits --

1) Delivers stronger predictive accuracy.

2) Reduces model development time by 50-75% through automation.

3) Explores hundreds of data attributes to provide better insight.

4) Updates models on a daily or weekly basis to reflect the most recent customer data.

Modeler Process - How it Works --

Loading the current customer data set into the Modeler software, initiates a process where the data is cleaned and sampled automatically, eliminating any manual data preparation.

Next, a dependent variable is selected and the model building process begins. The Modeler software creates and tests 250 individual models simultaneously, graduating the “most fit” to the next generations of models.

This process continues for thousands of generations and for millions of candidate models until the best combination of variable and predictive power emerges.

Semcasting Modeler is Not a black box. Analysts have complete access to the application toolset where they can take advantage of the automated variable selection capabilities and generation building process while also directing how the final model is composed.

If different parameters are preferred or more input is required, it is all part of a learning process for the model as it continuously learns and adapts. The final product of the modeling process is a scoring formula which is accessible as SAS, SPSS or XML code.

Modeler is also capable of scoring a file at high speed directly through the application.

What’s the best approach to using the Semcasting Model?

Regardless of data mining technique or technology, there are many factors to consider when building predictive models. These factors include the business objective, schedule, implementation considerations, policies and processes.

Model development is an iterative process where you adjust parameters and restructure data to best fit the model to the business problem. With this in mind, there are two (2) basic approaches to using Semcasting software to create predictive models:

1. An end-to-end where the final model from the genetic algorithm (GA) is used to score the customer base.

2. A two-step process where the variables from genetic algorithm (GA) models are used as input into regression techniques.

Regardless of the approach, there are different steps you can take to adjust the model to meet the business objectives. When these adjustments are made, the genetic algorithm will dynamically refocus its efforts based on the new parameters, which will often lead to improvements in the “king” model.

Keep in mind that the fitness level may take an immediate hit when dynamically switching modeling parameters. However, if the model is left alone for 50 to 100 generations, the fitness may improve significantly. If Not, switching the setting back will return the model to the prior state with the higher fitness level.

Five (5) ways to adjust the model --

1) Multiple Fitness Metrics: Create several models with different fitness metrics such as Linear Lift, Left Curve and Upper Lift. Since these fitness metrics focus on different aspects of the business problem, this can lead to different insights from the data. The best of the models can be used “as is”, or the superset of variables from all models can be used as input.

2) Adjust Variable Limits: Start a model allowing fewer variables, and then increase the limits over time. Reviewing the “king” models along the way can provide valuable insight into the business problem. Another alternative is to increase or decrease the maximum number of variables after the model has completed several hundred generations.

3) Variable Analytical Type: Using Semcasting software, modeling analysts can change the analytic type a variable was assigned during sampling. Numeric variables can be defined as continuous numeric, bucketed numeric, or categorical numeric, with an option for ordering buckets or categories with the “next nearest neighbor”. Changing this analytical type will allow the genetic algorithm to construct the model from a different point of view.

4) Drop/Force Variables: Within the metadata tab, analysts can drop variables from consideration, or they can force variables into the model. Dropping variables forces the genetic algorithm to explore a smaller set of variables, while forcing variables focuses it on optimizing specific variables.

This R&D experimentation allows the modeling analyst to explore hypotheses in the data. For example, some clients have used the genetic algorithm to evaluate the value of demographic data by building several models with and without this data.

5) Interaction Terms: Sometimes, interactions pose a challenge when trying to implement the model due to interpretability and business policies. However, interactions can add significant lift for a model and should always be considered at some level.

Many Semcasting software users will create one model without interactions and a separate one with interactions for added learning, or will create a genetic algorithm model with interactions and then use them as separate input variables for a second regression model.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site Semcasting Modeler

Price Contact manufacturer.

G6G Abstract Number 20148R

Note: 20148 was previously listed as "Genalytics Model"

G6G Manufacturer Number 101095