Custom R Modules in Predictive Analysis

With the release of version 1.0.11 of Predictive Analysis in early June 2013, SAP added a feature allowing users to add new R algorithms to the Predictive Analysis algorithm library. This might be desirable for several reasons:

An organization has an existing model or algorithm that they would like to make available within Predictive Analysis for either analytical or comparison purposes
A modeler/analyst would like to build a model using an algorithm that is not currently supported in Predictive Analysis
A modeler/analyst would like to implement an algorithm that exists within Predictive Analysis, but would like to perform additional processing, have different visualizations, or make different settings available.

In this blog post, we’ll walk through an example of how to create a custom R component for Logistic Regression.

Logistic Regression Background

Logistic regression is one of my favorite predictive algorithms because it is accurate, versatile, and granular. I’ve heard the explanation that logistic regression is the equivalent of linear regression for binary responses. Therefore, where a modeler might use linear regression to predict a continuous output (like what our profit will be next quarter based on economic factors), he would use logistic probabilities.

While there are lots of classification algorithms that predict categorical or binary values, what I really like about logistic regression is the granularity of the prediction; because logistic regression can take in continuous variables (potentially many continuous variables), the user can get back extremely granular results with probabilities anywhere between 0 and 1. This allows the user to self-select a cutoff point or points for handling of records and even just to prioritize records; for example, if we are trying to predict customers likely to respond to a marketing offer, we can use a logistic regression model in any of the following ways:

Send marketing offer to customers predicted to respond (logistic regression predicts probability of response > 0.5)
If we have a budget to send only 100,000 offers, we van send marketing offer to the 100,000 customers with the highest predicted response probability.
Perhaps we determined that based on the model accuracy and response rates and marketing costs that the cost-benefit for the marketing offer is only break-even for customers that have a response probability of 68% or higher – therefore we can use 0.68 as our cutoff.

To download full PDF and Continue Reading…

About Hillary Bliss

Hillary Bliss is the Analytics Practice Lead at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.