Implementation is the piece of your analytics strategy that makes your models available to everyone. Your models aren’t valuable until they can be accessed from wherever they need to be, whether that is in a business intelligence environment or within an application. If no one can get to the information, it’s useless!
There are two ways to implement your predictive models and pros and cons to each. It is helpful to view the pros and cons of each method relative to your needs, rather than to the other method.
Using this method, data is scored in the modeling application, and then the data is written back to database via JDBC/ODBC connection or flat file transfer. This is the best option for something like a direct mail drive.
- Generally easier to implement; faster implementation and minimal testing required
- Model score vintage/versioning/stability is easier to control
- Model updates require no programming changes
- Scores can be accessed by multiple applications
- Lag in obtaining model scores, not an instantaneous response
Using this method, you export the model scoring algorithm to other applications. The scoring equation depends on algorithm. This is best for something like an online price quote based on customer data.
- Fresh model scores immediately available
- Requires significant programming/integration effort/testing
- Updating model requires re-development/testing
- May be difficult to get consistent scoring snapshots/history
- Requires a bit more statistical expertise
Even though it may look like the choice is simple, based only on the pros and cons, when deciding which way to implement your model, ask yourself the following:
- How will the model be used and what are the user requirements?
- Is there a need for an instantaneous response? For example, an online insurance quote would require real-time scoring, while a monthly update to a customer database would do well with batch scoring.
- What areas/applications will need to use the scores?
- How spread out does the information need to be? Will the data be in your sales system, customer system, and operational system? All of the above? Or perhaps just one, like your marketing system?
- What data is required to calculate scores and where is this data available?
- Do you need fifty pieces of data from fifty different departments in order to calculate, or just one or two pieces of information?
- How often is the model going to be updated (refreshed or rebuilt)?
- If you will need to rebuild your model every month, you are not going to want to extract that algorithm and put it into different software
Who should be involved in the implementation process?
Model implementation is really a team effort between the data scientists and/or model developers and the systems team who make the systems “talk” to each other. So it’s really important to take a holistic approach when building and deploying your models. Everyone is a potential stakeholder, so ensure you have the right people involved from the beginning of the process.
Deciding which method to use isn’t a matter of one method necessarily being better than the other, but instead a matter of one method being more suited to a particular output, or better for the type of information you’re working with, or the skills of the people on your team who will be using the data and why. Asking yourself the above questions as well as considering future model maintenance and potential model degradation, are what drives your methodology.
Good thing I’ll be discussing model maintenance in my next blog!
About Hillary Bliss
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.