Since September is drawing to a close and October is rapidly approaching, I decided to hunt down some baseball data and see if we can draw any insights on MLB statistics and post-season performance. I’m definitely not an expert in sports statistics, but I pulled the well-known Lahman baseball database and calculated some summary statistics to evaluate team performance. I used SAP Predictive Analysis (with SAP Lumira visualization components) to visualize the data and perform some predictive analytics for the 2013 post season.
Metrics and Data
I pulled just a few metrics to summarize batting and fielding performance of each team. The metrics I pulled were:
- Put Outs and Errors per inning out for each of the main positions (1B, 2B, 3B, C, CF, RF, LF, SS)
- HR, H, R, 2B, 3B, SO, SB, CS, SF, SH per At Bat
For all teams from 1981 – 2012 during the regular season only.
Visualizing Changes Over Time
Not being familiar with the ins and outs (get it?) of baseball, I decided to look for trends over time – would these metrics be consistent, or have strategies changed over the years?
I can look at these trends over time by league (HR increased through the early 2000s and have since been decreasing):
And by metric, it looks like the frequency of 2B hits has been increasing, while 3B hits have been steadily decreasing. Though interestingly, the number of runs per AB has been relatively steady since 1993.
Perhaps these decreasing trends in scoring are driven by an improvement in fielding or pitching? Steady decreases in errors per inning out for all infield positions and increases in strikeouts per at bat suggest this could be the case.
To download full PDF and Continue Reading…
Hillary Bliss is the Analytics Practice Lead at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.
Another great entry for the DataGeek Challenge http://www.sap.com/datageek
Thanks! I was planning on entering again!
[…] show some side-by-side comparisons, I’ve re-created several of the graphs from my recent baseball statistics post under version […]