One of the first statistical tools many new analyst use is the Analysis of Variance, a collection of statistical methods used to decompose and understand the causes of variation within a set of data. One-way ANOVA is perhaps the most basic of these methods and a staple of most statistical software. While SAP has included many predictive algorithms in their SAP Predictive Analytics Expert application, it is missing many of the common descriptive algorithms used by data scientists to better understand their data. Luckily, it is relatively easy to build in custom R extensions to accommodate any descriptive statistical needs.
What is One-Way ANOVA?
One-Way Analysis of Variance is used to compare means within 3 or more samples to evaluate whether or not all groups have the same means (in effect, there is no difference in the monitored statistic between the groups). Of course there is natural variation in data, so the actual means of the groups may vary slightly, but the age-old question persists: is the difference statistically significant???
One-Way ANOVA is an omnibus test, which means that if the null hypothesis (all means are the same) is rejected, it offers no additional information on which of the group(s) are different from each other, simply that at least one of them is different enough to reject the hypothesis that they are all the same.
For additional background on One-Way ANOVA, see the relevant Wikipedia article.
To download full PDF, and Continue Reading…
About Hillary Bliss
Hillary is a Senior Manager – Data & Analytics at Protiviti, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.