Thursday, May 29, 2014

Overview of Single-Factor ANOVA

This is one of the following sixteen articles on Single-Factor ANOVA in Excel

Overview of Single-Factor ANOVA

Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013

Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group

Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013

Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison

Single-Factor ANOVA - All Excel Calculations

Overview of Post-Hoc Testing For Single-Factor ANOVA

Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA

Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA

Overview of Effect Size For Single-Factor ANOVA

ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013

ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013

ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013

Power of Single-Factor ANOVA Test Using Free Utility G*Power

Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

 

Overview of Single-Factor

ANOVA

Single-factor ANOVA is used to determine if there is a real difference between three or more sample groups of continuous data. ANOVA answers the following question: Is it likely that all sample groups came from the same population?

Single-factor ANOVA is useful in the following two circumstances:

Determining if three or more independent samples are different. In this case Single-Factor ANOVA might be used to determine whether there is a real difference between the test scores of three or more separate groups of people. Another example would be to use Single-Factor ANOVA to determine whether there is a real difference between retail sales of groups of stores in different regions.

Determining if three or more different treatments applied to similar groups have produced different results. A common example for this case is to compare test scores from groups that underwent different training programs.

 

ANOVA = Analysis of Variance

ANOVA stands for Analysis of Variance. ANOVA determines whether or not all sample groups are likely to have come from the same population by performing a comparison of the variance between sample groups to the variance within the sample groups.

Single-factor ANOVA represents groupings of objects that described by two variables. One of the variables describing each grouped object is a categorical variable. The value of each object’s categorical variable determines into which group the object is placed. The other variable describing each object is continuous and is the object’s displayed value in the data group.

The categorical variable is sometimes referred to as the independent variable while the continuous variable is sometimes referred to as the dependent variable. In the case of Simple-Factor ANOVA, the independent variable simply predicts which group each object’s continuous measurement will be placed. This independent-dependent relationship is different from that in regression because the independent variable does not predict the value of the dependent variable, only the group into which it will be placed.

ANOVA is a parametric test because one of ANOVA’s requirements is that the data in each sample group are normally-distributed. ANOVA is relative robust against minor deviations from normality. When normality of sample group data cannot be confirmed or if the sample data is ordinal instead of continuous, a nonparametric test called the Kruskal-Wallis test should be substituted for ANOVA.

Ordinal data are data whose order matter but the specific distances between units is not measurable. Customer-rating survey data and Likert scales data can be examples of ordinal data. These types of data can, however, be treated as continuous data if distances between successive units are considered equal.

 

Null and Alternative Hypotheses for Single-factor ANOVA

The Null Hypothesis for Single-Factor ANOVA states that the samples ALL come from the same population. This would be written as follows:

Null Hypothesis = H0: µ1 = µ2 = … = µk (k equals the number of sample groups)

Note that Null Hypothesis is not referring to the sample means, s1 , s2 , … , sk, but to the population means, µ1 , µ2 , … , µk.

The Alternative Hypothesis for Single-Factor ANOVA states that at least one sample group is likely to have come from a different population. Single-Factor ANOVA does not clarify which groups are different or how large any of the differences between the groups are. This Alternative Hypothesis only states whether at least one sample group is likely to have come from a different population.

Alternative Hypothesis = H0: µi ≠ µj for some i and j

 

Single-Factor ANOVA vs.Two-Sample, Pooled t-Test

Single-Factor ANOVA is nearly the same test as the two-independent-sample, pooled t-test. The major difference is that Single-Factor ANOVA is used to compare more than two samples groups. Performing Single-Factor ANOVA or a two-independent sample, pooled t-test on the same two sample groups will produce exactly the same results.

As stated, ANOVA compares the variance between the samples groups to the variance within the sample groups. If the ratio of the variance between sample groups over variance within sample groups is high enough, the samples said to be different from each other.

Another way to understand ANOVA (or the two-independent sample, pooled t-test) is to state that the sample groups become easier to tell apart as the sample groups become more spread out from each other or as each of the sample groups become smaller and tighter. That might be more intuitive if presented visually.

Below are box plots of three sample groups:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

Each of the sample groups are easy to differentiate from the others. The measures of spread - standard deviation and variance - are shown for each sample group. Remember that variance equals standard deviation squared. Each sample group is a small, tightly-bunched group as a result of having a small standard deviation.

If each sample group’s spread is increased (widened), the sample groups become much harder to differentiate from each other. The graph shown below is of three sample groups having the same means as above but much wider spread. The between-groups variance has remained the same but the within-groups variance has increased.

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

It is easy to differentiate the sample groups in the top graph but much less easy to differentiate the sample groups in the bottom graph simply because the sample groups in the bottom graph have much wider spread.

In statistical terms, one could say that it is easy to tell that the samples in the top graph were drawn from different populations. It is much more difficult to say whether the sample groups in the bottom graph were drawn from different populations.

That is the underlying principle behind both t-tests and ANOVA tests. The main purpose of t-tests and ANOVA tests is to determine whether samples are from the same populations or from different populations. The variance (or equivalently, the standard deviation) of the sample groups is what is what determines how difficult it is to tell the sample groups apart.

The two-independent-sample, pooled t-test is essentially the same test as single-factor ANOVA. The two-independent-sample, pooled t-test can only be applied to two sample groups at one time. Single-Factor ANOVA can be applied to three or more groups at one time.

 

2-Sample One-Way ANOVA = 2-Sample, Pooled t-Test

We will apply both the two-independent sample, pooled t-test and single-factor ANOVA to the first two samples in each of the above graphs to verify that the results are equivalent.

 

Sample Groups With Small Variances (the first graph)

Applying a two-independent sample t-test to the first two samples with the small variances would produce the following result:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

Running Single-Factor ANOVA on those same two sample groups would produce this result:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

This blog article has not covered how to perform ANOVA in Excel but this result would have been obtained by filling in the Excel dialogue box as follows:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

Both the t-test and the ANOVA test produce the same result when applied to these two sample groups. They both produce the same p Value (1.51E-10) which is extremely small. This indicates that the result is statistically significant and that the difference in the means of the two groups is real. More correctly put, it can be stated that there is a very small chance (1.51E-10) that the samples came from the same population and that the result obtained (that their means are different) was merely a random occurrence.

 

Sample Groups With Large Variances (the second graph)

Applying a two-independent sample t-test to the first two samples with the large variances would produce the following result:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

Running Single-Factor ANOVA on those same two sample groups would produce this result:

anova, single-factor anova, one-way anova, excel, excel 2010, excel 2013, statistics (Click Image To See a Larger Version)

This blog article has not yet covered how to perform ANOVA in Excel but this result would have been obtained by filling in the Excel dialogue box as follows:

clip_image010 (Click Image To See a Larger Version)

Both the t-test and the ANOVA test produce the same result when applied to these two sample groups. They both produce the same p Value (0.230876). This is relatively large. 95 percent is the standard level of confidence usually required in statistical hypothesis tests to conclude that the results are statistically significant (real). The p value needs to be less than 0.05 to achieve a 95 percent confidence level that a difference really exists. The sample groups with the large spread produced a p Value greater than 0.05 and we can therefore not reject the Null Hypothesis which states that the sample groups are the same. The results are not statistically significant and we cannot conclude that the two samples were not drawn from the same population.

 

Single-Factor ANOVA Should Not Be Done By Hand

Excel provides an excellent ANOVA tool that can perform Single-factor or two-Factor ANOVA with equal ease. Doing the calculations by hand would be tedious and provide lots of opportunities to make a mistake. Excel produces a very detailed output when the ANOVA tool is run. A blog article several after this one shows the example of Single-Factor ANOVA with all calculations performed individually in Excel.

It will probably be clear from viewing this that it is wise to let Excel do the ANOVA calculations. A number of statistics textbook place probably too much emphasis on teaching the ability to perform the ANOVA equations by hand. In the real world that would not likely be done for Single-Factor ANOVA because the Excel tool is so convenient to use.

The best way to understand Single-Factor ANOVA is to perform an example as follows in the next blog article.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment