Thursday, May 29, 2014

Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013

This is one of the following sixteen articles on Single-Factor ANOVA in Excel

Overview of Single-Factor ANOVA

Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013

Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group

Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013

Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison

Single-Factor ANOVA - All Excel Calculations

Overview of Post-Hoc Testing For Single-Factor ANOVA

Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA

Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA

Overview of Effect Size For Single-Factor ANOVA

ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013

ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013

ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013

Power of Single-Factor ANOVA Test Using Free Utility G*Power

Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

 

Single-Factor ANOVA

in 5 Steps in Excel

A company was attempting to determine whether there was a difference in results produced by three different training programs. The three unique training programs had the same objective and the training results were by a single, common test taken by participants at the end of the training.

In this test three groups of similar employees underwent the training. Each of the three groups was put through one of the three training programs so no group was given the same training program. At the end of the training, all participants in each group were given the same test. The groups all had a different number of participants. The test results from all three groups were as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013
(Click Image To See a Larger Version)

Group 1 had 22 participants. Group 2 had 23 participants, Group 3 had 19 participants.

 

Single-Factor ANOVA Step 1 – Place Data in Excel Group Columns

The Excel Single-Factor ANOVA tool requires that the data be arranged in columns. Each data column will hold only data whose categorical variable is the same. In this case, all data whose categorical variable is Group 1 will be in the first column, Group 2 in the second column, and Group 3 data in the third column.

Quite often the data is not conveniently arranged that way. Very often the data is arranged in one long column with each row containing each observation’s independent (categorical) variable value and its dependent (measured) value as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013
(Click Image To See a Larger Version)

The data now has to be separated into columns so that each column contains data from one level of the independent variable. In other words, each column will contain a unique group of data that will consist of all data having a single level of the independent variable. This will be done as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013 (Click Image To See a Larger Version)

The blank cells now have to be removed from the columns. This is accomplished as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013
(Click Image To See a Larger Version)

Cell J3 contains the formula:

=IF(ISNUMBER(LARGE($E$3:$E$22,ROW()-ROW($J$2))),LARGE($E$3:$E$22,ROW()-ROW($J$2)),"")

Cell K3 contains the formula:

=IF(ISNUMBER(LARGE($F$3:$F$22,ROW()-ROW($K$2))),LARGE($F$3:$F$22,ROW()-ROW($K$2)),"")

Cell L3 contains the formula:

=IF(ISNUMBER(LARGE($G$3:$G$22,ROW()-ROW($L$2))),LARGE($G$3:$G$22,ROW()-ROW($L$2)),"")

These three formulas are copied down to row 22 to produce the result shown here.

It is easier to work with sorted data columns when performing Single-Factor ANOVA so the data will be sorted in the next step. Data can be sorted in Excel by copying a single command down a column as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013
(Click Image To See a Larger Version)

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013 (Click Image To See a Larger Version)

 

Single-Factor ANOVA Step 2 – Remove Extreme Outliers

Calculation of the mean is one of the fundamental computations when performing ANOVA. The mean is unduly affected by outliers. Extremely outliers should be removed before ANOVA. Not all outliers should be removed. An outlier should be removed if it is obviously extreme and inconsistent with the remainder of the data.

 

Find Outliers From the Sorted Data

An easy way to spot extreme outliers is to look at the sorted data. Extremely high or low outlier values will appear at the ends of the sort. A convenient, one-step method to sort a column of data in Excel is shown here.

The formula is cell H2 is the following:

=IF($D2=””,””,LARGE($D$2:$D$19,ROW()-ROW($D$1)))

Copy this formula down as shown to create a descending sort of the data in cells D2 to D19.

Exchanging the word SMALL for LARGE would create an ascending sort instead of the descending sort performed here.

No extreme outliers are apparent from the sort.

 

Find Outliers By Standardizing Residuals

Another way to evaluate data for outliers is to calculate the standardized residual value for each data point. In the case of ANOVA, the residual for each data point is the difference between the data point and its group mean. The standardized residual value is simply this residual length expressed as the number of standard deviations.

For example, the value in cell G3 is calculated by the following formula:

=ABS((C3-AVERAGE($C$3:$C$20))/STDEV($C$3:$C$20))

Quite often outliers are considered to be those data that are more than three standard deviations from the group mean. No data points are that far from the column mean. The farthest data point is only 2.15 standard deviations from its column mean. These numbers are shown as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013 (Click Image To See a Larger Version)

After obvious outliers have been removed, it is good idea to visually inspect a box plot of data to get a better feel for the dispersion between groups (how spread out the group means are) and within the groups (how dispersed is the data within each group).

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013
(Click Image To See a Larger Version)

All data points that are deemed extreme outliers and removed should be recorded. Before an outlier is removed, causes of the outlying value should be considered. It is always a good idea to ensure that no data recording errors or data measurement errors have cause outlying values. Any reports that record and interpret the results of the ANOVA test should list any outlier values that were removed and the reason that they were removed.

 

Single-Factor ANOVA Step 3 – Verify Required Assumptions

Single-Factor ANOVA Required Assumptions

Single-Factor ANOVA has six required assumptions whose validity should be confirmed before this test is applied. The six required assumptions are the following:

1) Independence of Sample Group Data - Sample groups must be differentiated in such a way that there can be no cross-over of data between sample groups. No data observation in any sample group could have been legitimately placed in another sample group. No data observation affects the value of another data observation in the same group or in a different group. This is verified by an examination of the test procedure.

2) Sample Data Are Continuous - Sample group data (the dependent variable’s measured value) can be ratio or interval data, which are the two major types of continuous data. Sample group data cannot be nominal or ordinal data, which are the two major types of categorical data.

3) Independent Variable is Categorical - The determinant of which group each data observation belongs to is a categorical, independent variable. Single-factor ANOVA uses a single categorical variable that has at least two levels. All data observations associated with each variable level represent a unique data group and will occupy a separate column on the Excel worksheet.

4) Extreme Outliers Removed If Necessary - ANOVA is a parametric test that relies upon calculation of the means of sample groups. Extreme outliers can skew the calculation of the mean. Outliers should be identified and evaluated for removal in all sample groups. Occasional outliers are to be expected in normally-distributed data but all outliers should be evaluated to determine whether their inclusion will produce a less representative result of the overall data than their exclusion.

5) Normally-Distributed Data In All Sample Groups - Single-factor ANOVA is a parametric test having the required assumption the data from each sample group comes from a normally-distributed population. Each sample group’s data should be tested for normality. Normality testing becomes significantly less powerful (accurate) when a group’s size fall below 20. An effort should be made to obtain group sizes that exceed 20 to ensure that normality tests will provide accurate results.

6) Relatively Similar Variances In All Sample Groups - Single-Factor ANOVA requires that sample groups are obtained from populations that have similar variances. This requirement is often worded to state that the populations must have equal variances. The variances do not have to be exactly equal but do have to be similar enough so the variance testing of the sample groups will not detect significant differences. Variance testing becomes significantly less powerful (accurate) when a group’s size fall below 20. An effort should be made to obtain group sizes that exceed 20 to ensure that variance tests will provide accurate results.

 

Determining If Sample Groups Are Normally-Distributed

There are a number of normality test that can be performed on each group’s data. The normality test that is preferred because it is considered to be more powerful (accurate) than the others, particularly with smaller sample sizes is the Shapiro-Wilk test.

 

Nonparametric Alternatives To Single-Factor ANOVA For Non-Normal Data

When groups cannot be shown to all have normally-distributed data, a nonparametric test called the Kruskal-Wallis test should be performed instead of Single-Factor ANOVA. The Kruskal-Wallis test will be performed on the original sample data in a blog article shortly following this one.

 

Determining If Sample Groups Have Similar Variances

Single-Factor ANOVA requires that the variances of all sample groups be similar. Sample groups that have similar variances are said to be homoscedastistic. Sample groups that have significantly different variances are said to be heteroscedastistic.

A rule-of-thumb is as follows: Variances are considered similar if the standard deviation of any one group is no more than twice as large as the standard deviation of any other group. That is the case here as the following are true:

s1 = Group1 standard deviation = 1.495

s2 = Group2 standard deviation = 1.514

s3 = Group3 standard deviation = 1.552

The variances of all three groups are very similar. A quick look at the box plot of the data would have confirmed that as well.

Two statistical tests are commonly performed when it is necessary to evaluate the equality of variances in sample groups. These tests are Levene’s Test and the Brown-Forsythe Test. The Brown-Forsythe Test is more robust against outliers but Levene’s Test is the more popular test.

 

Alternative Tests To Single-Factor ANOVA When Groups Variances Are Not Similarity

When groups cannot be shown to have homogeneous (similar) variances, either Welch’s ANOVA or the Brown-Forsythe F test should be used in place of Single-Factor ANOVA. Both of these tests will be performed on the same data set in blog articles after this one.

 

Single-Factor ANOVA Step 4 – Run the Single-Factor ANOVA Tool in Excel

The Single-Factor ANOVA tool can be found in Excel 2007 and later by clicking the Data Analysis link located under the Data tab. In Excel 2003, the Data Analysis link is located in the Tool drop-down menu. Clicking Anova: Single-Factor brings up the Excel dialogue box for this tool.

The data need to be arranged in contiguous (columns touching with the rows correctly lined up) columns. The completed dialogue box for this data set would appear as follows:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013 (Click Image To See a Larger Version)

Hitting OK runs the tools and produces the following output:

anova,single-factor anova,one-way anova,statistics,excel,excel 2010,excel 2013 (Click Image To See a Larger Version)

The meaning of this output can be understood by reviewing the Null and Alternative Hypotheses that Single-Factor ANOVA evaluates.

The Null Hypothesis states that all populations from which all samples were drawn have the same mean.

Null Hypothesis = H0: µ1 = µ2 = … = µk (k equals the number of sample groups)

Note that Null Hypothesis is not referring to the sample means, s1 , s2 , … , sk, but to the population means, µ1 , µ2 , … , µk.

The Alternative Hypothesis for Single-Factor ANOVA states that at least one sample group is likely to have come from a different population. Single-Factor ANOVA does not clarify which groups are different or how large any of the differences between the groups are. This Alternative Hypothesis only states whether at least one sample group is likely to have come from a different population.

Alternative Hypothesis = H0: µi ≠ µj for some i and j

 

Single-Factor ANOVA Step 5 – Interpret the Excel Output

The Null Hypothesis is rejected if ANOVA’s calculated p Value is smaller than the designated Level of Significance (alpha). Alpha is most commonly set at 0.05. In this case the Null Hypothesis would be rejected because the p Value (0.0369) is smaller than Alpha (0.05).

The exact interpretation of a p value of 0.0369 is that there is only a 3.96 percent chance that samples having these values could have been drawn if all of the populations had the same means.

Although a Hypothesis Test can only result in the rejection of the Null Hypothesis, we can conclude with at least 95 percent certainty that at least one sample has been drawn from a population with a different mean than the other samples.

ANOVA can only indicate that at least one sample is different but ANOVA does not provide specific information about where that difference comes from. Further testing called Post-Hoc testing can indicate from where the specific differences have come from. Post-Hoc testing on this data set will shortly be performed in blog articles after this one.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment