# Sphericity Testing in

9 Steps For Repeated

Measures ANOVA in Excel

This is one of the following four articles on Repeated-Measures ANOVA

Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel

Sphericity Testing For Repeated-Measures ANOVA in 9 Steps in Excel

Effect Size For Repeated-Measures ANOVA in Excel

Friedman Testing For Repeated-Measures ANOVA in 3 Steps in Excel

## Overview

Single-factor ANOVA has a requirement that the variances of all sample groups be similar. The rule of thumb for single-factor ANOVA is that no sample group’s variance should more than four time the variance of any other sample group.

Repeated-measures ANOVA has a similar but more rigorous requirement called sphericity. Sphericity exists when the variances of the differences between data pairs from the same subjects are the same across all possible combinations of sample groups.

Violation of sphericity makes repeated-measures ANOVA more likely to produce Type 1 Errors, i.e., detecting a significant difference when one does not exist – a false positive. When sphericity is shown to be violated, a correction is applied to the test that makes the test more conservative (less likely to detect a significant difference) by increasing the test result’s p Value. Tests of sphericity and the correction calculation will be demonstrated in Excel in this blog article.

### Two Sphericity Tests

Two hypothesis tests are used to determine if Sphericity exists. The weaker but more popular of the two sphericity hypothesis tests is __ Mauchly’s Test of Sphericity__. The more powerful but less frequently employed sphericity test is called

__. Both of these tests will be demonstrated in detail on the data used in this example in the next blog article.__

**John, Nagao, and Sugiura’s Test of Sphericity**If the Null Hypothesis of either Mauchly’s Sphericity Test or John, Nagao, and Sugiura’s Test of Sphericity can be rejected, then a correction should be applied to df_{between} and df_{error} that will ultimately make the test more conservative and less powerful by increasing the final p Value of the repeated-measures ANOVA.

### Epsilon Estimation Corrections

If the sphericity requirement has been violated, then the degree to which sphericity is violated needs to be calculated. The statistic that describes how much sphericity is violated is called Epsilon (?). Epsilon is a number between 1 and 0. The further from 1 that Epsilon is, the greater is the violation of sphericity.

Sphericity can only be estimated because the available data are sample data and not population data. There are two methods commonly used to estimate Epsilon: the __ Geisser-Greenhouse procedure__ and the

__. The estimate of Sphericity (Epsilon) that each of these procedures calculates is used to correct df__

**Huynd-Feldt procedure**_{between}and df

_{error}in a way that makes test less powerful by increasing the final p value.

The blog article following this one will provided detailed instructions on how to perform the Geisser-Greenhouse procedure and the Huynd-Feldt procedure in Excel on the data used in this example and make then corrections to the degrees of freedom.

## Sphericity Testing For Repeated-Measures ANOVA Example in Excel

The example used in this article to demonstrate sphericity testing for repeated-measures ANOVA in Excel is the same example as in the previous blog article about how to perform repeated-measure ANOVA in Excel. A company implemented a four-week training program to reduce clerical errors. Five employees underwent this training program. The number of clerical errors that each trainee committed during each week as the training progressed was recorded. These data are shown as follows:

*(Click On Image To See a Larger Version)*

The data were analyzed with both single-factor ANOVA and with repeated-measures ANOVA with the following results:

*(Click On Image To See a Larger Version)*

Repeated-measures ANOVA detects a significant difference between sample groups (p Value = 0.0123) while single-factor ANOVA does not (p Value = 0.0767) if a 95-percent level of confidence (alpha = 0.05) is required.

Single-factor repeated-measures ANOVA has the following five required assumptions:

1) Sample data are continuous.

2) The independent variable is categorical

3) Extreme outliers have been removed in necessary

4) Sample groups are normally distributed

5) Sphericity

Assuming that first four required assumptions for repeated-measures ANOVA have been met, sphericity should now be evaluated.

Sphericity exists when the variances of the differences between data pairs from the same subjects are the same across all possible combinations of sample groups. Remember that all sample groups for a repeated-measures ANOVA test consist of measurements taken from the same set of subjects at different time intervals or in different conditions. Violation of sphericity makes a repeated-measures ANOVA test more likely to produce a false positive (a Type 1 error). If sphericity is shown to exist, a correction should be applied to both degrees of freedom which will increase the final p Value of the repeated-measures ANOVA test. Increasing the p Value reduces the power of the test (makes it less likely that the test will detect a difference) in order to compensate for the test’s increased tendency to produce a false positive result due to the data’s violation of sphericity.

Following are the 9 steps of sphericity testing and Epsilon estimation correction:

__Sphericity Testing and Estimated Epsilon Correction for Single-Factor Repeated-Measures ANOVA in Excel - The 9 Steps __

**Step 1) Create the Covariance Matrix From the Sample Groups**

**Step 2) Calculate the Eigenvalues of the Covariance Matrix**

**Step 3) Conduct Mauchly’s Test of Sphericity**

**Step 4) Conduct John, Nagao, and Sugiura’s Test of Sphericity**

**Step 5) Determine Whether Sphericity Exists**

**Step 6) Calculate the Greenhouse-Geisser Correction**

**Step 7) Calculate the Huyhn-Feldt Correction**

**Step 8) Apply the Correction to the Degrees of Freedom and Recalculate p Value**

Here is a detailed description of the performance of each step:

## Step 1) Create the Covariance Matrix From the Sample Groups

Both sphericity tests require the use of Eigenvalues that are calculated from the covariance matrix of the raw data sample matrix. Eigenvalues are the characteristic roots of a set of linear equations that a matrix represents.

The first step in determining the Eigenvalues of the covariance matrix is to create the covariance matrix from the raw data sample matrix. Creating the covariance matrix from the raw data matrix is performed in Excel as follows. The Excel formulas are shown for the cells in column BE29:BE32.

*(Click On Image To See a Larger Version)*

The diagonals of the covariance matrix should agree with the results of the single-factor ANOVA previously performed on the raw data. Variance, σ^{2}, is a special case of covariance, σ_{xy}, when the two variables are the same.

σ(X,X) = σ^{2}(X)

*(Click On Image To See a Larger Version)*

## Step 2) Calculate the Eigenvalues of the Covariance Matrix

### Method 1 – Use an Online Tool

The easiest way to find all Eigenvalues for any symmetric matrix is to use an automated online tool such as the following:

http://www.bluebit.gr/matrix-calculator/

Input the Covariance matrix data as follows:

*(Click On Image To See a Larger Version)*

Hitting the Calculate button will produce the following output:

*(Click On Image To See a Larger Version)*

We now know that the Covariance matrix has the following 4 Eigenvalues:

λ_{4} = 5073.607973

λ_{3} = 211.9174057

λ_{2} = 7.37740705

λ_{1} = 0.087529249

### Method 2 – Calculate the Eigenvalues in Excel

This method requires a little more work and a bit of guessing. The first step is to create the Identity matrix of the Covariance matrix as follows:

*(Click On Image To See a Larger Version)*

The next step is to create the [A-cI] matrix.

[A – cI] = Covariance Matrix – Covariance Matrix Eigenvalue * Identity Matrix

The Covariance Matrix, A, is contained in cells BZ23:CC26

The Identity Matrix is contained in cells BZ29:CC32

The Covariance Matrix Eigenvalue is contained in cell BZ47. It has been initially set to a value of 10000.

In Excel, to create the [A-cI] matrix in cells BZ39:CC42, perform the following actions:

1) Highlight cells BZ39:CC42

2) While these cells are still highlighted (the active cells), type =BZ23:CC26-BZ47*BZ29:CC32

3) Simultaneously hit CTRL/SHIFT/ENTER and the [A-cI] matrix will be created in cells BZ39:CC42

*(Click On Image To See a Larger Version)*

Once again, the [A-cI] matrix is contained in cells BZ39:CC42.

[A – cI] = Covariance Matrix – Covariance Matrix Eigenvalue * Identity Matrix

The Determinant of the [A – cI] is contained in cell BZ50 and is calculated by the following formula:

MDETERM(BZ39:CC42)

The Eigenvalues that are needed for both sphericity tests are the Eigenvalues of the Covariance matrix.

An Eigenvalue of matrix A is a number that causes the determinant of matrix [A – cI] to be 0. In this case matrix A is the Covariance matrix.

The Excel tool Goal Seek can be used to find the values of the Eigenvalue in cell BZ47 that causes Det [A – cI] to assume a value of 0. The Goal Seek tool is part of the What-If Analysis found under the Data tab in Excel.

Goal Seek is a relatively straight-forward tool that calculates the value in a specific cell (the Eigenvalue in cell BZ47) that will cause another cell (Det [A – cI] in cell BZ50) to assume a specified value (0).

A 4X4 matrix should have 4 Eigenvalues. The quickest way to find those 4 Eigenvalues is to first find the highest and lowest values and then use a bit of guesswork to find the middle 2 values.

Find the highest Eigenvalue by setting the value in cell BZ47 to 10000 and running Goal Seek to find the value of BZ47 that will make Det[A – cI] (cell BZ50) equal 0. This Goal Seek operation is shown as follows:

*(Click On Image To See a Larger Version)*

Clicking OK will calculate the number in cell BZ47 that is closest to 10000 that will cause cell BZ50 to assume a value of 0. This value is cell BZ47 is the highest of the 4 Eigenvalues and has the following value:

λ_{4} = 5073.607973

Note that this value agrees with the largest Eigenvalue found by the online tool.

*(Click On Image To See a Larger Version)*

The next step is to find the lowest Eigenvalue. Do this by setting cell BZ47 (the Eigenvalue of the Covariance matrix) to a very low value. In this case cell BZ47 was set to -10000 and Goal Seek was run again.

*(Click On Image To See a Larger Version)*

Clicking OK will calculate the number in cell BZ47 that is closest to -10000 that will cause cell BZ50 to assume a value of 0. This new value in cell BZ47 is the lowest of the 4 Eigenvalues and has the following value:

λ_{1} = 0.087529249

Note that this value agrees with the lowest Eigenvalue found by the online tool.

*(Click On Image To See a Larger Version)*

Finding the middle 2 Eigenvalues involves a bit of guess work. One of the those 2 Eigenvalues values can be found by setting the Eigenvalue in cell BZ47 to the halfway point between the lowest Eigenvalue (0.087529249) and the highest Eigenvalue (5073.607973). This middle value is 2536. Running Goal Seek with Cell BZ47 set at 2536 is shown as follows:

*(Click On Image To See a Larger Version)*

This newly calculated Eigenvalue of 211.9174057 shown below is the 2^{nd} highest of the 4 Eigenvalues.

*(Click On Image To See a Larger Version)*

Running Goal Seek a final time with cell BZ47 set at the halfway point between the 2^{nd} highest Eigenvalue (211.9174057) and the lowest Eigenvalue (0.087529249) will produce the 3^{rd} highest Eigenvalue. The halfway point is 105 and that Goal Seek operation is shown as follows:

*(Click On Image To See a Larger Version)*

This final Goal Seek operation produces the 3^{rd} highest Eigenvalue which is the following:

λ_{2} = 7.37740705

*(Click On Image To See a Larger Version)*

We now know that the Covariance matrix has the following 4 Eigenvalues:

λ_{4} = 5073.607973

λ_{3} = 211.9174057

λ_{2} = 7.37740705

λ_{1} = 0.087529249

These Eigenvalues of the Covariance Matrix agree with the Eigenvalues calculated by the online tool.

## Step 3) Conduct Mauchly’s Test of Sphericity in Excel

Mauchly’s Test of Sphericity is the most widely used hypothesis test to evaluate whether sphericity exists in the raw data. Mauchly’s test is, however not the most accurate test because of its small-sample tendency toward Type 2 errors (a false negative, i.e., failing to detect a difference when there is one) and its large-sample tendency toward Type 1 errors (a false positive, i.e., detecting a difference when none exists). A more powerful test is the John, Nagao, and Sugiura’s Test of Sphericity which will be discussed in the next step.

Mauchly’s test, which is a hypothesis test, can be summed up as follows:

The Null Hypothesis states that sphericity exist, i.e., the variances of the differences between data pairs from the same subjects are the same across all possible combinations of sample groups. This is another way of stating that the covariances (the off-diagonal elements of the covariance matrix) are equal.

Test Statistic W is calculated from k (the number of sample groups), n (the number of data points in each sample group), and the 4 Eigenvalues that were just calculated.

Critical values of Test Statistic W are available but W can be quickly transformed into X_{w}^{2}

The distribution of X_{w}^{2 }can be approximated by the Chi-Square distribution with degrees of freedom df = k/2*(k-1).

This hypothesis test’s p Value can then be calculated in Excel as follows:

p Value = CHISQ.DIST.RT(X_{w}^{2},df)

A p Value of less than alpha (usually set at 0.05) indicates that the Null Hypothesis stating that sphericity exists can be rejected.

All of these calculations in Excel are shown as follows:

The first step is to calculate k (the number of sample groups), n (the number of data points in each sample group), and degrees of freedom, df_{W}. The degrees of freedom for Mauchly’s test is found by this formula:

Calculation of k, n, and df_{W} are shown as follows:

*(Click On Image To See a Larger Version)*

The next step is to calculate f from k and n with this formula:

*(Click On Image To See a Larger Version)*

Test Statistic W is then calculated using k and the 4 Eigenvalues of the Covariance matrix using the following formula:

These calculations in Excel are shown as follows:

*(Click On Image To See a Larger Version)*

There are critical values calculated for Test Statistic W but an easier way to perform this hypothesis test is to convert W into X_{w}^{2} using the following formula:

*(Click On Image To See a Larger Version)*

These calculations are shown in Excel as follows:

*(Click On Image To See a Larger Version)*

The distribution of X_{w}^{2} can be approximated by the Chi-Square distribution with df_{w} degrees of freedom.

A p Value for this hypothesis test can then be found with the following formula:

These calculations in Excel are shown as follows:

*(Click On Image To See a Larger Version)*

The small p Value of this Mauchly’s Test of Sphericity indicates departure from sphericity. It is important to note that not using all of the Eigenvalues significantly weakens the power of the test. When Eigenvalues are calculated manually in Excel instead of using an automated tool, occasionally the highest Eigenvalue is not located.

The p Value for Mauchly’s Test is calculated using only the 3 lowest Eigenvalues but not the highest one. The p value is significantly larger as a result. This means that the test has lost some of its power.

## Step 4) Conduct John, Nagao, and Sugiura’s Test of Sphericity in Excel

Mauchly’s Test of Sphericity is the most widely used hypothesis test to evaluate whether sphericity exists in the raw data. Mauchly’s test is, however not the most accurate test because of its small-sample tendency toward Type 2 errors (a false negative, i.e., failing to detect a difference when there is one) and its large-sample tendency toward Type 1 errors (a false positive, i.e., detecting a difference when none exists). A more powerful test is the John, Nagao, and Sugiura’s Test of Sphericity which will be discussed here.

John, Nagao, and Sugiura’s test, which is a hypothesis test, can be summed up as follows:

The Null Hypothesis states that sphericity exist, i.e., the variances of the differences between data pairs from the same subjects are the same across all possible combinations of sample groups. This is another way of stating that the covariances (the off-diagonal elements of the covariance matrix) are equal.

Test Statistic V is calculated from the 4 Eigenvalues that were just calculated.

Critical values of Test Statistic V are available but V can be quickly transformed into X_{V}^{2}

The distribution of X_{V}^{2 }can be approximated by the Chi-Square distribution with degrees of freedom df = k/2*(k-1) - 1.

This hypothesis test’s p Value can then be calculated in Excel as follows:

p Value = CHISQ.DIST.RT(X_{V}^{2},df)

A p Value of less than alpha (usually set at 0.05) indicates that the Null Hypothesis stating that sphericity exists can be rejected.

All of these calculations in Excel are shown as follows:

The first step is to calculate k (the number of sample groups), n (the number of data points in each sample group), and degrees of freedom, df_{V}. The degrees of freedom for John, Nagao, and Sugiura’s test is found by this formula:

These calculations are performed in Excel as follows:

*(Click On Image To See a Larger Version)*

Test Statistic V is then calculated from the 4 Eigenvalues with the following formula:

These calculations are performed in Excel as follows:

*(Click On Image To See a Larger Version)*

There are critical values calculated for Test Statistic V but an easier way to perform this hypothesis test is to convert V into X_{V}^{2} using the following formula:

These calculations are shown in Excel as follows:

*(Click On Image To See a Larger Version)*

The distribution of X_{V}^{2} can be approximated by the Chi-Square distribution with df_{V} degrees of freedom.

A p Value for this hypothesis test can then be found with the following formula:

These calculations in Excel are shown as follows:

*(Click On Image To See a Larger Version)*

The small p Value of this John, Nagao, and Sugiura’s Test of Sphericity indicates departure from sphericity. Usually this test is more powerful than Mauchly’s Test but it was not in this case as evidenced by Mauchly’s smaller p Value.

## Step 5) Determine Whether Sphericity Exists

Sphericity exists if the Null Hypothesis for both Mauchly's test of Sphericity and John, Nagao, and Sugiura's Test of Sphericity cannot be rejected.

The Null Hypotheses of both sphericity tests state that the covariances (the off-diagonal elements of the covariance matrix) are equal. If either test's p Value is less than alpha (usually set at 0.05), then the test's Null Hypothesis that sphericity existis can be rejected.

If this Null Hypothesis is rejected, an adjustment should be made reducing the degrees of freedom of both between subjects and for error. The reduction in degrees of freedom will increase the final p value for the Repeated ANOVA thereby reducing the overall power of the repeated ANOVA test.

p Value_{Mauchly} = 0.00167

p Value_{John, Nagao, and Sugiura} = 0.00536

Both p Values are less than alpha (0.05).

The Null Hypothesis that sphericity exists is rejected.

The degrees of freedom for the error term and for between subjects must now be adjusted by one of two methods. The next step is to determine which of the two adjustments to apply to the degrees of freedom used to calculate the final p Value for the repeated-measure ANOVA.

If sphericity cannot be rejected, the original p Value calculated for the repeated-measures ANOVA is the final answer. If sphericity is violated, the p value calculated for the repeated-measures ANOVA test will need to be increased (making the test less powerful in order to account for the violation of sphericity) by applying a correction to both degrees of freedom used to calculate that p Value. The two variations of this correction are discussed in the following steps.

## Step 6) Calculate the Greenhouse-Geisser Correction in Excel

If the required assumption of sphericity has been violated, the repeated-measures ANOVA test become too liberal, i.e., develops a tendency to produce Type 1 errors (false positives – detecting a difference when none exists). To counteract this tendency an adjustment should be made which will increase the ANOVA test’s final p Value by an amount that corresponds to the degree to which sphericity has been violated. Increasing the p Value makes the test less powerful and therefore less likely to register a false positive.

When sphericity is found to be violated, there are two corrections that are commonly applied that will increase the test’s p Value thereby reducing the likelihood of the test registering a false positive (a type 1 error). The two corrections are the Greenhouse-Geisser correction and the Huyhn-Feldt correction. The degree to which sphericity is violated and also the relative cost of a Type 1 versus a Type 2 error determines which of the two corrections should be applied.

### Epsilon, Є, Equals the Index of Sphericity Between Populations

The index of sphericity between population data is called Epsilon, ?. Epsilon is a number between 0 and 1. A value of 1 indicates perfect sphericity between populations. Epsilon can only be estimated because only samples of the populations are available for analysis. The two methods of estimating the population Epsilon are Є_{GG} (the Greenhouse-Geisser estimate of Epsilon) and Є_{HF} (the Huyhn-Feldt estimate of Epsilon). One of these estimates of Epsilon is designated as the correction that the degrees of freedom (df_{between} and df_{error}) are each multiplied by when calculating the revised (increased) p Value for the repeated-measure ANOVA test to account for the test’s increased tendency to register a false positive result due to violation of data sphericity.

As mentioned, Epsilon is a number between 0 and 1. The smaller that the estimate of Epsilon becomes, the great will be the revised p Value when the correction is applied by multiplying by degrees of freedom by Epsilon. The repeated-measures ANOVA test’s p Value is calculated in Excel as follows:

p Value = CHISQ.DIST.RT(F Value, df_{between}, df_{error})

The correction does not have any effect on the F value. Decreasing both degrees of freedom will increase the p Value, which makes the test less likely to register a false positive (commit a Type 1 error).

Є_{GG}, the Greenhouse-Geisser estimate of Epsilon, is viewed as being slightly too low and therefore too conservative. The lower the estimate of Epsilon is, the greater will be the test’s p Value. The greater the test’s p Value, the more conservative the test becomes. Є_{GG} is often used when Epsilon is estimated to be less than 0.75. The overall estimate of Epsilon is the average between Є_{GG} and Є_{HF}.

Є_{HF}, the Huyhn-Feldt estimate of Epsilon, is viewed as being slightly too high and overestimates Epsilon. Є_{HF} is used when Epsilon is estimated to be more than 0.75.

Є_{GG}, the Greenhouse-Geisser estimate of Epsilon, is calculated using the following formula:

*(Click On Image To See a Larger Version)*

k = Number of groups

Mean _{Cov_Matrix_Diag} = Mean of Diagonals of Covariance Matrix = Mean of Raw Data Group Variances

Mean _{Cov_Matrix} = Mean of Covariance Matrix

SS _{Cov_Matrix} = Sum of Squares of Covariance Matrix

SS _{Cov_Matrix_Column_Means} = Sum of Squares of Means of Covariance Matrix Rows

These calculations are shown in Excel as follows:

*(Click On Image To See a Larger Version)*

*(Click On Image To See a Larger Version)*

*(Click On Image To See a Larger Version)*

Є_{GG}, the Greenhouse-Geisser estimate of Epsilon, is calculated to be 0.415

## Step 7) Calculate the Huyhn-Feldt Correction in Excel

Є_{HF}, the Huyhn-Feldt estimate of Epsilon, is calculated using the following formula:

These calculations are shown in Excel as follows:

*(Click On Image To See a Larger Version)*

Є_{HF}, the Huyhn-Feldt estimate of Epsilon, is calculated to be 0.51302.

## Step 8) Determine Which Epsilon Correction To Apply When Sphericity Is Violated

The relationship between the repeated-measures ANOVA test’s power, the p Value, Є_{HF}, and Є_{GG} is shown in the following diagram:

*(Click On Image To See a Larger Version)*

The lower bound of the estimates of Epsilon is calculated by 1/(k-1). Note that the lower the estimate of Epsilon is, the lower becomes the test power and the greater becomes the p Value. The higher the test’s p Value becomes, the less likely it is that the test will detect a difference. The lower that the estimate of Epsilon is, the higher the p Value becomes.

__Determining Which Correction To Apply__

There are two rules that can be applied to determine which of the two corrections to apply when sphericity is found to be violated.

**Rule 1) **

If Type 1 Error (False Positive) is more expensive, use the conservative (lower power) adjustment. This is the Greenhouse-Geisser adjustment, Є_{GG}.

If Type 2 Error (False Negative) is more expensive, use the adjustment that increases test power. This is the Huynh-Feldt adjustment, Є_{HF}.

**Rule 2) **

If the Estimated Epsilon (the average of Є_{GG} and Є_{HF}) is less than 0.75, use Є_{GG}. If the Estimated Epsilon is greater than 0.75, use Є_{HF}. If the Estimated Epsilon is very low and sample size is relatively large (n equals at least k + 10), MANOVA can be used in placed of Repeated-Measure ANOVA because MANOVA does not require sphericity.

In this case, the Estimated Epsilon is approximately 0.46 and there is no mention of the cost difference between Type 1 and Type 2 errors. The Greenhouse-Geisser correction, Є_{GG}, will therefore be used to adjust the degrees of freedom used to calculate the final p Value for this Repeated-Measure ANOVA.

## Step 9) Apply the Correction to the Degrees of Freedom and Recalculate the p Value in Excel

Below is the repeated-measures ANOVA table before the correction is applied:

*(Click On Image To See a Larger Version)*

The correction to be applied is the following:

The p Value will be calculated by multiplying df_{between} and df_{error} by the Greenhouse-Geisser correction, Є_{GG}. Note that this only affects the final p Value but not the F Value.

Є_{GG} = 0.415770093

Below is the Repeated-Measure ANOVA table after the Greenhouse-Geisser Correction is applied to df_{between} and df_{error}.

*(Click On Image To See a Larger Version)*

The correction used to compensate for the lack of sphericity reduced the power of the test to the point that a difference was not detected, if alpha is set to 0.05.

**Excel Master Series Blog Directory**

**Click Here To See a List Of All Statistical Topics And Articles In This Blog**

**You Will Become an Excel Statistical Master!**

Everything is fine and dandy, except for some reason I keep getting a p-value of 0.0609 with the changed dfs (which match yours). The original p-value is consistent, 0.0123. Am I missing something really obvious, or did you manage to make some sort of typo happen between the two final screenshots?

ReplyDelete