Monday, May 4, 2015

How To Create a Completely Automated Shapiro-Wilk Normality Test in Excel in 8 Steps

How To Create a

Completely Automated

Shapiro-Wilk Normality

Test in Excel in 8 Steps

The purpose of this blog article is to show how to create a completely automated Shapiro-Wilk test for normality in Excel. The test is completely automated because the user has only to enter the raw, unsorted data sample to be tested for normality along with the alpha level desired for the test. Upon insertion of the sample data and alpha, the entire test will be automatically run and the test’s output will be immediately returned.

As with any hypothesis test, the output states whether to reject or not reject the Null Hypothesis and the specified alpha level. In this case, the Null Hypothesis states that the data sample came from a normally distributed population. This Null Hypothesis is rejected only if Test Statistic W is less than the critical value of W for a given alpha level and sample size.

 

Overview of the Shapiro-Wilk Test For Normality

The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample comes from a normally distributed population is rejected. Keep in mind that passing a hypothesis test for normality only allows one to state that no significant departure from normality was found.

The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior performance against other normality tests, especially with small sample sizes. Superior performance means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly higher percentage of times than most other normality tests, particularly at small sample sizes.

The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-Smirnov normality test. The Shapiro-Wilk normality test is affected by tied data values but less so than the Anderson-Darling normality test.

An abbreviated description of the steps of the Shapiro-Wilk normality test is as follows:

  • Sort the raw data in an ascending sort

  • Arrange the sorted data into pairs of successive highest and lowest data values

  • Calculate the difference between the high and low value of each pair

  • Multiply each difference by an “a Value” from a lookup table

  • Calculate test statistic W

  • Compare test statistic W with a W Critical Value obtained from a lookup table

  • Reject the Null Hypothesis stating normality only if W is smaller than W Critical

The complete Shapiro-Wilk test of normality appears in Excel as follows:

Shapiro-Wilk Normality Test in Excel - Complete Shapiro-Wilk Test in Excel
(Click On Image To See a Larger Version)

 

Step 1 – Sort Data in an Ascending Sort

 

Enter the Raw Data and Alpha Level

First, paste in the raw, unsorted data and set the alpha level in the yellow cells as follows:

Shapiro-Wilk Normality Test in Excel - Inserting Raw Data and Alpha
(Click On Image To See a Larger Version)

This Shapiro-Wilk normality test in Excel has the capability to handle up to 20 data points. This data sample contains only 15 data points but the Excel worksheet below indicates the capability for an additional 5 data points in the empty yellow cells. Expanding the capability of this Excel test to handle a larger number of data points would be a straightforward matter of making adjustments to the Excel formulas.

The alpha level should be set at 0.01, 0.05, or 0.10 because critical values of W are provided here only for those common alpha levels. If alpha is set at a different level, the following warning appears as a result of the If-Then-Else statement seen in the previous image:

Shapiro-Wilk Normality Test in Excel - Incorrect Alpha
(Click On Image To See a Larger Version)

 

Sort the Raw Data Using Formulas, Not the Excel Sorting Tool

The most efficient way to sort data is with the use of formulas as shown. The Excel sorting tool must be manually re-run whenever the raw data is changed in any way. The following formula will automatically resort the data if any data are changed or added. The following formulas produce an ascending sort. To create a descending sort, simply swap the word LARGE for SMALL within the formula.

Shapiro-Wilk Normality Test in Excel - Sorting Data With Formulas
(Click On Image To See a Larger Version)

 

Step 2 – Obtain “a Values” From the “a Value Lookup Table”

 

The Table of a Values

The a Values are constants that are calculated from the means, variances, and covariances of the order statistics of a sample of size n from a normal distribution. The a Value table is shown as follows for data samples varying in size from n = 2 up to n = 50.

Shapiro-Wilk Normality Test in Excel - a Table
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - a Table0
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - a Table
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - a Table
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - a Table
(Click On Image To See a Larger Version)

 

Create Labels for the a Values

The first step in obtaining the correct a Values is to create the proper number of labels for the a Values.

The number of a Values needed depends on the sample size n. If n is an even number, the number of a Values equals n / 2. If n is an odd number, the number of a Values equals (n-1) / 2. This can be implemented with the following formulas:

Shapiro-Wilk Normality Test in Excel - Numbered As
(Click On Image To See a Larger Version)

 

Lookup a Values on the a Value Lookup Table

The a Values can now be looked up on the a Value table previously shown. An Index formula is a straightforward way implementing this. The Excel Index function is its simplest form has the following format:

Index(array, row number, column number) and returns the data value located in the specified row and column of the given array. The row and column number are relative to the cell in the upper-left corner of the array. Note that the array containing the a Values starts in cell H58 at the upper left corner and finishes in cell BE82 in the lower right corner.

Shapiro-Wilk Normality Test in Excel - A Values
(Click On Image To See a Larger Version)

 

Step 3 – Pair Up Successive Highest-Lowest Data Values

 

Create X Labels for Data Sample Values

Each sorted data sample value will have an X Label. This is implemented with the following formulas:

Shapiro-Wilk Normality Test in Excel - Numbered Xs
(Click On Image To See a Larger Version)

The following is a close-up of the formulas to create the labels for the X Values:

Shapiro-Wilk Normality Test in Excel - Closeup Numbered Xs
(Click On Image To See a Larger Version)

The X Values are then obtained using the following formulas:

Shapiro-Wilk Normality Test in Excel - X Values
(Click On Image To See a Larger Version)

 

Create the Upper Value of Each Data Pair

The labels for the upper X Values of each data pair are created as follows:

Shapiro-Wilk Normality Test in Excel - Upper Xs The following is a close-up of the formulas to create the labels for the upper X Values:

Shapiro-Wilk Normality Test in Excel - Closeup Upper Xs
(Click On Image To See a Larger Version)

The upper X Values of each data pair are the obtained using the following set of formulas:

Shapiro-Wilk Normality Test in Excel - Upper X Values
(Click On Image To See a Larger Version)

Here is a close-up of those formulas:

Shapiro-Wilk Normality Test in Excel - Closeup Upper X Values
(Click On Image To See a Larger Version)

 

Create the Lower Value of Each Data Pair

The labels for the lower X Values of each data pair are created as follows:

Shapiro-Wilk Normality Test in Excel - Lower Xs
(Click On Image To See a Larger Version)

The following is a close-up of the formulas to create the labels for the lower X Values:

Shapiro-Wilk Normality Test in Excel - Closeup Lower Xs
(Click On Image To See a Larger Version)

The lower X Values of each data pair are the obtained using the following set of formulas:

Shapiro-Wilk Normality Test in Excel - Lower X Values
(Click On Image To See a Larger Version)

Here is a close-up of those formulas:

Shapiro-Wilk Normality Test in Excel - Closeup Lower X Values
(Click On Image To See a Larger Version)

 

Step 4 – Calculate the Difference Within Each Pair

Shapiro-Wilk Normality Test in Excel - Pair Differences
(Click On Image To See a Larger Version)

Here is a close-up of the difference formulas:

Shapiro-Wilk Normality Test in Excel - Closeup Pair Differences
(Click On Image To See a Larger Version)

 

Step 5 – Calculate a * Difference

Shapiro-Wilk Normality Test in Excel - A Times Differences
(Click On Image To See a Larger Version)

Here is a close-up of those formulas:

Shapiro-Wilk Normality Test in Excel - Closeup A Times Differences
(Click On Image To See a Larger Version)

 

Step 6 – Calculate b, SS, and Test Statistic W

b equals the sums of the product of a * pair difference.

SS is the sum of the squared deviations of x from the mean x. Excel formula DEVSQ() performs this.

Test Statistic W equals b2/SS.

Shapiro-Wilk Normality Test in Excel - b, SS, Test Statistic W
(Click On Image To See a Larger Version)

Here is a close-up of these formulas:

Shapiro-Wilk Normality Test in Excel - Closeup b, SS, Test Statistic W
(Click On Image To See a Larger Version)

 

Step 7 – Lookup W Critical Values

Each unique combination of sample size and alpha level has its own critical W value. The following is a table of the respective critical W value for each combination of n and alpha up to a sample size of 50 and 3 common levels of alpha.

Shapiro-Wilk Normality Test in Excel - Critical W Table
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - Critical W Table
(Click On Image To See a Larger Version)

Shapiro-Wilk Normality Test in Excel - Loookup Critical W
(Click On Image To See a Larger Version)

Here is a close-up of the formulas. The If-Then-Else statement that looks up the critical W value is the following:

=IF(OR(B1=0.01,B1=0.05,B1=0.1),INDEX(BL18:BN67,U14,IF(U12=0.01,1,IF(U12=0.05,2,IF(U12=0.1,3)))),”alpha must be set to 0.01, 0.05, or 0.10”)

The INDEX() function looks up the critical W values in the array starting in cell BL18 in the upper left corner and ending in cell BN67 in the lower right corner.

Shapiro-Wilk Normality Test in Excel - Closeup Loookup Critical W
(Click On Image To See a Larger Version)

 

Step 8 – Determine Whether or Not To Reject Null Hypothesis By Comparing W to W Critical

The Null Hypothesis is rejected only if Test Statistic W is smaller the critical W value for the given sample size and alpha level. The Null Hypothesis states that the sample came from a normally distributed population. The Null Hypothesis is never accepted but only rejected or not rejected. Not rejecting the Null Hypothesis does not automatically imply that the population from which the sample was taken is normally distributed. This merely indicates that there is not even evidence to state that the population is likely not normally distributed for the given alpha level. As with all hypothesis tests, alpha = 1 - level of certainty. If a 95-percent level of certainty is required to reject the Null Hypothesis, the alpha level will be 0.05.

Shapiro-Wilk Normality Test in Excel - Comparison W and Critical W
(Click On Image To See a Larger Version)

Here is a close-up of the formulas to compare W with W Critical:

Shapiro-Wilk Normality Test in Excel - Closeup Comparison W and Critical W
(Click On Image To See a Larger Version)

The complete Shapiro-Wilk test of normality appears in Excel as follows:

Shapiro-Wilk Normality Test in Excel - Complete Shaprio-Wilk Normality Test in Excel
(Click On Image To See a Larger Version)

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

5 comments:

  1. Excell is must needed working sheet in many official purposes.There are many ways to learn how the people can improve the lesson of the excell sheet.The http://www.procustomwriting.com/custom-term-papers will help the people to get an idea about how they can improve the quality of the learning the Sahpiro on excell sheet.

    ReplyDelete
  2. Not really much of a formula user in excel because I only know the basics. But with these tips on how to completely automate sheet if will be more easy for me to generate reports, not like the tips posted via Custom Papers review which are not helpful and legit after all.

    ReplyDelete
  3. I always enjoy your blog, thank you! I want to tell about an excellent way to write something. My friend advised me the resource https://www.thewritingkid.com/ last year. Since that time, I always use it when I need to write something which I don't want to write. I buy only interesting and well-written articles and abstracts. Such saving of time! I hope the information will be useful for you too.

    ReplyDelete
  4. I'd like an explanation of the "a Value" constants. How are they calculated from the means, variances, and covariances of the order statistics of a sample of size n from a normal distribution? What does this mean?

    ReplyDelete