Saturday, May 31, 2014

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

This is one of the following six articles on 1-Sample t-Tests in Excel

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013

1-Sample t-Test Power With G*Power Utility

Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

 

Evaluating the Normality

of the Sample Data For

the One-Sample t-Test in

Excel

The following five normality tests will be performed in Excel on the sample data here:

An Excel histogram of the sample data will be created.

A normal probability plot of the sample data will be created in Excel.

The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.

The Anderson-Darling test for normality of the sample data will be performed in Excel.

The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

 

Histogram in Excel

The quickest way to check the sample data for normality is to create an Excel histogram of the data as shown below, or to create a normal probability plot of the data if you have access to an automated method of generating that kind of a graph.

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

The sample group appears to be distributed reasonably closely to the bell-shaped normal distribution. It should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes could result in an output that would not appear bell-shaped at all. What is actually set by the user in an Excel histogram is the upper boundary of each bin.

 

Normal Probability Plot in Excel

Another way to graphically evaluate normality of each data sample is to create a normal probability plot for each sample group. This can be implemented in Excel and appears as follows:

clip_image004 (Click On Image and See a larger Version)

The normal probability plots for the sample group show that the data appears to be very close to being normally distributed. The actual sample data (red) matches very closely the data values of the sample were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval boundaries (green).

 

Kolmogorov-Smirnov Test For

Normality in Excel

The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test states that the distribution of actual data points matches the distribution that is being tested. In this case the data sample is being compared to the normal distribution.

The largest distance between the CDF of any data point and its expected CDF is compared to Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

0.1500 = Max Difference Between Actual and Expected CDF

20 = n = Number of Data Points

0.05 = α

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Max Difference Between the Actual and Expected CDF (0.1500) is less than the Kolmogorov-Smirnov Critical Value for n = 20 and α = 0.05 so do not reject the Null Hypothesis.

The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data are normally distributed, is rejected if the maximum difference between the expected and actual CDF of any of the data points exceed the Critical Value for the given n and α.

 

Anderson-Darling Test For

Normality in Excel

The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Anderson-Darling Test calculates a Test Statistic based upon the actual value of each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were perfectly normally distributed.

The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test for the following two reasons:

The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is therefore more sensitive to the specific distribution.

The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov test. The K-S test is less sensitive to aberration in outer values than the A-D test.

If the Test Statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the Test Statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

Adjusted Test Statistic A* = 0.407

Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally distributed if any the following are true:

A* > 0.576 When Level of Significance (α) = 0.15

A* > 0.656 When Level of Significance (α) = 0.10

A* > 0.787 When Level of Significance (α) = 0.05

A* > 1.092 When Level of Significance (α) = 0.01

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given n and α.

The Adjusted Test Statistic (A*) for the Difference Sample Group (0.407) is significantly less than the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for the sample group is accepted.

 

Shapiro-Wilk Test For Normality in

Excel

The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. A Test Statistic W is calculated. If this Test Statistic is less than a critical value of W for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is normally distributed is rejected.

The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior performance against other normality tests, especially with small sample sizes. Superior performance means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly higher percentage of times than most other normality tests, particularly at small sample sizes.

The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-Smirnov normality test.

Sample Data

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

0.967452 = Test Statistic W

0.905 = W Critical for the following n and Alpha

20 = n = Number of Data Points

0.05 = α

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Shapiro-Wilk Test Statistic W (0.967452) is larger than W Critical 0.905. The Null Hypothesis therefore cannot be rejected. There is not enough evidence to state that the data are not normally distributed with a confidence level of 95 percent.

 

Correctable Reasons That Normal

Data Can Appear Non-Normal

If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation of whether any of the following factors have caused normally-distributed data to appear to be non-normally-distributed:

 

1) Outliers

– Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-distributed data.

 

2) Data Has Been Affected by More Than One Process

– Variations to a process such as shift changes or operator changes can change the distribution of data. Multiple modal values in the data are common indicators that this might be occurring. The effects of different inputs must be identified and eliminated from the data.

 

3) Not Enough Data

– Normally-distributed data will often not assume the appearance of normality until at least 25 data points have been sampled.

 

4) Measuring Devices Have Poor Resolution

– Sometimes (but not always) this problem can be solved by using a larger sample size.

 

5) Data Approaching Zero or a Natural Limit

– If a large number of data values approach a limit such as zero, calculations using very small values might skew computations of important values such as the mean. A simple solution might be to raise all the values by a certain amount.

 

6) Only a Subset of a Process’ Output Is Being Analyzed

– If only a subset of data from an entire process is being used, a representative sample in not being collected. Normally-distributed results would not appear normally distributed if a representative sample of the entire process is not collected.

 

When Data Are Not Normally

Distributed

The Sign Test and Wilcoxon One-Sample Signed-Rank Test are nonparametric alternative to the one-sample t-test when the normality assumption of sampled data is questionable. The one-sample t-test is used to evaluate whether a population from which samples are drawn has the same mean as a known value. The nonparametric tests evaluate whether the sample have the same median as a known value.

The Sign Test is a much less powerful alternative to the Wilcoxon One-Sample Signed-Rank test, but does not assume that the differences between the samples and the known value is symmetrical about a median, as does the Wilcoxon One-Sample Signed-Rank test when used as a nonparametric alternative to the one-sample t-test. The Sign Test is non-directional and can be substituted only for a two-tailed test but not for a one-tailed test.

The parametric one-sample, two-tailed t-Test that is currently being in this section detected a difference at alpha = 0.05. The Wilcoxon One-Sample Signed-Rank Test also detected a difference at alpha = 0.05. The Sign Test was not able to detect a difference at alpha = 0.25.

Both the Wilcoxon One-Sample Signed-Rank Test and the Sign Test will be performed on the data in this example in blog articles shortly following this one.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

2 comments:

  1. A lower bound on the minimum difference based on adjusted results is at least 5.4% . If a most likely value (80th percentile) for the sample group with regard to hair loss were 15%, then this would be about 150 hairs per person; if it were 14%, that would still be 70 and so forth.

    ReplyDelete
  2. I find this post so cool. This can help a lot of people. lanai screening Venice, FL

    ReplyDelete