Saturday, May 31, 2014

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

This is one of the following six articles on 1-Sample t-Tests in Excel

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013

1-Sample t-Test Power With G*Power Utility

Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

 

One-Sample t-Test in 4

Steps in Excel

This hypothesis test determines whether the mean of the population from which the sample was taken is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant is often the known mean of a population from which the sample may have come from. The constant is the constant on the right side of the Null Hypothesis.

x_bar = Observed Sample Mean

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test

(Click On Image and See a larger Version)

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

df = n - 1

Null Hypothesis H0: x_bar = Constant

The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:

1) The observed x_bar is beyond the Critical Value.

2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.

3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

 

Example of a 1-Sample, 2-Tailed t-

Test in Excel

This problem is very similar to the problem solved in the z-test section for a one-sample, two-tailed t-test. Similar problems were used in each of these sections to show the similarities and also contrast the differences between the one-sample z-Test and t-test as easily as possible.

This problem compares average monthly sales from one fast food chain’s retail stores in one region with the average monthly sales of all of the fast food chain’s retails in the entire country. The region being evaluated has more than 1,000 very similar stores. The national mean monthly retail store sales equals $186,000.

Determine with at least 95% certainty whether the average monthly sales of all of the fast food chain’s stores in the one region is different than the national average monthly sales of all of the fast food chain’s stores.

The data sample of sales for the same month for a random sample of 20 fast food retail stores in a region is as follows:

$240,000 $180,000 $200,000 $260,000 $200,000

$220,000 $180,000 $200,000 $200,000 $180,000

$160,000 $180,000 $200,000 $220,000 $220,000

$140,000, $220,000 $200,000 $240,000 $160,000

Running the Excel data analysis tool Descriptive Statistics will provide the Sample Mean, the Sample Standard Deviation, the Standard Error, and the Sample Size. The output of this tool appears as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

 

Summary of Problem Information

x_bar = sample mean = AVERAGE() = 200,000

µ = national (population) mean = 186,000

s = sample standard deviation =STDEV.S() = 29735.68

σ (Greek letter “sigma”) = population standard deviation = Not Known

n = sample size = COUNT() = 20

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

Note that this calculation of the Standard Error using the sample standard deviation, s, is an estimate of the true Standard Error which would be calculated using the population standard deviation, σ.

SE = 6649.10

df = degrees of freedom = n – 1 = 20 – 1 = 19

Level of Certainty = 0.95

Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then proceed to the four-step method of solving the hypothesis test that follows.

 

The Initial Two Questions That Must be Answered Satisfactorily

What Type of Test Should Be Done?

Have All of the Required Assumptions For This Test Been Met?

 

The Four-Step Method For Solving All Hypothesis Tests of Mean

Step 1) Create the Null Hypothesis and the Alternate Hypothesis

Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions To Be Answered Before Performing the Four-Step Hypothesis Test of Mean are as follows:

 

Question 1) Type of Test?

a) Hypothesis Test of Mean or Proportion?

This is a Hypothesis Test of Mean because each individual observation (each sampled monthly sales figure) within the sample can have a wide range of values. Data observations for Hypothesis Tests of Proportion are binary: they can take only one of two possible values.

b) One-Sample or a Two-Sample Test?

This is a one-sample hypothesis test because only one sample containing monthly sales figures from twenty stores has been taken and is being compared to the national monthly retail store average for the same month.

c) Independent (Unpaired) Test or a Dependent (Paired) Test?

It is neither. The designation of “paired” or “unpaired” applies only for two-sample hypothesis tests.

d) One-Tailed or Two-Tailed Hypothesis?

The problem asks to determine whether the twenty-store monthly average is simply different than the national average. This is a non-directional inequality making this hypothesis test a two-tailed test. If the problem asked whether the twenty-store average was greater than or less than the national average, the inequality would be directional and the resulting hypothesis test would be a one-tailed test. A two-tailed test is more stringent than a one-tailed test.

e) t-Test or z-Test?

Assuming that the population or sample can pass a normality test, a hypothesis test of mean must be performed as a t-Test when the sample size is small (n < 30) or if the population variance is unknown.

In this case the sample size is small as n = 20. This Hypothesis Test of Mean must therefore be performed as a t-Test and not as a z-Test.

The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data sample of sample size n taken from a normal population.

The means of samples taken from a normal population are also distributed according to the t Distribution with degrees of freedom = df = n – 1.

The Test Statistic (the t Value, which is based upon sample mean (x_bar) because it equals (x_bar – Constant)/(s/SQRT(n)) will therefore also be distributed according to the t Distribution. A t-Test will be performed if the Test Statistic is distributed according to the t Distribution.

The distribution of the Test Statistic for sample taken from a normal population is always described by the t Distribution. The shape of the t Distribution converges to (very closely resembles) the shape of the standard normal distribution when sample size becomes large (n > 30).

The Test Statistic’s distribution can be approximated by the normal distribution only if the sample size is large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if the Test Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in all other cases.

It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages including the well-known SPSS provide only t-Tests and no direct z-Tests.

This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as all required assumptions have been met.

 

Question 2) Test Requirements Met?

a) t-Distribution of Test Statistic

A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under the Null Hypothesis by the t Distribution. The Test Statistic is derived from the mean of the sample taken and therefore has the same distribution that the sample mean would have if multiple similar samples were taken from the same population.

The sample size indicates how to determine the distribution of the sample mean and therefore the distribution of the Test Statistic as follows:

 

When Sample Size Is Large

When the sample size is large (n > 30), the distribution of means of similar samples drawn from the same population is described by the t Distribution. As per the Central Limit Theorem, as sample size increases, the distribution of the sample means converges to the normal distribution as does the t Distribution. When sample size approaches infinity, the t Distribution converges to the standard normal distribution.

When sample size is large, the distribution of the sample mean, and therefore the distribution of the Test Statistic, is always described by the t Distribution. A t-Test can therefore always be used when sample size is large, regardless of the distribution of the population or sample.

 

When Sample Size is Small

The data in a sample taken from a normally-distributed population will be distributed according to the t Distribution regardless of sample size.

The means of similar random samples taken from a normally-distributed population are also distributed according to the t Distribution regardless of sample size.

The sample mean, and therefore the Test Statistic, are distributed according to the t Distribution if the population is normally distributed.

The population is considered to be normally distributed if any of the following are true:

1) The population from which the sample was taken is shown to be normally distributed.

2) The sample is shown to be normally distributed. If the sample passes a test of normality then the population from which the sample was taken can be assumed to be normally distributed.

The population or the sample must pass a normality test before a t-Test can be performed. If the only data available are the data of the single data taken, then sample must pass a normality test before a t-Test can be performed.

 

Evaluating the Normality of the Sample Data

The following five normality tests will be performed in Excel on the sample data in a blog article shortly following this one:

An Excel histogram of the sample data will be created.

A normal probability plot of the sample data will be created in Excel.

The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.

The Anderson-Darling test for normality of the sample data will be performed in Excel.

The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

 

We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:

Step 1) Create the Null Hypothesis and the Alternate Hypothesis

Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

 

Step 1 – Create the Null and Alternate Hypotheses

The Null Hypothesis is always an equality that states that the items being compared are the same. In this case, the Null Hypothesis would state that the average monthly sales of all stores in the region (the population from which the twenty-store sample was taken) is not different than the national monthly store average sales, µ, which is $186,000. We will use the variable x_bar to represent the sample mean of the twenty stores. The Null Hypothesis is as follows:

H0: x_bar = Constant = 186,000

The Constant is quite often the known population mean, µ, to which the sample mean is being compared.

The Alternative Hypothesis is always in inequality and states that the two items being compared are different. This hypothesis test is trying to determine whether the average monthly sales of all stores in the region (the population from which the twenty-store sample was taken) is merely different than the national monthly store average sales, µ, which is $186,000.

The Alternative Hypothesis is as follows:

H1: x_bar ≠ Constant, which is 186,000

H1: x_bar ≠ 186,000

The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous (requires a greater differences between the two entities being compared before the test shows that there is a difference) than a one-tailed test.

It is important to note that the Null and Alternative Hypotheses refer to the means of the populations from which the samples were taken. A one-sample t-Test determines whether to reject or fail to reject the Null Hypothesis that states that that population from which the sample was taken (the entire region) has a mean equal to the Constant. The Constant in this case is equal to known national average.

Parameters necessary to map the distributed variable, x_bar, to the t Distribution are the following:

s = sample standard deviation =STDEV.S() = 29735.68

n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

df = degrees of freedom = n – 1 = 20 – 1 = 19

 

Step 2 – Map the Distributed Variable to t-Distribution

A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-related Test Statistic are distributed according to the t Distribution.

The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its degrees of freedom, which equals n – 1 if n = sample size.

The means of similar, random samples taken from a normal population are distributed according to the t Distribution. This means that the distribution of a large number of means of samples of size n taken from a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.

The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.

The t Distribution is usually presented in its finalized form with standardized values of a mean that equals zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the distributed variable is the t Value (the Test Statistic) as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure used to take the samples. The distributed variable would be the sample mean, x_bar.

The variable x_bar is distributed according to the t Distribution. Mapping this distributed variable to a t Distribution curve is shown as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null Hypothesis, which is:

H0: x_bar = Constant = 186,000

This non-standardized t Distribution curve is constructed from the following parameters:

Mean = 186,000

Standard Error = 6,649.10

Degrees of Freedom = 19

Distributed Variable = x_bar

 

Step 3 – Map the Regions of Acceptance and Rejection

The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a given level of certainty. If the two things being compared are far enough apart from each other, the Null Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying to show graphically how different the sample mean, x_bar = $200.000, is from the national average of $186,000.

The non-standardized t Distribution curve can be divided up into two types of regions: the Region of Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of Rejection is called a Critical Value.

If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null Hypothesis is not rejected.

The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.

This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve that is the Region of Rejection.

The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The locations of these Critical values need to be calculated.

 

Calculate Critical Values

A Critical Value is the boundary between a Region of Acceptance and a Region of Rejection. In the case of a two-tailed test, the Region of rejection is split between two outer tails. There are therefore two Critical Values.

The Critical Value is the boundary on either side of the curve beyond which 2.5 percent of the total area under the curve exists. In this case both Critical Values can be found by the following:

Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE

Critical Values = Mean ± T.INV(1-α/2,df) * SE

Critical Values = 186,000 ± T.INV(0.975, 19) * 6649.1

Critical Values = 186,000 ± 13,916

Critical Values = 172,083 and 199,916

The Region of Rejection is therefore everything that is to the right of 199,916 and everything to the left of 172,083.

The non-standardized t Distribution curve with the blue Region of Acceptance and the yellow Regions of Rejection divided by the Critical Values is shown is in the following Excel-generated graph of this non-standardized t Distribution curve:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

 

Step 4 – Determine Whether to Reject Null Hypothesis

The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these tests needs to be performed because all three provide equivalent information. The three tests are as follows:

 

1) Compare Sample Mean x-bar With Critical Value

Reject the Null Hypothesis if the sample mean, x_bar = 200,000, falls into the Region of Rejection.

Equivalently, reject the Null Hypothesis if the sample mean, x_bar, is further the curve’s mean of 186,000 the Critical Value.

The Critical Values have been calculated to be 172,083 on the left and 199,916 on the right. X_bar (200,000) is further from the curve mean (186,000) than right

Critical Value (199,916). The Null Hypothesis would therefore be rejected.

 

2) Compare the t Value With Critical t Value

The t Value is the number of Standard Errors that x_bar is from the curve’s mean of 186,000.

The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.

Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t Value.

Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the Critical t Value.

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

t Value (Test Statistic) = (200,000 – 186,000)/6,649.1

t Value (Test Statistic) = 2.105

This means that the sample mean, x_bar, is 2.105 standard errors from the curve mean (186,000).

Critical t Values = ±T.INV(1-α/2,df)

Critical t Values = ±T.INV(1-0.05/2,19)

Critical t Values = ±2.093

This means that the boundaries of the Region of Rejection are 2.093 standard errors from the curve mean (186,000) on each side since this is a two-tailed test.

The Null Hypothesis is rejected because the t Value (2.105) is farther from the standardized mean of zero than the Critical t Value on that side (+2.093) indicating that x_bar is in the Region of Rejection.

 

3) Compare the p Value With Alpha

The p Value is the percent of the curve that is beyond x_bar (200,000). If the p Value is smaller than Alpha/2 (since this is a two-tailed test), the Null Hypothesis is rejected.

p Value = T.DIST.RT(ABS(t Value), df)

p Value = T.DIST.RT(ABS(2.105), 19)

p Value = 0.0244

The p Value (0.0244) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is smaller than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails. This is shown in the following Excel-generated graph of this non-standardized t Distribution curve:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test (Click On Image and See a larger Version)

It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test, the Null Hypothesis would still have been rejected because:

1) The p Value (0.0244) would still be smaller than the Alpha (0.05) Region of Rejection, which is now entirely contained in the right tail

2) x_bar (200,000) would still be outside the Region of Acceptance, which would now have its outer right boundary at 197,497.2 (mean + T.INV(1 - Alpha,df)*SE)

3) The t Value (2.105) would still be larger than the critical t Value which would now be 1.73 (Critical t Value = T.INV(1 - Alpha,df))

 

Excel Shortcut to Performing a One-Sample t-Test

All of the three other types of t-Tests (two-independent-sample pooled and unpooled t-Tests along with the paired t-Test) can be solved in one step with a built-in Excel formula and also with a built-in Data Analysis tool for each t-Test.

Excel unfortunately does not provide a formula or tool that can perform or solve a one-sample t-Test in one step. Interestingly enough, a one-sample z-Test can be solved in Excel in one step with the following formula:

p Value = MIN(Z.TEST(array,Constant,σ),1- Z.TEST(array,Constant,σ))

array = Set of sample data

Constant = the Constant in the Null Hypothesis

σ = Population standard deviation

There is no such method in Excel to perform a one-sample t-Test similarly in a single step. The other three types of t-Tests each have a one-step tool and a one-step formula. One of the main reasons that these tools and formulas are one-step is that the t Value is calculated automatically. There is no one-sample t-Test tool or formula that automatically calculates the t Value while performing the t-Test or calculating the p Value. The t Value must be calculated in its own step when performing a one-sample t-Test in Excel.

The formula needed to perform a one-sample t-Test is the following as previously shown:

p Value = T.DIST.RT(ABS(t Value), df)

This formula requires that the t Value be calculated first. This must be done manually using the following steps:

t Value = (x-bar – Constant)/SE

SE = s/SQRT(n)

The one-sample t-Test is a very common statistical test so it is surprising that Excel does not have a one-step formula or a Data Analysis tool to directly calculate either the p Value or t Value given the array and the Constant from the Null Hypothesis. Each of the other three types of t-Tests has its own specific formulas and its own Data Analysis tools to perform either entire the t-Test or calculate the p Value in a single step.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

This is one of the following six articles on 1-Sample t-Tests in Excel

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013

1-Sample t-Test Power With G*Power Utility

Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

 

Evaluating the Normality

of the Sample Data For

the One-Sample t-Test in

Excel

The following five normality tests will be performed in Excel on the sample data here:

An Excel histogram of the sample data will be created.

A normal probability plot of the sample data will be created in Excel.

The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.

The Anderson-Darling test for normality of the sample data will be performed in Excel.

The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

 

Histogram in Excel

The quickest way to check the sample data for normality is to create an Excel histogram of the data as shown below, or to create a normal probability plot of the data if you have access to an automated method of generating that kind of a graph.

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

The sample group appears to be distributed reasonably closely to the bell-shaped normal distribution. It should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes could result in an output that would not appear bell-shaped at all. What is actually set by the user in an Excel histogram is the upper boundary of each bin.

 

Normal Probability Plot in Excel

Another way to graphically evaluate normality of each data sample is to create a normal probability plot for each sample group. This can be implemented in Excel and appears as follows:

clip_image004 (Click On Image and See a larger Version)

The normal probability plots for the sample group show that the data appears to be very close to being normally distributed. The actual sample data (red) matches very closely the data values of the sample were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval boundaries (green).

 

Kolmogorov-Smirnov Test For

Normality in Excel

The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test states that the distribution of actual data points matches the distribution that is being tested. In this case the data sample is being compared to the normal distribution.

The largest distance between the CDF of any data point and its expected CDF is compared to Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

0.1500 = Max Difference Between Actual and Expected CDF

20 = n = Number of Data Points

0.05 = α

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Max Difference Between the Actual and Expected CDF (0.1500) is less than the Kolmogorov-Smirnov Critical Value for n = 20 and α = 0.05 so do not reject the Null Hypothesis.

The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data are normally distributed, is rejected if the maximum difference between the expected and actual CDF of any of the data points exceed the Critical Value for the given n and α.

 

Anderson-Darling Test For

Normality in Excel

The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Anderson-Darling Test calculates a Test Statistic based upon the actual value of each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were perfectly normally distributed.

The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test for the following two reasons:

The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is therefore more sensitive to the specific distribution.

The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov test. The K-S test is less sensitive to aberration in outer values than the A-D test.

If the Test Statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the Test Statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

Adjusted Test Statistic A* = 0.407

Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally distributed if any the following are true:

A* > 0.576 When Level of Significance (α) = 0.15

A* > 0.656 When Level of Significance (α) = 0.10

A* > 0.787 When Level of Significance (α) = 0.05

A* > 1.092 When Level of Significance (α) = 0.01

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given n and α.

The Adjusted Test Statistic (A*) for the Difference Sample Group (0.407) is significantly less than the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for the sample group is accepted.

 

Shapiro-Wilk Test For Normality in

Excel

The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. A Test Statistic W is calculated. If this Test Statistic is less than a critical value of W for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is normally distributed is rejected.

The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior performance against other normality tests, especially with small sample sizes. Superior performance means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly higher percentage of times than most other normality tests, particularly at small sample sizes.

The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-Smirnov normality test.

Sample Data

normality, t test, kolmogorov, anderson-darling, shapiro-wilk, histogram, excel, excel 2010, excel 2013, statistics (Click On Image and See a larger Version)

0.967452 = Test Statistic W

0.905 = W Critical for the following n and Alpha

20 = n = Number of Data Points

0.05 = α

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Shapiro-Wilk Test Statistic W (0.967452) is larger than W Critical 0.905. The Null Hypothesis therefore cannot be rejected. There is not enough evidence to state that the data are not normally distributed with a confidence level of 95 percent.

 

Correctable Reasons That Normal

Data Can Appear Non-Normal

If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation of whether any of the following factors have caused normally-distributed data to appear to be non-normally-distributed:

 

1) Outliers

– Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-distributed data.

 

2) Data Has Been Affected by More Than One Process

– Variations to a process such as shift changes or operator changes can change the distribution of data. Multiple modal values in the data are common indicators that this might be occurring. The effects of different inputs must be identified and eliminated from the data.

 

3) Not Enough Data

– Normally-distributed data will often not assume the appearance of normality until at least 25 data points have been sampled.

 

4) Measuring Devices Have Poor Resolution

– Sometimes (but not always) this problem can be solved by using a larger sample size.

 

5) Data Approaching Zero or a Natural Limit

– If a large number of data values approach a limit such as zero, calculations using very small values might skew computations of important values such as the mean. A simple solution might be to raise all the values by a certain amount.

 

6) Only a Subset of a Process’ Output Is Being Analyzed

– If only a subset of data from an entire process is being used, a representative sample in not being collected. Normally-distributed results would not appear normally distributed if a representative sample of the entire process is not collected.

 

When Data Are Not Normally

Distributed

The Sign Test and Wilcoxon One-Sample Signed-Rank Test are nonparametric alternative to the one-sample t-test when the normality assumption of sampled data is questionable. The one-sample t-test is used to evaluate whether a population from which samples are drawn has the same mean as a known value. The nonparametric tests evaluate whether the sample have the same median as a known value.

The Sign Test is a much less powerful alternative to the Wilcoxon One-Sample Signed-Rank test, but does not assume that the differences between the samples and the known value is symmetrical about a median, as does the Wilcoxon One-Sample Signed-Rank test when used as a nonparametric alternative to the one-sample t-test. The Sign Test is non-directional and can be substituted only for a two-tailed test but not for a one-tailed test.

The parametric one-sample, two-tailed t-Test that is currently being in this section detected a difference at alpha = 0.05. The Wilcoxon One-Sample Signed-Rank Test also detected a difference at alpha = 0.05. The Sign Test was not able to detect a difference at alpha = 0.25.

Both the Wilcoxon One-Sample Signed-Rank Test and the Sign Test will be performed on the data in this example in blog articles shortly following this one.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic