Saturday, May 31, 2014

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

This is one of the following six articles on 1-Sample t-Tests in Excel

1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013

Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013

1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013

1-Sample t-Test Power With G*Power Utility

Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

 

One-Sample t-Test in 4

Steps in Excel

This hypothesis test determines whether the mean of the population from which the sample was taken is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant is often the known mean of a population from which the sample may have come from. The constant is the constant on the right side of the Null Hypothesis.

x_bar = Observed Sample Mean

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test

(Click On Image and See a larger Version)

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

df = n - 1

Null Hypothesis H0: x_bar = Constant

The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:

1) The observed x_bar is beyond the Critical Value.

2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.

3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

 

Example of a 1-Sample, 2-Tailed t-

Test in Excel

This problem is very similar to the problem solved in the z-test section for a one-sample, two-tailed t-test. Similar problems were used in each of these sections to show the similarities and also contrast the differences between the one-sample z-Test and t-test as easily as possible.

This problem compares average monthly sales from one fast food chain’s retail stores in one region with the average monthly sales of all of the fast food chain’s retails in the entire country. The region being evaluated has more than 1,000 very similar stores. The national mean monthly retail store sales equals $186,000.

Determine with at least 95% certainty whether the average monthly sales of all of the fast food chain’s stores in the one region is different than the national average monthly sales of all of the fast food chain’s stores.

The data sample of sales for the same month for a random sample of 20 fast food retail stores in a region is as follows:

$240,000 $180,000 $200,000 $260,000 $200,000

$220,000 $180,000 $200,000 $200,000 $180,000

$160,000 $180,000 $200,000 $220,000 $220,000

$140,000, $220,000 $200,000 $240,000 $160,000

Running the Excel data analysis tool Descriptive Statistics will provide the Sample Mean, the Sample Standard Deviation, the Standard Error, and the Sample Size. The output of this tool appears as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

 

Summary of Problem Information

x_bar = sample mean = AVERAGE() = 200,000

µ = national (population) mean = 186,000

s = sample standard deviation =STDEV.S() = 29735.68

σ (Greek letter “sigma”) = population standard deviation = Not Known

n = sample size = COUNT() = 20

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

Note that this calculation of the Standard Error using the sample standard deviation, s, is an estimate of the true Standard Error which would be calculated using the population standard deviation, σ.

SE = 6649.10

df = degrees of freedom = n – 1 = 20 – 1 = 19

Level of Certainty = 0.95

Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then proceed to the four-step method of solving the hypothesis test that follows.

 

The Initial Two Questions That Must be Answered Satisfactorily

What Type of Test Should Be Done?

Have All of the Required Assumptions For This Test Been Met?

 

The Four-Step Method For Solving All Hypothesis Tests of Mean

Step 1) Create the Null Hypothesis and the Alternate Hypothesis

Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions To Be Answered Before Performing the Four-Step Hypothesis Test of Mean are as follows:

 

Question 1) Type of Test?

a) Hypothesis Test of Mean or Proportion?

This is a Hypothesis Test of Mean because each individual observation (each sampled monthly sales figure) within the sample can have a wide range of values. Data observations for Hypothesis Tests of Proportion are binary: they can take only one of two possible values.

b) One-Sample or a Two-Sample Test?

This is a one-sample hypothesis test because only one sample containing monthly sales figures from twenty stores has been taken and is being compared to the national monthly retail store average for the same month.

c) Independent (Unpaired) Test or a Dependent (Paired) Test?

It is neither. The designation of “paired” or “unpaired” applies only for two-sample hypothesis tests.

d) One-Tailed or Two-Tailed Hypothesis?

The problem asks to determine whether the twenty-store monthly average is simply different than the national average. This is a non-directional inequality making this hypothesis test a two-tailed test. If the problem asked whether the twenty-store average was greater than or less than the national average, the inequality would be directional and the resulting hypothesis test would be a one-tailed test. A two-tailed test is more stringent than a one-tailed test.

e) t-Test or z-Test?

Assuming that the population or sample can pass a normality test, a hypothesis test of mean must be performed as a t-Test when the sample size is small (n < 30) or if the population variance is unknown.

In this case the sample size is small as n = 20. This Hypothesis Test of Mean must therefore be performed as a t-Test and not as a z-Test.

The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data sample of sample size n taken from a normal population.

The means of samples taken from a normal population are also distributed according to the t Distribution with degrees of freedom = df = n – 1.

The Test Statistic (the t Value, which is based upon sample mean (x_bar) because it equals (x_bar – Constant)/(s/SQRT(n)) will therefore also be distributed according to the t Distribution. A t-Test will be performed if the Test Statistic is distributed according to the t Distribution.

The distribution of the Test Statistic for sample taken from a normal population is always described by the t Distribution. The shape of the t Distribution converges to (very closely resembles) the shape of the standard normal distribution when sample size becomes large (n > 30).

The Test Statistic’s distribution can be approximated by the normal distribution only if the sample size is large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if the Test Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in all other cases.

It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages including the well-known SPSS provide only t-Tests and no direct z-Tests.

This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as all required assumptions have been met.

 

Question 2) Test Requirements Met?

a) t-Distribution of Test Statistic

A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under the Null Hypothesis by the t Distribution. The Test Statistic is derived from the mean of the sample taken and therefore has the same distribution that the sample mean would have if multiple similar samples were taken from the same population.

The sample size indicates how to determine the distribution of the sample mean and therefore the distribution of the Test Statistic as follows:

 

When Sample Size Is Large

When the sample size is large (n > 30), the distribution of means of similar samples drawn from the same population is described by the t Distribution. As per the Central Limit Theorem, as sample size increases, the distribution of the sample means converges to the normal distribution as does the t Distribution. When sample size approaches infinity, the t Distribution converges to the standard normal distribution.

When sample size is large, the distribution of the sample mean, and therefore the distribution of the Test Statistic, is always described by the t Distribution. A t-Test can therefore always be used when sample size is large, regardless of the distribution of the population or sample.

 

When Sample Size is Small

The data in a sample taken from a normally-distributed population will be distributed according to the t Distribution regardless of sample size.

The means of similar random samples taken from a normally-distributed population are also distributed according to the t Distribution regardless of sample size.

The sample mean, and therefore the Test Statistic, are distributed according to the t Distribution if the population is normally distributed.

The population is considered to be normally distributed if any of the following are true:

1) The population from which the sample was taken is shown to be normally distributed.

2) The sample is shown to be normally distributed. If the sample passes a test of normality then the population from which the sample was taken can be assumed to be normally distributed.

The population or the sample must pass a normality test before a t-Test can be performed. If the only data available are the data of the single data taken, then sample must pass a normality test before a t-Test can be performed.

 

Evaluating the Normality of the Sample Data

The following five normality tests will be performed in Excel on the sample data in a blog article shortly following this one:

An Excel histogram of the sample data will be created.

A normal probability plot of the sample data will be created in Excel.

The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.

The Anderson-Darling test for normality of the sample data will be performed in Excel.

The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

 

We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:

Step 1) Create the Null Hypothesis and the Alternate Hypothesis

Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

 

Step 1 – Create the Null and Alternate Hypotheses

The Null Hypothesis is always an equality that states that the items being compared are the same. In this case, the Null Hypothesis would state that the average monthly sales of all stores in the region (the population from which the twenty-store sample was taken) is not different than the national monthly store average sales, µ, which is $186,000. We will use the variable x_bar to represent the sample mean of the twenty stores. The Null Hypothesis is as follows:

H0: x_bar = Constant = 186,000

The Constant is quite often the known population mean, µ, to which the sample mean is being compared.

The Alternative Hypothesis is always in inequality and states that the two items being compared are different. This hypothesis test is trying to determine whether the average monthly sales of all stores in the region (the population from which the twenty-store sample was taken) is merely different than the national monthly store average sales, µ, which is $186,000.

The Alternative Hypothesis is as follows:

H1: x_bar ≠ Constant, which is 186,000

H1: x_bar ≠ 186,000

The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous (requires a greater differences between the two entities being compared before the test shows that there is a difference) than a one-tailed test.

It is important to note that the Null and Alternative Hypotheses refer to the means of the populations from which the samples were taken. A one-sample t-Test determines whether to reject or fail to reject the Null Hypothesis that states that that population from which the sample was taken (the entire region) has a mean equal to the Constant. The Constant in this case is equal to known national average.

Parameters necessary to map the distributed variable, x_bar, to the t Distribution are the following:

s = sample standard deviation =STDEV.S() = 29735.68

n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)

df = degrees of freedom = n – 1 = 20 – 1 = 19

 

Step 2 – Map the Distributed Variable to t-Distribution

A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-related Test Statistic are distributed according to the t Distribution.

The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its degrees of freedom, which equals n – 1 if n = sample size.

The means of similar, random samples taken from a normal population are distributed according to the t Distribution. This means that the distribution of a large number of means of samples of size n taken from a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.

The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.

The t Distribution is usually presented in its finalized form with standardized values of a mean that equals zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the distributed variable is the t Value (the Test Statistic) as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure used to take the samples. The distributed variable would be the sample mean, x_bar.

The variable x_bar is distributed according to the t Distribution. Mapping this distributed variable to a t Distribution curve is shown as follows:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null Hypothesis, which is:

H0: x_bar = Constant = 186,000

This non-standardized t Distribution curve is constructed from the following parameters:

Mean = 186,000

Standard Error = 6,649.10

Degrees of Freedom = 19

Distributed Variable = x_bar

 

Step 3 – Map the Regions of Acceptance and Rejection

The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a given level of certainty. If the two things being compared are far enough apart from each other, the Null Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying to show graphically how different the sample mean, x_bar = $200.000, is from the national average of $186,000.

The non-standardized t Distribution curve can be divided up into two types of regions: the Region of Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of Rejection is called a Critical Value.

If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null Hypothesis is not rejected.

The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.

This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve that is the Region of Rejection.

The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The locations of these Critical values need to be calculated.

 

Calculate Critical Values

A Critical Value is the boundary between a Region of Acceptance and a Region of Rejection. In the case of a two-tailed test, the Region of rejection is split between two outer tails. There are therefore two Critical Values.

The Critical Value is the boundary on either side of the curve beyond which 2.5 percent of the total area under the curve exists. In this case both Critical Values can be found by the following:

Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE

Critical Values = Mean ± T.INV(1-α/2,df) * SE

Critical Values = 186,000 ± T.INV(0.975, 19) * 6649.1

Critical Values = 186,000 ± 13,916

Critical Values = 172,083 and 199,916

The Region of Rejection is therefore everything that is to the right of 199,916 and everything to the left of 172,083.

The non-standardized t Distribution curve with the blue Region of Acceptance and the yellow Regions of Rejection divided by the Critical Values is shown is in the following Excel-generated graph of this non-standardized t Distribution curve:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

 

Step 4 – Determine Whether to Reject Null Hypothesis

The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these tests needs to be performed because all three provide equivalent information. The three tests are as follows:

 

1) Compare Sample Mean x-bar With Critical Value

Reject the Null Hypothesis if the sample mean, x_bar = 200,000, falls into the Region of Rejection.

Equivalently, reject the Null Hypothesis if the sample mean, x_bar, is further the curve’s mean of 186,000 the Critical Value.

The Critical Values have been calculated to be 172,083 on the left and 199,916 on the right. X_bar (200,000) is further from the curve mean (186,000) than right

Critical Value (199,916). The Null Hypothesis would therefore be rejected.

 

2) Compare the t Value With Critical t Value

The t Value is the number of Standard Errors that x_bar is from the curve’s mean of 186,000.

The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.

Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t Value.

Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the Critical t Value.

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test
(Click On Image and See a larger Version)

t Value (Test Statistic) = (200,000 – 186,000)/6,649.1

t Value (Test Statistic) = 2.105

This means that the sample mean, x_bar, is 2.105 standard errors from the curve mean (186,000).

Critical t Values = ±T.INV(1-α/2,df)

Critical t Values = ±T.INV(1-0.05/2,19)

Critical t Values = ±2.093

This means that the boundaries of the Region of Rejection are 2.093 standard errors from the curve mean (186,000) on each side since this is a two-tailed test.

The Null Hypothesis is rejected because the t Value (2.105) is farther from the standardized mean of zero than the Critical t Value on that side (+2.093) indicating that x_bar is in the Region of Rejection.

 

3) Compare the p Value With Alpha

The p Value is the percent of the curve that is beyond x_bar (200,000). If the p Value is smaller than Alpha/2 (since this is a two-tailed test), the Null Hypothesis is rejected.

p Value = T.DIST.RT(ABS(t Value), df)

p Value = T.DIST.RT(ABS(2.105), 19)

p Value = 0.0244

The p Value (0.0244) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is smaller than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails. This is shown in the following Excel-generated graph of this non-standardized t Distribution curve:

t test, t-test, excel, excel 2010, excel 2013, statistics, t distribution,hypothesis test (Click On Image and See a larger Version)

It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test, the Null Hypothesis would still have been rejected because:

1) The p Value (0.0244) would still be smaller than the Alpha (0.05) Region of Rejection, which is now entirely contained in the right tail

2) x_bar (200,000) would still be outside the Region of Acceptance, which would now have its outer right boundary at 197,497.2 (mean + T.INV(1 - Alpha,df)*SE)

3) The t Value (2.105) would still be larger than the critical t Value which would now be 1.73 (Critical t Value = T.INV(1 - Alpha,df))

 

Excel Shortcut to Performing a One-Sample t-Test

All of the three other types of t-Tests (two-independent-sample pooled and unpooled t-Tests along with the paired t-Test) can be solved in one step with a built-in Excel formula and also with a built-in Data Analysis tool for each t-Test.

Excel unfortunately does not provide a formula or tool that can perform or solve a one-sample t-Test in one step. Interestingly enough, a one-sample z-Test can be solved in Excel in one step with the following formula:

p Value = MIN(Z.TEST(array,Constant,σ),1- Z.TEST(array,Constant,σ))

array = Set of sample data

Constant = the Constant in the Null Hypothesis

σ = Population standard deviation

There is no such method in Excel to perform a one-sample t-Test similarly in a single step. The other three types of t-Tests each have a one-step tool and a one-step formula. One of the main reasons that these tools and formulas are one-step is that the t Value is calculated automatically. There is no one-sample t-Test tool or formula that automatically calculates the t Value while performing the t-Test or calculating the p Value. The t Value must be calculated in its own step when performing a one-sample t-Test in Excel.

The formula needed to perform a one-sample t-Test is the following as previously shown:

p Value = T.DIST.RT(ABS(t Value), df)

This formula requires that the t Value be calculated first. This must be done manually using the following steps:

t Value = (x-bar – Constant)/SE

SE = s/SQRT(n)

The one-sample t-Test is a very common statistical test so it is surprising that Excel does not have a one-step formula or a Data Analysis tool to directly calculate either the p Value or t Value given the array and the Constant from the Null Hypothesis. Each of the other three types of t-Tests has its own specific formulas and its own Data Analysis tools to perform either entire the t-Test or calculate the p Value in a single step.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

No comments:

Post a Comment