Saturday, May 31, 2014

2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013

This is one of the following four articles on z-Tests in Excel

Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013

One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013

2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013

Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013

 

Two-Independent

Sample, Unpooled z-Test

in 4 Steps in Excel

 

This hypothesis test evaluates two independent samples to determine whether the difference between the two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant.

This is an unpooled test. An unpooled test can always be used in place of a pooled test. An unpooled test must be used when population variances are not similar. An unpooled test calculates the Standard Error using separate standard deviations instead of combining them into a single, pooled standard deviation as a pooled test does.

The t-Test is nearly always used to compare two independent samples. For this reason, only the unpooled, two-independent-sample z-Test will be covered. The pooled version of this z-Test will not be covered.

In the real world, the only the sample variances are known but the population variances are usually not known and therefore t-tests are nearly always used to perform a two-independent-sample hypothesis test of mean. For this reason, only the unpooled, two-independent-sample z-Test will be explained. This z-Test can always be used in place of the pooled z-Test that could be used if population variances were known to be similar enough.

x_bar1 - x_bar2 = Observed difference between the sample means

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test
(Click On Image To See a Larger Version)

Note that this is the same formula for SE for the two-independent-sample, unpooled t-test except that the variance for the z-Test is the population variance as follows:

var1 = σ12

var2 = σ22

and not the sample variance used for the t-test as follows:

var1 = s12

var2 = s22

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test 
(Click On Image To See a Larger Version)

Null Hypothesis H0: x_bar1 - x_bar2 = Constant

The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:

1) The observed x_bar1 - x_bar2 is beyond the Critical Value.

2) The z Score (the Test Statistic) is farther from zero than the Critical t Value

3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

 

Example of 2-Sample, 2-Tailed,

Unpooled z-Test in Excel

This problem is very similar to the problem solved in the t-test section for a two-independent-sample, two-tailed t-test. Similar problems were used in each of these sections to show the similarities and also contrast the differences between the two-independent-sample z-Test and t-test as easily as possible.

Two shifts on a production are being compared to determine if there is a difference in the average daily number of units produced by each shift. The two shifts operate eight hours per day under nearly identical conditions that remain fairly constant from day to day. A sample of the total number of units produced by each shift on a random selection of days is taken. Determine with a 95 percent Level of Confidence if there is a difference between the average daily number of units produced by the two shifts.

Note that when performing two-sample z-tests in Excel, always designate Sample 1 (Variable 1) to be the sample with the larger mean.

The results of the two-independent-sample z-Test will be more intuitive if the sample group with the larger mean is designated as the first sample and the sample group with the smaller mean is designated as the second sample.

Details about both data samples are shown as follows:

 

Summary of Problem Information

 

Sample Group 1 – Shift A (Variable 1)

x_bar1 = sample1 mean = 46.55

µ1 (Greek letter “mu”) = population mean from which Sample 1 was drawn = Not Known

σ1 (Greek letter “sigma”) = population standard deviation from which Sample 1 was drawn = 25.5

Var1 = population1 variance = σ12 = 650.25

n1 = sample1 size = 40

 

Sample Group 2 – Shift B (Variable 2)

x_bar2 = samples mean = 42.24

µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known

σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was taken = 11.2

Var2 = population2 variance = σ22 = 125.44

n2 = sample1 size = 36

x_bar1 - x_bar2 = 46.55 – 42.24 = 4.31

Level of Certainty = 0.95

Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

As mentioned, always designate Sample 1 (Variable 1) to be the sample with the larger mean when performing two-sample z-Tests in Excel.

The results of the unpooled z-Test will be more intuitive if the sample group with the larger mean is designated as the first sample and the sample group with the smaller mean is designated as the second sample.

Another reason for designating the sample group with the larger mean as the first sample is to obtain the correct result from the Excel data analysis tool for two-independent-sample, unpooled z-Tests called the z-Test:Two-Sample for Means. The test statistic (z in the Excel output, which stands for z Score) and the Critical z value (z Critical in the Excel output) will have the same sign (as they always should) only if the sample group with the larger mean is designated the first sample.

As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then proceed to the four-step method of solving the hypothesis test that follows.

 

The Initial Two Questions That Must be Answered Satisfactorily

What Type of Test Should Be Done?

Have All of the Required Assumptions For This Test Been Met?

 

The Four-Step Method For Solving All Hypothesis Tests of Mean

Step 1) Create the Null Hypothesis and the Alternative Hypothesis

Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

 

The initial two questions that need to be answered before performing the Four-Step Hypothesis Test of Mean are as follows:

 

Question 1) What Type of Test Should Be Done?

a) Hypothesis Test of Mean or Proportion?

This is a test of mean because each individual observation (each sampled shift’s output) within each of the two sample groups can have a wide range of values. Data points for tests of proportion are binary: they can take only one of two possible values.

 

b) One-Sample or Two-Sample Test?

This is a two-sample hypothesis test because two independent samples are being compared with each other. The two sample groups are the daily units produced by Shift A and the daily units produced by Shift B.

 

c) Independent (Unpaired) Test or Dependent (Paired) Test?

It is an unpaired test because data observations in each sample group are completely unrelated to data observations in the other sample group. The designation of “paired” or “unpaired” applies only for two-sample hypothesis tests.

 

d) One-Tailed or Two-Tailed Test?

The problem asks to determine whether there is a difference in the average number of daily units produced by Shift A and by Shift B. This is a non-directional inequality making this hypothesis test a two-tailed test. If the problem asked to determine whether Shift A really does have a higher average than Shift B, the inequality would be directional and the resulting hypothesis test would be a one-tailed test. A two-tailed test is more stringent than a one-tailed test.

 

e) t-Test or z-Test?

A z-Test is a statistical test in which the distribution of the Test Statistic under the Null Hypothesis can be approximated by the normal distribution.

The Test Statistic is distributed by the normal distribution if both samples are large and both population standard deviations are known. Both samples considered to be large samples because both sample sizes (n1 = 40 and n2 = 36) exceeds 30. Both population standard deviations (σ1 = 25.5 and σ2 = 11.2) are known.

Because both sample sizes (n1 = 40 and n2 = 36) exceeds 30, both sample means are therefore normal-distributed as per the Central Limit Theorem. The difference between two normally-distributed sample means is also normal-distributed. The Test Statistic is derived from the difference between the two means and is therefore normal-distributed. A z-Test can be performed if the Test Statistic is normal-distributed.

It should be noted that a two-independent-sample, unpooled t-Test can always be used in place of a two-independent-sample, unpooled. All z-Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages including the well-known SPSS provide only t-Tests and no direct z-Tests.

 

f) Pooled or Unpooled t-Test?

A pooled z-Test can be performed if the variances of both populations are similar, i.e., one population’s standard deviation is no more than twice as large as the other population’s standard deviation. An unpooled z-Test must be performed otherwise.

An unpooled z-Test can always be performed in the place of a pooled z-Test. Excel only provides a tool and formula for an unpooled z-test but not a pooled z-Test. For this reason the only type of two-independent-sample z-Test covered in this section will be the unpooled one.

t-Tests can always be performed in place of z-Tests. Excel does have separate tools and formulas for pooled and unpooled, two-independent-sample t-Tests.

This hypothesis test is a z-Test that is two-independent-sample, unpooled two-tailed hypothesis test of mean as long as all required assumptions have been met.

 

Question 2) Test Requirements Met?

a) Normal Distribution of Both Sample Means

The normal distribution can be used to map the distribution of the difference of the sample means (and therefore the Test Statistic, which is derived from this difference) only if the following conditions exist:

1) Both Population Standard Deviations, σ1 and σ2, Are Known

Those values are σ1 = 25.5 and σ2 = 11.2. Population standard deviation, σ, is one of the two required parameters needed to fully describe a unique normal distribution curve and must therefore be known in order to perform a z-Test (which uses the normal distribution).

and

2) Both samples sizes are large (n > 30).

Because both sample sizes (n1 = 40 and n2 = 36) exceeds 30, both sample means are therefore normal-distributed as per the Central Limit Theorem. The difference between two normally-distributed sample means is also normal-distributed. The Test Statistic is derived from the difference between the two means and is therefore normal-distributed.

The distributions of both samples and populations do not have to be verified because both sample means are known to be normal-distributed as a result of the large size.

The difference between the sample means and therefore the Test Statistic are normal-distributed because both samples are large and both population standard deviations are known.

 

We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:

Step 1) Create the Null Hypothesis and the Alternative Hypothesis

Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Determine Whether to Accept or Reject the Null Hypothesis By Performing the Critical Value Test, the p Value Test, or the Critical z Value Test

 

Proceeding through the four steps is done is follows:

 

Step 1 – Create the Null and Alternative Hypotheses

The Null Hypothesis is always an equality and states that the items being compared are the same. In this case, the Null Hypothesis would state that the average optimism scores for both sample groups are the same. We will use the variable x_bar1-x_bar2 to represent the difference between the means of the two groups. If the mean scores for both groups are the same, then the difference between the two means, x_bar1-x_bar2, would equal zero. The Null Hypothesis is as follows:

H0: x_bar1-x_bar2 = Constant = 0

The Alternative Hypothesis is always in inequality and states that the two items being compared are different. This hypothesis test is trying to determine whether the first mean (x_bar1) is different than the second mean (x_bar2). The Alternative Hypothesis is as follows:

H1: x_bar1-x_bar2 ≠ Constant

H1: x_bar1-x_bar2 ≠ 0

The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous (requires a greater differences between the two entities being compared before the test shows that there is a difference) than a one-tailed test.

Parameters necessary to map the distributed variable, x_bar1-x_bar2, to the normal distribution are the following:

 

Step 2 – Map the Distributed Variable on a Normal Distribution Curve

H0: x_bar1-x_bar2 = Constant = 0

n1 = 40

n2 = 36

Var1 = σ12 = (25.5)2 = 650.25

Var2 = σ22 = (11.2)2 = 125.44

 

Unpooled Population Standard Error

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test

SE = SQRT[ (Var1/n1) + (Var2/n2) ]

SE = SQRT[ (650.25/40) + (125.44/36) ]

SE = 4.443

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test
(Click On Image To See a Larger Version)

This non-standardized normal distribution curve has its mean set to equal the Constant taken from the Null Hypothesis, which is:

H0: x_bar1-x_bar2 = Constant = 0

This non-standardized normal distribution curve is constructed from the following parameters:

Mean = 0

Standard Error = 4.443

Distributed Variable = x_bar1-x_bar2

 

Step 3 – Map the Regions of Acceptance and Rejection

The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a given level of certainty. If the two things being compared are far enough apart from each other, the Null Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying to show graphically how different x_bar1 is from x_bar2 by showing how different x_bar1-x_bar2 (4.31) is from zero.

The non-standardized t-Distribution curve can be divided up into two types of regions: the Region of Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of Rejection is called a Critical Value.

If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a Region of Rejection, the Null Hypothesis is rejected. If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a Region of Acceptance, the Null Hypothesis is not rejected.

The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.

This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which outer tail. The Alternative Hypothesis is the follows:

H1: x_bar1-x_bar2 ≠ 0

A “not equal” operator indicates that this will be a two-tailed test. This means that the Region of Rejection is split between both outer tails.

The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The locations of these Critical Values need to be calculated.

Calculate the Critical Values

Two-Tailed Critical Values

Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE

Critical Values = Mean ± NORM.S.INV(1-α/2) * SE

Critical Values = 0 ± NORM.S.INV(1 - 0.05/2) * 4.443

Critical Values = 0 ± NORM.S.INV(0.975) * 4.443

Critical Values = 0 ± 8.708

Critical Values = -8.708 and 8.708

The Region of Rejection is therefore everything that is to the right of 8.708 and everything to the left of -8.708.

The following Excel-generated distribution curve with the blue Region of Acceptance and the yellow Regions of Rejection is shown is as follows:

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test (Click On Image To See a Larger Version)

 

Step 4 – Determine Whether to Reject Null Hypothesis

The object of a hypothesis test is to determine whether to accept or reject the Null Hypothesis. There are three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these tests needs to be performed because all three provide equivalent information. The three tests are as follows:

 

1) Compare x_bar1-x_bar2 With Critical Value

Reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 4.31, falls into the Region of Rejection. Fail to reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 4.31, falls into the Region of Acceptance.

Equivalently, reject the Null Hypothesis if the sample mean, x_bar1-x_bar2, is further from the curve’s mean of 0 than the Critical Value. Fail to reject the Null Hypothesis if the sample mean, x_bar1-x_bar2, is closer than the curve’s mean of 0 than the Critical Value.

The Critical Values have been calculated to be +8.708 on the left and -8.708 on the right. x_bar1-x_bar2 (4.31) is closer to the curve mean (0) than the right Critical Value (+8.708). The Null Hypothesis would therefore not be rejected.

 

2) Compare the z Score with the Critical z Value

The z Score is the number of Standard Errors that x_bar1-x_bar2 (4.31) is from the curve’s mean of 0.

The Critical z Value is the number of Standard Errors that the Critical Value is from the curve’s mean.

Reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than the Critical z Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of zero than the Critical z Value.

Equivalently, reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than the Critical z Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of zero than the Critical z Value.

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test
(Click On Image To See a Larger Version)

The Constant is the Constant from the Null Hypothesis (H0: x_bar1-x_bar2 = Constant = 0)

Z Score (Test Statistic) = (4.31 – 0)/4.443

Z Score (Test Statistic) = 0.97

This means that the sample mean, x_bar1-x_bar2 (4.31), is 0.97 standard errors from the curve mean (0).

Two-tailed Critical z Values = ±NORM.S.INV(1-α/2)

Two-tailed Critical z Values = ±NORM.S.INV(1-0.05/2)

Two-tailed = ±NORM.S.INV(0.975) = ±1.9599

This means that the boundaries between the Region of Acceptance and the Region of Rejection are 1.9599 standard errors from the curve mean on each side since this is a two-tailed test.

The Null Hypothesis is not rejected because the z Score (+0.97) is closer to the standardized mean of zero than the Critical z Value on the right side (+1.9599).

 

3) Compare the p Value With Alpha

The p Value is the percent of the curve that is beyond x_bar1-x_bar2 (4.31). If the p Value is smaller than Alpha/2, the Null Hypothesis is rejected. If the p Value is larger than Alpha/2, the Null Hypothesis is not rejected.

p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))

p Value =MIN(NORM.S.DIST(0.97,TRUE),1-NORM.S.DIST(0.97,TRUE))

p Value = 0.1660

The p Value (0.1660) is larger than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore do not reject the Null Hypothesis.

The following Excel-generated graph shows that the red p Value (the curve area beyond x_bar1-x_bar2) is larger than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails.

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test (Click On Image To See a Larger Version)

 

Excel Data Analysis Tool Shortcut

This two-independent-sample, unpooled z-Test can be solved much quicker using the following Excel data analysis tool:

z-Test: Two Sample For Means. This tool uses the formulas for an unpooled, two-sample z-Test as are shown above. This tool can be accesses by clicking Data Analysis under the Data tab. The entire Data Analysis Toolpak is an add-in that ships with Excel but must first be activated by the user before it is available. This tool calculates the z Score and p Value using the same equations as shown.

Note that this tool requires that all data in each sample group be placed in a single column. In the following image, only the first 19 data points of each sample are showing.

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test (Click On Image To See a Larger Version)

The completed dialogue box for this tool which produced the preceding output is as follows:

z-test, z test,normal distribution,excel,excel 20101 excel 2013,statistics,hypothesis test (Click On Image To See a Larger Version)

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

2 comments:

  1. This hypothesis test is often employed in business or economics to determine the effect of company executive compensation on firm's performance:
    In simple terms, this difference (x_bar1-x_bar2) can be interpreted as an expected value ($|X_{24}−X_{25}|). The null hypothesis states that "the average change across all firms was zero" and thus should not appear in any variable measure.

    ReplyDelete
  2. You're one of the great resources of Excel. Continue providing wonderful knowledge. Hoarder Cleanout Peoria, IL

    ReplyDelete