Friday, May 30, 2014

2-Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups

This is one of the following eight articles on 2-Independent-Sample Pooled t-Tests in Excel

2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013

Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013

Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test

Two-Independent-Sample Pooled t-Test - All Excel Calculations

2-Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013

2-Sample Pooled t-Test Power With G*Power Utility

Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013

2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups

 

How Sample Standard

Deviation Affects t-Test

Results

When the standard deviation in sample groups is increased, the sample groups harder to tell apart. This might be more intuitive to understand if presented visually.

Below are box plots of three sample groups each having a small sample standard deviation:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Each of the sample groups is visually easy to differentiate from the others. The measures of spread - standard deviation and variance - are shown for each sample group. Remember that variance equals standard deviation squared.

If each sample group’s spread is increased (widened), the sample groups become much harder to differentiate from each other. The graph shown below is of three sample groups having the same means as above but much wider spread.

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

It is easy to differentiate the sample groups in the top graph but much less easy to differentiate the sample groups in the bottom graph simply because the sample groups in the bottom graph have much wider spread.

In statistical terms, one could say that it is easy to tell that the samples in the top graph were drawn from different populations. It is much more difficult to say whether the sample groups in the bottom graph were drawn from different populations.

 

Relationship Between the Two-

Independent-Sample, Pooled t-Test

and Single-Factor ANOVA

The preceding illustrates the underlying principle behind both t-tests and ANOVA tests. One of the main purposes of both t-tests and ANOVA tests is to determine whether samples are from the same populations or from different populations. The variance (or equivalently, the standard deviation) of the sample groups is what is what determines how difficult it is to tell the sample groups apart.

The two-independent-sample, pooled t-test is essentially the same test as single-factor ANOVA. The two-independent-sample, pooled t-test can only be applied to two sample groups at one time. Single-Factor ANOVA can be applied to three or more groups at one time. Both two-independent-sample, pooled t-test and single-factor ANOVA require that variances of sample groups be similar.

We will apply both the two-independent sample t-test and single-factor ANOVA to the first two samples in each of the above graphs to verify that the results are equivalent.

 

Sample Groups With Small Variances (the first graph)

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Applying a two-independent-sample, pooled t-test to the first two of the three sample groups of this graph would produce the following result:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Running Single-Factor ANOVA on those same two sample groups would produce this result:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Both the Two-Independent-Sample, Pooled t-test and the Single-Factor ANOVA test produce the same result when applied to these two sample groups. They both produce the same p Value (1.51E-10) which is extremely small. This indicates that the result is statistically significant and that the difference in the means of the two groups is real. More correctly put, it can be stated that there is a very small chance (1.51E-10) that the samples came from the same population and that the result obtained (that their means are different) was merely a random occurrence.

 

Sample Groups With Large Variances (the second graph)

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Applying a two-independent sample t-test to the first two of the three sample groups in this graph would produce the following result:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Running Single-Factor ANOVA on those same two sample groups would produce this result:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

This result would have been obtained by filling in the Excel dialogue box as follows:

t test, t-test, anova, one-way anova, single-factor anova, variance (Click On Image To See a Larger Version)

Both the t-test and the ANOVA test produce the same result when applied to these two sample groups. They both produce the same p Value (0.230876). This is relatively large. 95 percent is the standard level of confidence usually required in statistical hypothesis tests to conclude that the results are statistically significant (real). The p value needs to be less than 0.05 to achieve a 95 percent confidence level that a difference really exists. The sample groups with the large spread produced a p Value greater than 0.05 and we can therefore not reject the Null Hypothesis which states that the sample groups are the same. The results are not statistically significant and we cannot conclude that the two samples were not drawn from the same population.

 

Showing How the Formulas For

Both the t-Test and for ANOVA

Produce the Same Result

 

t-Test Formula

The Two-Independent-Sample, Pooled t-Test is used to determine with a specific degree of certainty whether there really is a difference between the mean values of two sample groups given a similar amount of variance in each of the two sample groups.

If the sample standard deviation in each of the two sample groups, s1 and s2, is large, then the Pooled Standard Deviation will also be large, as can be seen from the following equation:

 

Pooled Sample Standard Deviation

sPooled = SQRT[{(n1-1)s12 +(n2-1)s22}/df]

This, in turn, increases the value of length on one Standard Error, SEPooled, as can be seen in the following equation:

 

Pooled Sample Standard Error

SEPooled =  sPooled *SQRT(1/n1 + 1/n2)

This, in turn, decreases the t Value, as can be seen in the following equation:

t Value = (x_bar1-x_bar2) / SEPooled

The larger the t Value, the more likely it is that the sample groups are different, i.e., came from different populations.

The bottom line is that increased variance (or, equivalently, standard deviation) in the sample groups causes the t Value to be smaller. This makes it less likely that a t-Test will show that the sample groups are really different.

 

ANOVA

The ANOVA outputs of the previous two comparisons demonstrate the following:

The smaller the p Value is, the more certainty exists that sample groups are really different, i.e., that the sample groups came from different populations.

The p Value is derived from the F value. The larger the F Value, the smaller is the p Value.

The F value can be roughly described as being the variation between groups divided by the variation within groups (the spread of the groups).

As the spread (standard deviation) of the sample groups increase, the F value become smaller. When the F Value become smaller, the p Value becomes larger. The larger the p Value becomes, the less certainty exists that the ANOVA results are statistically significant (real). If the results are not statistically significant, we cannot reject the Null Hypothesis that states that the sample different (drawn from different populations).

Bottom line: the larger the standard deviation of sample groups being compared with a two-independent-sample, pooled t-Test or single-factor ANOVA, that harder it is to state that the sample groups are truly different, i.e., that the sample groups come from different populations.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment