Overview of ANOVA
Single-Factor ANOVA tests whether a significant proportion of the variation present in a data set can be accounted for by a single factor that affects the objects being measured.
Two-Factor ANOVA tests whether a significant proportion of the variation present in a data set can be accounted for by either or both of two factors that simultaneously affect the objects being measured.
Two-Factor ANOVA can also be used to test whether a significant proportion of the variation present in a data set can be accounted for by the interaction between two factors that simultaneously affect the objects being measured.
Two-Factor ANOVA Without
Replication Example in Excel
Excel provides two options for Two-Factor ANOVA. This Excel test can be performed with replication or without replication. The difference is fairly simple. Two-Factor ANOVA without replication contains exactly one data point for each possible combination of levels between the two factors.
Two-Factor ANOVA without replication should not be considered to be a reliable statistical test because the data samples on which this test is based are always too small. This will be discussed shortly.
An example of a data set for two-factor ANOVA without replication is shown as follows:
(Click on Image To See Larger Version)
Factor 1 contains four levels and Factor 2 contains 3 levels. There are 12 possible combinations of levels between Factors 1 and 2. Each of these 12 combinations is a unique treatment cell and contains a single data observation. There are 12 data observations total in this data set.
Two-factor ANOVA with replication contains more than one observation for each combination of factor levels. Two-factor ANOVA will have an equal number of data observations for every combination of factor levels. This arrangement of data for ANOVA testing is referred to as being “balanced.” Each treatment cell (unique combination of factor levels) will contain the same number of data observations. It is possible to conduct unbalanced two-factor ANOVA but that is much more complicated and will not be discussed here.
Performing two-factor ANOVA without replication can be done by selecting the Data Analysis tool entitled Anova:Two-Factor Without Replication and then completing the tool’s dialogue box as follows:
(Click on Image To See Larger Version)
Hitting the OK button will produce the following output:
(Click on Image To See Larger Version)
The output shown here can be interpreted as follows:
The p Value associated with the main effect of Factor 1 (the factor whose levels are arranged in rows) is 0.0734. This is not significant at an alpha of 0.05. By this measure, Factor 1 has not had a significant affect on the data.
The p Value associated with the main effect of Factor 2 (the factor whose levels are arranged in columns) is 0.0417. This is not significant at an alpha of 0.05. By this measure, Factor 1 has not had a significant affect on the data.
There is, however, one major issue that dramatically reduces the validity of this test’s conclusions just shown. Two-Factor ANOVA without replication nearly always tests too little data to be considered reliable. Because each combination of levels contains only a single data observation, the number of observations in each level group is very small and the total number of observations is very small. This affects the validity of the test results in the follow two important ways:
1) Small Sample Size Makes ANOVA’s Required Assumptions Unverifiable.
ANOVA’s required assumptions that data come from normally-distributed populations having similar variances cannot be verified. ANOVA’s required assumptions of data normality and homoscedasticity (similarity of variances) are derived from the requirements of the F-tests that are performed in the ANOVA tests. Two-Factor ANOVA performs a separate F-test for each factor that is tested. This can be seen in the Excel ANOVA output shown in this section. Each F-test requires that the data from all data groups used to construct the Sum of Squares be taken from populations that are normally distributed and have similar variances. Group sizes for Two-Factor ANOVA without replication are nearly always smaller than ten. This size is too small to credibly validate ANOVA’s required assumptions of data normality and similar variances within the groups of each F test.
2) Small Sample Size Reduces the Test’s Power to an Unacceptably Low Level.
The small group sizes reduce the ANOVA test’s power to an unacceptable level. A statistical test’s power is its probability of detecting an effect of a specified size. Power is defined as 1 - Β. Beta, Β, represents is a test’s probability of a type 2 error. A type 2 error is a false negative. In other words, Β is a test’s probability of not detecting an effect that should have been detected. 1 – Β (the power of the test) is a test’s probability of detecting an effect that should have been detected. Calculating the power of an ANOVA test is tedious but fortunately there are a number of utilities freely available online that can quickly calculate an ANOVA test’s power. The power of the Two-Factor ANOVA without replication will be discussed in detail as follows:
Power Analysis of Two-Factor
ANOVA Without Replication
The accuracy of a statistical test is very dependent upon the sample size. The larger the sample size, the more reliable will be the test’s results. The accuracy of a statistical test is specified as the Power of the test. A statistical test’s Power is the probability that the test will detect an effect of a given size at a given level of significance (alpha). The relationships are as follows:
α (“alpha”) = Level of Significance = 1 – Level of Confidence
α = probability of a type 1 error (a false positive)
α = probability of detecting an effect where there is none
Β (“beta”) = probability of a type 2 error (a false negative)
Β = probability of not detecting a real effect
1 - Β = probability of detecting a real effect
Power = 1 - Β
Power needs to be clarified further. Power is the probability of detecting a real effect of a given size at a given Level of Significance (alpha) at a given total sample size and number of groups.
The term Power can be described as the accuracy of a statistical test. The Power of a statistical test is related with alpha, sample size, and effect size in the following ways:
1) The larger the sample size, the larger is a test’s Power because a larger sample size increases a statistical test’s accuracy.
2) The larger alpha is, the larger is a test’s Power because a larger alpha reduces the amount of confidence needed to validate a statistical test’s result. Alpha = 1 – Level of Confidence. The lower the Level of Confidence needed, the more likely a statistical test will detect an effect.
3) The larger the specified effect size, the larger is a test’s Power because a larger effect size is more likely to be detected by a statistical test.
If any three of the four related factors (Power, alpha, sample size, and effect size) are known, the fourth factor can be calculated. These calculations can be very tedious. Fortunately there are a number of free utilities available online that can calculate a test’s Power or the sample size needed to achieve a specified Power. One very convenient and easy-to-use downloadable Power calculator called G-Power is available at the following link at the time of this writing:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Power calculations are generally used in two ways:
1) A priori - Calculation of the minimum sample size needed to achieve a specified Power to detect an effect of a given size at a given alpha. This is the most common use of Power analysis and is normally conducted a priori (before the test is conducted) when designing the test. A Power level of 80 percent for a given alpha and effect size is a common target. Sample size is increased until the desired Power level can be achieved. Since Power equals 1 – Β, the resulting Β of the targeted Power level represents the highest acceptable level of a type 2 error (a false negative – failing to detect a real effect). Calculation of the sample size necessary to achieve a specified Power requires three input variables:
a) Power level – This is often set at .8 meaning that the test has an 80 percent to detect an effect of a given size.
b) Effect size - Effect sizes are specified by the variable f. Effect size f is calculated from a different measure of effect size called η2 (eta square). η2 = SSBetween_Groups / SSTotal These two terms are part of the ANOVA calculations found in the Single-factor ANOVA output.
The relationship between effect size f and effect size η2 is as follows:
Jacob Cohen in his landmark 1998 book Statistical Analysis for the Behavior Sciences proposed that effect sizes could be generalized as follows:
η2 = 0.01 for a small effect. A small effect is one that not easily observable.
η2 = 0.05 for a medium effect. A medium effect is more easily detected than a small effect but less easily detected than a large effect.
η2 = 0.14 for a small effect. A large effect is one that is readily detected with the current measuring equipment.
The above values of η2 produce the following values of effect size f:
f = 0.1 for a small effect
f = 0.25 for a medium effect
f = 0.4 for a large effect
c) Alpha – This is commonly set at 0.05.
Performing a priori Power Analysis for the Main Effect of Factor 1
The G*Power utility will be used in an a priori manner to demonstrate how incredibly low the Power of two-factor ANOVA without replication is. The example used in this chapter will be analyzed. The data set and the Excel output of this example are shown as follows:
(Click on Image To See Larger Version)
(Click on Image To See Larger Version)
Two-Factor ANOVA without replication has two factors. There is no factor to account for the effect of interaction between these two factors. Each factor has its own unique Power that must be calculated. The Power for each factor is the probability that the ANOVA test will detect an effect of a given size caused by that factor. A separate Power calculation can be calculated for each of the two factors in this example.
Power analysis performed a priori calculates how large the total sample size must be to achieve a specific Power level to detect an effect of a specified size at a given alpha level. A priori Power analysis of the main effect of factor 1 of this example is done as follows:
The following parameters must be entered into the G*Power for a priori analysis for the general ANOVA dialogue box:
Power (1 – Β): 0.8 – This is commonly used Power target. A test that achieves a Power level of 0.8 has an 80 percent chance of detecting the specified effect.
Effect size: 0.4 – This is a large effect. This analysis will calculate the sample size needed to achieve an 80 percent probability of detecting an effect of this size.
α (alpha): 0.05
Numerator df: 3 – The degrees of freedom specified for a test of a main effect of a factor equals the number of factor levels – 1. Factor 1 has 4 levels. This numerator df therefore equals 4 – 1 = 3. Note that this is the same df that is specified in the Excel ANOVA output for factor 1.
Number of groups: 12 – The number of groups equals (number of levels in factor 1) x (number f levels in factor 2). This equals 4 x 3 = 12. The number of groups is equal to the total number of unique treatment cells. Each unique treatment cell exists for each unique combination of levels between the factors.
Running the G*Power analysis produces the following output:
(Click on Image To See Larger Version)
This indicates that a total sample size of 73 is needed to achieve a Power level of 0.8 to detect the main effect of factor 1 that is large (f = 0.4). The total sample size for this example is 12 because there are 12 total data observations in this ANOVA test.
G*Power also creates an additional plot showing the Power of this test across a range of values for the total sample size. This plot will confirm how low the Power of two-factor ANOVA without replication really is:
(Click on Image To See Larger Version)
This plot shows the Power of this particular test using a total sample size of 12 to be slightly less than 0.1. This means that this two-factor ANOVA test has less than a 10 percent chance of detecting a large main effect caused by factor 1 if the total sample size is 12.
Two-factor ANOVA without replication is a two-factor ANOVA test performed on a data set having only a single data observation in each treatment cell. Performing this same test on a data set with two data observations in each treatment cell (total sample size equals 24) would still attain a Power level of approximately 0.25.
This plot shows that this two-factor ANOVA test would require at least 6 data observations in each treatment cell (total sample size equals 72) to achieve a Power level of 0.8 for a large main effect (f = 0.4) of factor 1 at alpha = 0.05.
Conclusion
Two-Factor ANOVA without replication nearly always tests too little data to be considered reliable.
The small group sizes that occur with two-way ANOVA without replication reduce the test’s Power to an unacceptable level. Small group size also prevents validation of ANOVA’s required assumptions of data normality within groups and similar variances of all groups within each factor. The Excel output of the two-way ANOVA without replication test conducted in this section shows Factor 2 to have a significant effect on the output (p Value = 0.0417) and Factor 1 not having a significant effect (p value = 0.0734) at a significance level of alpha = 0.05. This would clearly not be a valid conclusion given the small group sizes and resulting lack of Power of this ANOVA test.
(Click on Image To See Larger Version)
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013
- One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013
- Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- 1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- 2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2- Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression in 6 Steps in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in 5 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Randomized Block Design ANOVA in Excel
- Repeated-Measures ANOVA in Excel
- Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel 2010 and Excel 2013
- Sphericity Testing in 9 Steps For Repeated Measures ANOVA in Excel 2010 and Excel 2013
- Effect Size For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- Friedman Test in 3 Steps For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- ANCOVA in Excel
- Normality Testing in Excel
- Creating a Box Plot in 8 Steps in Excel
- Creating a Normal Probability Plot With Adjustable Confidence Interval Bands in 9 Steps in Excel With Formulas and a Bar Chart
- Chi-Square Goodness-of-Fit Test For Normality in 9 Steps in Excel
- Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk Normality Tests in Excel
- Nonparametric Testing in Excel
- Mann-Whitney U Test in 12 Steps in Excel
- Wilcoxon Signed-Rank Test in 8 Steps in Excel
- Sign Test in Excel
- Friedman Test in 3 Steps in Excel
- Scheirer-Ray-Hope Test in Excel
- Welch's ANOVA Test in 8 Steps Test in Excel
- Brown-Forsythe F Test in 4 Steps Test in Excel
- Levene's Test and Brown-Forsythe Variance Tests in Excel
- Chi-Square Independence Test in 7 Steps in Excel
- Chi-Square Goodness-of-Fit Tests in Excel
- Chi-Square Population Variance Test in Excel
- Post Hoc Testing in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
- VLOOKUP
No comments:
Post a Comment