ANOVA Using Randomized
Block Design in Excel
Randomized Block Design ANOVA Overview
Randomized block design is a method used to perform single-factor ANOVA while partially removing the effects of another variable, sometimes called a confounding variable. A confounding variable is an additional variable that might be affecting all of the data values in unpredictable ways and possibly obscuring the intended result of the single-factor ANOVA test, which is to determine whether the original factor has a significant effect on data values.
Randomized block design is equivalent to two-factor ANOVA without replication. In Excel, randomized block design is implemented with the following Data Analysis tool: Two-Factor ANOVA Without Replication.
Data most suitable for analysis with randomized block design have much of the overall variance explained by two relatively unrelated factors. The data are placed into unique cells each having a unique combination of levels of the two factors. Randomized block design requires that each unique cell contains only one data point so that each unique combination of levels of the two factor is represented by a single data point.
The levels of the confounding factor are referred to as blocks. Each block contains a single random data point from each the levels of the main factor being isolated and tested. Hence the name randomized block design. Following is an example of data correctly arranged for ANOVA testing using randomized block design. The main factor has its levels divided across columns with each level being referred to in this case as a treatment. The confounding factor has its levels divided among rows with each level being referred to as a block. Each unique combination of treatment/block is represented by a single data point as follows:
The main purpose of performing ANOVA testing using randomized block design is to isolate the effect of the main factor (whose levels are called treatments) from the effect of the confounding factor (whose levels are called blocks) so that the effect of the main factor can be analyzed with greater clarity.
One simple example of randomized block design might be to test the effects of four different types of fertilizer on crop yields. Each of the treatments of the main factor represents the use of one of the unique type of the four available fertilizers. Each measurement taken represents the total amount of the same crop harvested from the same-sized plot of farm land treated with the respective type of fertilizer.
This same experiment was performed on four different farms. Each farm has its own unique set of conditions that affect crop growth such as different rainfall and soil nutrients. The confounding factor is the farm upon which each fertilizer experiment took place. Each farm is described as being its own block or level of the confounding factor. The fertilizer experiment was replicated exactly on each of the four different farm but individual differences inherent to each farm have an unpredictable effect on crop growth and are considered to be a confounding factor. ANOVA using randomized block design attempts to separate the effects of the confounding factor from the effects of the main factor so that the main factor can be analyzed with greater clarity.
ANOVA using randomized block design has the same overall purpose as single-factor ANOVA, i.e., determining whether the main factor has a significant effect on data values. A significant effect is defined as the occurrence of at least one of the sample groups of the main factor (one of the treatment groups) having a significantly different mean than any of the other treatment groups.
ANOVA using randomized block design has more power than single-factor ANOVA when applied to a data set. Test power in ANOVA is the ability to detect a significant difference between sample group means if such a difference exists. The greater power of randomized block design is evidenced by the lower p Value that this test calculates than single-factor ANOVA does when applied to the same data set.
ANOVA tests are hypothesis tests. The Null Hypotheses of all ANOVA tests state all sample groups analyzed with the same F test came from the same population. The lower the p Value calculated by a hypothesis test, the greater is the probability that at least one of the sample groups came from a different population than the other samples groups that were analyzed in the same F test. ANOVA using randomized block design will calculate a lower p Value than single-factor ANOVA when analyzing the significance of the effect of the main factor.
The reason is that ANOVA using randomized block design attempts to separate the effects of the confounding factor from the main factor while single-factor ANOVA does not. ANOVA using randomized block design is still not the most desirable (powerful) test to determine when a factor significantly affect data values when a confounding factor also exists. The small sample group sizes that are mandated by the structure of randomized block design generally cause this test to have relatively low power. This is discussed as follows.
ANOVA With Randomized Block
Design Usually Has Very Low Power
Randomized block design specifies that only one data value be collected for each unique combination of main-factor level/confounding-factor level. This nearly always ensures that the test will be of extremely low power because of the small total number of data points collected for the entire test. The power of a test is its ability to detect a significant difference if one exists. A test with low power has a high probability of making a type 2 error, i.e., a false negative – failing to detect a difference that actually exists).
Test power is nearly always increased when more random, representative sample data are collected and added to the correct sample groups. If the overall intention of the randomized block design test is isolate the effects of the main factor from the effects of the confounding variable, a more effective (powerful) alternative is simply to collect more data and analyze the larger sample groups with two-factor ANOVA with replication using the following one-step Excel Data Analysis tool: Two-Way ANOVA With Replication. Implementation of this technique will be shown in detail later in this blog article.
Statistical tests become more and more powerful as the total number of data points within sample groups increases. This is accomplished by replicating levels of the confounding variable. Each additional row of data consists of data
Null and Alternative Hypotheses for
ANOVA With Randomized Block
The Null Hypothesis for ANOVA with randomized block design is exactly like that of single-factor ANOVA and states that the sample groups of the main factor ALL come from the same population. An equivalent of that would be to state that the respective populations from the sample groups were drawn all have the same population mean. This would be written as follows:
Null Hypothesis = H0: µ1 = µ2 = … = µk (k equals the number of sample groups)
Note that Null Hypothesis is not referring to the sample means, x_bar1 , x_bar2 , … , x_bark, but to the population means, µ1 , µ2 , … , µk.
The Alternative Hypothesis for ANOVA with randomized block design states that at least one sample group is likely to have come from a different population. Like single-Factor ANOVA, ANOVA with randomized block design is an omnibus test meaning that the test does not clarify which groups are different or how large any of the differences between the groups are. This Alternative Hypothesis only states whether at least one sample group is likely to have come from a different population. An equivalent of that would be to state that the population that at least one sample group was drawn from has a different population mean that the populations that the other sample groups were drawn from. This would be written as follows:
Alternative Hypothesis = H0: µi ≠ µj for some i and j
ANOVA With Randomized Block
Design Example in ExcelP>The article will continue using the example of comparing the effects of four types of fertilizer (the four treatments, which are the levels of the main factor) on plots of farm land that exist on four different farms (the four blocks, which are the levels of the confounding factor).
Each farm has four identical plots of land. Each of the four plots will be fertilized with one of the four types of fertilizers. Growing conditions are identical in all plots within the same farm. Each farm has its own unique set of conditions that affect crop growth such as different rainfall and soil nutrients. The confounding factor is the farm upon which each fertilizer experiment took place. Each farm is described as being its own block or level of the confounding factor. The fertilizer experiment was replicated exactly on each of the four different farm but individual differences inherent to each farm have an unpredictable effect on crop growth and are considered to be a confounding factor.
The same type of crop will be planted in all 16 plots (each of the four farms has four plots). The weight of the total crop harvested from each of the 16 plots is recorded as follows:
Converting the above into a generic randomized block design would be done as follows:
It is best to arrange the raw data as shown above with blocks on separate rows and treatments on separate columns. The reason is that ANOVA using randomized block design usually has very low power due to the small amount of sample data. Test power is defined as the ability to detect a difference from sample data when a difference actually exists between the populations from which the samples were drawn.
This test can be made more powerful by collecting more data. Each block of data would have to be replicated the sample number of times. For example if the entire experiment were replicated three times on each farm, the data would appear similar to the following:
This data matrix would now be analyzed with the Excel Data Analysis tool Two-Factor ANOVA With Replication. This would no longer be randomized block design because randomized block design specifies that only one data point for each unique combination of block and treatment, which is shown once again in the original raw data matrix.
ANOVA using randomized block design can be performed in Excel in as follows:
Required Assumptions For ANOVA Using Randomized Block Design
ANOVA Using Randomized Block Design Has Assumptions That Are Very Similar To Single-Factor ANOVA
ANOVA using randomized block design has the following seven required assumptions whose validity should be confirmed before this test is applied. The seven required assumptions are the following:
1) Independence of Sample Group Data Sample groups must be differentiated in such a way that there can be no cross-over of data between sample groups. No data observation in any sample group could have been legitimately placed in another sample group. No data observation affects the value of another data observation in the same group or in a different group. This is verified by an examination of the test procedure. In this case each of the 16 data points (the crop yields from the 16 plots of farm land) are independent because no crop yield affects the size of any other crop yield.
2) Sample Data Are Continuous Sample group data (the dependent variable’s measured value) can be ratio or interval data, which are the two major types of continuous data. Sample group data cannot be nominal or ordinal data, which are the two major types of categorical data. The measurement taken is the weight of the yield of each of the plots. Weight is a continuous variable that is classified as ratio because the zero point indicates an absence of the variable and not an arbitrary point on a scale.
3) Both Independent Variables (Factors) Are Categorical The main factor and the confounding factor are both categorical variables. The levels of the main factor are often called treatments and the levels of the confounding factor are called blocks. The main factor is the type of fertilizer being used. The confounding factor is the choice of farm.
4) Extreme Outliers Removed If Necessary ANOVA is a parametric test that relies upon calculation of the means of sample groups. Extreme outliers can skew the calculation of the mean. Outliers should be identified and evaluated for removal in all sample groups. Occasional outliers are to be expected in normally-distributed data but all outliers should be evaluated to determine whether their inclusion will produce a less representative result of the overall data than their exclusion. There were no values of crop yields that could be considered extremely outliers.
5) Normally-Distributed Data In All Sample Groups ANOVA is a parametric test having the required assumption the dependent-variable data from each sample group come from a normally-distributed population. Each sample group’s dependent-variable data should be tested for normality. Normality testing becomes significantly less powerful (accurate) when a group’s size fall below 20. An effort should be made to obtain group sizes that exceed 20 to ensure that normality tests will provide accurate results. Like single-factor ANOVA, ANOVA using randomized block design is relatively robust to minor deviation from sample group normality. Verifying normality of treatment groups (sample groups that represent levels of the main variables) is much more important than verifying normality of blocks (sample groups that represent levels of the confounding variable) because the objective of this ANOVA is to determine whether the main factor has a significant effect on data values. The only important F test is the F test that will be performed on the treatment groups (the sample groups that represent levels of the main factor). The sample groups that are part of this F test should be normally distributed.
Normality testing cannot reliably performed here because the main factor sample groups (treatment groups) have only four data points. That is too few.
6) Relatively Similar Variances In Sample Groups of the Same F Test ANCOA requires that sample groups that are analyzed in the same F test have similar variances. Similar sample variances indicate that the populations from which the samples were taken have similar population variances. The sample variances do not have to be exactly equal but do have to be similar enough so the variance testing of the sample groups will not detect significant differences. The variances of sample groups are considered similar if no sample group variance is more than twice as large as the variance of another sample group. Variance testing becomes significantly less powerful (accurate) when a group’s size fall below 20. An effort should be made to obtain group sizes that exceed 20 to ensure that variance tests will provide accurate results. Verifying similarity of variances of treatment groups (sample groups that represent levels of the main variables) is much more important than verifying similarity of variances of blocks (sample groups that represent levels of the confounding variable) because the objective of this ANOVA is to determine whether the main factor has a significant effect on data values. The only important F test is the F test that will be performed on the treatment groups (the sample groups that represent levels of the main factor). The sample groups that are part of this F test should have similar sample variances.
Variance comparison cannot reliably performed here because the main factor sample groups (treatment groups) have only four data points. That is too few.
7) Only One Data Point Is Collected For Each Unique Combination of Block and Treatment
This structure is what defines the randomized block design. Because there is no additional data collection for any levels of any factors, this test is performed with two-factor ANOVA without replication (which is the equivalent of ANOVA with randomized block design). If additional data were collected, e.g., two data points collected for each unique combination of treatment/block, then two-factor ANOVA with replication would be used instead of randomized block design.
All of the required assumptions of ANOVA using randomized block design have been validated except for sample group normality and sample group homogeneity of variance for the main factor, this test can now be performed in Excel as follows:
Step 1) Run Two-Way ANOVA Without Replication on the Raw Data Matrix
Two-Factor ANOVA Without Replication is one of the Excel Data Analysis tools.
The dialogue box for the Data Analysis tool Two-Factor ANOVA Without Replication is completed as follows:
The following output is produced:
In this case the ANOVA test using randomized block design did not detect a difference for an alpha of 0.05 as evidenced by the p value of 0.109581 for the Column factor (treatments – types of fertilizer).
Although ANOVA using randomized block design is a relatively weak test (as will soon be demonstrated), it still has more power than single-factor ANOVA when applied to the same raw data set. Applying the Excel Data Analysis tool Single-Factor ANOVA, here is the completed dialogue box:
This produces the resulting output:
Single-factor ANOVA calculated a significantly larger p Value of 0.312711 for the same raw data set. Even so, ANOVA using randomized block design has relatively low power because of the small number of total data points. Test power can be quickly obtained using the well-known online utility G*Power as follows:
Power Analysis of Two-Factor ANOVA Without Replication
The accuracy of a statistical test is very dependent upon the sample size. The larger the sample size, the more reliable will be the test’s results. The accuracy of a statistical test is specified as the Power of the test. A statistical test’s Power is the probability that the test will detect an effect of a given size at a given level of significance (alpha). The relationships are as follows:
α (“alpha”) = Level of Significance = 1 – Level of Confidence
α = probability of a type 1 error (a false positive)
α = probability of detecting an effect where there is none
Β (“beta”) = probability of a type 2 error (a false negative)
Β = probability of not detecting a real effect
1 - Β = probability of detecting a real effect
Power = 1 - Β
Power needs to be clarified further. Power is the probability of detecting a real effect of a given size at a given Level of Significance (alpha) at a given total sample size and number of groups.
The term Power can be described as the accuracy of a statistical test. The Power of a statistical test is related with alpha, sample size, and effect size in the following ways:
1) The larger the sample size, the larger is a test’s Power because a larger sample size increases a statistical test’s accuracy.
2) The larger alpha is, the larger is a test’s Power because a larger alpha reduces the amount of confidence needed to validate a statistical test’s result. Alpha = 1 – Level of Confidence. The lower the Level of Confidence needed, the more likely a statistical test will detect an effect.
3) The larger the specified effect size, the larger is a test’s Power because a larger effect size is more likely to be detected by a statistical test.
If any three of the four related factors (Power, alpha, sample size, and effect size) are known, the fourth factor can be calculated. These calculations can be very tedious. Fortunately there are a number of free utilities available online that can calculate a test’s Power or the sample size needed to achieve a specified Power. One very convenient and easy-to-use downloadable Power calculator called G-Power is available at the following link at the time of this writing:
Power calculations are generally used in two ways:
1) A priori - Calculation of the minimum sample size needed to achieve a specified Power to detect an effect of a given size at a given alpha. This is the most common use of Power analysis and is normally conducted a priori (before the test is conducted) when designing the test. A Power level of 80 percent for a given alpha and effect size is a common target. Sample size is increased until the desired Power level can be achieved. Since Power equals 1 – Β, the resulting Β of the targeted Power level represents the highest acceptable level of a type 2 error (a false negative – failing to detect a real effect). Calculation of the sample size necessary to achieve a specified Power requires three input variables:
a) Power level – This is often set at .8 meaning that the test has an 80 percent to detect an effect of a given size.
b) Effect size - Effect sizes are specified by the variable f. Effect size f is calculated from a different measure of effect size called η2 (eta square). η2 = SSBetween_Groups / SSTotal These two terms are part of the ANOVA calculations found in the Single-factor ANOVA output.
The relationship between effect size f and effect size η2 is as follows:
Jacob Cohen in his landmark 1998 book Statistical Analysis for the Behavior Sciences proposed that effect sizes could be generalized as follows:
η2 = 0.01 for a small effect. A small effect is one that not easily observable.
η2 = 0.05 for a medium effect. A medium effect is more easily detected than a small effect but less easily detected than a large effect.
η2 = 0.14 for a small effect. A large effect is one that is readily detected with the current measuring equipment.
The above values of η2 produce the following values of effect size f:
f = 0.1 for a small effect
f = 0.25 for a medium effect
f = 0.4 for a large effect
c) Alpha – This is commonly set at 0.05.
Performing a priori Power Analysis for the Main Effect of Factor 1
The G*Power utility will be used in an a priori manner to demonstrate how incredibly low the Power of two-factor ANOVA without replication is. The example used in this chapter will be analyzed. The data set and the Excel output of this example are shown as follows:
Two-Factor ANOVA without replication has two factors. There is no factor to account for the effect of interaction between these two factors. Each factor has its own unique Power that must be calculated. The Power for each factor is the probability that the ANOVA test will detect an effect of a given size caused by that factor. A separate Power calculation can be calculated for each of the two factors in this example.
Power analysis performed a priori calculates how large the total sample size must be to achieve a specific Power level to detect an effect of a specified size at a given alpha level. A priori Power analysis of the main effect of factor 1 of this example is done as follows:
The following parameters must be entered into the G*Power for a priori analysis for the general ANOVA dialogue box on the left side of the dialogue box shown below:
Power (1 – Β): 0.8 – This is commonly used Power target. A test that achieves a Power level of 0.8 has an 80 percent chance of detecting the specified effect.
Effect size: 0.55 – This is a very large effect. This analysis will calculate the sample size needed to achieve an 80 percent probability of detecting an effect of this size. The effect size was set at such a large number in order to keep the required number of data points to a manageable size for further analysis, which will be done as part of this article. The larger the effect to be detected by the test, the smaller is the total number of samples needed by the test.
α (alpha): 0.05
Numerator df: 3 – The degrees of freedom specified for a test of a main effect of a factor equals the number of factor levels – 1. Factor 1 (Fertilizer Type) has 4 levels or treatments. This numerator df therefore equals 4 – 1 = 3. Note that this is the same df that is specified in the Excel ANOVA output for factor 1.
Number of groups: 16 – The number of groups equals (number of levels in factor 1) x (number f levels in factor 2). This equals 4 x 6 = 16. The number of groups is equal to the total number of unique treatment cells. Each unique treatment cell exists for each unique combination of levels between the factors.
The calculated output, which appears on the right side of the dialogue box after it is run, specifies that 42 data points would be required to achieve a power level of 0.8007730 to detect an effect of size 0.55 for an alpha = 0.05 if the main factor has 4 levels (treatments) and the 2nd factor (the confounding factor) has 4 levels (blocks).
G*Power also produces the following graph which indicates that the current test with 16 data points has a test power of less than 0.20. This means that the probability of this test indicating a difference in the populations from which the samples were taken if a significant difference actually exists is less than 20 percent.
The solution to increase test power is clearly to increase the amount of data being tested. G*Power calculations and the previous G*Power graph indicate that at least 43 data points must be analyzed to bring this test up to a power level of 80 percent.
This would be accomplished by simply collecting more data. Collecting more data would involve replicating all blocks an equal number of times. Each unique combination of treatment-block on the raw data matrix is called a treatment cell. All treatment cells for two-factor ANOVA must have the same number of data points. This is known has having balanced data.
The initial test analyzed 16 data points. G*Power indicates that at least total 43 data points must be analyzed to raise the test power to at least 80 percent, which is generally considered to be an acceptable power level.
Analyzing the Data With Two-
Factor ANOVA With Replication in
If each block were replicated three times, the total number of data points would be 48 (16 * 3 = 48). Practical implementation of that would require that each farm conduct the experiment three times and record each result. The raw data matrix from such an effort would look similar to the following:
Note that each treatment cell has three data points. The data is therefore balanced as required by two-factor ANOVA. Single-factor ANOVA does not require that sample groups be the same size.
Data arranged in this fashion can be evaluated using two-factor ANOVA with replication. There is an Excel Data Analysis tool exactly for this. Its dialogue box can be brought up and completed as follows:
This would produce the following output:
This test displays more power as evidenced by the lower p Value of 0.061243 for the column factor (the choice of fertilizer treatments). Larger amounts of appropriate data nearly always increase the power of a statistical test.
An additional benefit of more data is the ability to verify the two required assumptions that remained unverified for the two-factor ANOVA without replication: normality and variance similarity in all groups of the main factor (the treatment groups).
Normality Testing of Treatment Groups
A histogram for each treatment group can be quickly constructed using the Excel Data Analysis histogram tool. The results are shown in the following diagrams. Normality is not perfect but the histograms don’t deviate from normality too significantly given that the modal values (most frequently occurring) are generally near the center of each histogram.
Variance Comparison of Treatment Groups
Sample groups display homoscedasticity (similarity of variances) if the variance of any sample group is not more than 4 times that variance of any other sample group. The following ANOVA output, which was previously calculated, shows that the variances of each of the treatment sample groups are similar to each other.
It can now be stated that all of the required assumptions for two-factor ANOVA have been verified.
ANOVA using randomized block design (two-factor ANOVA without replication) nearly always tests too little data to be considered reliable. The small group sizes that occur with two-way ANOVA without replication reduce the test’s Power to an unacceptable level. Small group size also prevents validation of ANOVA’s required assumptions of data normality within groups and similar variances of all groups within each factor.
The correct solution is to collect more data and perform two-factor ANOVA with replication.
Excel Master Series Blog Directory
You Will Become an Excel Statistical Master!