# Chi-Square Independence

Test in 7 Steps in Excel

The Chi-Square Independence Test is used to determine whether two categorical variables associated with the same item act independently on that item. The example presented in this section analyzes whether the gender of the purchaser of a car is independent of the color of the car. This Chi-Square Independence Test answers the question of whether gender plays a role in the color selection of a purchased car

Each item (each purchased car) has two attributes associated with it. These two attributes are the categorical variables of purchaser’s gender and color. The counts of the number of cars purchased for each unique combination of gender and color are placed in a matrix called a contingency table.

## Contingency Table

A contingency table is a two-way cross-tabulation. Each row in the contingency table is associated with one of the levels of one of the categorical attributes (such as gender) and each column is associated with one of the levels of the other categorical attribute (such as color).

The number of rows in the contingency table, r, is equal to the number of levels of the row attribute. The number of columns in the contingency table, c, is equal to the number of levels of the column attribute. The contingency table is therefore an r x c table and has r x c cells representing r x c unique combinations of levels of row and column attributes.

## Test Compares Actual vs.

Expected Bin Counts

The Chi-Square Independence Test compares whether counts of the actual data for each unique combination of factors of the two variables are significantly different than the counts that would be expected if the attributes were totally independent of each other.

## Null Hypothesis

A Null Hypothesis is created which states there is no significant difference between the actual and expected counts of data for the unique combinations of levels of the two factors.

## Test Statistic

The Chi-Square Independence Test calculates a Test Statistic called a Chi-Square Statistic, Χ^{2}. The distribution of this Test Statistic can be approximated by the Chi-Square distribution if several conditions are met.

### When to Reject Null Hypothesis

The Null Hypothesis is rejected if that Chi-Square Statistic is larger than a Critical Chi-Square Value based upon the specified alpha level and degrees of freedom associated with that test. Equivalently, the Null Hypothesis is rejected if the p value derived from the test is smaller than the specified alpha level.

## Required Assumptions

The distribution of this Test Statistic, Χ^{2}, can be approximated by the Chi-Square distribution with degrees of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:

1) The number of cells in the contingency table (r x c) is at least 5. A 2 x 2 contingency table is not large enough. One of the two attributes must have at least 3 levels.

2) The average value of all of the expected counts is at least 5.

3) All of the expected counts equal at least 1.

## Example of Chi-Square

Independence Test in Excel

We will examine whether gender and product color selection are independent of each other. A car company in the United States sold new 12,000 cars of one brand in one month. The car company recorded the gender of each customer and also the color of the car. The car was available in only three colors: red, blue, and green. The actual counts of cars purchased in that months for each unique combination of gender/color are shown as follows:

Determine with 95-percent certainty the car purchaser’s gender and the selected color of the car are independent of each other.

### Step 1 – Place Actual Counts In Contingency Table

The actual counts of the number of items having each unique combination of row and column attribute level are placed into the proper cell in the r x c contingency table. In this case the counts of the number of cars associated with each unique combination of gender/color are placed into the correct cells of the 2 x 3 contingency table as follows:

*(Click Image To See Larger Version)*

### Creating the Contingency Table From an Excel Pivot Table

The contingency table can be created with Excel’s Pivot Table tool if the data are initially presented in the following fashion as they often are:

*(Click Image To See Larger Version)*

The Pivot Table is accessed from within the Insert tab.

**Insert / Pivot Table / Pivot Table** bring up the initial Pivot Table dialogue box. The table range and output location should be filled in as follows:

*(Click Image To See Larger Version)*

Hitting OK brings up the following final Pivot Table dialogue box:

*(Click Image To See Larger Version)*

Dragging the label Color down to the Column Labels box and to the Σ Values box and then dragging the label Gender down to the Row Labels box produces the completed Pivot Table as follows. This Pivot Table is an exact match of the contingency table containing the actual values for this data set.

*(Click Image To See Larger Version)*

Note that the Excel Pivot Table would be an exact match for the contingency table with the actual counts that is shown again here.

*(Click Image To See Larger Version)*

### Step 2 – Place Expected Counts In Contingency Table

The expected counts for each unique combination of levels of row/column attributes are placed into the correct cells of an identical contingency table as follows:

*(Click Image To See Larger Version)*

The expected counts are based upon the assumption that the row and column attributed act independently of each other. The method of calculated the expected numbers based upon this assumption is shown below:

*(Click Image To See Larger Version)*

### Step 3 – Create Null and Alternative Hypotheses

The Null Hypothesis states that there is no difference between the expected and actual counts of items for each unique combination of levels of row and column attributes. The Test Statistic, Χ^{2}, would equal 0 in this case. The Null Hypothesis is therefore specified as follows:

H_{0}: Χ^{2} = 0

The Chi-Square Statistic, Χ^{2}, is distributed according to the Chi-Square distribution if the required assumptions for this tests that are specified in this blog article are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined for positive values of x.

Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified as follows:

H_{1}: Χ^{2} > 0

### Step 4 – Verify Required Assumptions

The distribution of this Test Statistic, Χ2, can be approximated by the Chi-Square distribution with degrees of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:

1) The number of cells in the contingency table (r x c) is at least 5. The contingency table is a 2 x 3 table so this condition is met.

2) The average value of all of the expected counts is at least 5. This condition is met.

3) All of the expected counts equal at least 1. This condition is met.

### Step 5 – Calculate Chi-Square Statistic, Χ^{2}

The Test Statistic, which is the Chi-Square Statistic, Χ^{2}, is calculated for n = r x c unique cells in the contingency table as follows:

*(Click Image To See Larger Version)*

This can be quickly implemented in a convenient table as follows: *(Click Image To See Larger Version)*

### Step 6 – Calculate Critical Chi-Square Value and p Value

The degrees of freedom for the Chi-Square Independence Test is calculated as follows:

r = number of rows = 2

c = number of columns = 3

df = (r – 1)(c – 1) = (2 – 1)(3 – 1) = 2

**The Critical Chi-Square Value is calculated as follows:**

Chi-Square Critical = CHISQ.INV.RT(α,df)

Chi-Square Critical = CHISQ.INV.RT(0.05,2) = 5.99

Prior to Excel 2010, the formula is calculated as follows:

Chi-Square Critical = CHIINV(α,df)

**The p Value is calculated as follows:**

p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)

p Value = CHISQ.DIST.RT(6.17,2) = 0.0457

Prior to Excel 2010, the formula is calculated as follows:

p Value = CHIDIST(Chi-Square Statistic,df)

### Step 7 – Determine Whether To Reject Null Hypothesis

The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:

1) Chi-Square Statistic > Critical Chi-Square Value

2) p Value < α

Both of these conditions exist as follows.

Chi-Square Statistic = 6.17

Critical Chi-Square value = 5.99

p Value = 0.0457

α = 0.05

In this case we reject the Null Hypothesis because the Chi-Square Statistic (6.17) is larger than the Critical Value (5.99) or, equivalently, the p Value (0.0457) is smaller than Alpha (0.05). A graphical representation of this problem is shown as follows in this Excel-generated graph:

*(Click Image To See Larger Version)*

**Excel Master Series Blog Directory**

Statistical Topics and Articles In Each Topic

- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way

- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013
- One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013
- Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013

- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- 1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

- 2-Independent-Sample Pooled t-Tests in Excel
- 2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2- Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups

- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility

- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative

- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer

- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013

- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013

- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel

- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression in 6 Steps in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013

- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in 5 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication

- Two-Factor ANOVA Without Replication in Excel
- Randomized Block Design ANOVA in Excel
- Repeated-Measures ANOVA in Excel
- Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel 2010 and Excel 2013
- Sphericity Testing in 9 Steps For Repeated Measures ANOVA in Excel 2010 and Excel 2013
- Effect Size For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- Friedman Test in 3 Steps For Repeated-Measures ANOVA in Excel 2010 and Excel 2013

- ANCOVA in Excel
- Normality Testing in Excel
- Creating a Box Plot in 8 Steps in Excel
- Creating a Normal Probability Plot With Adjustable Confidence Interval Bands in 9 Steps in Excel With Formulas and a Bar Chart
- Chi-Square Goodness-of-Fit Test For Normality in 9 Steps in Excel
- Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk Normality Tests in Excel

- Nonparametric Testing in Excel
- Mann-Whitney U Test in 12 Steps in Excel
- Wilcoxon Signed-Rank Test in 8 Steps in Excel
- Sign Test in Excel
- Friedman Test in 3 Steps in Excel
- Scheirer-Ray-Hope Test in Excel
- Welch's ANOVA Test in 8 Steps Test in Excel
- Brown-Forsythe F Test in 4 Steps Test in Excel
- Levene's Test and Brown-Forsythe Variance Tests in Excel
- Chi-Square Independence Test in 7 Steps in Excel
- Chi-Square Goodness-of-Fit Tests in Excel
- Chi-Square Population Variance Test in Excel

- Post Hoc Testing in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013

- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013

- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers

- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
- VLOOKUP

## No comments:

## Post a Comment