Monday, June 2, 2014

Chi-Square Goodness-of-Fit Test Overview

This is one of the following three articles on Chi-Square Goodness-Of-Fit Tests in Excel

Overview of the Chi-Square Goodness-of-Fit Test

Chi-Square Goodness- of-Fit Test With Pre-Determined Bins Sizes in 7 Steps in Excel 2010 and Excel 2013

Chi-Square Goodness-Of-Fit-Normality Test in 9 Steps in Excel 2010 and Excel 2013

 

Chi-Square Goodness-of-

Fit Test Overview

Chi-Square Goodness-Of-Fit (GOF) tests are hypothesis tests that determine how closely a sample of data fits a hypothesized distribution. The actual data observations are divided up into groups called bins. The same number of data points is divided up into identical bins in the groupings that would be expected if these data points exactly matched the hypothesized distribution.

The counts of actual data observations in each bin are compared with the expected number of data points that would be in identical bins if the data exactly matched the hypothesized distribution.

 

Test Statistic Χ2

A Test Statistic called the Chi-Square Statistic, Χ2, is calculated based upon the comparison of the counts of actual data points in each bin and the counts of expected data points in each of the bins. The formula for the Chi-Square Statistic is as follows:

chi-square, chi square, goodness-of-fit, goodness of fit, statistics, excel, excel 2010, excel 2013
(Click On Image To See a Larger Version)

n = the total number of bins that containing expected groupings of data points

Actuali = the number of actual observed data points that fall into the ith bin

Expectedi = the number of expected data points in the ith bin if the data exactly matched hypothesized distribution.

 

Required Assumptions

The distribution of the Chi-Square Statistic, Χ2, can be approximated by the Chi-Square distribution if the following 3 conditions are met:

1) n ≥ 5

2) The minimum expected number of data points in any of the bins is at least 1

3) The average number of expected data points in a bin is at least 5

 

Null Hypothesis

The Null Hypothesis of this hypothesis test states that Χ2 = 0. This would mean that actual and expected counts of data points in each bin are the same. This Null Hypothesis is rejected if either of the following two equivalent conditions exist:

1) The Chi-Square Statistic is larger than the Critical Chi-Square Value

2) The p Value is smaller than the specified alpha.

 

Basic Excel Formulas

The formulas for the Critical Chi-Square Value and p Value in Excel are the following:

Critical Chi-Square Value = CHISQ.INV.RT(α, df)

p Value = CHISQ.DIST.RT(Chi-Square Statistic, df)

df = degrees of freedom and is calculated using one of two different formulas depending on which of the two types of GOF tests is being performed.

 

The Two Types of GOF Tests

 

1) Bin Sizes Are Pre-Determined

An example would be to test whether the weekly sales count is uniformly distributed throughout the seven days of the week. The actual sales count for each day would be compared with expected bins each containing one seventh of the total weekly sales count. The sales count for each day would be expected equal one-seventh of the week’s total sales count if sales were uniformly distributed throughout the seven week days. This type of a GOF test often starts with the actual observed data already allocated to bins. This is the case here in that actual sales are grouped at the start into bins each holding the sales of a separate day. This example will be performed shortly within this section.

df = n - 1

n = number of bins of expected data

This type of Chi-Square GOF Test will be performed in Excel in the next blog article.

 

2) Bin Sizes Arbitrarily Set To Match a Distribution

An example would be to perform a Chi-Square Goodness-of-Fit Test for normality on a large single group of data values. This type of a GOF test starts with the actual observed data in a single group and therefore not yet allocated to bins. The expected bins are created by establishing arbitrary CDF endpoints of each bin. The upper and lower CDF endpoints of each expected bin determine the total number of data points that should be placed in each of these expected bins. The actual data values will be grouped in bins whose endpoints match those of the expected bins. Standardizing the actual observed data points is a way of simplifying their bin allocation. The Chi-Square GOF Test for Normality will be performed in this section using this method.

df = n – 1 – m

n = number of bins of expected data

m = number of parameters needed to fully describe the distribution, e.g. m = 2 for the normal distribution, which is fully described by two parameters; the mean and standard deviation.

The Chi-Square Goodness-of-Fit Test For Normality will be performed in detail in Excel in a later article in this blog.

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

1 comment:

  1. \\m = number of parameters needed to fully describe the distribution, e.g. m = 2 for the normal distribution, which is fully described by two parameters; the mean and standard deviation\\ In that case, Poisson distribution has an m-value of 1, weibull 3-parameter distribution has a m-value of 3. So what is the m-value of Exponential and other family of distributions ?

    ReplyDelete