Monday, June 2, 2014

Chi-Square Goodness-Of-Fit Test in 7 Steps With Pre-Determined Bin Sizes in Excel

This is one of the following three articles on Chi-Square Goodness-Of-Fit Tests in Excel

Overview of the Chi-Square Goodness-of-Fit Test

Chi-Square Goodness- of-Fit Test With Pre-Determined Bins Sizes in 7 Steps in Excel 2010 and Excel 2013

Chi-Square Goodness-Of-Fit-Normality Test in 9 Steps in Excel 2010 and Excel 2013

 

The Two Types of Chi-

Square Goodness-of-Fit

Tests

 

1) Bin Sizes Are Pre-Determined

An example would be to test whether the weekly sales count is uniformly distributed throughout the seven days of the week. The actual sales count for each day would be compared with expected bins each containing one seventh of the total weekly sales count. The sales count for each day would be expected equal one-seventh of the week’s total sales count if sales were uniformly distributed throughout the seven week days. This type of a GOF test often starts with the actual observed data already allocated to bins. This is the case here in that actual sales are grouped at the start into bins each holding the sales of a separate day. This example will be performed shortly within this section.

df = n - 1

n = number of bins of expected data

This blog article will perform this type of Chi-Square GOF Test.

 

2) Bin Sizes Arbitrarily Set To Match

a Distribution

An example would be to perform a Chi-Square Goodness-of-Fit Test for normality on a large single group of data values. This type of a GOF test starts with the actual observed data in a single group and therefore not yet allocated to bins. The expected bins are created by establishing arbitrary CDF endpoints of each bin. The upper and lower CDF endpoints of each expected bin determine the total number of data points that should be placed in each of these expected bins. The actual data values will be grouped in bins whose endpoints match those of the expected bins. Standardizing the actual observed data points is a way of simplifying their bin allocation. The Chi-Square GOF Test for Normality will be performed in this section using this method.

df = n – 1 – m

n = number of bins of expected data

m = number of parameters needed to fully describe the distribution, e.g. m = 2 for the normal distribution, which is fully described by two parameters; the mean and standard deviation.

An article further in this blog will perform a Chi-Square Goodness-Of-Fit Test for Normality, which is this type of Chi-Square GOF Test.

 

GOF Example – Type 1 - Bin Sizes

Are Pre-Determined

In this example, sales counts for each weekday have been averaged over the course of an entire year. The average number of sales for each weekday is shown in the follow figure. Determine with 90 percent certainty whether sales counts are uniformly distributed over the seven days of the week.

chi square, chi-square, goodness of fit, goodness-of-fit, gof test, statistics, excel, excel 2010, excel 2013
(Click Image To See a Larger Version)

 

Problem Information

Required Level of Certainty = 90 percent

α = 0.10

Actual data observations divided up into 7 bins.

The 7 Actual bins contain the average count of sales that occurred on each of the seven weekdays.

The average number of total sales each week was 105. This is the total number of actual data observations.

 

Step 1 – Create Expected Bins

The framework of expected and actual bins must match. There are seven bins containing actual sales counts for each of the seven days of the week. There must also be seven bins containing expected sales counts for each of the seven weekdays.

 

Step 2 – Calculate Counts in Expected Bins

The expected bins contain the data counts that would be expected if the total number of actual data points (105) were divided up according to the hypothesized grouping, i.e., uniformly distributed among all seven weekday.

Each of the seven expected bins will contain the expected number of the daily sales if all of the 105 total sales are uniformly distributed across seven days. Expected bin counts are calculated as follows:

chi square, chi-square, goodness of fit, goodness-of-fit, gof test, statistics, excel, excel 2010, excel 2013
(Click Image To See a Larger Version)

 

Step 3 – Verify Required Assumptions

The distribution of the Chi-Square Statistic, Χ2, can be approximated by the Chi-Square distribution if the following 3 conditions are met:

1) n ≥ 5

2) The minimum expected number of data points in any of the bins is at least 1

3) The average number of expected data points in a bin is at least 5

All of these conditions have been met.

 

Step 4 – Create Null and Alternative Hypotheses

The Null Hypothesis states that actual distribution of the data matches the hypothesized distribution. The Null Hypothesis for the Chi-Square GOF is always specified as the following:

H0: Χ2 = 0

The Chi-Square Statistic, Χ2, is distributed according to the Chi-Square distribution if certain conditions are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined for positive values of x.

Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified as follows:

H1: Χ2 > 0

 

Step 5 – Calculate Chi-Square Statistic, Χ2

The Test Statistic, which is the Chi-Square Statistic, Χ2, is calculated by this formula as follows:

chi square, chi-square, goodness of fit, goodness-of-fit, gof test, statistics, excel, excel 2010, excel 2013

This can be quickly implemented in a convenient table as follows:

chi square, chi-square, goodness of fit, goodness-of-fit, gof test, statistics, excel, excel 2010, excel 2013 (Click Image To See a Larger Version)

 

Step 6 – Calculate Critical Chi-Square Value and p Value

The degrees of freedom for the Chi-Square Independence Test is calculated as follows:

df = n – 1 = 7 – 1 = 6

n = k = number of expected bins

The Critical Chi-Square Value is calculated as follows:

Chi-Square Critical = CHISQ.INV.RT(α,df)

Chi-Square Critical = CHISQ.INV.RT(0.10,6) = 10.64

Prior to Excel 2010, the formula is calculated as follows:

Chi-Square Critical = CHIINV(α,df)

The p Value is calculated as follows:

p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)

p Value = CHISQ.DIST.RT(11.07,6) = 0.0863

Prior to Excel 2010, the formula is calculated as follows:

p Value = CHIDIST(Chi-Square Statistic,df)

 

Step 7 – Determine Whether To Reject Null Hypothesis

The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:

1) Chi-Square Statistic > Critical Chi-Square Value

2) p Value < α

Both of these equivalent conditions exist as follows:

Chi-Square Statistic = 11.07

Critical Chi-Square value = 10.64

p Value = 0.0863

α = 0.10

In this case we reject the Null Hypothesis because the Chi-Square Statistic (11.07) is larger than the Critical Value (10.64) or, equivalently, the p Value (0.0863) is smaller than Alpha (0.10).

A graphical representation of this problem is shown as follows:

chi square, chi-square, goodness of fit, goodness-of-fit, gof test, statistics, excel, excel 2010, excel 2013 (Click Image To See a Larger Version)

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

No comments:

Post a Comment