Monday, June 2, 2014

Chi-Square Goodness-Of-Fit Test in 7 Steps With Pre-Determined Bin Sizes in Excel

This is one of the following three articles on Chi-Square Goodness-Of-Fit Tests in Excel

Overview of the Chi-Square Goodness-of-Fit Test

Chi-Square Goodness- of-Fit Test With Pre-Determined Bins Sizes in 7 Steps in Excel 2010 and Excel 2013

Chi-Square Goodness-Of-Fit-Normality Test in 9 Steps in Excel 2010 and Excel 2013

The Two Types of Chi-

Square Goodness-of-Fit

Tests

1) Bin Sizes Are Pre-Determined

An example would be to test whether the weekly sales count is uniformly distributed throughout the seven days of the week. The actual sales count for each day would be compared with expected bins each containing one seventh of the total weekly sales count. The sales count for each day would be expected equal one-seventh of the week’s total sales count if sales were uniformly distributed throughout the seven week days. This type of a GOF test often starts with the actual observed data already allocated to bins. This is the case here in that actual sales are grouped at the start into bins each holding the sales of a separate day. This example will be performed shortly within this section.

df = n - 1

n = number of bins of expected data

This blog article will perform this type of Chi-Square GOF Test.

2) Bin Sizes Arbitrarily Set To Match

a Distribution

An example would be to perform a Chi-Square Goodness-of-Fit Test for normality on a large single group of data values. This type of a GOF test starts with the actual observed data in a single group and therefore not yet allocated to bins. The expected bins are created by establishing arbitrary CDF endpoints of each bin. The upper and lower CDF endpoints of each expected bin determine the total number of data points that should be placed in each of these expected bins. The actual data values will be grouped in bins whose endpoints match those of the expected bins. Standardizing the actual observed data points is a way of simplifying their bin allocation. The Chi-Square GOF Test for Normality will be performed in this section using this method.

df = n – 1 – m

n = number of bins of expected data

m = number of parameters needed to fully describe the distribution, e.g. m = 2 for the normal distribution, which is fully described by two parameters; the mean and standard deviation.

An article further in this blog will perform a Chi-Square Goodness-Of-Fit Test for Normality, which is this type of Chi-Square GOF Test.

GOF Example – Type 1 - Bin Sizes

Are Pre-Determined

In this example, sales counts for each weekday have been averaged over the course of an entire year. The average number of sales for each weekday is shown in the follow figure. Determine with 90 percent certainty whether sales counts are uniformly distributed over the seven days of the week.

(Click Image To See a Larger Version)

Problem Information

Required Level of Certainty = 90 percent

α = 0.10

Actual data observations divided up into 7 bins.

The 7 Actual bins contain the average count of sales that occurred on each of the seven weekdays.

The average number of total sales each week was 105. This is the total number of actual data observations.

Step 1 – Create Expected Bins

The framework of expected and actual bins must match. There are seven bins containing actual sales counts for each of the seven days of the week. There must also be seven bins containing expected sales counts for each of the seven weekdays.

Step 2 – Calculate Counts in Expected Bins

The expected bins contain the data counts that would be expected if the total number of actual data points (105) were divided up according to the hypothesized grouping, i.e., uniformly distributed among all seven weekday.

Each of the seven expected bins will contain the expected number of the daily sales if all of the 105 total sales are uniformly distributed across seven days. Expected bin counts are calculated as follows:

(Click Image To See a Larger Version)

Step 3 – Verify Required Assumptions

The distribution of the Chi-Square Statistic, Χ², can be approximated by the Chi-Square distribution if the following 3 conditions are met:

1) n ≥ 5

2) The minimum expected number of data points in any of the bins is at least 1

3) The average number of expected data points in a bin is at least 5

All of these conditions have been met.

Step 4 – Create Null and Alternative Hypotheses

The Null Hypothesis states that actual distribution of the data matches the hypothesized distribution. The Null Hypothesis for the Chi-Square GOF is always specified as the following:

H₀: Χ² = 0

The Chi-Square Statistic, Χ², is distributed according to the Chi-Square distribution if certain conditions are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined for positive values of x.

Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified as follows:

H₁: Χ² > 0

Step 5 – Calculate Chi-Square Statistic, Χ²

The Test Statistic, which is the Chi-Square Statistic, Χ², is calculated by this formula as follows:

This can be quickly implemented in a convenient table as follows:

(Click Image To See a Larger Version)

Step 6 – Calculate Critical Chi-Square Value and p Value

The degrees of freedom for the Chi-Square Independence Test is calculated as follows:

df = n – 1 = 7 – 1 = 6

n = k = number of expected bins

The Critical Chi-Square Value is calculated as follows:

Chi-Square Critical = CHISQ.INV.RT(α,df)

Chi-Square Critical = CHISQ.INV.RT(0.10,6) = 10.64

Prior to Excel 2010, the formula is calculated as follows:

Chi-Square Critical = CHIINV(α,df)

The p Value is calculated as follows:

p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)

p Value = CHISQ.DIST.RT(11.07,6) = 0.0863

Prior to Excel 2010, the formula is calculated as follows:

p Value = CHIDIST(Chi-Square Statistic,df)

Step 7 – Determine Whether To Reject Null Hypothesis

The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:

1) Chi-Square Statistic > Critical Chi-Square Value

2) p Value < α

Both of these equivalent conditions exist as follows:

Chi-Square Statistic = 11.07

Critical Chi-Square value = 10.64

p Value = 0.0863

α = 0.10

In this case we reject the Null Hypothesis because the Chi-Square Statistic (11.07) is larger than the Critical Value (10.64) or, equivalently, the p Value (0.0863) is smaller than Alpha (0.10).

A graphical representation of this problem is shown as follows:

(Click Image To See a Larger Version)

Excel Master Series Blog Directory

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

You Will Become an Excel Statistical Master!

2 comments:

salmaMarch 12, 2020 at 1:14 PM

الرائد افضل شركة عزل خزانات بالرياض شركة عزل خزانات بالرياض اتصل بنا الان

ReplyDelete
Replies
Urdu ShayariApril 5, 2021 at 11:02 PM
thanks for nice post, i love peotry
ReplyDelete
Replies

Add comment

Become an Excel Statistical Master

Excel Master Series - MBA-level statistics - Over 1,100+ Pages of Easy-To-Follow Instructions in Excel

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

Step-By-Step Optimization With Excel Solver

What's In It?

For anyone who wants to be operating at a high level with the Excel Solver quickly, this is the book for you. Step-By-Step Optimization With Excel Solver is a 200+ page .pdf e-manual of simple yet thorough explanations on how to use the Excel Solver to solve today’s most widely known optimization problems. Loaded with screen shots that are coupled with easy-to-follow instructions, this book will simplify many difficult optimization problems and make you a master of the Excel Solver almost immediately.

Here are just some of the Solver optimization problems that are solved completely with simple-to-understand instructions and screen shots in this e-manual:

• The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use of the Excel INDEX function.

• The well-known “Knapsack Problem” which shows how optimize the use of limited space while satisfying numerous other criteria.

• How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear solving method.

• How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.

• Portfolio optimization to maximize return or minimize risk.

• Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.

• How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to multiple locations.

• How to optimize the selection of different production machine to minimize cost while fulfilling an order.

• How to optimally allocate a marketing budget to generate the greatest reach and frequency or number of inbound leads at the lowest cost.

Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all of the Solver’s settings for total custom use. This e-manual also provides lots of inside advice and guidance on setting up the model in Excel so that it will be as simple and intuitive as possible to work with.

All of the optimization problems in this book are solved step-by-step using a 6-step process that works every time. In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization problem in the book, a link is provided to download an Excel workbook that has all problems completed exactly as they are in this e-manual.

Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be optimizing at an advanced level with the Excel Solver quickly.

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

Immediate, Absolute, No-Questions-Asked, Money-Back Guarantee If Not TOTALLY, 100% Satisfied. In Other Words, If Any Excel Master Series eManual That You've Purchased Here Does Not Provide Instructions That Are CRYSTAL CLEAR and EASY TO UNDERSTAND, You Get All Of Your Money Back Immediately and Keep the eManual. Guaranteed!

Meet The Author