Sunday, June 1, 2014

Hosmer-Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013

This is one of the following seven articles on Logistic Regression in Excel

Logistic Regression Overview

Logistic Regression in 7 Steps in Excel 2010 and Excel 2013

R Square For Logistic Regression Overview

Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013

Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013

Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013

Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013

 

Hosmer-Lemeshow Test in

Excel – Logistic Regression

Goodness-of-Fit Test

Another goodness-of-fit test commonly applied to logistic regression results is the Hosmer-Lemeshow test. This is a Chi-Square Goodness-Of-Fit test that quantifies how closely the predicted results match the actual observations. The test can be summarized as follows:

The total number of observations is split up into ten groups, called quintiles. The number of expected (predicted) positives and negatives in each quintile is compared with the observed number of positives and negatives in each quintile. The comparison of expected numbers and observed numbers produces a test statistic called the Chi-Square Statistic. A p Value is then derived which determines whether or not the model is a good fit.

A large p Value indicates that the difference between the number of observed and expected values is insignificant and the model is therefore considered valid. If the p Value is smaller than the specified level of significance (usually set at 0.05), the difference between the number of observed and expected values is statistically significant and the model is therefore considered not valid.

 

Hosmer-Lemeshow Test in Excel

The data should be divided up into 10 equally-sized groups called quintiles or bins. Produce the following four counts of the data in each bin:

- Positive values observed in that bin

- Positive values expected in the bin

- Negative values observed in that bin

- Negative values expected in that bin

Arrange all of that data is done in the following diagram. Place the positive observed and expected values together on one side. Place the negative observed and expected values together on the other side. This is shown as follows:

hosmer-lemeshow, goodness-of-fit, chi-square, logistic regression, excel, excel 2010, excel 2013, statistics, excel solver, optimization (Click On Image To See a Larger Version)

A Chi-Square Goodness-Of-Fit test requires that the average number of values in each “Expected” bin is at least 5 and that every “Expected” bin has a value of at least 1.

This test suffers when the total number of observations is not large. Test creators David Hosmer and Stanley Lemeshow recommend that the minimum number of observations be at least 200.

This test is performed almost exactly like any other Chi-Square Goodness-Of-Fit test except the degrees of freedom equals the number of bins – 2. In this case, that would be as follows:

df = Number of bins – 2 = 10 – 2 = 8

Calculate the following for each positive observed/expected group and for each negative observed/expected group:

(Number observed – number expected)2 / (Number of expected)

Calculate the test statistic called the Chi-Square Statistic, Χ2.

Χ2 = ∑ (Number observed – number expected)2 / (Number of expected)

Χ2 = 6.08418

This test statistic, Χ2, is distributed approximately according to the Chi-Square distribution with (Number of bins) – 2 degrees of freedom if the average number of values in each “Expected” bin is at least 5 and that every “Expected” bin has a value of at least 1.

A p Value can be derived from the Chi-Square Statistic as follows:

p Value = CHISQ.DIST.RT(Χ2,2) = CHISQ.DIST.RT(6.08418,2) = 0.63780

This p Value states that there is a 63.78 percent chance the difference between the observed and expected values is merely a random result and is not significant. The model is therefore considered to be a good model because the predicted values appear to be a good fit to the observed values. The Null Hypothesis stating that there is no difference between the Expected and Observed values cannot be rejected.

A small p Value would indicate that the model was not that good of a fit.

The p Value indicates the percentage of area under the Chi-Square distribution curve that is to the right of the Chi-Square Statistics of 6.08418. This is illustrated in the following diagram.

hosmer-lemeshow, goodness-of-fit, chi-square, logistic regression, excel, excel 2010, excel 2013, statistics, excel solver, optimization (Click On Image To See a Larger Version)

In Excel 2010 and later the formula CHIDIST(Χ2,df) can be replaced with the following formula: CHISQ.DIST.RT(Χ2,df)

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

1 comment:

  1. Thanks . very helpful..was validating my model but wanted to use Excel.

    ReplyDelete