Sunday, June 1, 2014

Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013

This is one of the following seven articles on Logistic Regression in Excel

Logistic Regression Overview

Logistic Regression in 7 Steps in Excel 2010 and Excel 2013

R Square For Logistic Regression Overview

Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013

Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013

Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013

Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013

 

Likelihood Ratio in Excel

Is More Effective Than

Wald Statistic To

Determine if the Variable

Coefficients Are Significant

An essential part of linear regression analysis is the determination of whether the calculated coefficients of input variables are statistically significant. A variable coefficient is considered to be statistically significant if the probability that it has a nonzero value is less than the specified level of significance. This probability is shown in the Excel linear regression output as the P-value next to the coefficient. The normal level of significance is α = 0.05. A P-value of less than 0.05 indicates that the variable coefficient is statistically significant if α = 0.05.

The significance of the variable coefficients b1, b2 , …, bk in logistic regression is calculated by different methods.

 

The Wald Statistic

Until recently the most common metric used to evaluate the significance of a variable coefficient in binary logistic regression was the Wald Statistic. The Wald Statistic for each coefficient is calculated by the following formula:

Wald Statistic = (Coefficient)2 / (Standard Error of Coefficient)2

The Standard errors of the coefficients are equal to the square roots of the diagonals of the covariance matrix of the coefficients. This requires a bit of matrix work to compute. The Wald Statistic is approximately distributed according to the Chi-Square distribution with one degree of freedom. The p Value for the Wald Statistic is calculated as follows:

p Value = CHIDIST(Wald Statistic,1)

In Excel 2010 and later, this formula can be replaced by the following:

p Value = CHISQ.DIST.RT(Wald Statistic, 1)

The coefficient is considered statistically significant is less than the specified level of significance, which is commonly set at 0.05.

The 95 percent confidence interval for the coefficient is calculated in Excel as follows:

95 percent C.I. = Coefficient ± S.E.* NORM.S.INV(1 – α/2)

 

Problems With the Wald Statistic

The Wald Statistic is currently the main logistic regression metric of variable coefficient significance calculated by well-known statistical packages such as SAS and SPSS. The reliability of the Wald Statistic is, however, considered questionable. If the case that a large coefficient is produced, the standard error of the coefficient can be inflated. This will result in an undersized Wald Statistic, which could lead to a conclusion that a significant coefficient was not significant. This is a false negative, which is a Type 2 Error. A false positive is a Type 1 Error. An additional reliability issue occurs with the Wald Statistic when sample size is small. The Wald Statistic is often biased for small sample sizes.

Due to the reliability issues associated with the Wald Statistic, the preferred method to evaluate the significance of logistic regression variable coefficients is the Likelihood Ratio calculated for each coefficient.

 

The Likelihood Ratio

The Likelihood Ratio is a statistical test that compares the likelihood of obtaining the data using a full model with the likelihood of obtaining the same data with a model that is missing the coefficient being evaluated. The Likelihood Ratio for logistic regression is a Chi-Square test that compares the goodness of fit of two models when one of the models is a subset of the other.

The general formula for the Likelihood Ratio is as follows:

Likelihood RatioReduced_Model = -2*MLLReduced_Model + 2*MLLFull_Model

MLLFull_Model is equal to the MLLm that was initially calculated to determine the values of all coefficients b1, b2, …, bk that create the most accurate P(X). This has already been calculated to equal the following:

MLLm = -6.6545

MLLReduced_Model is simply the calculation of MLL with the coefficient being evaluated set to zero.

For example, MLLb1=0 would be the MLL calculated with coefficient b1 set to zero.

MLLb2=0 would be the MLL calculated with coefficient b\2 set to zero.

The Likelihood Ratio is approximately distributed according to the Chi-Square distribution with the degrees of freedom equal to number of coefficients that have been set to zero. This will be one.

The p value of the Likelihood Ratio determines whether the removal of the coefficient made a real difference. If the p value is lower than the specified level of significance (usually 0.05) the coefficient is considered significant.

The p value of the Likelihood Ratio is calculated with the following Excel formula:

p Value = CHISQ.DIST.RT(MLLReduced_Model,1)

 

Calculating the Likelihood Ratio to Determine Whether Coefficient b1 Is Significant With Excel Solver

The Solver will be used to calculate MLLb1=0. The p Value of MLLb1=0 (CHISQ.DIST.RT(MLLb1=0,1) will determine whether coefficient b1 is significant. Setting the value of coefficient b1 to zero before calculating MLLb1=0 with the Solver is done as follows:

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics
(Click On Image To See a Larger Version)

The Solver dialogue is configured as follows to calculate MLLb1=0. Note that b1 (cell C3) is no longer a Solver Decision Variable.

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics (Click On Image To See a Larger Version)

Running the Solver produces the following calculation of MLLb1=0.

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics (Click On Image To See a Larger Version)

MLLm = MLLFull_Model = -6.6546

MLLb1=0 = MLLReduced_Model = -10.9104

Likelihood RatioReduced_Model = -2*MLLReduced_Model + 2*MLLFull_Model

Likelihood Ratio b1 = LR b1 = -2*MLLb1=0 + 2*MLLm

LR b1 = 8.5117

This statistic is distributed according to the Chi-Square distribution with its degrees of freedom equal to the difference between the number of variables in the full model and the reduced model. In this case that difference is one variable so df = 1.

p Value = CHIDIST(LR b1,1) = CHIDIST(8.5117,1) = 0.0035

The very low p Value indicates that LR b1 is statistically significant. Variable b1 is therefore significant.

 

Calculating the Likelihood Ratio to Determine Whether Coefficient b2 Is Significant With Excel Solver

The Solver will be used to calculate MLLb2=0. The p Value of MLLb2=0 (CHISQ.DIST.RT(MLLb2=0,1) will determine whether coefficient b2 is significant. Setting the value of coefficient b\2 to zero before calculating MLLb2=0 with the Solver is done as follows:

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics
(Click On Image To See a Larger Version)

The Solver dialogue is configured as follows to calculate MLLb2=0. Note that b2 cell (C4) is no longer a Solver Decision Variable.

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics
(Click On Image To See a Larger Version)

Running the Solver produces the following calculation of MLLb2=0.

wald statistic, likelihood ratio, logistic regression, excel, excel 2010, excel 2013, statistics
(Click On Image To See a Larger Version)

MLLm = MLLFull_Model = -6.6546

MLLb2=0 = MLLReduced_Model = -9.5059

Likelihood RatioReduced_Model = -2*MLLReduced_Model + 2*MLLFull_Model

Likelihood Ratio b1 = LR b2 = -2*MLLb2=0 + 2*MLLm

LR b2 = 5.7025

This statistic is distributed approximately according to the Chi-Square distribution with its degrees of freedom equal to the difference between the number of variables in the full model and the reduced model. In this case that difference is one variable so df = 1.

p Value = CHISQ.DIST.RT(LR b2,1) = CHISQ.DIST.RT(5.7025,1) = 0.0169

The very low p Value indicates that LR b2 is statistically significant. Variable b2 is therefore significant.

http://excelmasterseries.com/ClickBank/Thank_You_New_Manual_Order/Thank-You-For-Your-Order-Optimization.php

 

No comments:

Post a Comment