This is one of the following seven articles on Logistic Regression in Excel

Logistic Regression in 7 Steps in Excel 2010 and Excel 2013

R Square For Logistic Regression Overview

Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013

# Likelihood Ratio in Excel

Is More Effective Than

Wald Statistic To

Determine if the Variable

Coefficients Are Significant

An essential part of linear regression analysis is the determination of whether the calculated coefficients of input variables are statistically significant. A variable coefficient is considered to be statistically significant if the probability that it has a nonzero value is less than the specified level of significance. This probability is shown in the Excel linear regression output as the P-value next to the coefficient. The normal level of significance is α = 0.05. A P-value of less than 0.05 indicates that the variable coefficient is statistically significant if α = 0.05.

The significance of the variable coefficients b** _{1}**, b

**, …, b**

_{2}**in logistic regression is calculated by different methods.**

_{k}

## The Wald Statistic

Until recently the most common metric used to evaluate the significance of a variable coefficient in binary logistic regression was the Wald Statistic. The Wald Statistic for each coefficient is calculated by the following formula:

Wald Statistic = (Coefficient)** ^{2}** / (Standard Error of Coefficient)

^{2}The Standard errors of the coefficients are equal to the square roots of the diagonals of the covariance matrix of the coefficients. This requires a bit of matrix work to compute. The Wald Statistic is approximately distributed according to the Chi-Square distribution with one degree of freedom. The p Value for the Wald Statistic is calculated as follows:

p Value = CHIDIST(Wald Statistic,1)

In Excel 2010 and later, this formula can be replaced by the following:

p Value = CHISQ.DIST.RT(Wald Statistic, 1)

The coefficient is considered statistically significant is less than the specified level of significance, which is commonly set at 0.05.

The 95 percent confidence interval for the coefficient is calculated in Excel as follows:

95 percent C.I. = Coefficient ± S.E.* NORM.S.INV(1 – α/2)

### Problems With the Wald Statistic

The Wald Statistic is currently the main logistic regression metric of variable coefficient significance calculated by well-known statistical packages such as SAS and SPSS. The reliability of the Wald Statistic is, however, considered questionable. If the case that a large coefficient is produced, the standard error of the coefficient can be inflated. This will result in an undersized Wald Statistic, which could lead to a conclusion that a significant coefficient was not significant. This is a false negative, which is a Type 2 Error. A false positive is a Type 1 Error. An additional reliability issue occurs with the Wald Statistic when sample size is small. The Wald Statistic is often biased for small sample sizes.

Due to the reliability issues associated with the Wald Statistic, the preferred method to evaluate the significance of logistic regression variable coefficients is the Likelihood Ratio calculated for each coefficient.

## The Likelihood Ratio

The Likelihood Ratio is a statistical test that compares the likelihood of obtaining the data using a full model with the likelihood of obtaining the same data with a model that is missing the coefficient being evaluated. The Likelihood Ratio for logistic regression is a Chi-Square test that compares the goodness of fit of two models when one of the models is a subset of the other.

The general formula for the Likelihood Ratio is as follows:

Likelihood Ratio** _{Reduced_Model}** = -2*MLL

**+ 2*MLL**

_{Reduced_Model}

_{Full_Model}MLL** _{Full_Model}** is equal to the MLL

**that was initially calculated to determine the values of all coefficients b**

_{m}**, b**

_{1}**, …, b**

_{2}**that create the most accurate P(X). This has already been calculated to equal the following:**

_{k}MLL** _{m}** = -6.6545

MLL** _{Reduced_Model}** is simply the calculation of MLL with the coefficient being evaluated set to zero.

For example, MLL** _{b1=0}** would be the MLL calculated with coefficient b

**set to zero.**

_{1}MLL** _{b2=0}** would be the MLL calculated with coefficient b

**set to zero.**

_{\2}The Likelihood Ratio is approximately distributed according to the Chi-Square distribution with the degrees of freedom equal to number of coefficients that have been set to zero. This will be one.

The p value of the Likelihood Ratio determines whether the removal of the coefficient made a real difference. If the p value is lower than the specified level of significance (usually 0.05) the coefficient is considered significant.

The p value of the Likelihood Ratio is calculated with the following Excel formula:

p Value = CHISQ.DIST.RT(MLL** _{Reduced_Model}**,1)

### Calculating the Likelihood Ratio to Determine Whether Coefficient b_{1} Is Significant With Excel Solver

The Solver will be used to calculate MLL** _{b1=0}**. The p Value of MLL

**(CHISQ.DIST.RT(MLL**

_{b1=0 }**,1) will determine whether coefficient b**

_{b1=0}**is significant. Setting the value of coefficient b**

_{1}**to zero before calculating MLL**

_{1}**with the Solver is done as follows:**

_{b1=0} *(Click On Image To See a Larger Version)*

The Solver dialogue is configured as follows to calculate MLL** _{b1=0}**. Note that b

**(cell C3) is no longer a Solver Decision Variable.**

_{1} *(Click On Image To See a Larger Version)*

Running the Solver produces the following calculation of MLL** _{b1=0}**.

*(Click On Image To See a Larger Version)*

MLL** _{m}** = MLL

**= -6.6546**

_{Full_Model}MLL** _{b1=0}** = MLL

**= -10.9104**

_{Reduced_Model}Likelihood Ratio** _{Reduced_Model}** = -2*MLL

**+ 2*MLL**

_{Reduced_Model}

_{Full_Model}Likelihood Ratio** _{ b1}** = LR

**= -2*MLL**

_{ b1}**+ 2*MLL**

_{b1=0}

_{m}LR** _{ b1}** = 8.5117

This statistic is distributed according to the Chi-Square distribution with its degrees of freedom equal to the difference between the number of variables in the full model and the reduced model. In this case that difference is one variable so df = 1.

p Value = CHIDIST(LR** _{ b1}**,1) = CHIDIST(8.5117,1) = 0.0035

The very low p Value indicates that LR** _{ b1}** is statistically significant. Variable b

**is therefore significant.**

_{1}

### Calculating the Likelihood Ratio to Determine Whether Coefficient b_{2} Is Significant With Excel Solver

The Solver will be used to calculate MLL** _{b2=0}**. The p Value of MLL

**(CHISQ.DIST.RT(MLL**

_{b2=0 }**,1) will determine whether coefficient b**

_{b2=0}**is significant. Setting the value of coefficient b**

_{2}**to zero before calculating MLL**

_{\2}**with the Solver is done as follows:**

_{b2=0} *(Click On Image To See a Larger Version)*

The Solver dialogue is configured as follows to calculate MLL** _{b2=0}**. Note that b

**cell (C4) is no longer a Solver Decision Variable.**

_{2} *(Click On Image To See a Larger Version)*

Running the Solver produces the following calculation of MLL** _{b2=0}**.

*(Click On Image To See a Larger Version)*

MLL** _{m}** = MLL

**= -6.6546**

_{Full_Model}MLL** _{b2=0}** = MLL

**= -9.5059**

_{Reduced_Model}Likelihood Ratio** _{Reduced_Model}** = -2*MLL

**+ 2*MLL**

_{Reduced_Model}

_{Full_Model}Likelihood Ratio** _{ b1}** = LR

**= -2*MLL**

_{ b2}**+ 2*MLL**

_{b2=0}

_{m}LR** _{ b2}** = 5.7025

This statistic is distributed approximately according to the Chi-Square distribution with its degrees of freedom equal to the difference between the number of variables in the full model and the reduced model. In this case that difference is one variable so df = 1.

p Value = CHISQ.DIST.RT(LR** _{ b2}**,1) = CHISQ.DIST.RT(5.7025,1) = 0.0169

The very low p Value indicates that LR** _{ b2}** is statistically significant. Variable b

**is therefore significant.**

_{2}

## No comments:

## Post a Comment