This is one of the following seven articles on Logistic Regression in Excel
Logistic Regression in 7 Steps in Excel 2010 and Excel 2013
R Square For Logistic Regression Overview
Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
Binary Logistic Regression
Overview
Binary logistic regression is a predictive technique that is applied when the dependent variable (y) is dichotomous (binary), i.e., there are only two possible outcomes. Binary logistic regression calculates the probability of the event designated as the positive event occurring.
Logistic regression is widely used in many fields. Engineers often use logistic regression to predict the probability of a system or part failing. Marketers use logistic regression to calculate the probability of prospective customer making a purchase or a subscriber cancelling a subscription. Bankers might use logistic regression to calculate the probability of a homeowner defaulting on a mortgage. Doctors use logistic regression to calculate a probability of a patient surviving trauma or serious disease.
Binary logistic regression is sometimes called Dummy Dependent Variable Regression because the dependent variable is binary and therefore resembles a dummy variable, which is binary. Dummy variables are binary variables that must be substituted when categorical independent variables are used as inputs to multiple linear regression. Multiple linear regression requires that independent variables be continuous or binary. Categorical independent variables must be converted to binary dummy variables before they can serve as inputs for multiple linear regression. Another chapter in this book covers this type of dummy variable regression in detail.
The Goal of Binary Logistic
Regression
The goal of binary logistic regression analysis is to create an equation, P(X), that most accurately calculates the probability of the occurrence of binary event X for a given the inputs X1, X2, …, Xk.
Variable Y describes the observed occurrence of event X. Y takes the value of 1 when event X actually occurred and the value of 0 when event X did not occur for a given set of inputs X1, X2, …,Xk.
P(X) should calculate a probability close to 1 as often as possible for any given set of inputs for which event X occurred (Y = 1). P(X) should also calculate a probability close to 0 as often as possible for any given set of inputs for which event X did not occur (Y = 0).
Allowed Variable Types For Binary
Logistic Regression
The dependent variable of binary logistic regression is a categorical variable with two possible outcomes.
The independent variables (the inputs, a.k.a. the predictor variables) can be any of the four variable types. The four types of numeric variables are nominal, ordinal, interval, and ratio.
Nominal variables are categorical and are simply arbitrary labels whose order doesn’t matter.
Ordinal variables are categorical variables whose order has meaning but the distance between units is usually not measurable.
Interval variables have measurable distance between units and a zero point that is arbitrarily chosen. Fahrenheit and Celsius temperatures are interval data.
Ratio variables have measurable distance between units and a zero point that indicates that there is none of the variable present. Absolute temperature is an example of ratio data.
Logistic Regression Calculates the
Probability of an Event Occurring
Logistic regression calculates the probability of the positive event (the event whose observed occurrence is designated by Y = 1) occurring for a given set of inputs X1, X2, …, Xk.
Binary logistic regression therefore calculates the following conditional probability:
Pr(Y=1 | X1, X2, …, Xk)
This is the probability that the actual observed output, Y, equals 1 given the inputs X1, X2, …, Xk.
The Difference Between Linear
Regression and Logistic Regression
Linear regression requires that the dependent variable (y) be continuous. The dependent variable for binary logistic regression is binary is therefore not continuous. Logistic regression is a method for calculating a continuous probability for a discontinuous event. A brief description of how that continuous probability is created follows.
The Relationship Between Probability and Odds
Event X is the event whose actual occurrence is designated by Y = 1. The probability of event X occurring is given as P(X). The odds of event X occurring are given as O(X). The “X” is somewhat of a strange variable name in P(X), O(X), and Event X because it is not related to the logistic regression inputs X1, X2, … , Xk.
The relationship between the probability of event X occurring and the odds of event X occurring is given as follows:
O(X) = P(X) / (1 – P(X))
For example, the probability of event X occurring is 75 percent, the odds of event X occurring are 3-to-1.
The odds, O(X), of discontinuous, binary event X occurring can be expressed as a continuous variable by taking the natural log of the odds. A complicated derivation proving this will not be shown here.
The Logit – The Natural Log of the Odds
The natural log of the odds is called the Logit, L, (pronounced LOH-jit) and is calculated as follows:
Given the following k inputs, X1, X2, …, Xk, and the following k constants, b0, b1, b2, …bk, the Logit equals the following:
ln[O(X)] = Logit = L = b0 + b1X1 + b2X2 + …+ bkXk
Since ln[O(X)] = Logit = L
O(X) therefore equals eL.
O(X) = eL = eb0+b1X1+b2X2 +..+bkXk
If O(X) = P(X) / (1 – P(X)), simple algebra can be applied to define P(X) as follows:
P(X) = O(X)/(1 + O(X))
or
P(X) = eL/(1+ eL)
With algebraic manipulation, this can also be expressed as the following for occasions when this formula is simpler to work with:
P(X) = 1 / (1+e-L)
Keep in mind that P(X) is the conditional probability Pr(Y=1 | X1, X2, …,Xk)
Showing How Closely The Predicted Value Matches The Actual Value
P(X) is the estimated probability of Event X occurring. Variable Y records the actual occurrence of Event X. The goal of binary logistic regression analysis is to create an equation P(X) that most closely matches Y for each set of inputs X1, X2, …, Xk.
P(X) should calculate a probability close to 1 as often as possible for any given set of inputs for which event X occurred (Y = 1). P(X) should also calculate a probability close to 0 as often as possible for any given set of inputs for which event X did not occur (Y = 0).
The conditional probability Pr(Yi=yi|X1i,X2i,…Xki) is the probability that predicted dependent variable yi equals the actual observed value Yi given the values of the independent variables inputs X1i,X2i,…Xki.
The conditional probability Pr(Yi=yi|X1i,X2i,…Xki) will be abbreviated Pr(Y=y|X) from here forward for convenience.
The conditional probability Pr(Y=y|X) is calculated by the following formula:
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y)
Pr(Y=y|X) can take values between 0 and 1 just like any other probability.
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y) is maximized (approaches 1) when P(X) matches Y:
In other words, Pr(Y=y|X) is maximized (approaches 1) when either of the following occur:
1) Y = 1 and P(X) approaches 1
2) Y = 0 and P(X) approaches 0
To demonstrate this, here are several scenarios. In the first two scenarios Y and P(X) are nearly the same and Pr(Y=y|X) is maximized (approaches 1):
Y = 1 and P(X) = 0.995,
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y) =
Pr(Y=y|X) = 0.9951 * [1-0.995](1-1) = 0.995
Y = 0 and P(X) = 0.005,
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y) =
Pr(Y=y|X) = 0.0050 * [1-0.005](1-0) = 0.995
In the third scenario Y and P(X) are very different and Pr(Y=y|X) is not maximized (does not approach 1):
Y = 0 and P(X) = 0.45
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y) =
Pr(Y=y|X) = 0.450 * [1-0.45](1-0) = 0.55
LE - The Likelihood Estimation
As explained, the following equation is maximized (approaches 1) when P(X) matches Y:
Pr(Y=y|X) = P(X)Y * [1-P(X)](1-Y)
If that conditional probability were calculated for each data record (each set of inputs and the associated output, Y), the product of all of these conditional probabilities is called the Likelihood Estimation, LE. The Likelihood Estimation is given by the following formula:
Likelihood Estimation = LE = ∏ Pr(Yi=yi|Xi)
LE = ∏ P(Xi)Yi * [1-P(Xi)](1-Yi)
In simple language, The LE is equal to the product of all P(X)Y * [1-P(X)](1-Y) terms calculated for each of the data records.
MLE – The Maximum Likelihood Estimation
The goal of binary logistic regression analysis is to create an equation P(X) that most accurately calculates the probability of the occurrence of binary event X for a given the inputs X1, X2, …, Xk.
Equation P(X) = eL/(1+ eL)
Logit = L = b0 + b1X1 + b2X2 + …+ bkXk
The highest possible value of the Likelihood Estimation, LE, is called the Maximum Likelihood Estimation, the MLE. The specific P(X) equation that maximizes the Likelihood Estimation, LE, to produce the Maximum Likelihood Estimation, the MLE, is the most accurate predictive equation.
The goal is therefore to determine the values of the constants b0, b1, b2, …bk that create an equation P(X) that maximizes the LE to creates the MLE.
LL - The Log-Likelihood Function
The Likelihood Function has been given by the following formula:
LK = ∏ P(Xi)Yi * [1-P(Xi)](1-Yi)
Taking the natural logarithm, ln(), of both sides of that equation creates LL, the Log-Likelihood Function. The formula for the Log-Likelihood Function is as follows:
ln [ LK ] = LL = ln [∏ P(Xi)Yi * [1-P(Xi)](1-Yi) ]
LL = ∑ Yi *P(Xi) + (1 – Yi)(1-P(Xi))
This is due to the following property of logarithms:
ln( ab * cd) = b*ln(a) + d*ln(c)
MLL – Maximum Log Likelihood Function
It is often more convenient to work with the logarithm of a number than the actual number. That is the case here. Each LE term, P(Xi)Yi * [1-P(Xi)](1-Yi), is equal to between one and zero. The MLE is equal to the maximum possible ∏ P(Xi)Yi * [1-P(Xi)](1-Yi). The product of a large number of terms, e.g., 1,000 such terms, between zero and one would produce an unwieldy small number.
A better solution is maximize the natural log of the MLE. Maximizing the log of the MLE would involve calculating the sum of terms and not the product. Maximizing the sum of small terms is much more convenient than maximizing the product of small terms.
The Log-Likelihood Function, LL, is given as follows:
LL = ∑ Yi *P(Xi) + (1 – Yi)(1-P(Xi))
The Maximum Log-Likelihood Function, MLL, is the maximum possible value of LL.
The MLE is maximized when its natural log, the MLL, is maximized since the logarithm is a monotonically increasing function. Two variables are monotonic if they either always move in the same direction or always move in the opposite direction. Two variables are monotonically increasing if one variable always increases when the other increases. Variables X and ln(X) are monotonically increasing because the ln(X) always increases when X increases. The maximum value of X will produce the maximum value of ln(X) and vice versa.
The parameters that produce the MLE (the Maximum Likelihood Estimation) also produce the MLL (the Maximum Log-Likelihood Function). In other words, the values of values of the constants b0, b1, b2, …bk that create an equation P(X) that maximizes the LE to creates the MLE are the same constant that maximize the LL to produce the MLL.
Using the Excel Solver To Calculate
the MLL and the Optimal P(X)
The coefficients b0, b1, b2,…, bk that produce MLL are the same coefficients b0, b1, b2,…, bk that produce the most accurate predictive equation P(X). The ultimate goal of binary logistic regression is to produce the most accurate predictive equation P(X). The Excel Solver is a quick and easy way to calculates the values of coefficients b0, b1, b2,…, bk that produce MLL, the Maximum Log-Likelihood function.
Working step-by-step through the example in the following blog article will provide clarity to what has just been covered in this article.
Excel Master Series Blog Directory
Click Here To See a List Of All
Statistical Topics And Articles In
This Blog
You Will Become an Excel Statistical Master!
very nice topic! Is it possible to dowload the EXcel file ?n
ReplyDeleteregards
Nice explanation. I think there is a typo however. Shouldn't the log likelihood be?
ReplyDeleteLL = ln [∏ P(Xi)Yi * [1-P(Xi)](1-Yi) ]
= ∑ Yi *ln[P(Xi)] + (1 – Yi)ln[(1-P(Xi))]
Instead of ∑ Yi *P(Xi) + (1 – Yi)(1-P(Xi)) because as you note:
ln( ab * cd) = b*ln(a) + d*ln(c)
When searching for a good calculator app, especially one that can handle the pythagorean theorem calculator angle https://calculatorprofessional.com/pythagorean-theorem-calculator , look for an app that offers both versatility and user-friendly interface. A quality calculator app should not only solve for right triangle measurements using the Pythagorean theorem but should also include features like angle calculations, graphical representations, and possibly trigonometric functions. It's helpful to read reviews and check the app's update history to ensure it's well-maintained and reliable. Additionally, consider whether the app offers step-by-step solutions, which can be particularly useful for educational purposes or complex calculations.
ReplyDelete