Sunday, June 1, 2014

This is one of the following seven articles on Logistic Regression in Excel

Binary Logistic Regression

in 7 Steps in Excel

The purpose of this example of binary logistic regression is to create an equation that will calculate the probability that a production machine is currently producing output that conforms to desired specifications based upon the age of the machine in months and the average number of shifts that the machine has operated during each week of its lifetime.

Data was collected on 20 similar machines as follows:

1) Whether the machine produces output that meets specifications at least 99 percent of the time.(1 = Machine Meets Spec – It Does Produce Conforming Output at least 99 Percent of the Time, 0 = Machine Does Not Meets Spec – It Does Not Produce Conforming Output at least 99 Percent of the Time)

2) The Machine’s Age in Months

3) The Average Number of Shifts That the Machine Has Operated Each Week During Its Lifetime.

(Click On Image To See a Larger Version)

Logistic Regression Steps in Excel

Logistic Regression Step 1 – Sort the Data

The purpose of sorting the data is to make data patterns more evident. Using Excel data sorting tool, perform the primary sort on the dependent variable. In this case, the dependent variable is the response variable indicating whether the prospect made a purchase. Perform subordinate sorts (secondary, tertiary, etc.) on the remaining variables.

The following data was sorted initially according to the response variable (Y). The secondary sort was done according to Machine Age and the tertiary sort was done according to Average Number of Shifts of Operation Per Week. The results are as follows:

(Click On Image To See a Larger Version)

Patterns are evident from the data sort. Machines that did not produce conforming output tended to the older machines and/or machines that operate during a higher average number of shifts per week.

Logistic Regression Step 2 – Calculate a Logit For Each Data Record

Given the following inputs, X₁, X₂, …, X_k, the Logit equals the following:

Logit = L = b₀ + b₁X₁ + b₂X₂ + …+ b_kX_k

If the explanatory variables are Age and Average Number of Shifts, the Logit, L, is as follows:

Logit = L = b₀ + b₁*Age + b₂*(Average Number of Weekly Shifts)

The Excel Solver will ultimately optimize the variables b₀, b₁, and b₂ in order to create an equation that will accurately predict the probability of a machine producing conforming output given the machines age and average number of operating shifts per week.

The Decision Variables are the variables that the Solver adjusts during the optimization process. The Decision Variables b₀, b₁, and b₂ are arbitrarily set to 0.1 before the Solver is run. It is a good idea to initially set the Solver decision variables so that the resulting Logit is well below 20 for each record. Logits that exceed 20 cause extreme values to occur in later steps of logistic regression. The Solver decision variables b₀, b₁, and b₂ have been arbitrarily set to the value of 0.1 to initial produce reasonably small Logits as shown next.

A unique Logit is created for each of the 20 data records based on the initial settings of the Decision Variables as follows:

(Click On Image To See a Larger Version)

Logistic Regression Step 3 – Calculate e^L For Each Data Record

The number e is the base of the natural logarithm. It is approximately equal to 2.71828163 and is the limit of (1 + 1/n)ⁿ as n approaches infinity. e^L must be calculated for each data record. This step will be shown in the image in the next step, Step 4.

Logistic Regression Step 4 – Calculate P(X) For Each Data Record

P(X) is the probability of event X occurring. Event X occurs when a machine produces conforming output. P(X) is the probability of a machine producing conforming output.

P(X) = e^L / (1 + e^L)

L = Logit = b₀ + b₁*X₁ + b₂*X₂ + …+ b_k*X_k

Calculating e^L and P(X) for each of the data records is done as follows:

(Click On Image To See a Larger Version)

e^L can also be calculated in Excel as exp(L).

Logistic Regression Step 5 – Calculate LL, the Log-Likelihood Function

The conditional probability Pr(Y_i=y_i|X_1i,X_2i,…X_ki) is the probability that predicted dependent variable y_i equals the actual observed value Y_i given the values of the independent variables inputs X_1i,X_2i,…X_ki.

The conditional probability Pr(Y_i=y_i|X_1i,X_2i,…X_ki) will be abbreviated Pr(Y=y|X) from here forward for convenience.

The conditional probability Pr(Y=y|X) is calculated by the following formula:

Pr(Y=y|X) = P(X)^Y * [1-P(X)]^(1-Y)

Taking the natural log of both sides yields the following:

ln [ Pr(Y=y|X) ] = y*ln [ P(X) ] * (1-y)*ln[ [1-P(X)] ]

The Log-Likelihood Function, LL, is the sum of the ln [ Pr(Y=y|X) ] terms for all data records as per the following formula:

LL = ∑ Y_i *P(X_i) + (1 – Y_i)(1-P(X_i))

Calculating LL is done as follows:

(Click On Image To See a Larger Version)

Logistic Regression Step 6 – Use the Excel Solver to Calculate MLL, the Maximum Log-Likelihood Function

The objective of Logistic Regression is find the coefficients of the Logit (b₀ , b₁,, b₂ + …+ b_k) that maximize LL, the Log-Likelihood Function in cell H30, to produce MLL, the Maximum Log-Likelihood Function.

The functionality of the Excel Solver is fairly straightforward: the Excel Solver adjusts the numeric values in specific cells in order to maximize or minimize the value in a single other cell.

The cell that the Solver is attempting to maximize or minimize is called the Solver Objective. This is LL in cell H30.

The cells whose values the Solver adjusts are called the Decision Variables. The Solver Decision Variables are therefore in cells C2, C3, and C4. These contain b₀ , b₁,, b₂ + …+ b_k, the coefficients of the Logit. These cells will be adjusted to maximize LL, which is in cell H30.

The Excel Solver is an add-in that in included with most Excel packages. The Solver most be manually activated by the user before it can be utilized for the first time. Different versions of Excel require different method of activation for the Solver. The best advice is to search Microsoft’s documentation online to locate instructions for activating the add-ins that are included with your version of Excel. YouTube videos are often another convenient source for step-by-step instructions for activating Solver in your version of Excel. Once activated, the Solver is normally found in the Data tab of versions of Excel from 2007 onward that use the ribbon navigation. Excel 2003 provides a link to the Solver in the drop-down menu under Tools.

These Decision Variables and Objective are entered into the Solver dialogue box as follows:

(Click On Image To See a Larger Version)

Make sure not to check the checkbox next to Make Unconstrained Variables Non-Negative.

Excel Solver’s GRG Nonlinear Solving Method

The GRG Nonlinear solving method should be selected if any of the equations involving Decision variables or Constraints is nonlinear and smooth (uninterrupted, continuous, i.e., having no breaks). GRG stands for Generalized Reduced Gradient and is a long-time, proven, reliable method for solving nonlinear problems.

The equations on the path to the calculation of the Objective (maximizing LK) involve the calculation of e^L, P(X), and Pr(Y=y|X). Each of these three equations is nonlinear and smooth. An equation is “smooth” if that equation and the derivative of that equation have no breaks (are continuous). The GRG Nonlinear solving method should therefore be selected.

One way to determine whether an equation or function is non-smooth (the graph has a sharp point indicating that the derivative is discontinuous) or discontinuous (the equation’s graph abruptly changes values at certain points – the graph is disconnected at these points) is to graph the equation over its expected range of values.

The Solver Should Be Run Through Several Trials To Ensure an Optimal Solution

When the Solver runs the GRG algorithm, it picks a starting point for its calculations. Each time the Solver GRG algorithm is run, it picks a slightly different starting point. This is why different answers will often appear after each run of the GRG Nonlinear solving method. The Solver should be re-run several times until the Objective (LK) is not maximized further. This should produce the best locally optimal values of the Decision Variables (b₀, b₁, b₂, …, b_k).

The GRG Nonlinear solving method is guaranteed to produce locally optimal solutions but not globally optimal solutions. The GRG nonlinear solving method will produce a Globally Optimal solution if all functions in the path to the Objective and all Constraints are convex. If any of the functions or Constraints is non-convex, the GRG Nonlinear solving method may find only Locally Optimal Solutions.

A function is convex if it has only one peak either up or down. A convex function can always be solved to a Globally Optimal solution. A function is non-convex if it has more than one peak or is discontinuous. Non-convex solutions can often be solved only to Locally Optimal solutions.

A Globally Optimal solution is the best possible solution that meets all Constraints. A Globally Optimal solution might be comparable to Mount Everest since Mount Everest is the highest of all mountains.

A Locally Optimal solution is the best nearby solution that meets all Constraints. It may not be the best overall solution, but it is the best nearby solution. A Locally Optimal solution might be comparable to Mount McKinley, which is the highest mountain in North America not the highest of all mountains.

The function e^L with L = b₀ + b₁*X₁ + b₂*X₂ + …+ b_k*X_k can be non-convex because inputs X₁ , X₂ ,…, X_k can be nonlinear. The GRG Nonlinear solving method is therefore only guaranteed to find a Locally Optimal Solution.

How to Increase the Chance That the Solver Will Find a Globally Optimal Solution

There are three ways to increase the chance that the Solver will arrive at a Globally Optimal solution:

The first is to run the Solver multiple times using different sets of values for the Decision Variables. This option allows you to select initial sets of Decision Variables based on your understanding of the overall problem and is often the best way to arrive at the most desirable solution.

The second was is to select “Use Multistart.” This runs the GRG Solver for a number of times and randomly selects a different set of initial values for the Decision Variables during each run. The Solver then presents the best of all of the Locally Optimal solutions that it has found.

The third way is to set constraints in the Solver dialogue box that will force the Solver to try a new set of values. Constraints are limitations manually placed on the Decision Variables. Constraints can be useful if the Decision variables should be limited to a specific range of values. A Globally Optimal solution will not likely be found by applying constraints but a more realistic solution can be obtained by limiting Decision Variables to likely values.

Interpreting Excel Solver Results

Running the Solver produces the following results for this problem:

(Click On Image To See a Larger Version)

MLL, the Maximum Log-Likelihood was calculated to be -6.654560484 when the constants were adjusted as Solver Decision Variables to the values of:

b₀ = 12.48285608

b₁ = -0.117031374

b₂ = -1.469140055

Logistic Regression Step 7 – Test the Solver Output By Running Scenarios

Validate the output by running several scenarios through the Solver results. Each scenario will employ a different variation of input variables X₁, X₂, .. , X_k to produce outputs that should be consistent with the initial data set.

The sort of the initial data showed a pattern that nonconforming product was more likely on older machines and/or machines that were run more often.

The following three scenarios were run as follows:

Scenario 1

Machine Age = 40 months

Average Number of Weekly Shifts = 7

P(X) = Probability of Conforming Output = 8 percent

(Click On Image To See a Larger Version)

Scenario 2

Machine Age = 40 months

Average Number of Weekly Shifts = 4

P(X) = Probability of Conforming Output = 87 percent

(Click On Image To See a Larger Version)

Scenario 3

Machine Age = 12 months

Average Number of Weekly Shifts = 7

P(X) = Probability of Conforming Output = 69 percent

(Click On Image To See a Larger Version)

The outcomes of these three scenarios are consistent with the patterns apparent in the initial sorted data set below that nonconforming product was more likely to be produced by older machines and/or machines that were run more often:

(Click On Image To See a Larger Version)

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Technorati Tags: logistic regression,regression,excel,statistics,excel solver,solver,excel 2010,excel 2013

Become an Excel Statistical Master

Excel Master Series - MBA-level statistics - Over 1,100+ Pages of Easy-To-Follow Instructions in Excel

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

Step-By-Step Optimization With Excel Solver

What's In It?

For anyone who wants to be operating at a high level with the Excel Solver quickly, this is the book for you. Step-By-Step Optimization With Excel Solver is a 200+ page .pdf e-manual of simple yet thorough explanations on how to use the Excel Solver to solve today’s most widely known optimization problems. Loaded with screen shots that are coupled with easy-to-follow instructions, this book will simplify many difficult optimization problems and make you a master of the Excel Solver almost immediately.

Here are just some of the Solver optimization problems that are solved completely with simple-to-understand instructions and screen shots in this e-manual:

• The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use of the Excel INDEX function.

• The well-known “Knapsack Problem” which shows how optimize the use of limited space while satisfying numerous other criteria.

• How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear solving method.

• How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.

• Portfolio optimization to maximize return or minimize risk.

• Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.

• How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to multiple locations.

• How to optimize the selection of different production machine to minimize cost while fulfilling an order.

• How to optimally allocate a marketing budget to generate the greatest reach and frequency or number of inbound leads at the lowest cost.

Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all of the Solver’s settings for total custom use. This e-manual also provides lots of inside advice and guidance on setting up the model in Excel so that it will be as simple and intuitive as possible to work with.

All of the optimization problems in this book are solved step-by-step using a 6-step process that works every time. In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization problem in the book, a link is provided to download an Excel workbook that has all problems completed exactly as they are in this e-manual.

Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be optimizing at an advanced level with the Excel Solver quickly.

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

Immediate, Absolute, No-Questions-Asked, Money-Back Guarantee If Not TOTALLY, 100% Satisfied. In Other Words, If Any Excel Master Series eManual That You've Purchased Here Does Not Provide Instructions That Are CRYSTAL CLEAR and EASY TO UNDERSTAND, You Get All Of Your Money Back Immediately and Keep the eManual. Guaranteed!

Meet The Author

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

8 comments:

AnonymousApril 28, 2016 at 12:30 PM
Excellent page - good job
AnonymousMay 10, 2016 at 8:19 AM
Excellent stuff
duncanwilMay 26, 2016 at 9:23 PM
Very Good. I followed a method and it worked. I applied that method to another problem: it failed to work. Your approach is a little different but it works! Thank you.
UnknownDecember 7, 2016 at 5:20 AM
you can share with me this file excel
UnknownFebruary 15, 2017 at 7:47 PM
Thanks for the post, i found it useful
AnonymousJuly 21, 2018 at 2:14 PM
One of the best descriptions and walk throughs I have read! Great points in both the usage and theory behind it. Thank you for your tremendous effort!
AnonymousMay 25, 2020 at 5:19 PM
The material looks interesting but none of the images referenced in the tutorial is visible with any browser I have tried so it is a bit hard to follow the steps involved.
Bryson MJune 26, 2024 at 12:58 PM
Great read tthankyou

Sunday, June 1, 2014

Logistic Regression in 7 Steps in Excel 2010 and Excel 2013

Binary Logistic Regression

in 7 Steps in Excel

Logistic Regression Steps in Excel

Logistic Regression Step 1 – Sort the Data

Logistic Regression Step 2 – Calculate a Logit For Each Data Record

Logistic Regression Step 3 – Calculate e^L For Each Data Record

Logistic Regression Step 4 – Calculate P(X) For Each Data Record

Logistic Regression Step 5 – Calculate LL, the Log-Likelihood Function

Logistic Regression Step 6 – Use the Excel Solver to Calculate MLL, the Maximum Log-Likelihood Function

Interpreting Excel Solver Results

Logistic Regression Step 7 – Test the Solver Output By Running Scenarios

8 comments:

Blog Directory

Become an Excel Statistical Master

Become an Excel Statistical Master

Become an Excel Statistical Master

Advanced Statistical and Solver Optimization Consulting Service - Fast and Completed in Excel

Become an Excel Statistical Master

Excel Master Series Blog Directory

Sunday, June 1, 2014

Logistic Regression in 7 Steps in Excel 2010 and Excel 2013

Binary Logistic Regression in 7 Steps in Excel

Logistic Regression Steps in Excel

Logistic Regression Step 1 – Sort the Data

Logistic Regression Step 2 – Calculate a Logit For Each Data Record

Logistic Regression Step 3 – Calculate eL For Each Data Record

Logistic Regression Step 4 – Calculate P(X) For Each Data Record

Logistic Regression Step 5 – Calculate LL, the Log-Likelihood Function

Logistic Regression Step 6 – Use the Excel Solver to Calculate MLL, the Maximum Log-Likelihood Function

Interpreting Excel Solver Results

Logistic Regression Step 7 – Test the Solver Output By Running Scenarios

8 comments:

Blog Directory

Become an Excel Statistical Master

Become an Excel Statistical Master

Become an Excel Statistical Master

Advanced Statistical and Solver Optimization Consulting Service - Fast and Completed in Excel

Become an Excel Statistical Master

Excel Master Series Blog Directory

Binary Logistic Regression

in 7 Steps in Excel

Logistic Regression Step 3 – Calculate e^L For Each Data Record