# Logistic Regression

Analysis in Excel

For Marketing

Wouldn’t it be great if there was a more accurate way to predict whether your prospect will buy rather than just taking an educated guess? Well, there is…if you have enough data on your previous prospects. The tool that makes this possible is called Logistic Regression and can be easily implemented in Excel.

**Customer Quality Scores Are Created With Logistic Regression**

Marketers use Logistic Regression to rank their prospects with a quality score which indicates that prospect’s likelihood to buy. The more data you’ve collected from previous prospects, the more accurately you’ll be able to use Logistic Regression in Excel to calculate your new prospect’s probability of purchasing.

Here is a video which will show you how to perform Logistic Regression in Excel and why it works. The example that will be presented in the video will also be covered below in the article:

**Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in Excel:**

*(Is Your Sound Turned On?)*

**What is Logistic Regression?**

Logistic Regression calculates the probability of the event occurring, such as the purchase of a product. In general, the thing being predicted in a Regression equation is represented by the dependent variable or output variable and is usually labeled as the Y variable in the Regression equation. In the case of Logistic Regression, this “Y” is binary. In other words, the output or dependent variable can only take the values of 1 or 0. The predicted event either occurs or it doesn’t occur – your prospect either will buy or won’t buy. Occasionally this type of output variable also referred to as a Dummy Dependent Variable.

**An Example of Logistic Regression In Action**

Here is a marketing example showing how Logistic Regression works. The embedded video walks through this example in Excel as well:

Suppose that you have collected three pieces of data on each of your previous prospects. The data you have collected on each prospect was:

1) The prospect’s age

2) The prospect’s gender (1 = Male and 0 = Female)

3) Whether the prospect purchased or not (Did purchase Y = 1, Did not purchase, Y = 0).

**Create the Predictive Equation**

With the above data, you could create a predictive equation that would calculate a new prospect’s probability of purchasing by inputting this new prospect’s age and gender. This predictive equation will be in the form of:

**P(X) = eL/ (1+eL)**

**P(X)**represents the possibility of event X occurring.

**The Logit**

Event X is a purchase. In other words, P(X) is the probability that Y = 1.

P(X) has only one variable. That is L, which is called the Logit.

**The Logit, L = Constant + A * Age + B * Gender**

L, the Logit, has 3 variables: Constant, A, and B. They must be known before P(X) can be calculated. Those 3 variables can be found in Excel by using the Excel Solver. The Excel Solver will find the optimal combination of those 3 variables that causes the resulting P(X) to most accurately predict whether Y = 1 or 0 for all previous prospects.

Here’s how the most optimal set of Logit variables (Constant, A, and B) are found in Excel:

Using Excel, each recorded prospect has the following calculation performed:

**P(X)Y * [ 1 - P(X) ](1-Y)**

The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didn’t buy.

The P(X) is the probability of purchase that will be calculated using the equation listed above. In Excel, the P(X) calculation is initially performed by the Excel Solver using Logit variables (Constant, A, and B) which are not optimal. The Excel Solver will then continuously try new combinations of these variables until the optimal P(X) is found

**.**

Optimizing the Logit Variables in the Excel Solver

Optimizing the Logit Variables in the Excel Solver

Here’s how the Excel Solver knows when it has found the correct combinations of these 3 variables so that the resulting P(X) equation most accurately predicts whether Y = 1 or 0:

The equation

**P(X )Y * [ 1 - P(X) ](1-Y)**is maximized when P(X) is most accurate. It approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1 and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0 and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0.

Each prospect has a separate

**P(X )Y * [ 1 - P(X) ](1-Y)**value calculated for him or her.

The sum of each

**P(X )Y * [ 1 - P(X) ](1-Y)**calculation for all prospects is taken.

The only variables that exist when calculating

**P(X )Y * [ 1 - P(X) ](1-Y)**are Y and the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these variable are adjusted until their values maximize the sum of all

**P(X )Y * [ 1 - P(X) ](1-Y)**.

**The Final, Most Accurate Predictive Equation**

**P(X )Y * [ 1 - P(X) ](1-Y)**is maximized, then the final resulting P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.

*Click On Image To See Larger Version***The Excel Solver Dialogue Box**

Stated another way, we now have a predictive equation

**P(X )**which uses the optimal combination of Constant, A, and B which most accurately calculates the probability that Y = 1 given a prospect’s age and gender.

The embedded video provides a clear picture of all of this in action in Excel.

The use of the Excel Solver does require some hand-tweeking to ensure that the most accurate answer is obtained. The video shows an example of this. Ultimately what the Solver is doing is adjusting variables Constant, A, and B to maximize the sum of the column of

**P(X )Y * [ 1 - P(X) ](1-Y)**equations. The answer obtained by the Solver should maximize that sum and provide realistic answers for the probabilities of each prospect, including the new one.

**You'll Have To Tweek the Constraints in the Excel Solver**

You’ll probably find that you have to experiment by applying constraints to the variables that Solver is adjusting in order to maximize the target sum. The variables that Solver adjusts are called Decision Variables. Solver allows you to create constraints on the value of any Decision Variable.

**Adding a Constraint to the Solver**

In the video, you will be able to watch how a Decision Variable is constrained to make the final answer more accurate. The Decision Variable called Constant was constrained to always remain above -25 during the Solver analysis. This resulted in the most accurate and realistic maximization of the sum of the

**P(X )Y * [ 1 - P(X) ](1-Y)**equations.

**Conclusion - Incredible Predictor but Not the Simplest Analysis**

Logistic Regression is not the simplest type of analysis to understand or perform. Hopefully this article and video have provided a much clearer picture for you.

If you have any comments, questions, suggestions regarding the use of Logistic Regression, your input is welcome and appreciated.

**If You Like This, Then Share It...**

**Excel Master Series Blog Directory**

Statistical Topics and Articles In Each Topic

- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way

- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- Overview of the 1-Sample t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013

- 2-Independent-Sample Pooled t-Tests in Excel
- Overview of 2-Independent-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2-Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups

- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility

- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative

- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer

- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- Overview of z-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013

- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Simple Linear Regression Example in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013

- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression Example in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel

- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression Performed in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer-Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013

- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA Example in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared (?2) in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi (?) – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared (?2) in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar

- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication

- Two-Factor ANOVA Without Replication in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013

- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013

- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers

- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel

sorry for the duplicate comment

ReplyDeleteThanks so much for such a helpful tutorial!

ReplyDeleteWhy Thanks! Really great to hear that you found it useful :>D

ReplyDeleteCan you please explain how you determined >=-25 constraint in your example?

DeleteMany Thanks!

Is there any way to calculate dispersion statistics, too? Thanks!

ReplyDelete