Showing posts with label regression excel. Show all posts
Showing posts with label regression excel. Show all posts

Wednesday, April 28, 2010

Logistic Regression - How To Predict If a Prospect Will Buy Using Logistic Regression in Excel

Logistic Regression

Analysis in Excel

For Marketing



Wouldn’t it be great if there was a more accurate way to predict whether your prospect will buy rather than just taking an educated guess? Well, there is…if you have enough data on your previous prospects. The tool that makes this possible is called Logistic Regression and can be easily implemented in Excel.




Customer Quality Scores Are Created With Logistic Regression


Marketers use Logistic Regression to rank their prospects with a quality score which indicates that prospect’s likelihood to buy. The more data you’ve collected from previous prospects, the more accurately you’ll be able to use Logistic Regression in Excel to calculate your new prospect’s probability of purchasing.


Here is a video which will show you how to perform Logistic Regression in Excel and why it works. The example that will be presented in the video will also be covered below in the article:




Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in Excel:
(Is Your Sound Turned On?)




What is Logistic Regression?


Logistic Regression calculates the probability of the event occurring, such as the purchase of a product. In general, the thing being predicted in a Regression equation is represented by the dependent variable or output variable and is usually labeled as the Y variable in the Regression equation. In the case of Logistic Regression, this “Y” is binary. In other words, the output or dependent variable can only take the values of 1 or 0. The predicted event either occurs or it doesn’t occur – your prospect either will buy or won’t buy. Occasionally this type of output variable also referred to as a Dummy Dependent Variable.




An Example of Logistic Regression In Action


Here is a marketing example showing how Logistic Regression works. The embedded video walks through this example in Excel as well:


Suppose that you have collected three pieces of data on each of your previous prospects. The data you have collected on each prospect was:


1) The prospect’s age
2) The prospect’s gender (1 = Male and 0 = Female)
3) Whether the prospect purchased or not (Did purchase Y = 1, Did not purchase, Y = 0).
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel






Create the Predictive Equation


With the above data, you could create a predictive equation that would calculate a new prospect’s probability of purchasing by inputting this new prospect’s age and gender. This predictive equation will be in the form of:


P(X) = eL/ (1+eL)


P(X) represents the possibility of event X occurring.




The Logit


Event X is a purchase. In other words, P(X) is the probability that Y = 1.


P(X) has only one variable. That is L, which is called the Logit.


The Logit, L = Constant + A * Age + B * Gender


L, the Logit, has 3 variables: Constant, A, and B. They must be known before P(X) can be calculated. Those 3 variables can be found in Excel by using the Excel Solver. The Excel Solver will find the optimal combination of those 3 variables that causes the resulting P(X) to most accurately predict whether Y = 1 or 0 for all previous prospects.




regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel
Click On Image To See Larger Version



Calculating the Logit Variables - A, B, and Constant


Here’s how the most optimal set of Logit variables (Constant, A, and B) are found in Excel:


Using Excel, each recorded prospect has the following calculation performed:


P(X)Y * [ 1 - P(X) ](1-Y)


The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didn’t buy.


The P(X) is the probability of purchase that will be calculated using the equation listed above. In Excel, the P(X) calculation is initially performed by the Excel Solver using Logit variables (Constant, A, and B) which are not optimal. The Excel Solver will then continuously try new combinations of these variables until the optimal P(X) is found.

Optimizing the Logit Variables in the Excel Solver


Here’s how the Excel Solver knows when it has found the correct combinations of these 3 variables so that the resulting P(X) equation most accurately predicts whether Y = 1 or 0:


The equation P(X )Y * [ 1 - P(X) ](1-Y) is maximized when P(X) is most accurate. It approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1 and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0 and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0.


Each prospect has a separate P(X )Y * [ 1 - P(X) ](1-Y) value calculated for him or her.


The sum of each P(X )Y * [ 1 - P(X) ](1-Y) calculation for all prospects is taken.


The only variables that exist when calculating P(X )Y * [ 1 - P(X) ](1-Y) are Y and the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these variable are adjusted until their values maximize the sum of all P(X )Y * [ 1 - P(X) ](1-Y).


regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




The Final, Most Accurate Predictive Equation


When the sum of P(X )Y * [ 1 - P(X) ](1-Y) is maximized, then the final resulting P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.


regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel
Click On Image To See Larger Version



The Excel Solver Dialogue Box
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel


Stated another way, we now have a predictive equation P(X ) which uses the optimal combination of Constant, A, and B which most accurately calculates the probability that Y = 1 given a prospect’s age and gender.


The embedded video provides a clear picture of all of this in action in Excel.


The use of the Excel Solver does require some hand-tweeking to ensure that the most accurate answer is obtained. The video shows an example of this. Ultimately what the Solver is doing is adjusting variables Constant, A, and B to maximize the sum of the column of


P(X )Y * [ 1 - P(X) ](1-Y) equations. The answer obtained by the Solver should maximize that sum and provide realistic answers for the probabilities of each prospect, including the new one.






You'll Have To Tweek the Constraints in the Excel Solver


You’ll probably find that you have to experiment by applying constraints to the variables that Solver is adjusting in order to maximize the target sum. The variables that Solver adjusts are called Decision Variables. Solver allows you to create constraints on the value of any Decision Variable.




Adding a Constraint to the Solver
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




In the video, you will be able to watch how a Decision Variable is constrained to make the final answer more accurate. The Decision Variable called Constant was constrained to always remain above -25 during the Solver analysis. This resulted in the most accurate and realistic maximization of the sum of the P(X )Y * [ 1 - P(X) ](1-Y) equations.




Conclusion - Incredible Predictor but Not the Simplest Analysis


Logistic Regression is not the simplest type of analysis to understand or perform. Hopefully this article and video have provided a much clearer picture for you.

 regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel





If you have any comments, questions, suggestions regarding the use of Logistic Regression, your input is welcome and appreciated.



If You Like This, Then Share It...
Dig this Stumble upon Delicious Technorati Reddit Buzz it Twitthis

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Wednesday, March 24, 2010

Regression - The Two Crucial Steps to Excel Regression That Most People Skip

Regression Analysis

Done in Excel

2 Most Important Steps



Running a Regression in Excel is fairly easy. So is running one incorrectly. There are two crucial steps that should always be performed on the data before any Regression should be run. Fortunately these two steps are very quick and easy to do in Excel. They are:

1) Graph the Data
2) Run Correlation Analysis On All Variables

Here is a video of this article showing how to perform all four steps to Regression in Excel, including the above two crucial steps at the beginning:


Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel, Including the 2 Crucial Initial Steps That No One Does, But Should

(Is Your Sound Turned On?)




Why You Need To Run The 2 Crucial Steps Before Doing Regression


Here’s why you need to run the two crucial steps prior to regressing any data in Excel:



Crucial Step 1) Graphing the Data

Whether or not you are using Excel to run a Regression, you should always graph the data before doing anything else. Eyeballing the data will allow you to quickly determine whether there is any relationship between the independent (input) variables and the dependent (output) variable. You also want to evaluate whether the graph generally appears to be linear or possibly quadratic. Excel’s Regression Tool works well only for reasonably linear data. Eyeballing the data upfront will tell you very quickly whether Excel’s Linear Regression is the right tool for the job.


Graphing The Data To Check If It Is Linear
regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View

The input and output variables will be graphed together. The y-axis of the chart will provide the scale for plotting of those values. The x-axis will provide a measure of whatever continuum was used, e.g. time, to collect the values of all of the variables. Excel’s charting function is the way to go here. The above linked video shows exactly how to chart all the data in Excel.



Crucial Step 2) Running Correlation Analysis on All Variables Simultaneously

There are two good reasons for doing this. First, we want to remove any input variables which are clearly not good predictors of the output variable. Second, we want to make sure that none of the input variables have a high correlation with (are good predictors of) other input variables.

Running Correlation Analysis on the Data To Prevent Collinearity and also To Remove Input Variables That Have Low Correlation With the Output Variable
regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View

Correlation of multiple variables is easily done in Excel using the Correlation Data Analysis tool. The linked video shows exactly how to do that.


Remove Input Variables That Have Low Correlation With Output Variable

After you have run Correlation Analysis on the data, you will want to remove any input variables that have a low correlation with the output variable. A Correlation Coefficient of with an absolute value of less than 0.4 (between -0.4 and +0.4) between the output variable and an input variable indicates that the input variable is not a good predictor of the output. That input variable should be removed from the Regression Analysis. The attached video provides an example of this.



Data Columns Before Removing Input Variable With Low Correlation To Output

regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View


Data Columns After Removing Input Variables With Low Correlation To Output

regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View



Remove Inputs Variables Highly Correlated With Other Input Variables

After looking at the Correlation Coefficients between the input and output variables, look at the Correlation Coefficients between the input variables themselves. You do not want to use pairs of input variables that are good predictors of each other in a Regression. This will cause a Regression error known as Collinearity or Multicollinearity. One variable from any pair of highly-correlated input variables should be removed prior to running the Regression Analysis. Variables can be considered highly-Correlated if the absolute value of their Correlation Coefficient is greater the 0.7 (greater than +0.7 or less than -0.7).



Adding New Input Variables To The Regression Analysis

Here are a few hints about adding new input variables to a Regression Analysis. First, build up a Regression by starting with a small number of input variables and add any new ones one at a time. Second, good new input variables noticeably increase Adjusted R Square and also lower Standard Error without significantly changing the existing Regression Coefficients.

When you are satisfied with the output of the data graph and the Correlation Analysis, go ahead and run the Regression with Excel. An example of how to do this is shown in the above video.


The Excel Regression Dialogue Box
regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View




The final step of Excel Regression is Analysis of the Excel output. Here is a link to another video which shows you how to quickly read the most important parts of the Excel Regression output: http://bit.ly/Quickly-Understanding-Excel-Regression-Output



Excel Regression Output With Color Coding Added
regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel
Click On Image To See Enlarged View



Conclusion - Plotting the Data and Running Correlation Can Be BIG Time Savers

Plotting the data and running Correlation Analysis prior to running a Regression can save you lots of time that you might otherwise have to spend making adjustments to your Regression after running it.



regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel



If you have any comments about this article, feel free to post them right here. Your input and opinions are highly valued!



If You Like This, Then Share It...
Dig this Stumble upon regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel regression, multiple regression, regression model, regression excel, regression analysis, multiple regression, regression coefficient, statistical analysis in excel

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic