Wednesday, April 28, 2010

Logistic Regression - How To Predict If a Prospect Will Buy Using Logistic Regression in Excel

Logistic Regression

Analysis in Excel

For Marketing



Wouldn’t it be great if there was a more accurate way to predict whether your prospect will buy rather than just taking an educated guess? Well, there is…if you have enough data on your previous prospects. The tool that makes this possible is called Logistic Regression and can be easily implemented in Excel.




Customer Quality Scores Are Created With Logistic Regression


Marketers use Logistic Regression to rank their prospects with a quality score which indicates that prospect’s likelihood to buy. The more data you’ve collected from previous prospects, the more accurately you’ll be able to use Logistic Regression in Excel to calculate your new prospect’s probability of purchasing.


Here is a video which will show you how to perform Logistic Regression in Excel and why it works. The example that will be presented in the video will also be covered below in the article:




Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in Excel:
(Is Your Sound Turned On?)




What is Logistic Regression?


Logistic Regression calculates the probability of the event occurring, such as the purchase of a product. In general, the thing being predicted in a Regression equation is represented by the dependent variable or output variable and is usually labeled as the Y variable in the Regression equation. In the case of Logistic Regression, this “Y” is binary. In other words, the output or dependent variable can only take the values of 1 or 0. The predicted event either occurs or it doesn’t occur – your prospect either will buy or won’t buy. Occasionally this type of output variable also referred to as a Dummy Dependent Variable.




An Example of Logistic Regression In Action


Here is a marketing example showing how Logistic Regression works. The embedded video walks through this example in Excel as well:


Suppose that you have collected three pieces of data on each of your previous prospects. The data you have collected on each prospect was:


1) The prospect’s age
2) The prospect’s gender (1 = Male and 0 = Female)
3) Whether the prospect purchased or not (Did purchase Y = 1, Did not purchase, Y = 0).
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel






Create the Predictive Equation


With the above data, you could create a predictive equation that would calculate a new prospect’s probability of purchasing by inputting this new prospect’s age and gender. This predictive equation will be in the form of:


P(X) = eL/ (1+eL)


P(X) represents the possibility of event X occurring.




The Logit


Event X is a purchase. In other words, P(X) is the probability that Y = 1.


P(X) has only one variable. That is L, which is called the Logit.


The Logit, L = Constant + A * Age + B * Gender


L, the Logit, has 3 variables: Constant, A, and B. They must be known before P(X) can be calculated. Those 3 variables can be found in Excel by using the Excel Solver. The Excel Solver will find the optimal combination of those 3 variables that causes the resulting P(X) to most accurately predict whether Y = 1 or 0 for all previous prospects.




regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel
Click On Image To See Larger Version



Calculating the Logit Variables - A, B, and Constant


Here’s how the most optimal set of Logit variables (Constant, A, and B) are found in Excel:


Using Excel, each recorded prospect has the following calculation performed:


P(X)Y * [ 1 - P(X) ](1-Y)


The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didn’t buy.


The P(X) is the probability of purchase that will be calculated using the equation listed above. In Excel, the P(X) calculation is initially performed by the Excel Solver using Logit variables (Constant, A, and B) which are not optimal. The Excel Solver will then continuously try new combinations of these variables until the optimal P(X) is found.

Optimizing the Logit Variables in the Excel Solver


Here’s how the Excel Solver knows when it has found the correct combinations of these 3 variables so that the resulting P(X) equation most accurately predicts whether Y = 1 or 0:


The equation P(X )Y * [ 1 - P(X) ](1-Y) is maximized when P(X) is most accurate. It approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1 and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0 and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0.


Each prospect has a separate P(X )Y * [ 1 - P(X) ](1-Y) value calculated for him or her.


The sum of each P(X )Y * [ 1 - P(X) ](1-Y) calculation for all prospects is taken.


The only variables that exist when calculating P(X )Y * [ 1 - P(X) ](1-Y) are Y and the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these variable are adjusted until their values maximize the sum of all P(X )Y * [ 1 - P(X) ](1-Y).


regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




The Final, Most Accurate Predictive Equation


When the sum of P(X )Y * [ 1 - P(X) ](1-Y) is maximized, then the final resulting P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.


regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel
Click On Image To See Larger Version



The Excel Solver Dialogue Box
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel


Stated another way, we now have a predictive equation P(X ) which uses the optimal combination of Constant, A, and B which most accurately calculates the probability that Y = 1 given a prospect’s age and gender.


The embedded video provides a clear picture of all of this in action in Excel.


The use of the Excel Solver does require some hand-tweeking to ensure that the most accurate answer is obtained. The video shows an example of this. Ultimately what the Solver is doing is adjusting variables Constant, A, and B to maximize the sum of the column of


P(X )Y * [ 1 - P(X) ](1-Y) equations. The answer obtained by the Solver should maximize that sum and provide realistic answers for the probabilities of each prospect, including the new one.






You'll Have To Tweek the Constraints in the Excel Solver


You’ll probably find that you have to experiment by applying constraints to the variables that Solver is adjusting in order to maximize the target sum. The variables that Solver adjusts are called Decision Variables. Solver allows you to create constraints on the value of any Decision Variable.




Adding a Constraint to the Solver
regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel




In the video, you will be able to watch how a Decision Variable is constrained to make the final answer more accurate. The Decision Variable called Constant was constrained to always remain above -25 during the Solver analysis. This resulted in the most accurate and realistic maximization of the sum of the P(X )Y * [ 1 - P(X) ](1-Y) equations.




Conclusion - Incredible Predictor but Not the Simplest Analysis


Logistic Regression is not the simplest type of analysis to understand or perform. Hopefully this article and video have provided a much clearer picture for you.

 regression, logistic regression, logit regression, logit, regression model, logit model, logistic regression spss, logistic regression models, statistical analysis in excel





If you have any comments, questions, suggestions regarding the use of Logistic Regression, your input is welcome and appreciated.



If You Like This, Then Share It...
Dig this Stumble upon Delicious Technorati Reddit Buzz it Twitthis

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Thursday, April 22, 2010

ANOVA - How To Improve Your PPC Marketing Using All 3 Types of ANOVA in Excel

ANOVA Test

Done in Excel

To Improve Marketing


The 3 ANOVA functions in Excel can be real power tools for a marketer, if you know how to use them. Let’s take a look at what they can do for you. Below is a video that will show in detail how to use all 3 types of ANOVA that are built into Excel in order to improve your pay-per-click marketing campaigns. In the video, each of the 3 ANOVA types will be applied to the same data set. The video will demonstrate how the data is input, how the ANOVA is run, and how the output is interpretted for each ANOVA type. The article below also discusses the same information. Here is the video:


Step-By-Step Video Showing How To Improve Your Pay-Per-Click Marketing Using All 3 Types of ANOVA Built Into Excel
(Is Your Sound Turned On?)


What Is ANOVA?

Marketers generally use ANOVA to tell them whether changing one element of a marketing campaign actually had an effect on the outcome. ANOVA testing should be applied only when that element being tested has at least three variations. For example, suppose you wanted to test whether a product’s color makes a difference in sales. To answer that question using ANOVA, your product would have to come in at least three colors to run an ANOVA test.

ANOVA, Analysis of Variance, tells the marketer whether or not the test output had enough variance in it to support the claim that varying the tested element actually did make a difference in the output.


ANOVA Output

As is common in most statistical problems, ANOVA’s output does not provide a definitive Yes or No answer but gives the probability that varying an input had an effect on the output. In ANOVA, this probability is called the Degree of Certainty. ANOVA testing requires a Degree of Certainty to be specified up front.

The ANOVA test answers the question of whether varying an element affected the output within a required degree of certainty. The output of an ANOVA might be explained as follows, “Yes, the variance observed in the output was large enough to conclude within 95% certainty that varying the element being tested did affect the output. We can reject the Null Hypothesis which stated that varying the tested element did not have effect on the output.”




The 3 Types of ANOVA Built Into Excel

The first type is called Single-Factor ANOVA. This is used to test only one input element. This is how the data must be arranged on the Excel spread sheet for this type analysis. The yellow-highlighted cells below are those that are selected as the input for Single-Factor ANOVA in Excel.

Excel Data Ready For Input - Single-Factor ANOVA
(Yellow-Highlighted Cells Are the Input Range)
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View

Excel Output of Single-Factor ANOVA
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View



The second type is called Two-Factor ANOVA Without Replication and tests two elements simultaneously. This is how the data must be arranged on the Excel spread sheet for this type analysis. The yellow-highlighted cells below are those that are selected as the input for Single-Factor ANOVA in Excel.


Excel Data Ready for Input
Two-Factor ANOVA Without Replication
(Yellow-Highlighted Cells Are the Input Range)
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View

Excel Output of
Two-Factor ANOVA Without Replication
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View



The third and final type of built-in ANOVA is called Two-Factor ANOVA With Replication. This ANOVA test simply replicates a two-factor test in at least two places in order to test whether interaction between the two factors also has an affect on the output.


Excel Data Ready for Input
Two-Factor ANOVA With Replication
(Yellow-Highlighted Cells Are the Input Range)
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View




Excel Output of
Two-Factor ANOVA With Replication
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View

In Excel, we are going to conduct one experiment and run the output of that experiment through all three types of ANOVA so you can observe the differences in how the data is inserted and how the Excel output is interpreted in each case. The above video shows the output of this experiment being run through all 3 ANOVA tests in Excel and how the different ANOVA outputs are interpreted.





The ANOVA Test We Ran


In a nutshell, the test is conducted as follows: we are testing ads in a pay-per-click campaign to determine whether varying the headline, ad text, or interaction between headline and ad text affects the click-through rate. Click-Through Rate (CTR) of an ad equals the number of ad impressions divided by the number of clicks that the ad generated.

We will use Excel to apply all three types of ANOVA testing to the same set of output data so that we can observe the differences in how data needs to be input into Excel for each ANOVA type and how the Excel output is interpreted for each case.

Single-Factor ANOVA will be applied to the output to determine if the varying the headlines affected CTR. Two-Factor ANOVA Without Replication will applied to the output to determine if varying headlines OR varying the ad text affected CTR. Two-Factor ANOVA With Replication will be applied to the output to determine whether variation in headlines OR ad text OR interaction between headlines and ad text affected CTR.

Once again. viewing the embedded video is probably the easiest way to quickly understand how data is input into Excel and how the Excel output is interpreted for each test.

In the video, all three types of Excel ANOVA are applied to the same set of output data.


Here Is a Summary of the Test That Was Run:


Step 1) Create 5 Sets of Ad Text For 1 Ad Group

5 different sets of ad text were created to be used within a single ad group. Ads within a single ad group should be tightly focused on a single theme, product, and landing page. We tried as much as possible for that to be the case here. One key to successful ANOVA testing to remove all other variation from the test except the specific elements being tested.


Step 2) Create 3 Sets of Headlines for That Ad Group

3 sets of headlines were created to be used in this ad group and were then combined with each of the 5 sets of ad text. This created a total of 15 possible combinations of ad text / headline.


Step 3) Run All 15 Ad Text / Headline Combinations In 1 Ad Group


Each of these 15 combinations of ad text / headline were run on both the Google and Yahoo pay-per-click search networks under similar conditions until 1 million ad impressions on each network were reached for each of the 15 ad text / headline combinations.


Step 4) Prepare the Test Results For Excel ANOVA

The output of this test was recorded as shown in the video. The linked video shows how output data must be recorded for insertion into Excel ANOVA. Each type of ANOVA in Excel requires the input data to be formatted slightly differently.


Step 5) Run All 3 Types of ANOVA On Test Results

Single-Factor ANOVA was applied to the output from Google to determine within 95% certainty whether varying the headline affects the output. Two-Factor ANOVA Without Replication was applied to the Google data to determine whether varying headline or ad text affects the output. Finally, Two-Factor ANOVA With Replication was applied to the output from both Google and Yahoo to determine whether varying headlines, ad text, or the interaction between headlines and ad text affect the output.


Step 6) Evaluate the Excel Output For Each ANOVA

The Excel output from each ANOVA test run in Excel was then interpreted. Each element being tested (headlines, ad text, and interaction) will have a separate P Value that will appear in the Excel output. The P Value for each tested element can be interpreted as being the probability of the observed variance in the output was due only to chance and not the result of varying the tested element.



Here is the output for the Single-Factor ANOVA test we ran:
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View

Note that the P Value associated with Headlines = 0.001307, less that the Alpha of 0.05. We can therefore reject the Headlines Null Hypothesis and state with 95% certainty that varying the Headlines affected Click-Through Rate.



Here is the output for the Two-Factor ANOVA Without Replication test we ran:
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View

Note that the P Value associated with Headlines (in Blue) is 0.00528, which is less that the Alpha of 0.05. However, the P Value associated with Ad Text (in yellow) is 0.627, which is greater than the Alpha of 0.05.

We can therefore reject the Headlines Null Hypothesis and state with 95% certainty that varying the Headlines affected the CTR. We cannot, howver, reject the Ad Text Null Hypothesis and we cannot state with 95% certainty that varying the Ad Text affected CTR.



Here is the output for the Two-Factor ANOVA With Replication test we ran:
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel
Click On Image To See Enlarged View


Note that the P Value associated with Headlines (in Blue) is 1.18 E-06 which is less that the Alpha of 0.05. The P Value associated with Interaction Between Headlines and Ad Text (in Green) is 0.0402 which is less that the Alpha of 0.05. However, the P Value associated with Ad Text (in yellow) is 0.156, which is greater than the Alpha of 0.05.


We can therefore reject the Headlines Null Hypothesis and state with 95% certainty that varying the Headlines affected the CTR. We can also reject the Interaction Null Hypothesis and state with 95% certainty that varying the Interaction Between Headlines and Ad Text affected the CTR.   We cannot, howver, reject the Ad Text Null Hypothesis and we cannot state with 95% certainty that varying the Ad Text affected CTR.




Example of How To Interpret the P Value

For example, if the P Value associated with varying the headline has a value of 0.010, it means that there is only a 1% chance that the variance in the output could have occurred entirely by random chance and not by varying the headline. In other words, there is a 99% chance the varying the headline affected the output. If the degree of certainty that you required for this ANOVA test was 95%, then alpha equals 0.05 (1 – 95% = 0.05).

In the case that the P Value associated with headline equals 0.01, which is smaller than the alpha of 0.05, you can state that you are at least 95% certain that you can reject the Null Hypothesis which states that varying the headline had no effect on the output.


The Correct Interpretation of the ANOVA Test

The correct interpretation of an ANOVA is to state whether or not you can reject the Null Hypothesis (varying the element did not affect the output), not its corollary that you can now accept the Alternate Hypothesis (varying the element did affect the output).


The Big "However"....

However, we’re in business and not in statistics class, so if the P Value associated with an element is less than alpha, go ahead and state that varying the tested element did affect output within your required degree of certainty.

Once again, the embedded video above which will show you exactly how to setup, run, and interpret all three Excel ANOVA tests on the same set of pay-per-click ad data.

In a Nutshell, Use ANOVA in Excel To Find Out If Changes You Made To Your Marketing Campaigns Really Made a Difference.
anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel



Please feel free to submit any comments you have on this article. Your input is highly valued!



If You Like This, Then Share It...
Dig this anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel anova, analysis of variance, anova testing, one way anova, anova test, 2 way anova, anova spss, two factor anova, anova two way, anova analysis, anova assumption, statistical analysis in excel

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic