Tuesday, March 9, 2010

Using Dummy

Independent Variable

Regression in Excel in 7

Steps To Perform Basic

Conjoint Analysis

Overview of Dummy Independent Variable Regression

Dummy independent variable regression is technique that allows linear regression to be performed when one or more of the input independent variables are categorical. Categorical variables cannot act as the input independent variables in a linear regression analysis is their current form as nominal variables. Nominal variables are simply categorical labels that provide no indication of relative value or importance.

The categorical variables can be used as inputs to a linear regression analysis if each categorical variable is converted dummy variables that are binary, i.e., can only take the value of either 1 or 0. The number of binary variables for each choice category will equal the number of choices available for that category.

One dummy variable from each choice category must be discarded as an input for the linear regression analysis. The values of independent variables of a regression should not be predictable based upon the values of other independent variables. Any error called multicollinearity occurs if the values any independent variables can be predicted from the values of any other independent variables.

If one level of each attribute is removed it is not possible to predict the values of the remaining dummy variables of each attribute. It does not matter which dummy variable from each choice category is removed. Removing one level of each attribute does not affect the accuracy of the regression analysis, as will be demonstrated at the end of this article.

The independent variables is a linear regression analysis can be both binary dummy variables and continuous variables. The number of choices for each category should be relatively few or the regression analysis will quickly become unmanageably large as a result of the large number of dummy variables that would be needed for a large number of choices for categories.

Dummy Dependent Variables

Linear regression can be performed if the independent variables are categorical by applying the dummy variable conversion described in this article. Linear regression cannot be performed if the dependent (Y) variable is categorical.

The simplest case of a categorical dependent variable is a binary dependent variable. An example might be an attempt to use independent variables to predict the outcome of a binary event, such as a potential customer making a purchase or not. The technique to be applied in this circumstance is called Binary Logistic Regression. Here is a link to a series of articles in this blog which explain how this technique can be performed in Excel:

http://blog.excelmasterseries.com/2014/06/logistic-regression-overview.html

Overview of Conjoint Marketing Analysis

Conjoint analysis is a statistical technique employed by market research to create an equation that can be used to predict the degree of preference that people have for different combinations of product attributes. Conjoint analysis also enables market researchers to determine the relative level of importance that consumers on attribute choice categories and on the individual choices available in each category.

A product can be described by the attribute choices available to the consumer. At its most basic level conjoint analysis requires that a test subject assign a preference rating to each of all of the possible combinations of attribute choices available for a product. The preference rating scale goes from 1 (lowest preference) to 10 (highest preference).

The information obtained from this consumer test can be directly analyzed with linear regression if the categorical choices are converted to binary dummy variables. The resulting binary dummy variables can be used part of the set of input independent variables.

The output of this linear regression analysis is a regression equation that can be used to predict the test respondent’s preference rating for any combination of attribute choices. The coefficients of the regression equation indicate the relative degree of importance that the test respondent places on each of the attribute choices.

The following describes the 7-step process of using dummy independent variable regression to perform a very basic Conjoint analysis:

Step 1 – List All Attributes

List all of the available choices that a consumer has for one product. Starts by listing all of the overall attribute categories. In this case the attribute categories are brand, color, and price. Lists all of the available choices within each attribute category as follows:

Step 2 – List All Possible Combinations of Attributes

Every possible combination of attributes should be listed. In actual Conjoint Analysis each unique combination of attributed is place on a separate card.

Step 3 – Rate All Combinations

The test subject will then rate each combination on a scale of preference from1 to 10 with 10 being the most desirable. Placing each unique combination on a separate card facilitates the rating process.

Step 4 – Create Dummy Variables

In this step the categorical variables are converted to binary variables that can now as inputs to a linear regression analysis. Each level of each attribute will have its own binary dummy variable as shown below. The number of binary dummy variables for each attribute category will equal the number of choices available for that category. For example, there are three choices of brands with each choice being assigned to a single, binary dummy variable.

One dummy variable from each attribute category should be removed from the analysis. The values of independent variables of a regression should not be predictable based upon the values of other independent variables. Any error called multicollinearity occurs if the values any independent variables can be predicted from the values of any other independent variables.

The following are the listing of binary dummy variables for each of the attribute choice categories.

Step 5 – Arrange Data For Regression Analysis

The remaining dummy variables are input into the regression analysis as the independent variables while the preference rating is input as the dependent variable. Each record of data includes the binary dummy variables and preference rating from one of the cards. The data is arranged as follows:

Step 6 – Perform Regression in Excel

The Excel Regression dialogue box is then completed as follows:

(Click On Image To See a Larger Version)

Step 7 – Analyze Regression Output

The Excel regression output appears as follows:

(Click On Image To See a Larger Version)

The most important parts of the output are highlighted in the output and described as follows:

The regression equation is calculated to be the following:

Preference Rating = 5.61 + 1.67*(Brand B) + 3.5*(Brand C) + 1.33*(Blue) – 2.17*($100) – 4.17*($150)

The value of each of the dummy variables is either 1 or 0 from the input data for each data record.
The relatively high R Square, 0.87, indicates that the regression equation is a good predictor of Preference Rating. Approximately 87 percent of the variance of the Preference Rating is explained by the input variables.
The low Significance of F (which is a p Value) indicates that the overall regression equation is significant with a high degree of validity.
The low p Value for the Intercept and coefficients indicates that is significant with a high degree of validity.

Confirming the Validity of the Dummy Variable Regression Analysis Step

Plugging the values of the input independent variables for each data record creates the following comparison between the actual Preference Ratings given by the test subject and the Predicted Preference Ratings using the regression equation. The dummy variable regression analysis is seen to be relatively accurate. The removal of one dummy variable for each attribute choice category did not adversely affect the accuracy of the analysis.

The effect of removing a single dummy variable for each attribute choice category was to simply assign the value of 0 to coefficient that would be represented that dummy variable in the overall regression equation. The other coefficients have values relative to that value of 0.

The regression equation is shown by the Excel regression output to be the following:

Preference Rating = 5.61 + 1.67*(Brand B) + 3.5*(Brand C) + 1.33*(Blue) – 2.17*($100) – 4.17*($150)

If the dummy variables that were removed from the analysis would added back to the regression equation, the resulting equation would be the following:

Preference Rating = 5.61 + 0*(Brand A) + 1.67*(Brand B) + 3.5*(Brand C) + 0*(Red) + 1.33*(Blue) + 0*($50) – 2.17*($100) – 4.17*($150)

Both of the above regression equations would produce the same calculation of predicted Preference Rating.

The following image calculates the difference between the test respondent’s actual preference ratings for each combination and the preference ratings predicted by the regression equation.

(Click On Image To See a Larger Version)

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Technorati Tags: hosmer-lemeshow,goodness-of-fit,chi-square,logistic regression,excel,excel 2010,excel 2013,statistics,excel solver,optimization

Become an Excel Statistical Master

Excel Master Series - MBA-level statistics - Over 1,100+ Pages of Easy-To-Follow Instructions in Excel

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

Step-By-Step Optimization With Excel Solver

What's In It?

For anyone who wants to be operating at a high level with the Excel Solver quickly, this is the book for you. Step-By-Step Optimization With Excel Solver is a 200+ page .pdf e-manual of simple yet thorough explanations on how to use the Excel Solver to solve today’s most widely known optimization problems. Loaded with screen shots that are coupled with easy-to-follow instructions, this book will simplify many difficult optimization problems and make you a master of the Excel Solver almost immediately.

Here are just some of the Solver optimization problems that are solved completely with simple-to-understand instructions and screen shots in this e-manual:

• The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use of the Excel INDEX function.

• The well-known “Knapsack Problem” which shows how optimize the use of limited space while satisfying numerous other criteria.

• How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear solving method.

• How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.

• Portfolio optimization to maximize return or minimize risk.

• Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.

• How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to multiple locations.

• How to optimize the selection of different production machine to minimize cost while fulfilling an order.

• How to optimally allocate a marketing budget to generate the greatest reach and frequency or number of inbound leads at the lowest cost.

Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all of the Solver’s settings for total custom use. This e-manual also provides lots of inside advice and guidance on setting up the model in Excel so that it will be as simple and intuitive as possible to work with.

All of the optimization problems in this book are solved step-by-step using a 6-step process that works every time. In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization problem in the book, a link is provided to download an Excel workbook that has all problems completed exactly as they are in this e-manual.

Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be optimizing at an advanced level with the Excel Solver quickly.

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

Immediate, Absolute, No-Questions-Asked, Money-Back Guarantee If Not TOTALLY, 100% Satisfied. In Other Words, If Any Excel Master Series eManual That You've Purchased Here Does Not Provide Instructions That Are CRYSTAL CLEAR and EASY TO UNDERSTAND, You Get All Of Your Money Back Immediately and Keep the eManual. Guaranteed!

Meet The Author