Tuesday, May 27, 2014

Basics of Multiple Regression in Excel 2010 and Excel 2013

This is one of the following seven articles on Multiple Linear Regression in Excel

Basics of Multiple Regression in Excel 2010 and Excel 2013

Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013

Multiple Linear Regression’s Required Residual Assumptions

Normality Testing of Residuals in Excel 2010 and Excel 2013

Evaluating the Excel Output of Multiple Regression

Estimating the Prediction Interval of Multiple Regression in Excel

Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel

 

Overview of Multiple

Variable Linear Regression

in Excel

Linear regression is a statistical technique used to model the relationship between one or more independent, explanatory variables and a single dependent variable. The linear regression type is classified as Simple Linear Regression if there is only a single explanatory variable. The regression type is classified as Multiple Linear Regression if there is more than one explanatory variable.

 

The Regression Equation

The end result of linear regression is a linear equation that models actual data as closely as possible. This equation is called the Regression Equation. The more linear the relationship is between each of the explanatory variables and the single dependent variable, the more closely the Regression Equation will model the actual data.

In the Regression Equation, the variable Y is usually designated as the single dependent variable. The independent explanatory variables are usually labeled X1, X2, …, Xk.

The Regression Equation for multiple regression appears as follows:

Y = b0 + b1X1 + b2X2 + … + bkXk

The Regression Equation for simple regression appears as follows:

Y = b0 + b1X

b0 is the Y-intercept of the Regression Equation.

b1, b2, ..,, bk are the coefficients of the independent variables.

The most important part of regression analysis is the calculation of b0, b1, b2, ..,, bk in order to be able to construct the Regression Equation

Y = b0 + b1X for simple regression

or

Y = b0 + b1X1 + b2X2 + … + bkXk for multiple regression.

 

Purposes of Linear Regression

Linear regression, both simple and multiple linear regression, generally have two main uses. They are as follows:

1) To quantify the linear relationship between the dependent variable and the independent variable(s) by calculating a regression equation.

2) To quantify how much of the movement or variation of the dependent variable is explained by the independent variable(s).

 

The Inputs For Linear Regression

The input data for linear regression analysis consists of a number of data records each having a single Y (dependent variable) value and one or more X (explanatory independent variable) values. Simple regression has only a single X value. Multiple regression has more than one X (independent) variable for each Y (dependent) variable.

Each data record occupies its own unique row in the regression input. Each data record contains the specific values of the input (independent) X variables that are associated with a specific value of the dependent Y variable shown in that data record.

The input data for multiple regression analysis appear as separate data records on each row as follows:

Y X1 X2 … Xk

4 6 10 …15

5 7 11 …16

6 8 12 …17

7 9 13 …18

8 10 14 …19

Multiple linear regression has more than one X (independent) variable. These independent variables (X’s) known as the explanatory, predictor, or regressor variables. The single dependent variable (Y) is the target or outcome variable.

Multiple linear regression requires that both the dependent variable and the independent variables be continuous. If ordinal data such as a Likert scale is used as a dependent or independent variable, it must be treated as a continuous variable that has equal distance between values. Ordinal data is normally defined data whose order matters but not the differences between values.

 

Null and Alternative Hypotheses

The Null Hypothesis of linear regression states that the coefficient(s) of the independent variable(s) in the regression equation equal(s) zero. The Alternative Hypothesis for linear regression therefore states that these coefficients do not equal zero.

For multiple linear regression this Null Hypothesis is expressed as follows:

H0: b1 = b2 = … = bk = 0

For simple linear regression this Null Hypothesis is expressed as follows:

H0: b1 = 0

b1 is the slope of the regression line for simple regression.

The Alternative Hypothesis, H1, for linear regression states that these coefficients do not equal zero.

The Y Intercept b0 is not included in the Null Hypothesis.

 

X and Y Variables Must Have a

Linear Relationship

Linear regression is a technique that provides accurate information only if a linear relationship exists between the dependent variable and each of the independent variables. Independent variables that do not have a linear relationship with the dependent variable should not be included as inputs. An X-Y scatterplot diagram of between each independent variable and the dependent variable provides a good indication of whether the relationship is linear.

When data are nonlinear, there often two solutions available to allow regression analysis to be performed. They are the following:

1) Transform the nonlinear data to linear using a logarithmic transformation. This will not be discussed in this section.

2) Perform nonlinear regression on the data. One way to do that is to apply curve-fitting software that will calculate the mathematical equation that most closely models the data. Another section in this book will focus on using the Excel Solver to fit a curve to nonlinear data. The least-squares method is the simplest way to do this and will be employed in this section.

 

Do Not Extrapolate Regression

Beyond Existing Data

The major purpose of linear regression is to create a Regression Equation that accurately predicts a Y value based on a new set of independent, explanatory X values. The new set of X values should not contain any X values that are outside of the range of the X values used to create the original regression equation. The following simple example illustrates why a Regression Equation should not be extrapolated beyond the original X values.

 

Example of Why Regression Should Not Be Extrapolated

Imagine that the height of a boy was measured every month from when the boy was one year old until the boy was eighteen years old. The independent, explanatory X variable would the month number (12 months to 216 months) and the dependent y variable would be the height measured in inches. Typically most boys stopped growing in height when they reach their upper teens.

If the Regression Equation was created from the above data and then extrapolated to predict the boy’s height when he reached 50 years of age, the Regression Equation might predict that the boy would be fifteen feet tall.

 

Linear Regression Should Not

Be Done By Hand

Excel provides an excellent data analysis regression tool that can perform simple or multiple regression with equal ease. Doing the calculations by hand would be very tedious and provide lots of opportunities to make a mistake. Excel produces a very detailed output and the regression tool is run. I have recreated all of the simple regression calculations that Excel performs in this chapter. It will probably be clear from viewing this that it is wise to let Excel do the regression calculations. A number of statistics textbooks probably place too much emphasis on teaching the ability to perform the regression equations by hand. In the real world regression analysis would never be done manually.

The best way to understand multiple-variable linear regression is to perform an example in the following blog article.

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment