This is one of the following seven articles on Multiple Linear Regression in Excel
Basics of Multiple Regression in Excel 2010 and Excel 2013
Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
Multiple Linear Regression’s Required Residual Assumptions
Normality Testing of Residuals in Excel 2010 and Excel 2013
Evaluating the Excel Output of Multiple Regression
Estimating the Prediction Interval of Multiple Regression in Excel
Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
Overview of Multiple
Variable Linear Regression
in Excel
Linear regression is a statistical technique used to model the relationship between one or more independent, explanatory variables and a single dependent variable. The linear regression type is classified as Simple Linear Regression if there is only a single explanatory variable. The regression type is classified as Multiple Linear Regression if there is more than one explanatory variable.
The Regression Equation
The end result of linear regression is a linear equation that models actual data as closely as possible. This equation is called the Regression Equation. The more linear the relationship is between each of the explanatory variables and the single dependent variable, the more closely the Regression Equation will model the actual data.
In the Regression Equation, the variable Y is usually designated as the single dependent variable. The independent explanatory variables are usually labeled X1, X2, …, Xk.
The Regression Equation for multiple regression appears as follows:
Y = b0 + b1X1 + b2X2 + … + bkXk
The Regression Equation for simple regression appears as follows:
Y = b0 + b1X
b0 is the Y-intercept of the Regression Equation.
b1, b2, ..,, bk are the coefficients of the independent variables.
The most important part of regression analysis is the calculation of b0, b1, b2, ..,, bk in order to be able to construct the Regression Equation
Y = b0 + b1X for simple regression
or
Y = b0 + b1X1 + b2X2 + … + bkXk for multiple regression.
Purposes of Linear Regression
Linear regression, both simple and multiple linear regression, generally have two main uses. They are as follows:
1) To quantify the linear relationship between the dependent variable and the independent variable(s) by calculating a regression equation.
2) To quantify how much of the movement or variation of the dependent variable is explained by the independent variable(s).
The Inputs For Linear Regression
The input data for linear regression analysis consists of a number of data records each having a single Y (dependent variable) value and one or more X (explanatory independent variable) values. Simple regression has only a single X value. Multiple regression has more than one X (independent) variable for each Y (dependent) variable.
Each data record occupies its own unique row in the regression input. Each data record contains the specific values of the input (independent) X variables that are associated with a specific value of the dependent Y variable shown in that data record.
The input data for multiple regression analysis appear as separate data records on each row as follows:
Y X1 X2 … Xk
4 6 10 …15
5 7 11 …16
6 8 12 …17
7 9 13 …18
8 10 14 …19
Multiple linear regression has more than one X (independent) variable. These independent variables (X’s) known as the explanatory, predictor, or regressor variables. The single dependent variable (Y) is the target or outcome variable.
Multiple linear regression requires that both the dependent variable and the independent variables be continuous. If ordinal data such as a Likert scale is used as a dependent or independent variable, it must be treated as a continuous variable that has equal distance between values. Ordinal data is normally defined data whose order matters but not the differences between values.
Null and Alternative Hypotheses
The Null Hypothesis of linear regression states that the coefficient(s) of the independent variable(s) in the regression equation equal(s) zero. The Alternative Hypothesis for linear regression therefore states that these coefficients do not equal zero.
For multiple linear regression this Null Hypothesis is expressed as follows:
H0: b1 = b2 = … = bk = 0
For simple linear regression this Null Hypothesis is expressed as follows:
H0: b1 = 0
b1 is the slope of the regression line for simple regression.
The Alternative Hypothesis, H1, for linear regression states that these coefficients do not equal zero.
The Y Intercept b0 is not included in the Null Hypothesis.
X and Y Variables Must Have a
Linear Relationship
Linear regression is a technique that provides accurate information only if a linear relationship exists between the dependent variable and each of the independent variables. Independent variables that do not have a linear relationship with the dependent variable should not be included as inputs. An X-Y scatterplot diagram of between each independent variable and the dependent variable provides a good indication of whether the relationship is linear.
When data are nonlinear, there often two solutions available to allow regression analysis to be performed. They are the following:
1) Transform the nonlinear data to linear using a logarithmic transformation. This will not be discussed in this section.
2) Perform nonlinear regression on the data. One way to do that is to apply curve-fitting software that will calculate the mathematical equation that most closely models the data. Another section in this book will focus on using the Excel Solver to fit a curve to nonlinear data. The least-squares method is the simplest way to do this and will be employed in this section.
Do Not Extrapolate Regression
Beyond Existing Data
The major purpose of linear regression is to create a Regression Equation that accurately predicts a Y value based on a new set of independent, explanatory X values. The new set of X values should not contain any X values that are outside of the range of the X values used to create the original regression equation. The following simple example illustrates why a Regression Equation should not be extrapolated beyond the original X values.
Example of Why Regression Should Not Be Extrapolated
Imagine that the height of a boy was measured every month from when the boy was one year old until the boy was eighteen years old. The independent, explanatory X variable would the month number (12 months to 216 months) and the dependent y variable would be the height measured in inches. Typically most boys stopped growing in height when they reach their upper teens.
If the Regression Equation was created from the above data and then extrapolated to predict the boy’s height when he reached 50 years of age, the Regression Equation might predict that the boy would be fifteen feet tall.
Linear Regression Should Not
Be Done By Hand
Excel provides an excellent data analysis regression tool that can perform simple or multiple regression with equal ease. Doing the calculations by hand would be very tedious and provide lots of opportunities to make a mistake. Excel produces a very detailed output and the regression tool is run. I have recreated all of the simple regression calculations that Excel performs in this chapter. It will probably be clear from viewing this that it is wise to let Excel do the regression calculations. A number of statistics textbooks probably place too much emphasis on teaching the ability to perform the regression equations by hand. In the real world regression analysis would never be done manually.
The best way to understand multiple-variable linear regression is to perform an example in the following blog article.
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013
- One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013
- Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- 1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- 2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2- Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression in 6 Steps in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in 5 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Randomized Block Design ANOVA in Excel
- Repeated-Measures ANOVA in Excel
- Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel 2010 and Excel 2013
- Sphericity Testing in 9 Steps For Repeated Measures ANOVA in Excel 2010 and Excel 2013
- Effect Size For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- Friedman Test in 3 Steps For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- ANCOVA in Excel
- Normality Testing in Excel
- Creating a Box Plot in 8 Steps in Excel
- Creating a Normal Probability Plot With Adjustable Confidence Interval Bands in 9 Steps in Excel With Formulas and a Bar Chart
- Chi-Square Goodness-of-Fit Test For Normality in 9 Steps in Excel
- Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk Normality Tests in Excel
- Nonparametric Testing in Excel
- Mann-Whitney U Test in 12 Steps in Excel
- Wilcoxon Signed-Rank Test in 8 Steps in Excel
- Sign Test in Excel
- Friedman Test in 3 Steps in Excel
- Scheirer-Ray-Hope Test in Excel
- Welch's ANOVA Test in 8 Steps Test in Excel
- Brown-Forsythe F Test in 4 Steps Test in Excel
- Levene's Test and Brown-Forsythe Variance Tests in Excel
- Chi-Square Independence Test in 7 Steps in Excel
- Chi-Square Goodness-of-Fit Tests in Excel
- Chi-Square Population Variance Test in Excel
- Post Hoc Testing in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
- VLOOKUP
No comments:
Post a Comment