## Thursday, May 29, 2014

### Evaluation of Simple Regression Output For Excel 2010 and Excel 2013

This is one of the following seven articles on Simple Linear Regression in Excel

Overview of Simple Linear Regression in Excel 2010 and Excel 2013

Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013

Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013

Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression

Evaluation of Simple Regression Output For Excel 2010 and Excel 2013

All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013

Prediction Interval of Simple Regression in Excel 2010 and Excel 2013

# Step 7 – Evaluate the Excel Regression Output

The Excel regression output that will now be evaluated is as follows: (Click On Image To See a Larger Version)

Explanations of the most important individual parts of the Excel regression output are as follows:

## 1) Regression Equation (Click On Image To See a Larger Version)

The regression equation is shown to be the following:

Yi = b0 + b1 * Xi

Number of Parts Produced = 1,345.09 + 1.875 (Weight of Input Pellets in kg.)

On an X-Y graph the Y intercept of the regression line would be 1,345.09 and the slope of the regression line would be 1.875.

For example, if 5,000 kg. of pellets were input into the molding the machine, then 10,730 parts are expected to be produced by the machine. This regression equation calculation is as follows:

Number of Parts Produced = 1,345.09 + 1.875 (Weight of Input Pellets in kg.)

Number of Parts Produced = 1,345.09 + 1.875 (5,000)

Number of Parts Produced = 10,730

It is very important to note that a regression equation should never be extrapolated outside the range of the original data set used to create the regression equation. The inputs for a regression prediction should not be outside of the following range of the original data set:

Weight of Input Pellets (kg.): 103 to 6,592

A simple example to illustrate why a regression line should never be extrapolated is as follows: Imagine that the height of a child was recorded every six months from ages one to seventeen. Most people stop growing in height at approximately age seventeen. If a regression line was created from that data and then extrapolated to predict that person’s height at age 50, the regression equation might predict that the person would be fifteen feet tall. Conditions are often very different outside the range of the original data set.

## 2) R Square –The Equation’s Overall Predictive Power (Click On Image To See a Larger Version)

R Square tells how closely the Regression Equation approximates the data. R Square tells what percentage of the variance of the output variables is explained by the input variables. We would like to see at least 0.6 or 0.7 for R Square. The remainder of the variance of the output is unexplained. R Square here is a relatively high value of 0.904. This indicates the 90.4 percent of the total variance in the output variable (number of parts produced) is explained by the variance of the input variable (weight of the input pellets).

Adjusted R Square is quoted more often than R Square because it is more conservative. Adjusted R Square only increases when new independent variables are added to the regression analysis if those new variables increase an equation’s predictive ability. When you are adding independent variables to the regression equation, add them one at a time and check whether Adjusted R Square has gone up with the addition of the new variable. The value of Adjusted R Square here is 0.898.

## 3) Significance of F - Overall p Value and Validity Measure (Click On Image To See a Larger Version)

The Significance of F is the overall p Value of the regression equation. A very small Significance of F confirms the validity of the Regression Equation. The Regression handout has more information about the Significance of F that appears in the Excel Regression output. The significance of F is actually a p Value. If the significance of F is 0.03, then there is only a 3% that the Regression Equation is random. This is strong evidence of the validity of the Regression Equation.

To be more specific, this p value (Significance of F) indicates whether to reject the overall Null Hypothesis of this regression analysis. The overall Null Hypothesis for this regression equation states that all coefficients of the independent variables equal zero. In other words, that for this multiple regression equation:

Y = b0 + b1X1 + b2X2 + … + bkXk

The Null Hypothesis for multiple regression states that the coefficients b1, b2, … , bk all equal zero. The Y intercept, b0, is not included in this Null Hypothesis.

For this simple regression equation:

Y = b0 + b1X

The Null Hypothesis for simple regression states that the Coefficient b1 equals zero. The Y intercept, b0, is not included in this Null Hypothesis. Coefficient b1 is the slope of the regression line in simple regression.

In this case, the p Value (Significance of F) is extremely low (1.39666E-10) so we have very strong evidence that this is a valid regression equation. There is almost no probability that the relationship shown to exist between the dependent and independent variables (the nonzero values of coefficient b1, b2, … , bk) was obtained merely by chance.

This low p Value (or corresponding high F Value) indicates that there is enough evidence to reject the Null Hypothesis of this regression analysis.

The 95 percent Level of Confidence is usually required to reject the Null Hypothesis. This translates to a 5 percent Level of Significance. The Null Hypothesis is rejected is the p Value (Significance of F) is less than 0.05. If the Null Hypothesis is rejected, the regression output stating that the regression coefficients b1, b2, … , bk do not equal zero is deemed to be statistically significant.

## 4) p Value of Intercept and Coefficients – Measure of Their Validity (Click On Image To See a Larger Version)

The lower the p-Value for each, more likely that the Y-Intercept or coefficient is valid. The Intercept’s low p Value of 0.017 indicates that there is only a 1.7 chance that this calculated value of the Intercept is a random occurrence.

The coefficient’s extremely low p Value of 1.4E-10 indicates that there is almost no chance that this calculated value of the coefficient is a random occurrence.

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

• Histograms in Excel
• Bar Chart in Excel
• Combinations & Permutations in Excel
• Normal Distribution in Excel
• t-Distribution in Excel
• Binomial Distribution in Excel
• z-Tests in Excel
• t-Tests in Excel
• Hypothesis Tests of Proportion in Excel
• Chi-Square Independence Tests in Excel
• Chi-Square Goodness-Of-Fit Tests in Excel
• F Tests in Excel
• Correlation in Excel
• Pearson Correlation in Excel
• Spearman Correlation in Excel
• Confidence Intervals in Excel
• Simple Linear Regression in Excel
• Multiple Linear Regression in Excel
• Logistic Regression in Excel
• Single-Factor ANOVA in Excel
• Two-Factor ANOVA With Replication in Excel
• Two-Factor ANOVA Without Replication in Excel
• Randomized Block Design ANOVA in Excel
• Repeated-Measures ANOVA in Excel
• ANCOVA in Excel
• Normality Testing in Excel
• Nonparametric Testing in Excel
• Post Hoc Testing in Excel
• Creating Interactive Graphs of Statistical Distributions in Excel
• Solving Problems With Other Distributions in Excel
• Optimization With Excel Solver
• Chi-Square Population Variance Test in Excel
• Analyzing Data With Pivot Tables
• SEO Functions in Excel
• Time Series Analysis in Excel
• VLOOKUP