## Tuesday, May 27, 2014

### Estimating the Prediction Interval of Multiple Regression in Excel

This is one of the following seven articles on Multiple Linear Regression in Excel

Basics of Multiple Regression in Excel 2010 and Excel 2013

Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013

Multiple Linear Regression’s Required Residual Assumptions

Normality Testing of Residuals in Excel 2010 and Excel 2013

Evaluating the Excel Output of Multiple Regression

Estimating the Prediction Interval of Multiple Regression in Excel

Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel

# Overview of Prediction Interval of Multiple Regression In Excel

A prediction interval is a confidence interval about a Y value that is estimated from a regression equation. A regression prediction interval is a value range above and below the Y estimate calculated by the regression equation that would contain the actual value of a sample with, for example, 95 percent certainty.

The Prediction Error for a point estimate of Y is always slightly larger than the Standard Error of the Regression Equation shown in the Excel regression output directly under Adjusted R Square.

The Standard Error of the Regression Equation is used to calculate a confidence interval about the mean Y value. The Prediction Error is use to create a confidence interval about a predicted Y value. There will always be slightly more uncertainty in predicting an individual Y value than in estimating the mean Y value.

For that reason, a Prediction Interval will always be larger than a Confidence Interval for any type of regression analysis.

Calculating an exact prediction interval for any regression with more than one independent variable (multiple regression) involves some pretty heavy-duty matrix algebra. Fortunately there is an easy short-cut that can be applied to multiple regression that will give a fairly accurate estimate of the prediction interval.

## Prediction Interval Formula

The formula for a prediction interval about an estimated Y value (a Y value calculated from the regression equation) is found by the following formula:

Prediction Interval = Yest ± t-Valueα/2 * Prediction Error

Prediction Error = Standard Error of the Regression * SQRT(1 + distance value)

Distance value, sometimes called leverage value, is the measure of distance of the combinations of values, x1, x2,…, xk from the center of the observed data. Calculation of Distance value for any type of multiple regression requires some heavy-duty matrix algebra. This is given in Bowerman and O’Connell (1990).

Some software packages such as Minitab perform the internal calculations to produce an exact Prediction Error for a given Alpha. Excel does not. Fortunately there is an easy substitution that provides a fairly accurate estimate of Prediction Interval. The following fact enables this:

The Prediction Error for a point estimate of Y is always slightly larger than the Standard Error of the Regression Equation shown in the Excel regression output directly under Adjusted R Square.

The Standard Error (highlighted in yellow in the Excel regression output) is used to calculate a confidence interval about the mean Y value. The Prediction Error is use to create a confidence interval about a predicted Y value. There will always be slightly more uncertainty in predicting an individual Y value than in estimating the mean Y value.

## Prediction Interval Estimate Formula

The Prediction Error is always slightly bigger than the Standard Error of a Regression. The Prediction Error can be estimated with reasonable accuracy by the following formula:

Prediction Errorest = P.E.est

P.E.est = (Standard Error of the Regression)* 1.1

Prediction Intervalest = Yest ± t-Valueα/2 * P.E.est

Prediction Intervalest = Yest ± t-Valueα/2 * (Standard Error of the Regression)* 1.1

Prediction Intervalest = Yest ± TINV(α, dfResidual) * (Standard Error of the Regression)* 1.1

The t-value must be calculated using the degrees of freedom, df, of the Residual (highlighted in Yellow in the Excel Regression output and equals n – 2).

dfResidual = n – 2 = 20 – 2 = 18

t-Valueα/2,df=n-2 = TINV(0.05,20-2)

t-Valueα/2,df=n-2 = TINV(0.05,18) = 2.1009

In Excel 2010 and later TINV(α, df) can be replaced be T.INV(1-α/2,df)

## Example in Excel of Estimating the Prediction Interval

Create a 95 percent prediction interval about the estimated value of Y if a company had 10,000 production machines and added 500 new employees in the last 5 years.

In this case the company’s annual power consumption would be predicted as follows:

Yest = Annual Power Consumption (kW) = 37,123,164 + 10.234 (Number of Production Machines X 1,000) + 3.573 (New Employees Added in Last 5 Years X 1,000)

Yest = Annual Power Consumption (kW) = 37,123,164 + 10.234 (10,000 X 1,000) + 3.573 (500 X 1,000)

Yest = Estimated Annual Power Consumption = 49,143,690 kW

Yest = 49,143,690

Prediction Intervalest = Yest ± TINV(α, dfResidual) * (Standard Error of the Regression)* 1.1

In Excel 2010 and later TINV(α, df) can be replaced be T.INV(1-α/2,df)

The Standard Error of the Regression is found to be 21,502,161 in the Excel regression output as follows: (Click On Image To See a Larger Version)

Prediction Intervalest = 49,143,690 ± TINV(0.05, 18) * (21,502,161)* 1.1

Prediction Intervalest = [49,143,690 ± 49,691,800 ]

Prediction Intervalest = [ -549,110, 98,834,490 ]

This is a relatively wide Prediction Interval that results from a large Standard Error of the Regression (21,502,161).

It is very important to note that a regression equation should never be extrapolated outside the range of the original data set used to create the regression equation. The inputs for a regression prediction should not be outside of the following ranges of the original data set:

Number of machine: 442 to 28,345

New employees added in last 5 years: -1,460 to 7,030

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

• Histograms in Excel
• Bar Chart in Excel
• Combinations & Permutations in Excel
• Normal Distribution in Excel
• t-Distribution in Excel
• Binomial Distribution in Excel
• z-Tests in Excel
• t-Tests in Excel
• Hypothesis Tests of Proportion in Excel
• Chi-Square Independence Tests in Excel
• Chi-Square Goodness-Of-Fit Tests in Excel
• F Tests in Excel
• Correlation in Excel
• Pearson Correlation in Excel
• Spearman Correlation in Excel
• Confidence Intervals in Excel
• Simple Linear Regression in Excel
• Multiple Linear Regression in Excel
• Logistic Regression in Excel
• Single-Factor ANOVA in Excel
• Two-Factor ANOVA With Replication in Excel
• Two-Factor ANOVA Without Replication in Excel
• Randomized Block Design ANOVA in Excel
• Repeated-Measures ANOVA in Excel
• ANCOVA in Excel
• Normality Testing in Excel
• Nonparametric Testing in Excel
• Post Hoc Testing in Excel
• Creating Interactive Graphs of Statistical Distributions in Excel
• Solving Problems With Other Distributions in Excel
• Optimization With Excel Solver
• Chi-Square Population Variance Test in Excel
• Analyzing Data With Pivot Tables
• SEO Functions in Excel
• Time Series Analysis in Excel
• VLOOKUP