Tuesday, May 27, 2014

Estimating the Prediction Interval of Multiple Regression in Excel

This is one of the following seven articles on Multiple Linear Regression in Excel

Basics of Multiple Regression in Excel 2010 and Excel 2013

Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013

Multiple Linear Regression’s Required Residual Assumptions

Normality Testing of Residuals in Excel 2010 and Excel 2013

Evaluating the Excel Output of Multiple Regression

Estimating the Prediction Interval of Multiple Regression in Excel

Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel

 

Overview of Prediction

Interval of Multiple

Regression In Excel

A prediction interval is a confidence interval about a Y value that is estimated from a regression equation. A regression prediction interval is a value range above and below the Y estimate calculated by the regression equation that would contain the actual value of a sample with, for example, 95 percent certainty.

The Prediction Error for a point estimate of Y is always slightly larger than the Standard Error of the Regression Equation shown in the Excel regression output directly under Adjusted R Square.

The Standard Error of the Regression Equation is used to calculate a confidence interval about the mean Y value. The Prediction Error is use to create a confidence interval about a predicted Y value. There will always be slightly more uncertainty in predicting an individual Y value than in estimating the mean Y value.

For that reason, a Prediction Interval will always be larger than a Confidence Interval for any type of regression analysis.

Calculating an exact prediction interval for any regression with more than one independent variable (multiple regression) involves some pretty heavy-duty matrix algebra. Fortunately there is an easy short-cut that can be applied to multiple regression that will give a fairly accurate estimate of the prediction interval.

 

Prediction Interval Formula

The formula for a prediction interval about an estimated Y value (a Y value calculated from the regression equation) is found by the following formula:

Prediction Interval = Yest ± t-Valueα/2 * Prediction Error

Prediction Error = Standard Error of the Regression * SQRT(1 + distance value)

Distance value, sometimes called leverage value, is the measure of distance of the combinations of values, x1, x2,…, xk from the center of the observed data. Calculation of Distance value for any type of multiple regression requires some heavy-duty matrix algebra. This is given in Bowerman and O’Connell (1990).

Some software packages such as Minitab perform the internal calculations to produce an exact Prediction Error for a given Alpha. Excel does not. Fortunately there is an easy substitution that provides a fairly accurate estimate of Prediction Interval. The following fact enables this:

The Prediction Error for a point estimate of Y is always slightly larger than the Standard Error of the Regression Equation shown in the Excel regression output directly under Adjusted R Square.

The Standard Error (highlighted in yellow in the Excel regression output) is used to calculate a confidence interval about the mean Y value. The Prediction Error is use to create a confidence interval about a predicted Y value. There will always be slightly more uncertainty in predicting an individual Y value than in estimating the mean Y value.

 

Prediction Interval Estimate

Formula

The Prediction Error is always slightly bigger than the Standard Error of a Regression. The Prediction Error can be estimated with reasonable accuracy by the following formula:

Prediction Errorest = P.E.est

P.E.est = (Standard Error of the Regression)* 1.1

Prediction Intervalest = Yest ± t-Valueα/2 * P.E.est

Prediction Intervalest = Yest ± t-Valueα/2 * (Standard Error of the Regression)* 1.1

Prediction Intervalest = Yest ± TINV(α, dfResidual) * (Standard Error of the Regression)* 1.1

The t-value must be calculated using the degrees of freedom, df, of the Residual (highlighted in Yellow in the Excel Regression output and equals n – 2).

dfResidual = n – 2 = 20 – 2 = 18

t-Valueα/2,df=n-2 = TINV(0.05,20-2)

t-Valueα/2,df=n-2 = TINV(0.05,18) = 2.1009

In Excel 2010 and later TINV(α, df) can be replaced be T.INV(1-α/2,df)

 

Example in Excel of Estimating

the Prediction Interval

Create a 95 percent prediction interval about the estimated value of Y if a company had 10,000 production machines and added 500 new employees in the last 5 years.

In this case the company’s annual power consumption would be predicted as follows:

Yest = Annual Power Consumption (kW) = 37,123,164 + 10.234 (Number of Production Machines X 1,000) + 3.573 (New Employees Added in Last 5 Years X 1,000)

Yest = Annual Power Consumption (kW) = 37,123,164 + 10.234 (10,000 X 1,000) + 3.573 (500 X 1,000)

Yest = Estimated Annual Power Consumption = 49,143,690 kW

Yest = 49,143,690

Prediction Intervalest = Yest ± TINV(α, dfResidual) * (Standard Error of the Regression)* 1.1

In Excel 2010 and later TINV(α, df) can be replaced be T.INV(1-α/2,df)

The Standard Error of the Regression is found to be 21,502,161 in the Excel regression output as follows:

excel, excel 2010, excel 2013, regression, multiple regression, prediction interval, statistics
(Click On Image To See a Larger Version)

Prediction Intervalest = 49,143,690 ± TINV(0.05, 18) * (21,502,161)* 1.1

Prediction Intervalest = [49,143,690 ± 49,691,800 ]

Prediction Intervalest = [ -549,110, 98,834,490 ]

This is a relatively wide Prediction Interval that results from a large Standard Error of the Regression (21,502,161).

It is very important to note that a regression equation should never be extrapolated outside the range of the original data set used to create the regression equation. The inputs for a regression prediction should not be outside of the following ranges of the original data set:

Number of machine: 442 to 28,345

New employees added in last 5 years: -1,460 to 7,030

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment