Monday, February 9, 2015

Normal Probability Plot With Adjustable Confidence Bands in 9 Steps in Excel

Normal Probability Plot

With Adjustable

Confidence Interval Bands

in 9 Steps in Excel

(Click On Image To See a Larger Version)

Overview

The Normal Probability Plot is a graphical tool used to determine whether sample data is approximately normally distributed. Sample data are plotted against a line that the sample data would fall on if the sample was perfectly normally distributed. Sample data points that do not fall on this line represent deviations from the normal distribution.

Confidence interval bands can be added to a Normal Probability Plot to provide a visual representation of how wide a confidence interval would need to be to contain the majority of the sample data points if the data were normally distributed. The wider that the confidence interval must be to contain the data points, the less likely it is that the data are normally distributed.

The Normal Probability Plot is based upon the following unique characteristic of the normal distribution:

There is a linear relationship between data values and Z Scores of normally-distributed data. A graph of data value vs. data Z Score will be a perfectly straight line if the data are perfectly normally distributed. A Normal Probability Plot graphs the sample data values against the line of perfect normal distribution, i.e., the line that the data values would fall on if the data were perfectly normally distributed.

The line of perfect normal distribution for a given data set is fairly straightforward to construct on an Excel scatterplot. Upper and lower confidence interval bands can also be added to that scatterplot. Finally the actual sample data points can be plotted to show their proximity to the line of perfect normally distribution and also to determine whether they fall within the upper and lower bands of the specified confidence interval.

A histogram is probably a better tool to apply to make a quick determination of whether a data set might be normally distributed. A histogram is highly dependent on the choice of bin size and location but will directly show the shape of the data set’s distribution. A Normal Probability Plot does provide visual indication of the length of tail and whether the data is skewed to either side but the histogram is the more intuitive tool to evaluate the shape of a data set’s distribution.

Placing adjustable confidence interval bands on a Normal Probability Plot makes the Normal Probability Plot more useful. The bands can be adjusted to determine how large of a confidence interval must be to contain the data if they are normally distributed.

Here is the 9-step process to construct a Normal Probability Plot with adjustable confidence interval bans in Excel:

Step 1 – Sort Sample Data

The first step in plotting a data sample within a Normal Probability Plot is to sort the data in ascending order. Sorting can be performed automatically by copying the formula in cell D10 down to cell D28 as shown below. Changing the word SMALL to LARGE convert this ascending sort to a descending sort. This method of sorting using formulas is more convenient than using the Excel sorting tool because the formulas will automatically resort the data if any of the raw data changes. Excel’s sorting tool must be re-run manually every time a new sort is needed. The automatic sorting formula in cell D10 to be copied down to cell D28 is the following:

=IF($B10=””,””,SMALL($B$10:$B$28,ROW()-ROW($D$9)))

(Click On Image To See a Larger Version)

Step 2 – Calculate Expected Z

Scores

The actual Z Scores of the sorted sample data must first be calculated. The expected Z Scores will correspond to the actual Z Score except that each expected Z Score will be evenly spaced from each other as they increase from smallest to largest Z Score. Each expected Z Score will have the same incremental increase from the previous Z Score.

Z Score is calculated by the following formula:

Z Score = (Data Value – Average Data Value) / (Data Sample Standard Deviation)

The lowest and highest Z Scores are used to determine the total range in the Z Score values of the data set. The increment of Z Score between each expected Z Score point is calculated by the following formula:

Z Score increment = (Total Z Score Range) / (n – 1)

n = the number of data points in the data set

These calculations in Excel are shown as follows:

Data sample count, mean, and sample standard deviation are first calculated in Excel.

(Click On Image To See a Larger Version)

The actual Z Score of each sample data point can now be calculated.

(Click On Image To See a Larger Version)

The total range of actual Z Score values can now be calculated. The expected Z Score incremental increase can be calculated as soon as the total range of Z Score values and the number of data points is known.

(Click On Image To See a Larger Version)

The expected Z Scores for the sample data points if they were normally distributed can now be calculated by starting at the lowest actual Z Score value and adding the incremental increase to each consecutive Z Score until the highest actual Z Score value is reached.

It is important to note that the expected Z Scores calculated in this step will be the X-axis values of all points plotted on the Normal Probability Plot. The four sets of data values that will be plotted on the Normal Probability Plot using these expected Z Scores as their x-axis values are the following:

The normal distribution line
The upper confidence interval band
The lower confidence interval band
The actual sample data points

Expected Z Scores are calculated in Excel as follows:

(Click On Image To See a Larger Version)

Step 3 – Calculate Expected Data

Values if Data Are Normally

Distributed

Calculating the data values that would occur if the data were normally distributed is implemented with the following two-step process:

Calculate the CDF (Cumulative Distribution Function or F(Y)) of each data point. This can be done using the Excel formula NORM.S.DIST(Z Score, TRUE). The Z Scores used are the expected Z Scores if the data were normally distributed.
Calculate each expected data point’s value with the following Excel formula:

NORM.INV( F(Y), Sample Mean, Sample Standard Deviation)

The normal distribution has the unique property that there is a linear relationship between Z Scores and data values if the data are normally distributed.

(Click On Image To See a Larger Version)

Step 4 – Calculate Half Width of

Confidence Interval

The half-width of the confidence interval of the mean Y value at any X value of a fitted line is shown here. This formula is commonly used to construct the confidence interval about a regression line.

It should be noted that the confidence interval bands for the normal probability plot normal distribution line are curved with the narrowest part of the band occurring at the mean X value. Intuitively this occurs because there is less certainty of mean Y values as X values get further and further from the mean X value. The (X_i – X_bar) term in the formula has its lowest value at the mean X value. The confidence interval bands will be at their narrowest point here at the mean X value as a result.

(Click On Image To See a Larger Version)

The individual pieces of this formula will be calculated in this section. The Alpha (The level of significance) is user-adjustable and is currently set at a = 0.05. Reducing the size of a will cause the confidence interval bands to widen. The smaller alpha is, the greater the specified level of confidence is. The wider the confidence interval, the greater level of certainty exists that the confidence interval will contain the actual mean. For the Normal Probability Plot, the wider the confidence bands needs to be to contain the sample data points surrounding the normal distribution line, the less likely it is that the sample data are normally distributed.

The degrees of freedom is set at n – 2 because the values of two parameters, X and Y, must be known before an error term can be calculated.

(Click On Image To See a Larger Version)

The t Value above is calculated using the following Excel formula:

=T.INV(1 – Alpha/2, n-2)

=T.INV(1-T2/2,T4)

(Click On Image To See a Larger Version)

The Standard Error of the Y Values is calculated using the following Excel formula:

SE of Y Values = STEYX(Array Sorted Sample Values, Array Z Scores)

SE of Y Values = STEYX(D10:D28,I10:I28)

(Click On Image To See a Larger Version)

The Sum of Squared X Deviations From the Mean is calculated with the following Excel formula:

=DEVSQ(X Array) = DEVSQ(Array Z Scores) = DEVSQ(I10:I28)

(Click On Image To See a Larger Version)

The above piece of the Confidence Interval formula is the reason that the upper and lower confidence interval bands are curved. The confidence interval about the mean estimated Y value (the normal distribution line) is tightest at the mean X value (mean Z Score). The confidence interval about the mean Y value gets wider the further the X value is from the X mean.

The individual pieces of the formula can now be combined to calculate the half-width of the confidence interval at each X value. The expected Z Scores are the X Values.

(Click On Image To See a Larger Version)

Step 5 – Calculate Upper and Lower

Bands of the Confidence Interval

The location and width of the confidence interval at each X value (Expected Z Score) can now be calculated.

(Click On Image To See a Larger Version)

The following four data sets of Y values will be graphed to form the Normal Probability Plot:

Expected sample values will be plotted to form the normal distribution line.
Upper confidence interval values will be plotted to form the upper band of the confidence interval.
Lower confidence interval values will be plotted to form the lower band of the confidence interval.
Actual sorted sample values will be plotted individually.

Each of the above data sets of Y values will use the same X values, i.e., the Expected Z Scores.

This graphing will be performed as follows:

Step 6 – Graph the Normal

Distribution Line

All four data sets will be graphed using the following Excel scatterplot graph with straight lines and markers.

(Click On Image To See a Larger Version)

The first data series to be added the chart is the Normal Distribution Line which consists of Expected Z Scores as the X values and Expected Sample Values as they values. This data series is added as follows:

(Click On Image To See a Larger Version)

This data will appear on the graph as follows. Note that the data graphs as a straight line because of the unique property of the normal distribution that a linear relationship exists between the Z Scores and data values of normally-distributed data.

(Click On Image To See a Larger Version)

By default Excel configures the vertical (Y) axis to cross the horizontal (X) axis at the point X = 0. The vertical axis needs to be moved out of the way to the left side of the graph. This is accomplished by double-clicking the horizontal axis and specifying the vertical axis crosses at X = -2 as is shown when the format Axis dialogue box pops up with the double-click.

(Click On Image To See a Larger Version)

The vertical axis is now moved over. This plotted Normal Distribution Line should be only a line. The markers (round dots in the line) should be removed. Double-click on the line and select Format Data Series. Change the Marker Options from Automatic to None and the markers will disappear, leaving only the line.

(Click On Image To See a Larger Version)

There is a lot of empty space in the graph above and below the plotted line. Double-click on the vertical axis and reset the minimum and maximum axis values to 130,000 and 270,000 in the Format Axis dialogue box.

(Click On Image To See a Larger Version)

The first data series, the Normal Distribution Line, will now appear as follows:

(Click On Image To See a Larger Version)

Step 7 – Graph the Upper and

Lower Confidence Interval Bands

The upper and lower confidence interval bands are designated as the next two data series to be included in the graph. Right-click on the graph anywhere and click Select Data. Add the two data series in the following manner. Once again, the X values for both data series will be the Expected Z Scores.

(Click On Image To See a Larger Version)

Remove the markers from of the confidence interval series and they should appear graphed as follows. Note that the confidence interval band is at its narrowest point at the mean X value, which is the mean Expected Z Score of 0.

(Click On Image To See a Larger Version)

Step 8 – Graph Sorted Sample Values

The sorted sample values are the last series to be plotted. Once again the X values of this series will be the Expected Z Scores.

(Click On Image To See a Larger Version)

In the case of this plotted data series, only the markers should be plotted and not the connecting lines. Double-click on the plotted data series to bring up the Format Data Series dialogue box. Select No Line to remove the connecting lines.

(Click On Image To See a Larger Version)

The markers can also be configured here to have a blue color. The graph will now appear as follows:

(Click On Image To See a Larger Version)

Step 9 – Add Chart Title, Axes

Titles, and Legend

Chart elements such as chart title, legend, and axis title can be added to the chart by clicking on the chart and selecting Design / Add Chart Elements.

(Click On Image To See a Larger Version)

The final chart appears as follows:

(Click On Image To See a Larger Version)

The alpha of the chart was initial set to 0.05. Reducing alpha will widen the confidence interval. This the result of the following formula:

Level of Confidence = 1 – Level of Significance (alpha)

The above chart shows that the 95-percent confidence level does not contain many of the data points.

Reducing alpha to 0.0001 causes the confidence interval bands to widen as shown.

(Click On Image To See a Larger Version)

The Normal Probability Plot is a quick indicator of normality and should be used in combination with other normality-testing techniques. Here is an Excel histogram of this sample data set which shows the general shape of the data’s distribution:

(Click On Image To See a Larger Version)

Excel Master Series Blog Directory

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

You Will Become an Excel Statistical Master!

9 comments:

UnknownMarch 17, 2016 at 12:50 AM
Could you please calculate confidence interval as MiniTab?
ReplyDelete
Replies
AnonymousJuly 17, 2016 at 9:47 AM
Great post, but I question the result of the standard deviation as 33499.585. When I run your example problem, I get 3349.95 (1/10th your example value).
ReplyDelete
Replies
AnonymousMarch 20, 2017 at 8:08 AM
Would it be possible to get the file. I would like to walk it step by steps with your instructiosn
ReplyDelete
Replies
Dr. Najm AlghazaliJune 9, 2019 at 11:11 AM
clicking on image does not working (giving nothing)
ReplyDelete
Replies
AnonymousNovember 9, 2023 at 7:46 PM
it would really help if we could actually see the figures. can you re-post the images please?
ReplyDelete
Replies
AnonymousOctober 25, 2024 at 5:32 PM
Been returning to this page every couple of years as there are no other sites with this particular topic. Has anyone figured out how to perform these calculations?
ReplyDelete
Replies

Add comment

Become an Excel Statistical Master

Excel Master Series - MBA-level statistics - Over 1,100+ Pages of Easy-To-Follow Instructions in Excel

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

Step-By-Step Optimization With Excel Solver

What's In It?

For anyone who wants to be operating at a high level with the Excel Solver quickly, this is the book for you. Step-By-Step Optimization With Excel Solver is a 200+ page .pdf e-manual of simple yet thorough explanations on how to use the Excel Solver to solve today’s most widely known optimization problems. Loaded with screen shots that are coupled with easy-to-follow instructions, this book will simplify many difficult optimization problems and make you a master of the Excel Solver almost immediately.

Here are just some of the Solver optimization problems that are solved completely with simple-to-understand instructions and screen shots in this e-manual:

• The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use of the Excel INDEX function.

• The well-known “Knapsack Problem” which shows how optimize the use of limited space while satisfying numerous other criteria.

• How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear solving method.

• How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.

• Portfolio optimization to maximize return or minimize risk.

• Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.

• How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to multiple locations.

• How to optimize the selection of different production machine to minimize cost while fulfilling an order.

• How to optimally allocate a marketing budget to generate the greatest reach and frequency or number of inbound leads at the lowest cost.

Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all of the Solver’s settings for total custom use. This e-manual also provides lots of inside advice and guidance on setting up the model in Excel so that it will be as simple and intuitive as possible to work with.

All of the optimization problems in this book are solved step-by-step using a 6-step process that works every time. In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization problem in the book, a link is provided to download an Excel workbook that has all problems completed exactly as they are in this e-manual.

Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be optimizing at an advanced level with the Excel Solver quickly.

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

More Easy-To-

Follow eManuals

That You Will

Master Quickly

*******************

Become an Excel Statistical Master

It's a Full
Easy-To-Follow
MBA Course in Business Statistics

ALL IN EXCEL

&

MUCH Clearer

Than Your Text

Book

Download the
1,100+ Page Excel Statistical Master now

Immediate, Absolute, No-Questions-Asked, Money-Back Guarantee If Not TOTALLY, 100% Satisfied. In Other Words, If Any Excel Master Series eManual That You've Purchased Here Does Not Provide Instructions That Are CRYSTAL CLEAR and EASY TO UNDERSTAND, You Get All Of Your Money Back Immediately and Keep the eManual. Guaranteed!

Meet The Author