Tuesday, June 3, 2014

Creating an Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013

Creating an Interactive

Statistical Distribution

Graph in Excel 2010 and

Excel 2013

This is one of the following eleven articles on creating user-interactive graphs of statistical distributions in Excel

Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013

Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013

Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013

Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013

Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013

Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013

Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013

Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013

Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013

Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013

Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013

Interactive area charts are very useful for demonstrating how changes made to a statistical distribution’s parameters affect the distribution’s shape, scale, and location.

This section shows how to create a user-interactive normal distribution graph in Excel that instantaneously adjusts its shape, scale, location, and size of outer tails when a user changes any of the yellow cells that contain the graph’s parameters, label and legend text, and percentage of curve area in each outer tail. The following user-controlled settings produce the Excel area chart that is shown here:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Changes to each of the nine user-controlled settings instantly update the chart as shown here:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This section will provide instructions to create this interactive Excel area chart.

The X and Y values used to create this chart are based upon z Score(X), the z Score of each X value. The z Score is the number of standard deviations that a specific point (X value) is from the mean.

The z Score provides the basis of every element on this chart and therefore should be constructed first. 99.7 percent of all values in a normally distributed population will be within 3 standard deviations from the mean. A sufficient range of z Score values would include z Scores from 3 standard deviations below the mean (z Score = -3) to 3 standard deviations above the mean (z Score = +3).

The z Score data will be presented as a list of z Scores starting at -3 and increasing incrementally to a value of +3. The increments need to be small enough so that Excel graph based upon these z Scores will be smooth. The list of z Scores will increase by increments of 0.10 from -3 to +3. This means that there be 60 z Score values and therefore 60 points graphed in this Excel chart.

As mentioned the z Score will start at -3 and increase in 0.1 increments until the z Score has reached +3. There are many ways in Excel to incrementally increase the values in cells on the way down a column. ROW() generates the number of the current row and is used in the following fashion to incrementally increase z Score value:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The Excel Split View tool was used to simultaneously view the beginning and the end of a data column. The beginning and end values of data columns are usually the most important parts when creating interactive Excel graphs. The Split View tool is found under the View tab and is labeled Split. The horizontal and vertical partitions of the Split can be dragged-and-dropped to any location on the worksheet by the user.

The next set of graph parameters that should be created are the X-axis values. The values of the X-axis are calculated from the z Score, the population mean (µ) and the population standard deviation (σ). The calculation of the X values on the horizontal axis is given as follows:

X = µ + (z Score(X) * σ)

z Score data has already been created. The population mean, µ, and population standard deviation, σ, will be tied to user-inputs on the Excel worksheet.

Changing the Excel graph’s user inputs µ and σ changes the values of the X-axis. The graph’s shape remains the same but the values of the X axis are shifted and scaled based upon changes made to the user inputs µ and σ. The two graphs at the beginning of this section show the changes in the X-axis location and scale based upon changes to user inputs for µ and σ. The following image shows how X-axis values on the Excel worksheet are calculated based upon z Scores and the user inputs µ and σ:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

After creating the column of z Scores and the column of X-axis values tied to the z Scores, the next step is to create a column of PDF data. The z Score data is the fundamental basis of the entire graph and each PDF value in the PDF column is based upon the z Score that is directly across from it in the z Score column.

These PDF values are calculated in Excel as follows:

PDF(X)

= f(X,µ,σ)

=NORM.DIST(X, µ,σ,FALSE)

= NORM.S.DIST(z Score(X),FALSE)

= NORM.S.DIST((X - µ)/σ,FALSE)

This is implemented in the Excel worksheet as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The next data set that should be created are the values of the normal distribution’s CDF (Cumulative Distribution Function) at each z Score increment. This is calculated in Excel as follows:

CDF(X)

= F(X,µ,σ)

=NORM.DIST(X, µ,σ,TRUE)

= NORM.S.DIST(z Score(X),TRUE)

= NORM.S.DIST((X - µ)/σ,TRUE)

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The data sets created so far include z Score values, X-axis values, and the PDF and CDF values at each z Score. The final data set to be created are the Y-axis values.

This Excel area chart is a graph of the normal distribution’s bell-shaped PDF curve. The Y value at each point on this curve will therefore have the same value as the normal distribution’s PDF at that point. Setting the Y-values to equal the PDF values for each z Score increment is implemented as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The Y-values are given the label Series 1 because this represents the first data series created for the area chart.

Series 1 can now be graphed because both the X and Y values are now available. Creating this graph in Excel is done as follows:

The first step is to select the entire column of data including the label for Series 1 as shown.

With that selected data highlighted, insert an area chart into the worksheet. That chart is inserted by going under the Insert tab and then selecting Area Charts / 2-D Area Chart as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The following graph of the normal distribution’s PDF is automatically created as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The default color of the first data series graphed is blue. This can be changed if desired by right-clicking on blue bell-shaped curve and selecting Format Graph Area.

The area chart has been created based upon the column of Y values. No specific values have been assigned to the X-Axis. The default X values are created from the numbering of the points starting at 1 on the left.

The next step is to assign specific X-values to the data points. The X-value data set column has been created.

To assign specific X-values to the graph’s data points, click anywhere on the graph and choose Select Data from the pop-up short-cut menu as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This brings up the Select Data Source dialogue box. Select the Edit button on the right side under Horizontal (Category) Axis Labels.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will bring up the Axis Label dialogue box. Highlight the data in the X-Axis data column on the worksheet and hit OK as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The Excel graph will now attach the selected X values to the data points as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

These X-values change instantly when the user inputs for µ and σ are changed. This will be demonstrated shortly.

The values in the X-axis should have additional formatting applied to them. The X axis can be made more legible if the X-axis values have their font weight changed to Bold, are set to show at least one decimal place, and become spaced out in larger increments than the current crowded setting of displaying than every third X value. These changes can be made to the X–axis values as follows:

Right-click anywhere on the X-axis to bring up both the short-cut font menu and the short-cut pop-up menu. Select B on the short-cut font menu and the X values will have their font weight changed to bold.

Right-click again on the numbers on the X-axis to bring up both menus again. Select Format Axis as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will bring up the Format Axis dialogue box. Select the first option on the left which is Axis Options. The following settings shown here will display every tenth X-axis value instead of every third value:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The Number category brings up the Number dialogue page. The following setting will configure the X values to display one decimal place:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

These settings configure values on the X axis to be displayed as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

As mentioned, this graph is user-interactive. Changing user-inputs µ and σ produce the following changes in the Excel graph:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The graph’s shape did not change. The location of the mean was shifted to 50 and the graphed was scaled so that its standard deviation is now equal to 20. These changes are all implemented in the X axis.

Excel graphs can be configured to link the text of the chart title and axis labels to external cells that are user inputs. The first step to implementing this is to create the external cells that will be the user inputs. Place these inputs in the following locations:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The current generic Excel area chart does not have axis labels. The labels for the X-axis and the Y-axis have to created.

A generic X-axis label is created by first clicking anywhere on the chart. This will bring up the Chart Tools menu. From this menu select the Layout tab / Axis Titles / Primary Horizontal Axis Title / Title Below Axis.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will create a generic X-axis title on the bottom of the graph shown in the next image of the graph.

A generic Y-axis label is also created by first clicking anywhere on the chart. This once again bring up the Chart Tools menu. From that menu select the Layout tab / Axis Titles / Primary Vertical Axis Title / Horizontal Title.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will create a generic Y-axis title on the left side of the graph that has horizontal text that is shown in the next image of the graph.

The Excel chart is now showing generic X-axis and Y-axis labels, a chart title, and a legend for data Series 1.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Linking the text in the chart title to text in an external user-input cell is fairly simple. Just click anywhere on the chart title and then type an equal sign in the formula bar as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

After typing the equal sign in the formula bar, click on the external cell that contains text. The address of this cell will appear in the formula bar including the worksheet’s name as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The result is that the chart title now displays the text contained in the external cell as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

If the text in the external cell is changed, the updated text will be immediately displayed in the Excel chart.

That exact same procedure can be used to link the X and Y-axis labels to external cells that are controlled by the user. Performing this procedure for both labels produces the following result:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The next step is to link the legend of data Series 1 to an external cell. To do this click anywhere on the chart to bring up the short-cut menu. From this menu, choose Select Data as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will bring up the Select Data Source dialogue box. Select Series 1 on the left side and then click the Edit button also on the left side as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will bring up the Edit Series dialogue box. Under the Series name input shown below in this dialogue box, select the external cell containing the text for the legend as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The preceding steps have produced an Excel area chart that has the chart title, the labels for the X and Y axes, and the legend for Series 1 linked to external cells that can each be changed by the user as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Any changes in these user inputs are instantly reflected in the Excel graph as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The next step is to create an additional data series that will display the outer tails of the graph. This data series will have its data values controlled by the user inputs. These inputs will control the curve area displayed in each of the outer tails of the graph.

The first step in creating this data series (Series 2) is to add new user inputs as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The data values for Series 2 can now be created.

The Y values of Series 2 will also be set to equal the PDF values at each point on the graph as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Now that the column of Y values for Series 2 has been created, Series 2 can be added to the area chart and linked to the column of Y values. To do that click anywhere on the chart to bring up the pop-up short-cut menu. Choose Select Data on the menu as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

This will bring up the following Select Data Source dialogue box. Select Add on the left side under Legend Entries (Series).

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

For the Series name data input box, click on the external cell that contains the text that will be the legend for Series 2.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

From the Series Values input box, select the column of Series 2 Y values on the worksheet as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Any data series that is added to an Excel area chart will sit on top of all previously-added ones. Series 2 (default color is red) sits on top of Series 1 (default color is blue) and completely covers Series 1 on the chart as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Series 2 sits on top of and covers Series 1 on the chart. The only way to see any of the blue Series 1 is to zero out Y values of Series 2.

For example, if the following outer values of Series 2 were set to zero:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

The blue Y values of Series 1 underneath are uncovered and show through as the outer tails on each side. The resulting chart from the above data is shown as follow:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

It is possible to create a user input that automatically determines how many outer Y values on each side of Series 2 will be set to zero. The number of Y values set to zero on each side of Series 2 determines how much curve area each blue outer tail in the graph will contain.

Each outer tail is addressed separately and has a separate user input assigned to it. Each of the two user inputs specify the percentage of the total curve area that will be displayed as a blue outer tail on its respective side.

The upper user input set to the value of 0.05 controls the outer left tail. The user-controlled value of 0.05 specifies that the blue outer left tail will contain 5 percent of the total area under the curve.

Each of the Y values in Series 2 is calculated in Excel by an If-Then-Else statement. The If-Then-Else statement sets the Y value to zero if the user input (currently set at 0.05 for both tails) is equal or greater than its respective CDF value (in the left tail) or 1 – CDF (in the right tail).

When the above condition is not met, the If-Then-Else statement sets the Y value to its respective PDF value. The Y value will either be set to zero (if the condition is met) or set to its PDF value (if the condition is not met).

The If-Then-Else formula is changed when the z Score reaches zero. This formula change ensures that the user input for each of the two outer tails will always be compared to the area in that outer tail.

The CDF(X) states the percentage of the total bell-shaped normal curve is to the left of X. If point X is in the left tail, the curve area outside of point X in the left tail is specified by CDF(X). If point X is in the right tail, the curve area outside of point X in the right tail is specified by 1 – CDF(X).

The following image shows the formula change at the point that the z Score equals zero. The Excel formulas for the Y values of Series 2 change when the z Score reaches zero. Y values in the left tail have the tan background. Y values in the right tail have the light blue background.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

As a result, the settings shown below automatically configure the graph to display blue outer left and right tails that each contain 5 percent of the total area under the normal curve.

This was implemented by setting the Y values of Series 2 to zero in the outer 5 percent of each tail. This enables the blue Y values of Series 1 underneath to be displayed in those outer tails.

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

Changes to any of the yellow user inputs are automatically reflected in the Excel graph.

The following graph shows how the changes made to all nine user inputs are instantly and automatically reflected in the Excel area graph as follows:

statistics, graph, excel excel 2010 graph, excel 2010, excel 2013
(Click On Image To See a Larger Version)

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

Overview of the Normal Distribution

Overview of the Normal

Distribution

This is one of the following eight articles on the normal distribution in Excel

Overview of the Normal Distribution

Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013

Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013

Solving Normal Distribution Problems in Excel 2010 and Excel 2013

Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013

An Important Difference Between the t and Normal Distribution Graphs

The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean

Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way

The normal distribution is a very useful and widely-occurring distribution. The normal distribution curve has the well-known bell shape and is symmetrical about a mean. The normal distribution is actually a family of distributions with each unique normal distribution being fully described by the following two parameters: the population mean, µ (Greek letter “mu”), and the population standard deviation, σ (Greek letter “sigma”).

The following Excel-generated image is an example of the PDF (Probability Density Function) of a normal distribution curve whose population mean, µ, equals 10 and population standard deviation, σ, equals 5.

normal distribution, statistics
(Click On Image To See a Larger Version)

Following is image of the PDF of a different normal distribution curve with a population mean, µ, equals 0 and population standard deviation, σ, equals 1. This is a special normal distribution known as the Standard Normal Distribution as shown in the following Excel-generated image:

normal distribution, statistics
(Click On Image To See a Larger Version)

You may notice that the shapes of both normal distribution PDF graphs are the same but the x axis has been shifted and scaled. The entire family of normal distributions is symmetrical about a mean and contains approximately 68 percent of the entire area under the curve within one standard deviation of the mean. Approximately 95 percent of the total area under the curve lies within two standard deviations of the mean, and approximately 99.7 percent of the total curve area within three standard deviations of the mean.

If PDF curves of different normal distributions are created with the mean and standard deviation in the same locations on the horizontal axis, the curves will be in the same place and have exactly the same shape. The two previous images demonstrate this.

If the PDF curves of different normal distributions are placed on the same horizontal and vertical axes, each curve will be shifted left or right so it is symmetrical about its population mean, µ, and its width will be scaled (widened or narrowed) depending on the size of its population standard deviation, σ.

The PDF graph of a normal distribution with a smaller population standard deviation would appear thinner and taller than the PDF graph of a normal distribution with a larger population standard deviation, if both curves were graphed on the same set of horizontal and vertical axes.

 

Uses of the Normal Distribution

Many real-valued variables are modeled using the normal distribution. An example of normally-distributed natural phenomenon would be the velocities of the molecules in an ideal gas. Test scores for large numbers of people follow the normal distribution. Random variation in output of a machine barring special causes is often normal-distributed.

In other cases it is often a variable’s logarithm that is normal-distributed. Biological sciences provide many such examples such as the length of appendages and blood pressure. A variable whose logarithm is normal-distributed is said to have a log-normal distribution.

An outcome that is the result of a number of small variables that are additive, independent, and similar in magnitude is often normal-distributed. Measurement error of a physical experiment might be one example of such an outcome.

In some fields the changes in the logarithm of a variable are assumed to be normal-distributed. In the financial fields, changes in the logarithms of exchange rates, price indices, and the stock market indices are modeled by the normal distribution.

 

Useful Normal Approximations of

Other Distributions

Several distributions can be approximated by the normal distribution under certain conditions. When the normal approximation is valid, tools such as normal-distribution-based hypothesis testing and hypothesis testing can be conveniently applied directly to samples from that-Distribution.

 

Normal Approximation of the Binomial Distribution

The normal approximation of the binomial distribution is of particular importance. Each unique binomial distribution, B(n,p), is described with the following two parameters: n (sample size) and p (probability of each binary sample having a positive outcome of the two possible outcomes). A binomial distribution is approximately normal-distributed with a mean np and variance np(1-p) if n is large an p is not too close to zero. A unique normal distribution, N(µ,σ), is completely described by the following two parameters: µ (population mean) and σ (population standard deviation). When the normal approximation of the binomial distribution is appropriate, normal-distribution-based hypothesis testing and confidence intervals can be performed with binomially-distributed data by applying the following substitution: N(µ,σ) = N(np, SQRT(np(1-p))).

The normal approximation of the binomial distribution enables the convenient analysis of population proportions with normal-distribution-based hypothesis testing and confidence intervals. A well-known example of this type of analysis would be the estimation of the percentage of a population along with a percent margin of error that will vote for or against a particular political candidate. This is known as a confidence interval of a population proportion.

The t-Distribution (sometimes called the Student’s t-Distribution) is closely related to the normal distribution in function and appearance. The t-Distribution is used to analyze normally-distributed data when the sample size is small (n< 30) or the population standard deviation is not known. The t-Distribution is always centered about its mean of zero and closely resembles the standard normal curve. The standard normal curve is the unique normal distribution curve whose mean equals 0 and standard deviation equals 1. The t-Distribution has a lower peak and slightly thicker outer tails than the standard normal distribution, but converges to the exact shape of the standard normal distribution as sample size increases.

In the real world, normally-distributed data are much more frequently analyzed with t-Distribution-based tools than normal-distribution-based tools. The reason is that the t-Distribution more correctly describes the distribution normally-distributed data in the common occurrences of small sample size and unknown population standard deviation. t-Distribution-based tools are also equally appropriate for analysis of large samples of normally-distributed data because the t-Distribution converges to nearly an exact match of the standard normal distribution when sample size exceeds 30.

Normal Approximation of the Poisson Distribution

Of lesser importance is the normal distribution’s approximation of the Poisson and chi-Square distribution under certain conditions. Each unique Poisson distribution curve is completely described by a single parameter λ. When λ is large, data distributed according to the Poisson distribution can be analyzed with normal-distribution-based tools by making the following substitution:

N(µ,σ) = N(λ, SQRT(λ)).

Each unique Chi-Square distribution curve is completely described by the single parameter that is its degrees of freedom, k. When k is large, data distributed according to the Chi-Square distribution can be analyzed with normal-distribution-based tools by making the following substitution:

N(µ,σ) = N(k, SQRT(2k)).

One of the most useful modeling applications of the normal distribution is due to the fact the means of large random samples from a population are approximately normal-distributed no matter how the underlying population is distributed. This property is described by the Central Limit Theorem. Normal-distribution based analysis tools such as hypothesis testing and confidence intervals can therefore be applied to the means of samples taken from populations that are not normal-distributed.

 

Statistical Tests That Require

Normality of Data or Residuals

Statistical tests that as classified a parametric tests have a requirement that the sample data or the residuals are normal-distributed. For example, linear regression requires that the residuals be normal-distributed. t-tests performed on small samples (n < 30) require that the samples are taken from a normally-distributed population. ANOVA requires normality of the samples being compared.

When normality requirements cannot be met, nonparametric tests can often be substituted for parametric tests. Nonparametric tests do not requirements that data or residuals follow a specific distribution. Nonparametric tests are usually not as powerful as parametric tests.

 

History of the Normal Distribution

The first indirect mention of the normal distribution is credited to French mathematician and friend of Isaac Newton, Abraham De Moivre. De Moivre developed such repute as a mathematician that Newton often referred questions to him, saying “Go to Mr. de Moivre. He knows these things better than I do.”

In 1738 De Moivre published the second edition of his The Doctrine of Chances that contained a study of binomial coefficients which is considered to the first reference, albeit, indirect, to the normal distribution. That particular book became highly valued among gamblers of the time. The book noted that the distribution of the number of times that a binary event produces a positive outcome, such as a coin toss resulting in heads, becomes a smooth, bell-shaped curve when the number of trials is high enough. This bell-shaped curve approaches the normal curve. DeMoivre was alluding to what we know today as the normal distribution’s approximation of the binomial distribution. DeMoivre did not speak of the normal distribution in terms of a probability density function and therefore does not receive full credit for discovering the normal distribution.

De Moivre, who spent his adult life in London, remained somewhat poor because he was unable to secure a professorship at any local university, partially due to his French origins. He earned a substantial part of his living from tutoring mathematics and being a consultant to gamblers. One day when De Moivre became older, he noticed that he required more sleep every night. He determined that he was sleeping an extra 15 minutes every night. He correctly calculated the date his own death to be November 27, 1754, the date that the total required sleep time would reach 24 hours.

The first true mention of the normal distribution was made by German mathematician Carl Friedrich Gauss in 1809 as a way to rationalize the nonlinear method of least squares. The normal distribution curve is often referred to as the Gaussian curve as a result.

Gauss was one of the greatest mathematicians who ever lived and is sometimes referred to as “the Prince of Mathematicians” and “the greatest mathematician since antiquity.” Gauss was a child prodigy and made some of his groundbreaking mathematical discoveries as a teenager. Gauss was an astonishingly prolific scientist in many fields. Here is a link to a partial list of over 100 scientific topics named after him:

http://en.wikipedia.org/wiki/List_of_things_named_after_Carl_Friedrich_Gauss

Significant contributions to the normal distribution were made by another of the greatest mathematicians of all-time, Frenchman Pierre Simon LePlace, who was sometimes referred to as the French Newton. LePlace is credited with providing the first proof of one of statistics’ most important tenets related to the normal distribution, the Central Limit Theorem. LePlace has an interesting life and was appointing by Napoleon to be the French Minister of the Interior shortly after Napoleon seized power in a coup. The appointment lasted about six weeks until nepotism took over and the post was given to Napoleon’s brother.

The distribution’s moniker, the “normal distribution,” was made popular by Englishman Karl Pearson, another giant in the field of mathematics. Karl Pearson is credited with establishing the discipline of mathematical statistics and founded the world’s first university statistics department in 1911 at the University College of London. Many of topics covered in all basic statistics courses such a p Values and correlation are the direct result of Karl Pearson’s work.

Some of Pearson’s publishings, particularly his book The Grammar of Science, provided a number of themes that Einstein would weave into several of his most well-known theories of relativity. Pearson postulated that a person traveling faster than the speed of light would experience time being reversed. Pearson also discussed the concept that the operation of physical laws depended on the relative position of the observer. These are central themes in Einstein’s relativity theories.

 

Properties of the Normal

Distribution

This graph of the standard normal distribution’s PDF is once again presented to assist in the understanding of each listed property of the normal distribution.

normal distribution, statistics
(Click On Image To See a Larger Version)

The normal distribution’s probability density function, f(x), has the following properties:

1) It is symmetric about its population mean µ. Half of the values of a normally-distributed population will be less than (to the left of) the population mean and the other half of the population’s value will be greater than (to the right of) the population mean.

2) Its mode and median are equal to the population mean µ.

3) It is unimodal. This means that it has only one peak, i.e., only one point that is a local maximum.

4) The total area under the normal distribution’s PDF is equal to 1.

5) Each unique normal distribution curve is entirely defined by its two parameters population mean µ and population standard deviation σ.

6) The density of the normal distribution’s PDF is the highest at its mean and always decreases as distance from the mean increases.

7) 50 percent of values of a normally-distributed population are less than the population mean and 50 percent of the values are greater than the mean.

8) Approximately 68 percent of the total area under the PDF curve resides within one σ from the mean, approximately 95 percent of the total resides within two σs, and approximately 99.7 percent of the total area resides within three σs from the mean. This is sometimes known as the Empirical Rule or the 68-95-99.7 Rule.

9) f(x) is infinitely differentiable.

10) The first derivative of f(x) is positive for all x < µ and negative for all x > µ.

11) The second derivative of f(x) has two inflection points which are located one population standard deviation above and below the population mean. These inflection points are located at x = µ ± σ. An inflection point occurs at the point that the 2nd derivative equals zero and changes sign as x continues.

12) It is log-concave. A function f(x) is log-concave if its natural log, ln[f(x)], is concave. A log-concave function does not have multiple separate maxima and its tails are not “too thick.” Other well-known distributions that are log-concave include the following:

- exponential distribution

- uniform distribution over any convex set

- logistic distribution

- gamma distribution if its shape parameter is >=1

- Chi-Square distribution if the number of degrees of freedom >=2

- beta distribution if both shape parameters are >=1

- Weibull distribution if the shape parameter is >=1

The following well-known distributions are non-log-concave for all parameters:

- t-Distribution

- log-normal distribution

- F-distribution

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

Normal Distribution’s PDF in Excel 2010 and Excel 2013

Normal Distribution’s PDF

(Probability Density

Function)
in Excel

This is one of the following eight articles on the normal distribution in Excel

Overview of the Normal Distribution

Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013

Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013

Solving Normal Distribution Problems in Excel 2010 and Excel 2013

Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013

An Important Difference Between the t and Normal Distribution Graphs

The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean

Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way

The normal distribution is a family of distributions with each unique normal distribution being fully described by its two parameters µ (“mu,” population mean) and σ (“sigma,” population standard deviation). The population mean, µ, is a location parameter and the population standard deviation, σ, is a scale parameter. When two different normal distribution curves are plotted on the same set of horizontal and vertical axes, the means determine how shifted one curve is to the left of right of the other curve. The standard deviations detail how much wider or more narrow the first normal curve is than to the second.

The normal distribution is often denoted as N(µ,σ2). σ2 equals the population variance. When a random variable X is normal-distributed with a population mean µ and population variance σ2, it is written in the following form:

X ~ N(µ, σ2)

It is important to note that the two parameters of the normal distribution are population parameters, not measurements taken from a sample. Sample statistics would provide only estimates of population parameters. The t-Distribution is used to analyze normally-distributed data when only sample statistics and/or population parameters are not known. In the real world it is much more common to analyze normally-distributed data with t-Distribution based tests than normal-distribution-based tests because only data from small samples (n<30) are available.

As with all distributions, the normal distribution has a PDF (Probability Density Function) and a CDF (Cumulative Distribution Function).

The normal distribution’s PDF (Probability Density Function) equals the probability that sampled point from a normal-distributed population has a value EXACTLY EQUAL TO X given the population’s mean, µ, and standard deviation, σ.

The normal distribution’s PDF is expressed as f(X,µ,σ).

f(X,µ,σ) = the probability that a randomly-sampled point taken from normally-distributed population with a mean µ and standard deviation σ has the value of X. It is given by the following formula:

normal distribution pdf in excel 2010 and excel 2013 statistical distribution
(Click On Image to See a Larger Version)

exp refers to the value of the mathematical constant e which is the base of the natural logarithm. e is equal to 2.71828 and is the limit of (1 + 1/n)n and n approaches infinity. ea would be expressed in Excel as =exp(a).

The mathematical constant π (“pi”) is equal to 3.14159 and is the ratio of a circle’s circumference to its diameter.

In Excel 2010 and beyond, the normal distribution’s PDF can be calculated directly by the following Excel formula:

f(X,µ,σ) = NORM.DIST(X,µ,σ,FALSE)

The Excel formula parameter “FALSE” indicates that the formula is calculating the normal distribution’s PDF (Probability Density Function) and not its CDF (Cumulative Distribution Function)

Prior to Excel 2010, the normal distribution’s PDF was calculated in Excel by this formula:

f(X,µ,σ) = NORMDIST(X,µ,σ,FALSE)

Statistical formulas that worked in Excel versions prior to 2010 will also work in Excel 2010 and 2013.

The following Excel-generated graph shows the PDF of a normal distribution that has a population mean of 10 and population standard deviation equal to 5.

normal distribution, pdf, probability density function, excel, excel 2010, excel 2013, statistics
(Click On Image to See a Larger Version)

 

Normal Distribution PDF Example in

Excel

Determine the probability that a randomly-selected variable X taken from a normally-distributed population has the value of 5 if the population mean equals 10 and the population standard deviation equals 5. The preceding Excel-generated image shows a normal distribution PDF curve with the population mean equaling 10 and the population standard deviation equaling 5.

X = 5

µ = 10

σ = 5

f(X,µ,σ) = NORM.DIST(X,µ,σ,FALSE)

f(X=5,µ=10,σ=5) = NORM.DIST(5,10,5,FALSE) = 0.04834

There is a 4.834 percent chance that randomly-selected X = 5 if X is taken from a normally-distributed population with a population mean µ = 10 and population standard deviation σ = 5. The PDF diagram of this normal distribution curve also shows the probability of X at X = 5 to that value.

Performing the same calculation in Excel using the full normal distribution PDF formula as shown as follows:

normal distribution, pdf, probability density function, excel, excel 2010, excel 2013, statistics
(Click On Image to See a Larger Version)

f(X=5,µ=10,σ=5) =(1/(SQRT(2*3.14159*5^2)))*EXP((-1)*((5-10)^2)/(2*5^2))

= 0.04834

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

Normal Distribution’s CDF in Excel 2010 and Excel 2013

This is one of the following eight articles on the normal distribution in Excel

Overview of the Normal Distribution

Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013

Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013

Solving Normal Distribution Problems in Excel 2010 and Excel 2013

Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013

An Important Difference Between the t and Normal Distribution Graphs

The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean

Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way

The Normal Distribution’s

CDF (Cumulative

Distribution Function)in

Excel

The normal distribution’s CDF (Cumulative Distribution Function) equals the probability that sampled point from a normal-distributed population has a value UP TO X given the population’s mean, µ, and standard deviation, σ.

The normal distribution’s CDF is expressed as F(X,µ,σ).

normal distribution, cdf, cumulative distribution function, excel, excel 20101, excel 2013, statistics
(Click on Image To See a Larger Version)

Unlike the normal distribution’s PDF, the CDF has no convenient closed form of its equation, which is the integral just shown.

In Excel 2010 and beyond, the normal distribution’s CDF must be calculated by the following Excel formula:

F(X,µ,σ) = NORM.DIST(X,µ,σ,TRUE)

The Excel formula parameter “TRUE” indicates that the formula is cumulative, i.e., it is calculating the normal distribution’s CDF (Cumulative Distribution Function) and not its PDF (Probability Density Function).

Prior to Excel 2010, the normal distribution’s PDF was calculated in Excel by this formula:

F(X,µ,σ) = NORMDIST(X,µ,σ,TRUE)

Statistical formulas that worked in Excel versions prior to 2010 will also work in Excel 2010 and 2013.

Note that the CDF has asymptotic values of 0 as X decreases and an asymptotic value of 1 as X increases as shown in the following Excel-generated image:

normal distribution, cdf, cumulative distribution function, excel, excel 20101, excel 2013, statistics
(Click on Image To See a Larger Version)

The normal distribution’s CDF has a value of exactly 0.5 when X equals the population mean. This indicates that 50 percent of the entire area under the normal distribution’s PDF is contained under the curve before X reaches a value of the population mean. The underlying meaning is that a randomly-sample point from a normally-distributed population has a 50 percent chance of having a value less than or equal the population mean. This can be seen in the Excel-generated graph of the normal distribution’s PDF as follows:

normal distribution, cdf, cumulative distribution function, excel, excel 20101, excel 2013, statistics
(Click on Image To See a Larger Version)

This CDF always has a value of 0.15866 when X is one standard deviation below the mean. This indicates that 15.866 percent of the area under the normal distribution’s PDF curve occurs before X reaches the value of the point that is one standard deviation below the population’s mean. The underlying meaning is that a randomly-sample point from a normally-distributed population has a 15.866 percent chance of having a value less than or equal the value that is one standard deviation below the population mean.

This CDF always has a value of 0.84135 when X is one standard deviation above the mean. This indicates that 84.135 percent of the area under the normal distribution’s PDF curve occurs before X reaches the value of the point that is one standard deviation below the population’s mean. The underlying meaning is that a randomly-sample point from a normally-distributed population has an 84.135 percent chance of having a value less than or equal the value that is one standard deviation below the population mean.

A randomly-selected point from a normally-distributed population has a 68.269 percent chance of having a value X that is within one standard deviation above or below the mean (84.135 – 15.866 = 68.269). In other words, 68.269 percent of normally-distributed data lie within one standard deviation of the mean. Similar analysis shows that approximately 95 percent of all normally-distributed data lie within two standard deviations of the mean and 99.7 percent of the data are within three standard deviations of the mean. This rule is often referred to as the Empirical Rule or the 68-95-99.5 Rule.

 

Normal Distribution CDF Example in

Excel

Determine the probability that a randomly-selected variable X taken from a normally-distributed population has the value of UP TO 5 if the population mean equals 10 and the population standard deviation equals 5.

X = 5

µ = 10

σ = 5

F(X,µ,σ) = NORM.DIST(X,µ,σ,TRUE)

F(X=5,µ=10,σ=5) = NORM.DIST(5,10,5,TRUE) = 0.15866

There is a 15.866 percent chance that randomly-selected X equals UP TO 5 if X is taken from a normally-distributed population with a population mean µ = 10 and population standard deviation σ = 5. The CDF diagram of this normal distribution curve also shows the probability of X at X = 5 to that value in the following Excel-generated image.

normal distribution, cdf, cumulative distribution function, excel, excel 20101, excel 2013, statistics
(Click on Image To See a Larger Version)

The following PDF diagram of this normal distribution curve shows 15.866 percent of the total area under the bell-shaped curve that is to the left of X = 5.

.normal distribution, cdf, cumulative distribution function, excel, excel 20101, excel 2013, statistics
(Click on Image To See a Larger Version)

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!