This is one of the following eight articles on the normal distribution in Excel
Overview of the Normal Distribution
Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
Solving Normal Distribution Problems in Excel 2010 and Excel 2013
Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
An Important Difference Between the t and Normal Distribution Graphs
Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
Overview of the Central
Limit Theorem
The Central Limit Theorem, or CLT, is one of statistics’ most basic principles. The CLT states the following points:
When an entire population is grouped into random samples of size n, the following are true:
1) As sample size n increases, the distribution of the sample means approaches the normal distribution regardless of how the population is distributed. A sample size of 30 is often quoted as being the minimum sample size necessary to ensure that the sample means will be normal-distributed. In the real world, sample means will be normal-distributed when the sample size is much smaller. This will be demonstrated with Excel shortly.
2) The sample standard deviation will equal the population standard deviation divided by the square root of the sample size. This will also be demonstrated in Excel shortly
Why the Central Limit Theorem
Is So Important to Statistics
The Central Limit Theorem is very important to statistical work because it enables convenient normal-distribution-based and t-Distribution-based hypothesis testing and confidence-interval creation to be performed based upon a single large sample of data without having to verify that the population from which the sample was taken is normal-distributed. Samples are considered to be large if the size of each sample, n, is greater than or equal to 30.
Performing normal-distribution-based and t-Distribution-based hypothesis testing and creating confidence intervals based upon a small sample of data requires verification that the population from which the sample was taken is normal-distributed. The normality of the population is confirmed if the data within the data sample is verified to be normal-distributed. This step is not necessary when the sample is large. Samples are considered to be small if the size of each sample, n, is smaller than 30.
Hypothesis testing of a population mean and confidence intervals of a population mean are discussed in much greater detail in their own sections of this book. Hypothesis testing and confidence intervals of a population mean are calculated based upon a single sample of data. The fundamental assumption that underlies normal-distribution-based and t-Distribution-based hypothesis testing and confidence interval creation of a population mean is that the sample mean is normal-distributed.
The mean of a single sample is normal-distributed if the means of other similar samples are also normal-distributed. In other words, if the means of a number of other same-sized samples randomly taken from the same population were normal-distributed, then the mean of the single sample is normal-distributed.
When the single sample taken is large enough (sample size n is greater than or equal to 30), the mean of that sample is assumed to be normal-distributed as per the Central Limit Theorem. When the size of the single sample is less than 30, the sample’s mean cannot be assumed to be normal-distributed unless one of the following is true:
1) The population from which the sample was drawn is known to be normal-distributed.
2) The data within the single sample is verified through normality testing to be normal-distributed.
Quite often it is not possible to confirm the normality of the population. In this case normality testing should be performed on the data within the single sample taken before normal-distribution-based or t-Distribution-based hypothesis testing or the confidence interval creation can be performed based upon that single sample of data.
A Demonstration of the Central
Limit Theorem Using Excel
The Central Limit Theorem is not immediately intuitive can be more quickly understood when an example of how it works is shown in graphical form. The following example will be performed in Excel as follows:
A population of 5,000 randomly-generated data points whose distribution is highly skewed will be created in Excel. The population of 5,000 will be randomly into samples of sample size n = 2, sample size n = 5, and sample n = 10.
The distribution of the sample means will be shown in an Excel histogram created for each of the three sample sizes.
The histograms will demonstrate that the distribution of the sample means become closer and closer to the normal distribution. This exercise will also show that the sample standard deviation equals the population standard deviation divided by the square root of the sample size. This means that the distribution’s shape become tighter as the sample size increases.
Here are the steps in demonstrating how the Central Limit Theorem works using Excel.
The first step is to use Excel’s random-number generator to generate 5,000 random numbers that are highly skewed. In this case, the data will be highly skewed to the right and will following this model, which is an Excel-generated histogram:
(Click on the Image To See an Enlarged Version)
This population of 5,000 skewed random numbers by creating the following sets of random numbers in Excel:
● 1,500 random numbers between 0 and 100. Each of these numbers was created with the Excel formula RANDBETWEEN(0,100)
● 1,100 random numbers between 100 and 200. Each of these numbers was created with the Excel formula RANDBETWEEN(100,200)
● 700 random numbers between 200 and 300. Each of these numbers was created with the Excel formula RANDBETWEEN(200,300)
● 500 random numbers between 300 and 400. Each of these numbers was created with the Excel formula RANDBETWEEN(300,400)
● 400 random numbers between 400 and 500. Each of these numbers was created with the Excel formula RANDBETWEEN(400,500)
● 300 random numbers between 500 and 600. Each of these numbers was created with the Excel formula RANDBETWEEN(600,700)
● 200 random numbers between 600 and 700. Each of these numbers was created with the Excel formula RANDBETWEEN(600,700)
● 100 random numbers between 700 and 800. Each of these numbers was created with the Excel formula RANDBETWEEN(700,800)
● 100 random numbers between 800 and 900. Each of these numbers was created with the Excel formula RANDBETWEEN(800,900)
● 100 random numbers between 900 and 1,000. Each of these numbers was created with the Excel formula RANDBETWEEN(900,1,000)
These 5,000 numbers need to be randomly placed in a table so that samples can be taken. One way of doing that is to initially place all 5,000 numbers in a single Excel column. These numbers need to be randomly scrambled in the column. This is done as follows:
The columns of numbers on the right in column D are the randomly-generated numbers from several of the ranges previously created. The column of numbers on the left in column C are randomly-generated numbers from a single range using the Excel formula RANDBETWEEN(0,1000).
(Click on the Image To See an Enlarged Version)The objective is to randomly scramble the numbers in column D. The numbers in Column D will be scrambled randomly be performing a single sort of both column based upon an ascending sort of the numbers in Column C. Prior to the sort, the numbers in the two columns appear as follows:
After the sort according to Column C, the numbers appear as follows:
The next step is to take the numbers from the column on the right and place them into a table. This is necessary for taking samples and creating histograms in Excel. Converting column data into a table can be done in Excel using the OFFSET() command. As soon as any cell from C2 to E4 has the correct OFFSET() formula typed into it, that cell can be copied to all of the other cells in the table. The result is the following:
(Click on the Image To See an Enlarged Version)The data in this exercise was arrange into a table of dimensions 20 rows by 500 columns as follows:
Population (Sample size n = 1)
(Click on the Image To See an Enlarged Version)A histogram was created in Excel of this population of 5,000 data points. It population parameters are as follows:
Population size = N = 5,000 (Note that capital N is used to describe population size)
Population mean = µ = 260
Population standard deviation = σ = 230
Note that the following Excel histogram shows the population’s distribution to be highly skewed to the right.
(Click on the Image To See an Enlarged Version)
Sample Size n = 2
A table of samples of sample size n = 2 was created by taking the average of every 2 data points going down each column. A table with these samples appears as follows:
(Click on the Image To See an Enlarged Version)
Total number of samples = 2,500
Sample size = n = 2
Sample mean = x_bar = 260
Sample standard deviation = 163 = σ / SQRT(n) (with rounding error)
Note that the shape of the distribution of the sample means when the sample size is 2 is still highly skewed to the right. The bell shape of the normal distribution is already starting to form at a sample size of 2 even though the population from which the samples were taken is highly skewed.
Note how much more narrow the distribution’s shape has become as sample size increase from n = 1 to n = 2.
(Click on the Image To See an Enlarged Version)
Sample Size n = 5
A table of samples of sample size n = 5 was created by taking the average of every 5 data points going down each column. A table with these samples appears as follows:
(Click on the Image To See an Enlarged Version)Total number of samples = 1,000
Sample size = n = 5
Sample mean = x_bar = 260
Sample standard deviation = 104 = σ / SQRT(n) (with rounding error)
Note that the shape of the distribution of the sample means when the sample size is 5 already has a strong resemblance to the bell-shape normal distribution’s PDF curve even though the population from which the samples were taken is highly skewed. A mild skew to the right can still be seen.
Note how much more narrow the distribution’s shape has become as sample size increase from n = 2 to n = 5. The reduction in the sample standard deviation indicates how much thinner the distribution’s shape has become.
(Click on the Image To See an Enlarged Version)
Sample Size n = 10
A table of samples of sample size n = 10 was created by taking the average of every 10 data points going down each column. A table with these samples appears as follows:
(Click on the Image To See an Enlarged Version)Total number of samples = 500
Sample size = n = 10
Sample mean = x_bar = 260
Sample standard deviation = 72 = σ / SQRT(n) (with rounding error)
Note that the shape of the distribution of the sample means when the sample size is 10 now nearly exactly resembles the bell-shape normal distribution’s PDF curve even though the population from which the samples were taken is highly skewed.
Note how much more narrow the distribution’s shape has become as sample size increase from n = 5 to n = 10. The reduction in the sample standard deviation indicates how much thinner the distribution’s shape has become.
(Click on the Image To See an Enlarged Version)The most important point of the Central Limit Theorem is that the distribution of sample means become closer and closer to the normal distribution as sample size increases regardless of the distribution of the population from which the samples were taken.
Most statistics texts state that the sample size must reach at least 30 before the sample means are nearly certain to be normal-distributed. In the real world, the distribution of the sample means converges to normality at significantly smaller sample sizes. The example just provided shows a reasonable resemblance to the normal distribution’s PDF curve when the sample size is only 5 and the population was highly skewed. A sample size of 10 shows an almost exact resemblance to the normal distribution’s PDF curve.
The Resulting Robustness
of the t-Test
The quick convergence of sample mean distribution to normality as sample size increases means that the t–test is relatively robust on non-normally-distributed data as long as the sample size is not too small and the data is not too skewed.
The example just provided shows that even highly-skewed data will still have near-normal distribution of sample means when the sample size is as low as 10. Data that is not skewed will have sample means achieving near-normality at smaller sample sizes than 10.
The t-test is therefore very robust in the face of non-normal data. The t-test can produce a reasonably accurate result for sample sizes as low as 5 to 10 for data that is not skewed. Many statistics texts state that sample size must be large (n > 30) for a t-test to be reliable but that is not the case. The example provided in this section shows the sample means of highly-skewed data converging to near-normality at a sample size of 10.
Derivation of the Central Limit
Theorem
The derivation of the Central Limit Theorem is readily available on the Internet. The CLT’s derivation is not trivial and involves a significant amount of calculus. Understanding this complicated derivation does not significantly add to one’s understanding of when and how to apply the CLT. For that reason the derivation of the CLT is not discussed here.
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013
- One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013
- Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- 1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- 2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2- Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression in 6 Steps in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in 5 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Randomized Block Design ANOVA in Excel
- Repeated-Measures ANOVA in Excel
- Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel 2010 and Excel 2013
- Sphericity Testing in 9 Steps For Repeated Measures ANOVA in Excel 2010 and Excel 2013
- Effect Size For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- Friedman Test in 3 Steps For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- ANCOVA in Excel
- Normality Testing in Excel
- Creating a Box Plot in 8 Steps in Excel
- Creating a Normal Probability Plot With Adjustable Confidence Interval Bands in 9 Steps in Excel With Formulas and a Bar Chart
- Chi-Square Goodness-of-Fit Test For Normality in 9 Steps in Excel
- Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk Normality Tests in Excel
- Nonparametric Testing in Excel
- Mann-Whitney U Test in 12 Steps in Excel
- Wilcoxon Signed-Rank Test in 8 Steps in Excel
- Sign Test in Excel
- Friedman Test in 3 Steps in Excel
- Scheirer-Ray-Hope Test in Excel
- Welch's ANOVA Test in 8 Steps Test in Excel
- Brown-Forsythe F Test in 4 Steps Test in Excel
- Levene's Test and Brown-Forsythe Variance Tests in Excel
- Chi-Square Independence Test in 7 Steps in Excel
- Chi-Square Goodness-of-Fit Tests in Excel
- Chi-Square Population Variance Test in Excel
- Post Hoc Testing in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
- VLOOKUP
This comment has been removed by the author.
ReplyDelete