Correctable Reasons
Why Your Data Is
Not Normally Distributed
In the ideal world, all of your data samples are normally distributed. In this case you can usually apply the well-known parametric statistical tests such as ANOVA, the t Test, and regression to the sampled data.
What can you do if your data does not appear to be normally distributed?
You can either:
- Apply nonparametric tests to the data. Nonparametric tests do not rely on the underlying data to have any specific distribution
- Evaluate whether your “non-normal” data was really normally- distributed before it was affected by one of the seven correctable causes listed below:
The Biggest 7 Correctable Causes of Non-Normality in Data Samples
1) Outliers – Too many outliers can easily skew normally-distributed data. If you can identify and remove outliers that are caused by error in measurement or data entry, you might be able to obtain normally-distributed data from your skewed data set. Outliers should only be removed if a specific cause of their extreme value is identified. The nature of the normal distribution is that some outliers will occur. Outliers should be examined carefully if there are more than would be expected.
2) Data has been affected by more than one process – It is very important to understand all of the factors that can affect data sample measurement. Variations to process inputs might skew what would otherwise be normally-distributed output data. Input variation might be caused by factors such as shift changes, operator changes, or frequent changes in the underlying process. A common symptom that the output is being affected by more than one process is the occurrence of more than one mode (most commonly occurring value) in the output. In such a situation, you must isolate each input variation that is affecting the output. You must then isolate the overall effect which that variation had on the output. Finally, you must remove that input variation’s effect from output measurement. You may find that you now have normally-distributed data.
3) Not enough data – A normal process will not look normal at all until enough samples have been collected. It is often stated that 30 is the where a “large” sample starts. If you have collected 50 or fewer samples and do not have a normally-distributed sample, collect at least 100 samples before re-evaluating the normality of the population from which the samples are drawn.
4) Measuring devices that have poor resolution – Devices with poor resolution may round off incorrectly or make continuous data appear discrete. You can, of course, use a more accurate measuring device. A simpler solution is to use a much larger sample size to smooth out sharp edges.
5) A different distribution describes the data – Some forms of data inherently follow different distributions. For example, radioactive decay is described by the exponential distribution. The Poisson distribution describes events event that tend to occur at predictable intervals over time, such as calls over a switchboard, number of defects, or demand for services. The lengths of time between occurrences of Poisson-distributed processes are described by the exponential distribution. The uniform distribution describes events that have an equal probability of occurring. Application of the Gamma distribution often based on intervals between Poisson-distributed events, such as queuing models and the flow of items through a manufacturing process. The Beta distribution is often used for modeling planning and control systems such are PERT and CPM. The Weibull distribution is used extensively to model time between failure of manufactured items, finance, and climatology. It is important to become familiar with the applications of other distributions. If you know that the data is described by a different distribution than the normal distribution, you will have to apply the techniques of that distribution or use nonparametric analysis techniques.
6) Data approaching zero or a natural limit – If the data has a large number of value than are near zero or a natural limit, the data may appear to be skewed. In this case, you may have to adjust all data by adding a specific value to all data being analyzed. You need to make sure that all data being analyzed is “raised” to the same extent.
7) Only a subset of process’ output is being analyzed – If you are sampling only a specific subset of the total output of a process, you are likely not collecting a representative sample from the process and therefore will not have normally distributed samples. For example, if you are evaluating manufacturing samples that occur between 4 and 6AM and not an entire shift, you might not obtain the normally-distributed sample that a whole shift would provide. It is important to ensure that your sample is representative of an entire process.
If you are unable to obtain a normally-distributed data sample, you can usually apply non-parametric tests to the data.
If you would like to create a link to this blog article, here is the link to copy for your convenience:
Correctable Reasons Why Your Data Is Not Normally Distributed
Please post any comments you have on this article. Your opinion is highly valued!
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- Overview of the 1-Sample t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- Overview of 2-Independent-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2-Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- Overview of z-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Simple Linear Regression Example in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression Example in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression Performed in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer-Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA Example in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared (?2) in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi (?) – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared (?2) in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
This is a good summary thanks
ReplyDeleteMua vé máy bay tại Aivivu, tham khảo
ReplyDeleteVe may bay di My
vé máy bay từ los angeles về việt nam
bay từ đức về việt nam mấy tiếng
vé máy bay từ nga về tphcm
giá vé máy bay từ anh về hà nội
máy bay từ pháp về việt nam
chuyến bay chuyên gia trung quốc