This is one of the following five articles on Confidence Intervals in Excel
z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
Confidence Interval of
Population Proportion in 2
Steps in Excel
Confidence intervals covered in this manual will either be Confidence Intervals of a Population Mean or Confidence Intervals of a Population Proportion. A data point of a sample taken for a confidence interval of a population mean can have a range of values. A data point of a sample taken for a confidence interval of a population proportion is binary; it can take only one of two values.
Data observations in the sample taken for a confidence interval of a population proportion are required to be distributed according to the binomial distribution. Data that are binomially distributed are independent of each other, binary (can assume only one of two states), and all have the same probability of assuming the positive state.
A basic example of a confidence interval of a population proportion would be to create a 95-percent confidence interval of the overall proportion of defective units produced by one production line based upon a random sample of completed units taken from that production line. A sampled unit is either defective or it is not. The 95-percent confidence interval is range of values that has a 95-percent certainty of containing the proportion defective (the defect rate) of all of the production from that production line based on a random sample taken from the production line.
The data sample used to create a confidence interval of a population proportion must be distributed according to the binomial distribution. The confidence interval is created by using the normal distribution to approximate the binomial distribution. The normal approximation of the binomial distribution allows for the convenient application of the widely-understood z-based confidence interval to be applied to binomially-distributed data.
The binomial distribution can be approximated by the normal distribution under the following two conditions:
1) p (the probability of a positive outcome on each trial) and q (q = 1 – p) are not too close to 0 or 1.
2) np > 5 and nq > 5
The Standard Error and half the width of a confidence interval of proportion are calculated as follows:
Margin of Error = Half Width of C.I. = z Valueα, 2-tailed * Standard Error
Margin of Error = Half Width of C.I. = NORM.S.INV(1 – α/2) * SQRT[ (p_bar * q_bar) / n]
Example of a Confidence Interval
of a Population Proportion
in Excel
In this example a 95 percent confidence interval of a population proportion is created around a sample proportion using the normal distribution to approximate the binomial distribution.
This example evaluates a group of shoppers who either prefer to pay by credit or by cash. A random sample of 1,000 shoppers was taken. 70% of the sampled shoppers preferred to pay with a credit card. The remaining 30% of the sampled shoppers preferred to pay with cash.
Determine the 95% Confidence Interval for the proportion of the general population that prefers to pay with a credit card. In other words, determine the endpoints of the interval that is 95 percent certain to contain the true proportion of the total shopping population that prefers to pay by credit card.
Summary of Problem Information
p_bar = sample proportion = 0.70
q_bar = 1 – p_bar = 1 – 0.70 = 0.30
p = population proportion = Unknown (This is what the confidence interval will contain.)
n = sample size = 1,000
α = Alpha = 1 – Level of Certainty = 1 – 0.95 = 0.05
SE = Standard Error = SQRT[ (p_bar * q_bar) / n]
SE = SQRT[ (0.70 * 0.30) / 1000] = 0.014491
As when creating all Confidence of Proportion, we must satisfactorily answer these two questions and then proceed to the two-step method of creating the Confidence Interval of Proportion.
The Initial Two Questions That Must be Answered Satisfactorily
What Type of Confidence Interval Should Be Created?
Have All of the Required Assumptions For This Confidence Interval Been Met?
The Two-Step Method For Creating Confidence Intervals of Mean are the following:
Step 1 - Calculate the Half-Width of the Confidence Interval (Sometimes Called the Margin of Error)
Step 2 – Create the Confidence Interval By Adding to and Subtracting From the Sample Mean Half the Confidence Interval’s Width
The Initial Two Questions That Need To Be Answered Before Creating a Confidence Interval of the Mean or Proportion Are as Follows:
Question 1) Type of Confidence Interval?
a) Confidence Interval of Population Mean or Population Proportion?
This is a Confidence Interval of a population proportion because sampled data observations are binary: they can take only one of two possible values. A shopper sampled either prefers to pay with a credit card or prefers to pay with cash.
The data sample is distributed according to the binomial distribution because each observation has only two possible outcomes, the probability of a positive outcome is the same for all sampled data observations, and each data observation is independent from all others.
Sampled data points used to create a confidence interval of a population mean can take multiple values or values within a range. This is not the case here because sampled data observations can have only two possible outcomes: a sampled shopper either prefers to pay with credit card or with cash.
b) t-Based or z-Based Confidence Interval?
A Confidence Interval of proportion is always created using the normal distribution. The binomial distribution of binary sample data is closely approximated by the normal distribution in certain conditions.
The next step in this example will evaluate whether the correct conditions are in place that permit the approximation of the binomial distribution by the normal distribution.
It should be noted that the sample size (n) equals 1,000. At that sample size, the t distribution is nearly identical to the normal distribution. Using the t distribution to create this Confidence Interval would produce exactly the same result as the normal distribution produces.
This confidence interval will be a confidence interval of a population proportion and will be created using the normal distribution to approximate the binomial distribution of the sample data.
Question 2) All Required Assumptions Met?
Binomial Distribution Can Be Approximated By Normal Distribution?
The most important requirement of a Confidence Interval of a population proportion is the validity of approximating the binomial distribution (that the sampled objects follow because they are binary) with the normal distribution.
The binomial distribution can be approximated by the normal distribution sample size, n, is large enough and p is not too close to 0 or 1. This can be summed up with the following rule:
The binomial distribution can be approximated by the normal distribution if np > 5 and nq >5. In this case, the following are true:
n = 1,000
p = 0.70 (p is approximated by p_bar)
q = 0.30 (q is approximated by q_bar)
np = 700 and nq = 300
It is therefore valid to approximate the binomial distribution with the normal distribution.
The binomial distribution has the following parameters:
Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
If np is substituted for the normal distribution’s mean and npq is substituted for the normal distribution’s standard deviation as follows:
Normal (mean, standard deviation)
becomes
Normal (np, npq), which approximates Binomial (n,p)
This can be demonstrated with Excel using data from this problem.
n = 1000
n = the number of trials in one sample
p = 0.7 (p is approximated by p_bar)
p = the probability of obtaining a positive result in a single trial
q = 0.7 (q is approximated by q_bar)
q = 1 - p
np = 700
npq = 210
at arbitrary point X = 700
(X equals the number positive outcomes in n trials)
BINOM.DIST(X, n, p, FALSE) = BINOM.DIST(700, 1000, 0.7, FALSE) = 0.0275
The Excel formula to calculate the PDF (Probability Density Function) of the normal distribution at point X is the following:
NORM.DIST(X, Mean, Stan. Dev, FALSE)
The binomial distribution can now be approximated by the normal distribution in Excel by the following substitutions:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)
NORM.DIST(X, np, npq, FALSE) = NORM.DIST(700,700,210,FALSE) = 0.0019
BINOM.DIST(X, n, p, FALSE) = BINOM.DIST(700, 1000, 0.7, FALSE) = 0.0275
The difference is less than 0.03 and is reasonable close. Note that this approximation only works for the PDF (Probability Density Function) and not the CDF (Cumulative Distribution Function – Replacing FALSE with TRUE in the above formulas would calculate the CDF instead of the PDF).
We now proceed to the two-step method for creating all Confidence Intervals of a population proportion. These steps are as follows:
Step 1) Calculate the Width of Half of the Confidence Interval
Step 2 – Create the Confidence Interval By Adding and Subtracting the Width of Half of the Confidence Interval from the Sample Mean
Proceeding through the four steps is done is follows:
Step 1) Calculate Width-Half of Confidence Interval
Half the Width of the Confidence Interval is sometimes referred to the Margin of Error. The Margin of Error will always be measured in the same type of units as the sample proportion is measured in, which is percentage. Calculating the Half Width of the Confidence Interval using the t distribution would be done as follows in Excel:
Margin of Error = Half Width of C.I. = z Valueα, 2-tailed * Standard Error
Margin of Error = Half Width of C.I. = NORM.S.INV(1 – α/2) * SQRT[ (p_bar * q_bar) / n]
Margin of Error = Half Width of C.I. = NORM.S.INV(0.975) * SQRT[ (0.7 * 0.3) / 1000]
Margin of Error = Half Width of C.I. = 1.95996 * 0.014491
Margin of Error = Half Width of C.I. = 0.0284, which equals 2.84 percent
Step 2 Confidence Interval = Sample Proportion ± C.I. Half-Width
Confidence Interval = Sample Proportion ± (Half Width of Confidence Interval)
Confidence Interval = p_bar ± 0.0284
Confidence Interval = 0.70 ± 0.0284
Confidence Interval = [ 0.6716, 0.7284 ], which equals 67.16 percent to 72.84 percent
We now have 95 percent certainty that the true proportion of all shoppers who prefer to pay with a credit card is between 67.16 percent and 72.84 percent.
A Excel-generated graphical representation of this confidence interval is shown as follows:
(Click On Image To See a Larger Version)
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- Overview of Hypothesis Tests Using the Normal Distribution in Excel 2010 and Excel 2013
- One-Sample z-Test in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Unpooled z-Test in 4 Steps in Excel 2010 and Excel 2013
- Overview of the Paired (Two-Dependent-Sample) z-Test in 4 Steps in Excel 2010 and Excel 2013
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- 1-Sample t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- 2-Independent-Sample Pooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2- Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in 12 Steps in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2- Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in 4 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in 4 Steps in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test in 8 Steps As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in 4 Steps in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Complete Simple Linear Regression Example in 7 Steps in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in 8 Steps in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Complete Multiple Linear Regression Example in 6 Steps in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression in 6 Steps in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer- Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA in 5 Steps in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in 7 Steps in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in 8 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in 4 Steps in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in 5 Steps in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Randomized Block Design ANOVA in Excel
- Repeated-Measures ANOVA in Excel
- Single-Factor Repeated-Measures ANOVA in 4 Steps in Excel 2010 and Excel 2013
- Sphericity Testing in 9 Steps For Repeated Measures ANOVA in Excel 2010 and Excel 2013
- Effect Size For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- Friedman Test in 3 Steps For Repeated-Measures ANOVA in Excel 2010 and Excel 2013
- ANCOVA in Excel
- Normality Testing in Excel
- Creating a Box Plot in 8 Steps in Excel
- Creating a Normal Probability Plot With Adjustable Confidence Interval Bands in 9 Steps in Excel With Formulas and a Bar Chart
- Chi-Square Goodness-of-Fit Test For Normality in 9 Steps in Excel
- Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk Normality Tests in Excel
- Nonparametric Testing in Excel
- Mann-Whitney U Test in 12 Steps in Excel
- Wilcoxon Signed-Rank Test in 8 Steps in Excel
- Sign Test in Excel
- Friedman Test in 3 Steps in Excel
- Scheirer-Ray-Hope Test in Excel
- Welch's ANOVA Test in 8 Steps Test in Excel
- Brown-Forsythe F Test in 4 Steps Test in Excel
- Levene's Test and Brown-Forsythe Variance Tests in Excel
- Chi-Square Independence Test in 7 Steps in Excel
- Chi-Square Goodness-of-Fit Tests in Excel
- Chi-Square Population Variance Test in Excel
- Post Hoc Testing in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s PDF in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution’s CDF in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
- VLOOKUP
No comments:
Post a Comment