Using the Confidence
Interval in Excel To
Find Customer Preferences
This article will show you exactly how to calculate a 95% Confidence Interval in Excel of the proportion of customers on a commercial web site who prefer to pay with a credit card instead of Paypal. The 95% Confidence Interval is the interval for which we are 95% sure that the true proportion of credit card-preferring customers lies within.
The larger the sample size, the more accurate and smaller the confidence interval will be. We are taking a random sample of 1,000 Internet shoppers on this e-commerce web site.
The advantages of statistical analysis in Excel to solve business statistics problems is that most problems can be solved in just one or two steps and there is no more need to look anything up on Normal Distribution tables.
The 95% Confidence Interval calculated here is a Confidence Interval for Proportion, not Means. A Confidence Interval for Means uses an entirely different set of statistical formulas. Another article in this blog shows exactly how to calculate a Confidence Interval Means. Here is the problem:
Problem: A random sample of 1,000 Internet shoppers on one web site was taken. 70% preferred to pay with a credit card. 30% preferred to pay with PayPal. Determine the 95% Confidence Interval for the proportion of the general population of shoppers on that web site that prefers to pay with a credit card.
Before we begin solving this problem, we need to know about creating Confidence Intervals in Excel. Here is a brief lesson:
Confidence Interval of a Population Proportion
Creating a Confidence Interval for a population's proportion is very similar to creating a Confidence Interval for a population's mean. The only real difference is how the standard error is calculated. Everything else is the same.
First, the difference between using sampling to estimate a population mean and using sampling to estimate a population proportion will be explained below:
Mean Sampling vs. Proportion Sampling
What determines whether a mean is being estimated or a proportion is being estimated is the number of possible outcomes of each sample taken.
Proportion samples have only two possible outcomes. For example, if you are comparing the proportion of click-through visitors in two different PPC ads who converted (purchased), each sample has only two possible values; the visitor sampled either converted (purchased) or they didn't.
Mean samples have multiple possible outcomes. For example, if you are comparing the mean sales of salespeople in two different cities, each salesperson's sales sampled can have numerous values.
Below is a description of how to calculate a Confidence Interval for a population's proportion. Note that everything is almost the same as the calculation of the Confidence Interval for a mean, except sample standard error.
Levels of Confidence and Significance
Level of Significance, α ("alpha"), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05.
Level of Confidence is selected by the user. A 95% Level is the most common. A 95% Confidence Level would correspond to a 95% Confidence Interval of the Proportion.
This would state that the actual population Proportion has a 95% probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% Level of Significance, or α = 0.05. The Confidence Level therefore equals 1 - α.
Population Proportion vs. Sample Proportion
Population Proportion = µp = p (This is what we are trying to estimate)
Sample Proportion = pavg
Standard Deviation and Standard Error
Standard Deviation is not calculated during the creation of Confidence Interval for a population proportion.
Standard Error is an estimate of population Standard Deviation from data taken from a sample. Sample Standard Error will be an estimate taken from the sample proportion, pavg, and sample size, n. This is the major difference between calculating a Confidence Interval for a proportion and for a mean. Binomial distribution rules apply to proportions because a proportion sample has only two possible outcomes, just like a binomial variable.
Sample Standard Error of a Proportion
= σpavg = SQRT(p * q / n) ≈ spavg
= spavg = SQRT ( pavg * qavg / n )
p = Population proportion - This is the unknown that will be estimated with a Confidence Interval
q = 1 - p
n = Sample size
pavg = Sample proportion
qavg = 1 - pavg
Region of Certainty vs. Region of Uncertainty
Region of Certainty is the area under the Normal curve that corresponds to the required Level of Confidence. If a 95% percent Level of Confidence is required, then the Region of Certainty will contain 95% of the area under the Normal curve. The outer boundaries of the Region of Certainty will be the outer boundaries of the Confidence Interval.
The Region of Certainty, and therefore the Confidence Interval, will be centered about the mean. Half of the Confidence Interval is on one side of the mean and half on the other side.
Region of Uncertainty is the area under the Normal curve that is outside of the Region of Certainty. Half of the Region of Uncertainty will exist in the right outer tail of the Normal curve and the other half in the left outer tail. This is similar to the concept of the "two-tailed test" that is used in Hypothesis testing in further sections of this course. The concepts of one and two-tailed testing are not used when calculating Confidence Intervals. Just remember that the Region of Certainty, and therefore the Confidence Interval, are always centered about the mean on the Normal curve.
Relationship Between Region of Certainty, Uncertainty, and Alpha
The Region of Uncertainty corresponds to α ("alpha"). If α = 0.05, then that Region of Uncertainty contains 5% of the area under the Normal curve. Half of that area (2.5%) is in each outer tail. The 95% area centered about the mean will be the Region of Certainty. The outer boundaries of this Region of Certainty will be the outer boundaries of the 95% Confidence Interval.
The Level of Confidence is 95% and the Level of Significance, or maximum error allowed, is 5%.
Z Score
Z Score is the number of Standard Errors from the mean to outer right boundary of the Region of Certainty (and therefore to the outer right boundary of the Confidence Interval).
Standard Errors are used and not Standard Deviations because sample data is being used to calculate the Confidence Interval.
Z Score is calculated by the following Excel function:
Z Score(1-α) = NORMSINV (1 - α/2) - This will be discussed shortly.
Excel Functions Used When Calculating Confidence Interval for a Population Proportion
Note that Excel functions STDEV and AVERAGE are not used when working with proportions. The CONFIDENCE function is not used either.
Excel functions that are used are:
COUNT (Highlighted block of cells) = Sample size = n
This counts the number of cells in highlighted block
NORMSINV (1 - α/2) = Z Score(1 - α)
Number of Standard Errors from mean to boundary of Confidence Interval. Note that (1 - α/2)
= the entire area in the Normal curve to the left of outer right boundary of the Region of Certainty, or Confidence Interval. This includes the entire Region of Certainty and the half of the Region of Uncertainty that exists in the left tail.
For example:
Level of Confidence = 95% for a 95% Confidence Interval
Level of Significance = 5% (α = 0.05)
1 - α = 0.95 = 95%
Z Score95% = NORMSINV (1 - α/2) = NORMSINV (1 - .05/2) = NORMSINV(1 - 0.025)
Z Score95% = NORMSINV (0.975) = 1.96
The outer right boundary of the 95% Confidence Interval, and the Region of Certainty, is 1.96 Standard Errors from the mean. The left boundary is the same distance from the mean because the Confidence Interval is centered about the mean.
Formula for Calculating Confidence Interval Boundaries from Sample Data for a Population Proportion
Confidence Interval Boundaries = Sample proportion +/- Z Score(1-α) * Sample Standard Error
Confidence Interval Boundaries = pavg +/- Z Score(1-α) * spavg
Sample Proportion = pavg
Z Score(1 - α) = NORMSINV (1 - α/2)
Sample Standard Error of a Proportion = σpavg ≈ spavg = SQRT ( pavg * qavg / n )
Sample size = n = COUNT (Highlighted block of cells containing samples)
Confidence Interval Boundaries = pavg +/- Z Score(1-α) * spavg
***************************************************
Before we use any statistical test that relies upon the underlying population data (from which samples are drawn) to be normally distributed, we must first test the data for normality. The Confidence Interval is an interval in the Normal curve. This requires that the underlying population data be Normally distributed.
***************************************************
Always Test for Normality First
Normality tests should be performed on the independent random sample of Internet shoppers. This blog has numerous articles about how to perform normality testing and nonparametric testing if the data is not normally distributed.
***************************************************
Now we're ready to solve the problem and create a Confidence Interval in Excel.
Problem: Determine Confidence Interval of Internet Shoppers Who Prefer to Pay By Credit Card Based Upon Sample Data
A random sample of 1,000 Internet shoppers on one web site was taken. 70% preferred to pay with a credit card. 30% preferred to pay with PayPal. Determine the 95% Confidence Interval for the proportion of the general population of Internet shoppers on that web site that prefers to pay with a credit card.
Level of Confidence = 95% = 1 - α
Level of Significance = α = 0.05
Sample Size = n = 1,000
Sample Proportion = pavg = 0.70
qavg = 1 - pavg = 0.30
Sample Standard Error of a Proportion = σpavg ≈ spavg = SQRT ( pavg * qavg / n )
spavg = SQRT ( 0.70 * 0.30 / 1,000 ) = 0.014
Z Score(1 - α) = Z Score95% = NORMSINV (1 - α/2)
= NORMSINV (1 - 0.025) = NORMSINV (0.975) = 1.96
Width of Half the Confidence Interval = Z Score(1-α) * spavg
= 1.96 * 0.014 = 0.0274
Confidence Interval Boundaries = pavg +/- Z Score(1-α) * spavg
= 0.70 +/- (1.96) * (0.014)
0.70 +/- 0.0274 = 0.6726 to 0.7274 = 67.26% to 72.74%
We can be 95% certain that the percentage of Internet shoppers on this e-commerce web site who prefer to pay with a credit card instead of Paypal is between 67.26% and 72.74%.
Confidence Interval in Excel To Find Customer Preferences
Excel Master Series Blog Directory
Statistical Topics and Articles In Each Topic
- Histograms in Excel
- Bar Chart in Excel
- Combinations & Permutations in Excel
- Normal Distribution in Excel
- Overview of the Normal Distribution
- Normal Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
- Normal Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
- Solving Normal Distribution Problems in Excel 2010 and Excel 2013
- Overview of the Standard Normal Distribution in Excel 2010 and Excel 2013
- An Important Difference Between the t and Normal Distribution Graphs
- The Empirical Rule and Chebyshev’s Theorem in Excel – Calculating How Much Data Is a Certain Distance From the Mean
- Demonstrating the Central Limit Theorem In Excel 2010 and Excel 2013 In An Easy-To-Understand Way
- t-Distribution in Excel
- Binomial Distribution in Excel
- z-Tests in Excel
- t-Tests in Excel
- Overview of t-Tests: Hypothesis Tests that Use the t-Distribution
- 1-Sample t-Tests in Excel
- Overview of the 1-Sample t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing For the 1-Sample t-Test in Excel 2010 and Excel 2013
- 1-Sample t-Test – Effect Size in Excel 2010 and Excel 2013
- 1-Sample t-Test Power With G*Power Utility
- Wilcoxon Signed-Rank Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- Sign Test As a 1-Sample t-Test Alternative in Excel 2010 and Excel 2013
- 2-Independent-Sample Pooled t-Tests in Excel
- Overview of 2-Independent-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Variance Tests: Levene’s, Brown-Forsythe, and F Test For 2-Sample Pooled t-Test in Excel 2010 and Excel 2013
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro Wilk Tests For Two-Sample Pooled t-Test
- Two-Independent-Sample Pooled t-Test - All Excel Calculations
- 2-Sample Pooled t-Test – Effect Size in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test Power With G*Power Utility
- Mann-Whitney U Test in Excel as 2-Sample Pooled t-Test Nonparametric Alternative in Excel 2010 and Excel 2013
- 2-Sample Pooled t-Test = Single-Factor ANOVA With 2 Sample Groups
- 2-Independent-Sample Unpooled t-Tests in Excel
- 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Variance Tests: Levene’s Test, Brown-Forsythe Test, and F-Test in Excel For 2-Sample Unpooled t-Test
- Excel Normality Tests Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk For 2-Sample Unpooled t-Test
- 2-Sample Unpooled t-Test Excel Calculations, Formulas, and Tools
- Effect Size for a 2-Independent-Sample Unpooled t-Test in Excel 2010 and Excel 2013
- Test Power of a 2-Independent Sample Unpooled t-Test With G-Power Utility
- Paired (2-Sample Dependent) t-Tests in Excel
- Paired t-Test in Excel 2010 and Excel 2013
- Excel Normality Testing of Paired t-Test Data
- Paired t-Test Excel Calculations, Formulas, and Tools
- Paired t-Test – Effect Size in Excel 2010, and Excel 2013
- Paired t-Test – Test Power With G-Power Utility
- Wilcoxon Signed-Rank Test As a Paired t-Test Alternative
- Sign Test in Excel As A Paired t-Test Alternative
- Hypothesis Tests of Proportion in Excel
- Hypothesis Tests of Proportion Overview (Hypothesis Testing On Binomial Data)
- 1-Sample Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- 2-Sample Pooled Hypothesis Test of Proportion in Excel 2010 and Excel 2013
- How To Build a Much More Useful Split-Tester in Excel Than Google's Website Optimizer
- Chi-Square Independence Tests in Excel
- Chi-Square Goodness-Of-Fit Tests in Excel
- F Tests in Excel
- Correlation in Excel
- Pearson Correlation in Excel
- Spearman Correlation in Excel
- Confidence Intervals in Excel
- Overview of z-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- t-Based Confidence Intervals of a Population Mean in Excel 2010 and Excel 2013
- Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean
- Confidence Interval of Population Proportion in Excel 2010 and Excel 2013
- Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013
- Simple Linear Regression in Excel
- Overview of Simple Linear Regression in Excel 2010 and Excel 2013
- Simple Linear Regression Example in Excel 2010 and Excel 2013
- Residual Evaluation For Simple Regression in Excel 2010 and Excel 2013
- Residual Normality Tests in Excel – Kolmogorov-Smirnov Test, Anderson-Darling Test, and Shapiro-Wilk Test For Simple Linear Regression
- Evaluation of Simple Regression Output For Excel 2010 and Excel 2013
- All Calculations Performed By the Simple Regression Data Analysis Tool in Excel 2010 and Excel 2013
- Prediction Interval of Simple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression in Excel
- Basics of Multiple Regression in Excel 2010 and Excel 2013
- Multiple Linear Regression Example in Excel 2010 and Excel 2013
- Multiple Linear Regression’s Required Residual Assumptions
- Normality Testing of Residuals in Excel 2010 and Excel 2013
- Evaluating the Excel Output of Multiple Regression
- Estimating the Prediction Interval of Multiple Regression in Excel
- Regression - How To Do Conjoint Analysis Using Dummy Variable Regression in Excel
- Logistic Regression in Excel
- Logistic Regression Overview
- Logistic Regression Performed in Excel 2010 and Excel 2013
- R Square For Logistic Regression Overview
- Excel R Square Tests: Nagelkerke, Cox and Snell, and Log-Linear Ratio in Excel 2010 and Excel 2013
- Likelihood Ratio Is Better Than Wald Statistic To Determine if the Variable Coefficients Are Significant For Excel 2010 and Excel 2013
- Excel Classification Table: Logistic Regression’s Percentage Correct of Predicted Results in Excel 2010 and Excel 2013
- Hosmer-Lemeshow Test in Excel – Logistic Regression Goodness-of-Fit Test in Excel 2010 and Excel 2013
- Single-Factor ANOVA in Excel
- Overview of Single-Factor ANOVA
- Single-Factor ANOVA Example in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For Each Single-Factor ANOVA Sample Group
- Kruskal-Wallis Test Alternative For Single Factor ANOVA in Excel 2010 and Excel 2013
- Levene’s and Brown-Forsythe Tests in Excel For Single-Factor ANOVA Sample Group Variance Comparison
- Single-Factor ANOVA - All Excel Calculations
- Overview of Post-Hoc Testing For Single-Factor ANOVA
- Tukey-Kramer Post-Hoc Test in Excel For Single-Factor ANOVA
- Games-Howell Post-Hoc Test in Excel For Single-Factor ANOVA
- Overview of Effect Size For Single-Factor ANOVA
- ANOVA Effect Size Calculation Eta Squared (?2) in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Psi (?) – RMSSE – in Excel 2010 and Excel 2013
- ANOVA Effect Size Calculation Omega Squared (?2) in Excel 2010 and Excel 2013
- Power of Single-Factor ANOVA Test Using Free Utility G*Power
- Welch’s ANOVA Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Brown-Forsythe F-Test in Excel Substitute For Single-Factor ANOVA When Sample Variances Are Not Similar
- Two-Factor ANOVA With Replication in Excel
- Two-Factor ANOVA With Replication in Excel 2010 and Excel 2013
- Variance Tests: Levene’s and Brown-Forsythe For 2-Factor ANOVA in Excel 2010 and Excel 2013
- Shapiro-Wilk Normality Test in Excel For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication Effect Size in Excel 2010 and Excel 2013
- Excel Post Hoc Tukey’s HSD Test For 2-Factor ANOVA With Replication
- 2-Factor ANOVA With Replication – Test Power With G-Power Utility
- Scheirer-Ray-Hare Test Alternative For 2-Factor ANOVA With Replication
- Two-Factor ANOVA Without Replication in Excel
- Creating Interactive Graphs of Statistical Distributions in Excel
- Interactive Statistical Distribution Graph in Excel 2010 and Excel 2013
- Interactive Graph of the Normal Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Chi-Square Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the t-Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Binomial Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Exponential Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Beta Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Gamma Distribution in Excel 2010 and Excel 2013
- Interactive Graph of the Poisson Distribution in Excel 2010 and Excel 2013
- Solving Problems With Other Distributions in Excel
- Solving Uniform Distribution Problems in Excel 2010 and Excel 2013
- Solving Multinomial Distribution Problems in Excel 2010 and Excel 2013
- Solving Exponential Distribution Problems in Excel 2010 and Excel 2013
- Solving Beta Distribution Problems in Excel 2010 and Excel 2013
- Solving Gamma Distribution Problems in Excel 2010 and Excel 2013
- Solving Poisson Distribution Problems in Excel 2010 and Excel 2013
- Optimization With Excel Solver
- Maximizing Lead Generation With Excel Solver
- Minimizing Cutting Stock Waste With Excel Solver
- Optimal Investment Selection With Excel Solver
- Minimizing the Total Cost of Shipping From Multiple Points To Multiple Points With Excel Solver
- Knapsack Loading Problem in Excel Solver – Optimizing the Loading of a Limited Compartment
- Optimizing a Bond Portfolio With Excel Solver
- Travelling Salesman Problem in Excel Solver – Finding the Shortest Path To Reach All Customers
- Chi-Square Population Variance Test in Excel
- Analyzing Data With Pivot Tables
- SEO Functions in Excel
- Time Series Analysis in Excel
Greetings! Non-violent action training equips community members with essential skills for impactful activism. These sessions focus on peaceful protest and advocacy, promoting constructive dialogue and collaborative engagement. The Resistance Center https://theresistancecenter.org/ fosters non-violence, cultivating a respectful and understanding environment where individuals can express their beliefs while upholding community unity. This approach amplifies individual voices and strengthens solidarity across the community.
ReplyDelete