Monday, June 2, 2014

t-Based Confidence Interval of a Population Mean in 2 Steps in Excel 2010 and Excel 2013

This is one of the following five articles on Confidence Intervals in Excel

z-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013

t-Based Confidence Intervals of a Population Mean in 2 Steps in Excel 2010 and Excel 2013

Minimum Sample Size to Limit the Size of a Confidence interval of a Population Mean

Confidence Interval of Population Proportion in 2 Steps in Excel 2010 and Excel 2013

Min Sample Size of Confidence Interval of Proportion in Excel 2010 and Excel 2013

 

t-Based Confidence

Interval of a Population

Mean in 2 Steps in Excel

This confidence interval of a population mean is based upon the sample mean being distributed according to the t distribution. A 95-percent confidence interval of a population mean is an interval that has a 95-percent chance of containing the population mean.

The sample mean is distributed according to the t distribution if any of the following sets of conditions is in effect:

1) The sample size is large and the population standard deviation is not known.

2) The sample size is small (n < 30) and the population is shown to be normally distributed.

3) The sample size is small (n < 30) and the sample is proven to be normally distributed.

x_bar = Observed Sample Mean

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval

Margin of Error = Half Width of C.I. = t Valueα/2 * Standard Error

Margin of Error = Half Width of C.I. = T.INV(1 – α/2,df) * s/SQRT(n)

A confidence interval of a population mean that is based on the normal distribution is z-based. A confidence interval of a population mean that is based on the t distribution is t-based.

It is much more common to use the t distribution than the normal distribution to create a confidence interval of the population mean. Requirements for t-based confidence intervals are much less restrictive than the requirements for a z-based confidence interval.

 

Example of a t-Based Confidence

Interval of a Population Mean in

Excel

In this example a 95 percent Confidence Interval is created around a sample mean using the normal distribution.

In this example a 95 percent Confidence Interval is created around a sample mean. There is a 95 percent chance that the population mean is contained within this Confidence Interval.

A company is evaluating whether to purchase a large number of electric-powered machines. An importance purchase criterion is how long the machine can operate after being fully charged.

To determine how long the machine can be expected to operate on a single charge, the company purchased 20 machines and fully charged each. Each of these machines was then operated at full speed until the charge ran out and the machine stopped running. The number of hours that each machine was able to operate after a full charge at full speed was recorded.

Calculate the interval in which contains that average operation length of all of the machines with 95 percent certainty. In other words, calculate the 95 percent Confidence Interval of the mean operating length for all machines based upon the sample of 20 machines that was tested. The data provided is as follows:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval
(Click On Image To See a Larger Version)

Running the Excel data analysis tool Descriptive Statistics will provide the Sample Mean, the Sample Standard Deviation, the Standard Error, and the Sample Size. The output of this tool appears as follows:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval
(Click On Image To See a Larger Version)

The above Descriptive Statistics are obtained by running Excel Descriptive Statistics data analysis tool as shown below. It is important to select the Confidence Level checkbox and specific that confidence level desired (95 percent in this case). Doing so calculates half of the width of the 95 percent Confidence Interval using the t distribution as this example will also do. Below is the Descriptive Statistics completed dialogue box:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval
(Click On Image To See a Larger Version)

 

Summary of Problem Information

x_bar = sample mean = AVERAGE() = 250

µ = (population) mean of all machines = Unknown

s = sample standard deviation =STDEV.S() = 37.170

σ (Greek letter “sigma”) = population standard deviation = Not Known

n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 37.170 / SQRT(20)

SE = 8.311

Note that this calculation of the Standard Error using the sample standard deviation, s, is an estimate of the true Standard Error which would be calculated using the population standard deviation, σ.

df = degrees of freedom = n – 1 = 20 – 1 = 19

Level of Certainty = 0.95

Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

As when creating all Confidence of Mean, we must satisfactorily answer these two questions and then proceed to the two-step method of creating the Confidence Interval.

 

The Initial Two Questions That Must be Answered Satisfactorily

What Type of Confidence Interval Should Be created?

Have All of the Required Assumptions For This Confidence Interval Been Met?

 

The Two-Step Method For Creating Confidence Intervals of Mean Are the Following:

Step 1 - Calculate the Half-Width of the Confidence Interval (Sometimes Called the Margin of Error)

Step 2 – Create the Confidence Interval By Adding to and Subtracting From the Sample Mean Half the Confidence Interval’s Width

 

The Initial Two Questions That Need To Be Answered Before Creating a Confidence Interval of the Mean or Proportion Are as Follows:

 

Question 1) Type of Confidence Interval?

a) Confidence Interval of Population Mean or Population Proportion?

This is a Confidence Interval of a population mean because each individual observation (each sampled machine’s length of operation) within the entire sample can have a wide range of values. The samples values are spread out between 175 and 300. On the other hand, sampled data points used to create a Confidence Interval of a population proportion are binary: they can take only one of two possible values.

 

b) t-Based or z-Based Confidence Interval?

A Confidence Interval created using the t distribution is said to be t-based. A Confidence Interval created using the normal distribution is said to be z-based. It is much more common to use the t distribution to create Confidence Intervals of a population mean because the t distribution is much less restrictive. The t distribution can always be used. The normal distribution can only be used if:

Sample size is large (n > 30)

AND

The population standard deviation, σ, is known.

In this case sample size is small (n < 30) and the population standard deviation is not known. The t distribution must therefore be used to create this Confidence Interval of a population mean. This Confidence Interval of a population mean will be t-based.

This Confidence Interval will be a Confidence Interval of a population mean and will be created using the t distribution.

 

Question 2) All Required Assumptions Met?

a) Normal Distribution of the Sample Mean

We are attempting to create a confidence interval about the sample mean which contains the population mean. To create a confidence interval that is based on the normal distribution or t distribution, the sample mean must be normally distributed. In other words, if we took multiple samples just like the one mentioned here, the means of those samples would have to be normally distributed in order to be able to create a confidence interval that is based upon the normal or t distributions.

For example, 30 independent, random samples of 20 machines each could be tested for mean length of operation just like the single sample of 20 machines in this example was tested. If those means of all 30 samples are normally distributed, a confidence interval based on the t distribution can be created around the mean of the single sample taken.

 

Sample Means Are Normally Distributed If Any of the Following Are True:

1) Sample Size of Each Sample, n, Is Greater Than 30.

The Central Limit Theorem states that the means of similar-sized, random, independent samples will be normally distributed if the sample size is large (n >30) no matter how the underlying population from which the samples came from is distributed. In reality, the distribution of sample means converges toward normality when n is as small as 5 as long as the underlying population is not too skewed.

 

2) Population Is Normally Distributed.

If this is the case, the means of similar sized, random, independent samples will also be normally distributed. It is quite often the case that the distribution of the underlying population is not known and the normal distribution of a population should not be assumed until proven.

 

3) Sample Is Normally Distributed.

If the sample is normally distributed, the means of other similar-sized, independent, random samples will also be normally distributed. Normality testing must be performed on the sample to determine whether the sample is normally distributed.

In this case the sample size is small (n = 10) and the population’s distribution is unknown. The only remaining way to verify normal distribution of the sample mean is to verify normal distribution of the sample. The sample must be therefore be tested and confirmed for normality before a Confidence Interval based on t distribution can be created.

 

Evaluating the Normality of the Sample Data

The following five normality tests will be performed on the sample data here:

1) An Excel histogram of the sample data will be created.

2) A normal probability plot of the sample data will be created in Excel.

3) The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.

4) The Anderson-Darling test for normality of the sample data will be performed in Excel.

5) The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

 

Histogram in Excel

The quickest way to check the sample data for normality is to create an Excel histogram of the data as shown below, or to create a normal probability plot of the data if you have access to an automated method of generating that kind of a graph.

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval
(Click On Image To See a Larger Version)

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

The sample group appears to be distributed reasonably closely to the bell-shaped normal distribution. It should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes could result in an output that would not appear bell-shaped at all. What is actually set by the user in an Excel histogram is the upper boundary of each bin.

 

Normal Probability Plot in Excel

Another way to graphically evaluate normality of each data sample is to create a normal probability plot for each sample group. This can be implemented in Excel and appears as follows:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

The normal probability plots for the sample group show that the data appears to be very close to being normally distributed. The actual sample data (red) matches very closely the data values of the sample were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval boundaries (green).

 

Kolmogorov-Smirnov Test For Normality in Excel

The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test states that the distribution of actual data points matches the distribution that is being tested. In this case the data sample is being compared to the normal distribution.

The largest distance between the CDF of any data point and its expected CDF is compared to Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

0.1500 = Max Difference Between Actual and Expected CDF

20 = n = Number of Data Points

0.05 = α

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Max Difference Between the Actual and Expected CDF (0.1500) is less than the Kolmogorov-Smirnov Critical Value for n = 20 and α = 0.05 so do not reject the Null Hypothesis.

The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data are normally distributed, is rejected if the maximum difference between the expected and actual CDF of any of the data points exceed the Critical Value for the given n and α.

 

Anderson-Darling Test For Normality in Excel

The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were perfectly normally distributed.

The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test for the following two reasons:

The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is therefore more sensitive to the specific distribution.

The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov test. The K-S test is less sensitive to aberration in outer values than the A-D test.

If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is rejected and the data sample is determined to have a different distribution than the tested distribution. If the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested distribution.

F(Xk) = CDF(Xk) for normal distribution

F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

Adjusted Test Statistic A* = 0.407

Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally distributed if any the following are true:

A* > 0.576 When Level of Significance (α) = 0.15

A* > 0.656 When Level of Significance (α) = 0.10

A* > 0.787 When Level of Significance (α) = 0.05

A* > 1.092 When Level of Significance (α) = 0.01

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given n and α.

The Adjusted Test Statistic (A*) for the Difference Sample Group (0.407) is significantly less than the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for the sample group is accepted.

 

Shapiro-Wilk Test For Normality in Excel

The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is normally distributed is rejected.

The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior performance against other normality tests, especially with small sample sizes. Superior performance means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly higher percentage of times than most other normality tests, particularly at small sample sizes.

The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-Smirnov normality test.

Sample Data

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

0.967452 = Test Statistic W

0.905 = W Critical for the following n and Alpha

20 = n = Number of Data Points

0.05 = α

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

Test Statistic W (0.967452) is larger than W Critical 0.905. The Null Hypothesis therefore cannot be rejected. There is not enough evidence to state that the data are not normally distributed with a confidence level of 95 percent.

******************

One note on the data set used for this example – The data set used for this example is the same data set used for the one-sample hypothesis test except that each data value has now been divided by 800 for this example. If you go to that section in this blog, you will observe that the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Shapiro-Wilk test produce exactly the same results for both sets of numbers.

******************

Correctable Reasons That Normal Data Can Appear Non-Normal

If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation of whether any of the following factors have caused normally-distributed data to appear to be non-normally-distributed:

1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can often be removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-distributed data.

2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes or operator changes can change the distribution of data. Multiple modal values in the data are common indicators that this might be occurring. The effects of different inputs must be identified and eliminated from the data.

3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until at least 25 data points have been sampled.

4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved by using a larger sample size.

5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such as zero, calculations using very small values might skew computations of important values such as the mean. A simple solution might be to raise all the values by a certain amount.

6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire process is being used, a representative sample in not being collected. Normally-distributed results would not appear normally distributed if a representative sample of the entire process is not collected.

 

We now proceed to the two-step method for creating all Confidence intervals of a population mean. These steps are as follows:

Step 1 Calculate the Width of Half of the Confidence Interval

Step 2 – Create the Confidence Interval By Adding and Subtracting the Width of Half of the Confidence Interval from the Sample Mean

Proceeding through the four steps is done is follows:

 

Step 1) Calculate Width-Half of Confidence Interval

Half the Width of the Confidence Interval is sometimes referred to the Margin of Error. The Margin of Error will always be measured in the same type of units as the sample mean is measured in. Calculating the Half Width of the Confidence Interval using the t distribution would be done as follows in Excel:

Margin of Error = Half Width of C.I. = t-Valueα/2 * Standard Error

Margin of Error = Half Width of C.I. = T.INV(1-α/2, df) * s/SQRT(n)

Margin of Error = Half Width of C.I. = T.INV(0.975, 19) * 37.17/SQRT(20)

Margin of Error = Half Width of C.I. = 2.093 * 8.311 = 17.396 hours

 

Step 2) Confidence Interval = Sample Mean ± C.I. Half-Width

Confidence Interval = Sample Mean ± (Half Width of Confidence Interval)

Confidence Interval = x_bar ± 17.396

Confidence Interval = 250 ± 17.396

Confidence Interval = [ 323.604 hours, 267.396 hours ]

A graphical representation of this Confidence Interval is shown as follows:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Larger Version)

It should be noted that the legacy formula TINV(α,df) can be replaced in Excel 2010 and later by the following formula: T.INV(1-α/2,df)

 

A Excel Shortcut For Calculating the t-Based Confidence Interval

The formula for calculating the Confidence Interval is the following:

Confidence Interval = Sample Mean ± (Half Width of Confidence Interval)

Descriptive Statistics in Excel instantly calculates the following:

Sample mean

Half the width of the Confidence Interval at the specified level of confidence using the t distribution

Here, once again, is the Descriptive Statistics for this data set:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval
(Click On Image To See a Smaller Version)

This is created with the following information filled in the Excel dialogue box for the Descriptive Statistics data analysis tool:

confidence interval of mean, statistics, excel, excel 2010, excel 2013, confidence interval (Click On Image To See a Smaller Version)

Note the following from the Descriptive Statistics:

Sample mean = x_bar = 250

Half the Width of the Confidence Interval = 17.396 hours

These numbers can simply be plugged into the Confidence Interval formula below to obtain the t-based C. I. as long as the sample mean has been proven to be normally distributed.

Confidence Interval = Sample Mean ± (Half Width of Confidence Interval)

The half-width of a t-based confidence interval can also be quickly found by the following Excel formula:

Half-width of a t-based confidence interval = CONFIDENCE.T(α,s,n)

Half-width of a t-based confidence interval = CONFIDENCE.T(0.05,37.17,20)

Half-width of a t-based confidence interval = 17.396

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

1 comment:

  1. Really nice information, This information will always help everyone for gaining knowledge. So please always share your valuable information. I am very thankful to you for providing good information. Indonesia Export Data

    ReplyDelete