Monday, June 2, 2014

Overview of the t-Distribution

This is one of the following three articles about the t distribution in Excel

Overview of the t Distribution

t Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013

t Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013

 

Overview of the t-

Distribution

The t-Distribution is used much more often than the normal distribution to perform several basic parametric statistical tests such as hypothesis tests of a population mean and confidence intervals of a population mean. Requirements for statistical tests are generally less rigorous when a statistical test can be based upon the t-Distribution instead of the normal distribution.

The t-Distribution (also called the Student’s t-Distribution) describes the distribution of a sample taken from a normally-distributed population when the population standard deviation is unknown. The t-Distribution closely resembles the standard normal distribution (the normal distribution when the mean equals zero and the standard deviation equals one) except that the t-Distribution’s outer tails have more weight (are thicker) and its mean has a lower peak than the standard normal distribution.

As sample size increases, the t-Distribution converges to (more closely resembles) the standard normal distribution. When the sample size becomes large (n > 30), the t-Distribution almost exactly resembles the standard normal distribution.

Following is an Excel-generated image of the PDF (Probability Density Function) of the t-Distribution with a very low degrees of freedom. Sample size, n, equals 3 and degrees of freedom, df, equals n – 1 = 2. The PDF of the t-Distribution has a lower peak and thicker tails when its degrees of freedom is small.

t distribution, t-distribution, excel, excel 2010, excel 2103, statistics, students t (Click On Image To See a Larger Version)

As the degrees of freedom increases, the PDF of the t-Distribution converges toward (resembles more and more) the standard normal distribution. The standard normal distribution is a normal distribution curve with its mean, μ, equal to zero and its standard deviation, σ, equal to one. the standard normal curve is a special case of the t-Distribution with a sample size, n, equal to infinity. Note how the shape of the PDF curve of the t-Distribution changes as the sample size increases from n = 3 to n = . The height of the peak over the mean has risen significantly and the tails are quite a bit thinner. This is shown in the following Excel-generated image:

t distribution, t-distribution, excel, excel 2010, excel 2103, statistics, students t (Click On Image To See a Larger Version)

These differences are also reflected the CDF (Cumulative Distribution Function) graphs of the t-Distribution with the same degrees of freedom. The t-Distribution’s CDF approaches its asymptotic values of 0 and 1 much further from t =0 at smaller degrees of freedom than for larger degrees of freedom.

The t-Distribution CDF graph with only 2 degrees of freedom is still a significant distance from the asymptotic values of 0 and 1 at 3 standard errors above and below t = 0. This is shown in the following Excel-generated graph:

t distribution, t-distribution, excel, excel 2010, excel 2103, statistics, students t (Click On Image To See a Larger Version)

The t-Distribution CDF graph with the degrees of freedom equaling its highest possible value of has nearly reached the asymptotic values of 0 and 1 quite a bit closer than 3 standard errors above and below t = 0. This is shown in the following Excel-generated graph:

t distribution, t-distribution, excel, excel 2010, excel 2103, statistics, students t (Click On Image To See a Larger Version)

The t-Distribution’s CDF is simply a graph of the accumulation of its PDF as the t Value goes from -∞ to +∞. The t-Distribution’s PDF is bell-shaped and symmetrical about a t Value of 0. The t-Distribution’s CDF will therefore show that 50 percent of the area under the PDF curve, F(t,v) = 0.50, occurs at the t Value of 0.

 

History of the t-Distribution

One of the most colorful and well-known stories in the annals of statistics is the origin of the name of the Student’s t-Distribution, which the t-Distribution is often called. This distribution was first presented in the English language by William Sealy Gosset under the pseudonym “Student” in his article “The probable error of a mean” in the scientific journal Biometrika in March 1908. At the time Gosset was employed at the Guinness Brewery in Dublin, Ireland and was studying the nature of small samples of brewery ingredients such as barley. Gosset published his article under the pen name “Student” because his employer either did not allow staff to publish scientific papers or did not want competitors to know that the Guinness Brewery was using this test on small samples of raw materials.

The name “Student’s distribution” was conferred on the distribution by Ronald Fisher in his 1925 article “Applications of Student’s distribution.” This article also assigned the label “t” to value of the Test Statistic for this distribution.

Prior to Gosset’s English-language introduction, the t-Distribution was first described by German mathematicians Friedrich Helmert and Jacob Lüroth in 1876.

 

Properties of the t-Distribution

- The mean of the t-Distribution equals 0. The t-Distribution forms a bell-shaped curve about a mean of 0. This differs from the normal distribution because the normal distribution can be symmetrical about a mean of any real number.

- The variance of the t-Distribution is equal to v / (v – 2) when v (Greek letter “nu”) exceeds 2. v equals the degrees of freedom, which equals the sample size minus 1. The variance is always greater than 1 but converges to 1 as the sample size gets larger. The t-Distribution converges to the standard normal distribution (which has a variance equaling 1) as sample size approaches infinity.

- The standard error of t-Distribution is equal to the sample standard deviation divided by the square root of the sample size.

- The t-Distribution has only one parameter which is the degrees of freedom. Degrees of freedom is usually designated as df or ν (Greek letter “nu”). This differs from the normal distribution because the normal distribution is described by the following two parameters: mean (μ - Greek letter “mu”) and standard deviation (σ - Greek letter “sigma”).

- The graph of the t-Distribution is symmetrical about a mean of 0 and the units on its horizontal axis describing the distance from the mean of 0 are units of standard errors. This differs from the normal distribution because the normal distribution can be symmetrical about a mean of any real number and the units of its horizontal axis describing the distance from its mean use the same real number scale in which the mean was measured. The t-Distribution’s PDF or CDF at any real number X requires that the X value be converted to the number of standard errors that the X value is from the sample mean. The standard error is equal to the sample standard deviation divided by the square root of the sample size.

The t-Distribution can be used to describe any statistic that has a bell-shaped distribution, i.e., unimodal, symmetrical, and without significant outliers. The t-Distribution is used to analyze samples taken from a normally-distributed population when either of the following is true:

1) Small size is small (n < 30).

2) The population standard deviation is not known, which is often the case.

The t-Distribution is used much more often than the normal distribution when performing hypothesis tests or creating confidence intervals based upon samples taken from a normally-distributed population. If a sample is found to be normally distributed, its population is assumed to be normally distributed.

The t-Distribution more closely describes the distribution of a small sample (n < 30) taken from a normally-distributed population than the normal distribution does. Small samples taken from a normally-distributed population have a slightly higher probability that sample values will occupy the outer tails than do larger samples. The t-Distribution has slightly thicker tails and a lower peak than does the normal distribution. The t-Distribution is therefore used to describe the distribution of small samples taken from a normally-distributed population.

The extra weight in the outer tails of the t-Distribution accounts for the additional uncertainty of having to use the sample standard deviation to estimate the population standard deviation. This estimate becomes more uncertain as sample size decreases. the t-Distribution’s shape reflects that as its outer tails become thicker as sample size decreases.

The t-Distribution can be used to perform hypothesis tests or create confidence intervals of a normally-distributed population when the population standard deviation is not known. The normal distribution should not be used for these types of analysis when the population standard deviation is not known. In the real world it is much more common scenario that the standard deviation of the population from which the sample was drawn is not known.

 

Excel Master Series Blog Directory

 

Click Here To See a List Of All

Statistical Topics And Articles In

This Blog

 

You Will Become an Excel Statistical Master!

No comments:

Post a Comment