This is one of the following three articles about the t distribution in Excel
Overview of the t Distribution
t Distribution’s PDF (Probability Density Function) in Excel 2010 and Excel 2013
t Distribution’s CDF (Cumulative Distribution Function) in Excel 2010 and Excel 2013
The t-Distribution’s CDF
The t-Distribution’s CDF (Cumulative Distribution Function) equals the probability that sample point X taken from a normally distributed population is up to a certain distance from the sample mean for the given sample size.
More specifically, the t-Distribution’s PDF calculates the probability that the standardized distance between sample point X and the sample mean is up to the t Value of t for sample of size n when n = df + 1 = v + 1.
The t Value is the number of standard errors that a sampled point X is from the sample mean.
The t-Distribution’s CDF is expressed as F(t,v).
F(t,v) = the probability that a sample point is up to a distance of t standard errors from the sample mean if the sample size n = v + 1 and the population is normally-distributed. It is given by the following formula:
(Click On Image To See a Larger Version)
In Excel 2010 and beyond, the t-Distribution’s CDF can be calculated directly be the following Excel formula:
F(t,v) = T.DIST(t, v, TRUE)
Prior to Excel 2010, the t-Distribution’s CDF had to be calculated by the following Excel formula:
F(t,v) = IF(t>0,1-TDIST(ABS(t),v,1),TDIST(ABS(t),v,1))
t-Distribution CDF Example With
Sample Data
The following is the previous random sample of 20 data points drawn from a population. The population standard deviation is not known. The population’s distribution is not known. It will not be assumed to be normally distributed.
This exercise will calculate the probability that a single sample point from this sample has the value of up to X = 48.
If the population or the sample can be shown to be normally distributed, the t-Distribution can be used to calculate this probability, which would be the t-Distribution’s CDF at sample X equals up to 48. The t-Distribution would be used because the sample size is small (n < 30) and the population’s standard deviation is not known.
(Click On Image To See a Larger Version)
it is known from the previous example that these sample data are normally distributed so the t-Distribution can be used to calculate the CDF.
Descriptive Statistics was already run on this sample data in the previous example calculating the PDF at X = 48. The output of this Excel data analysis tools is as follows:
(Click On Image To See a Larger Version)
The t-Distribution is a family of distribution curves that are all symmetrical about a mean of zero. Each unique t-Distribution curve is completely described and differentiated from all other t-Distribution curves by only a single parameter: its degrees of freedom. Degrees of freedom equal sample size minus one as follows:
n = 20
df = v = n – 1 = 19
Note also that the t-Distribution’s CDF formula calculates a point’s PDF up to the t Value of the sample point, not up to the sample point’s actual value, X. The inputs to the t-Distribution’s PDF formula are the degrees of freedom and the point’s t Value, t.
(Click On Image To See a Larger Version)
The t Value of a point is the distance that the point is from the sample mean expressed as the number of standard errors. The length of one standard error is calculated from the sample as follows:
SE = Standard Error = Standard Deviation / SQRT(n)
SE = 7.434 / SQRT(20) = 1.662
Note that SE is also automatically calculated in the Excel Descriptive Statistics.
The t Value of point X = 48 can now be calculated since the sample mean (x_bar) and SE are known. This is done as follows:
X = 48
t = t Value = (X – x_bar) / SE
t = (48 – 50) / 1.662
t = -1.20337
The two inputs required to calculate the t-Distribution’s CDF are the point’s t Value (t = -1.20337) and the degrees of freedom (df = v = 19) from the sample.
Excel 2010 provides the following formula that enables the calculation of the t-Distribution’s PDF in one quick step. That calculation is as follows:
F(t,v) = T.DIST(t, df, TRUE) =
F(t=-1.20337, v=19) = T.DIST(-1.20337, 19, TRUE) = 0.12181
This results states that there is an 12.181 percent chance the one of the data observations from the sample group of 20 samples taken from a normally distributed population will have a t Value of up to -1.20337, i.e., will be up to 1.20337 standard errors to the left of the sample mean.
Given that the sample mean is 50, the length of one standard error = 1.662, and the d point is 1.20337 standard errors to the left of the mean, the point being evaluated is equal to the following:
X = x_bar + t Value*SE
X = 50 + (-1.20337)*(1.662) = 48
The t-Distribution PDF formula calculated that the probability that a single point of the 20 sampled points has a value of up to X = 48 (or equivalently has a t Value of t = -1.20337) is 12.181 percent.
Prior to Excel 2010, there was no Excel formula that calculated the correct CDF value for the t-Distribution in every situation. The following If-Then-Else statement correctly calculates the t-Distribution’s CDF in all situations for Excel versions prior to 2010:
F(t,v) = IF(t>0,1-TDIST(ABS(t),v,1),TDIST(ABS(t),v,1))
= IF(-1.20337>0,1-TDIST(ABS(-1.20337),19,1),TDIST(ABS(-1.20337),19,1))
F(t=-1.20337,v = 19) = 0.12181
This is the same result that is arrived at in the following calculation from Excel 2010:
T.DIST(t, df, TRUE) =
T.DIST(-1.20337, 19, TRUE) = 0.12181
Below is an Excel-generated graphical representation of this example.
(Click On Image To See a Larger Version)
Excel Master Series Blog Directory
Click Here To See a List Of All
Statistical Topics And Articles In
This Blog
You Will Become an Excel Statistical Master!
No comments:
Post a Comment