## Monday, June 2, 2014

### Distributions Related To the Binomial Distribution

This is one of the following four articles on the Binomial Distribution in Excel

Overview of the Binomial Distribution in Excel 2010 and Excel 2013

Solving Problems With the Binomial Distribution in Excel 2010 and Excel 2013

Normal Approximation of the Binomial Distribution in Excel 2010 and Excel 2013

Distributions Related to the Binomial Distribution

# Distributions Related To the Binomial Distribution

### Binomial Distribution

– Calculates the probability of k successes in n Bernoulli trials with replacement. Sampling with replacement means that each sample is placed back into the population before another sample is taken.

### Hypergeometric Distribution

- Calculates the probability of k successes in n Bernoulli trials without replacement. Sampling without replacement means that each sample taken is not placed back into the population. This is not a binomial experiment because a binomial experiment requires that the probability of success be constant on every trial.

### Multinomial distribution

– This is a generalization of the binomial distribution. Each multinomial trial has two or more possible outcomes. A binomial trial is a multinomial trial with only two possible outcomes. The multinomial distribution calculates the probability of a specific combination of outcomes occurring in n multinomial trials. Sampling is done with replacement during a multinomial experiment.

### Negative Binomial Distribution

– Calculates probability of a specific number of successes occurring before a specified number of failures. Each trial is a Bernoulli trial. Sampling is done with replacement during a negative binomial experiment. Like the binomial distribution, the probability of success on every trial is the same and the trials are independent, i.e., the outcome of one trial does not affect the outcome of any other trial.

### Geometric Distribution

– Calculates the probability that a success will occur for the first time on the nth Bernoulli trial. This is a special case of the negative binomial distribution in which the number of successes that will occur before a specified number of failures is equal to one. Sampling is done with replacement during a geometric experiment. Like the binomial distribution, the probability of success on every trial is the same and the trials are independent, i.e., the outcome of one trial does not affect the outcome of any other trial.

Following will be presented more information about each of the distributions just mentioned. Information about each distribution will include its PDF, CDF, Excel formulas, and a brief problem solved with the PDF of that distribution in Excel.

## Binomial Distribution

Calculates the probability of k successes in n Bernoulli trials with replacement. Sampling with replacement means that each sample is placed back into the population before another sample is taken.

The binomial distribution’s PDF is given by the following:

(Click On Image To See a Larger Version)

for k = 0, 1, 2, …, n,

where

(Click On Image To See a Larger Version)

is the binomial coefficient and is sometimes referred to as “n choose k.”

f(k;n,p) = Pr(X = k) is the probability that X, the number of successes, equals k for n independent, successive Bernoulli trials each having the probability p of success.

The Excel formula to calculate the binomial distribution’s PDF is the following:

f(k;n,p) = Pr(X = k) = BINOM.DIST(k, n, p , FALSE)

FALSE indicates that this Excel formula is calculating the binomial distribution’s PDF and not the CDF for this k, n, and p. “False” answers the question of the calculation is cumulative (which is the case if calculating the CDF – Cumulative Distribution Function) on not cumulative (which is the case if calculating the PDF – Probability Density Function).

Excel 2010 and later also use the formula BINOM.DIST which is equivalent to BINOMDIST used by earlier versions of Excel.

The binomial distribution’s CDF is as follows:

F(k;n,p) = Pr(X k). This is the probability that X, the number of successes in n Bernoulli trials each having the probability p of a successful outcome, equals up to k.

The binomial distribution’s CDF is given by the following:

(Click On Image To See a Larger Version)

is the “floor” under k, i.e., the greatest integer less than or equal to k.

The Excel formula to calculate the binomial distribution’s CDF is the following:

F(k;n,p) = Pr(X k) = BINOM.DIST(k, n, p , TRUE)

A basic problem solved by the binomial distribution’s PDF in Excel is as follows:

What is the probability of getting exactly 4 “fives” if a single die is rolled 7 times?

The probability of getting a five on any roll of a die is one sixth.

p = 1/6 = 0.1667

n = 7 trials (rolls of the die)

k = 4

This requires the PDF to solve for Pr(X = k)

f(k;n,p) = Pr(X = k) = BINOM.DIST(k, n, p , FALSE)

f(4;7,0.1667) = BINOM.DIST(4,7,0.1667,FALSE) = 0.0156

There is a 1.56 percent of getting exactly 4 “fives” if a single die is rolled 7 times.

## Hypergeometric Distribution

Calculates the probability of k successes in n Bernoulli trials without replacement. Sampling without replacement means that each sample taken is not placed back into the population. This is not a binomial experiment because a binomial experiment requires that the probability of success be constant on every trial.

The hypergeometric’s distribution’s PDF is given by the following:

k = targeted number of successes in n trials

K = total number of successes in the population

n = number of trials

N = population size

(Click On Image To See a Larger Version)

where

(Click On Image To See a Larger Version)

is the binomial coefficient and is sometimes referred to as “n choose k.”

The Excel formula to calculate the hypergeometric distribution’s PDF is the following:

Pr(X = k) = HYPGEOM.DIST(k, n, K, N, FALSE)

Excel 2010 and later also use the formula HYPGEOM.DIST which is similar but not exactly equivalent to HYPGEOMDIST used by earlier versions of Excel. The earlier version of the formula, HYPGEOMDIST(), calculates only the PDF.

HYPGEOMDIST(k, n, K, N) = HYPGEOM.DIST(k, n, K, N, FALSE)

The hypergeometric distribution’s CDF is as follows:

(Click On Image To See a Larger Version)

Until Excel 2010 there is no Excel formula to calculate the hypergeometric distribution’s CDF. Prior to 2010 the hypergeometric distribution’s CDF has to be calculated in Excel by summing successive PDF calculations as follows:

(Click On Image To See a Larger Version)

The hypergeometric distribution is a discrete distribution whose CDF is the sum of sum of the specific point values that make up the PDF of the hypergeometric distribution because it is a discrete distribution.

In Excel 2010 and beyond, the CDF of hypergeometric distribution can be calculated as follows:

Pr(X≤k) = HYPGEOM.DIST(k, n, K, N, TRUE)

A basic problem solved by the hypergeometric distribution’s PDF in Excel is as follows:

What is the probability of getting two kings in ten successive draws from a deck of 52 cards. There are four kings in a deck of cards. Each card drawn is not replaced back into the deck?

k = 2

K = 4

n = 10

N = 52

Pr(X = k) = HYPGEOM.DIST(k, n, K, N , FALSE)

Pr(X = 2) = HYPGEOM.DIST(2, 10, 4, 52 , FALSE) = 0.1431

There is a 14.31 percent probability of getting two kings in 10 successive draws from a 52-card that initial contains four kings if each card is discarded after being drawn.

## Multinomial Distribution

This is a generalization of the binomial distribution. Each multinomial trial has two or more possible outcomes. A binomial trial is a multinomial trial with only two possible outcomes. The multinomial distribution calculates the probability of a specific combination of outcomes occurring in n multinomial trials. Sampling is done with replacement during a multinomial experiment.

The multinomial’s distribution’s PDF is given by the following:

n = number of trials

m = number of possible outcomes in each trial

i as i goes from 1 to m = ith possible outcome of the 10 total possibilities. For example, if the three colors of marbles that can be drawn are red, blue, and green, then red is the 1st possible outcome (ired = 1), blue is the second possible outcome (iblue = 2), etc.

ki = the targeted number of successes for the ith possible outcome. For example, if the targeted number of blue marbles is 2 then k2 = 2

pi = the probability of each trial producing a success for the ith possible outcome. For example, if 5 out of 20 total marbles are blue, then p3 = 5/20 = 0.25. These probabilities remain the same because each sample is replaced before the next trial.

(Click On Image To See a Larger Version)

There is no Excel formula to calculate multinomial probabilities.

This formula probably appears more onerous that it actually is. The following example will hopefully simplify it.

A bag contains 21 marbles whose colors are distributed as follows:

6 marbles are red

5 marbles are blue

10 marbles are green

9 marbles are individually drawn, evaluated, and then placed back into the bag before the next marble is drawn. Calculate the probability that the 9 trials will produce exactly the following results:

4 red marbles

2 blue marbles

3 green marbles

n = 9 trials

m = 3 possible outcomes (red, blue, green)

p1 = 6/21 = probability of getting outcome 1 (red) in each trial

p2 = 5/21 = probability of getting outcome 2 (blue) in each trial

p3 = 10/21 = probability of getting outcome 3 (green) in each trial

k1 = 4 = the targeted number of successes of outcome 1 (red) in n trials

k2 = 2 = the targeted number of successes of outcome 2 (blue) in n trials

k3 = 3 = the targeted number of successes of outcome 3 (green) in n trials

(Click On Image To See a Larger Version)

Pr(X1=4, X2=2, X3=3)= 9! / (4!*2!*3!) * (6/21)4 * (5/21)2 * (10/21)3

Pr(X1=4, X2=2, X3=3)= 0.0514

There is a 5.14 percent probability of obtaining 4 red marbles, 2 blue marbles, and 3 green marbles in 9 successive random samples taken with replacement from a set of 6 red marbles, 5 blue marbles, and 10 green marbles.

The CDF of the multinomial distribution can be calculated but its formula quickly becomes unmanageably large as the number of outcomes and targeted number of each outcome grows.

## Negative Binomial Distribution

Calculates probability that a specific number of failures will occur before a specified number of successes will. Each trial is a Bernoulli trial with the same probability of a successful outcome. Sampling is done with replacement during a negative binomial experiment. Like the binomial distribution, the probability of success on every trial is the same and the trials are independent, i.e., the outcome of one trial does not affect the outcome of any other trial.

The negative binomial distribution calculates the probability of a specific number of trials to obtain a fixed number of successes. The number of trials is varied and the number of successes is fixed.

The binomial distribution calculates the probability of a specific number of successes in a fixed number of trials. The number of successes is varied and the number of trials is fixed.

The negative binomial distribution’s PDF is given by the following:

x = targeted number of failures occurring before the rth success

r = total number of successes that occur. The rth success occurs immediately following the xth failure.

p = the probability of success on each trial

(Click On Image To See a Larger Version)

where

(Click On Image To See a Larger Version)

is the binomial coefficient and is sometimes referred to as “n choose k.”

The Excel formula to calculate the hypergeometric distribution’s PDF is the following:

Prior to Excel 2010: Pr(X = x) = NEGBINOMDIST(x, r, p)

or

Excel 2010 and beyond: Pr(X = x) = NEGBINOM.DIST(x, r, p, FALSE)

NEGBINOM.DIST can calculate both the PDF and the CDF. The earlier version of this formula prior to Excel 2010, NEGBINOMDIST, will only calculate the PDF.

The negative binomial distribution’s CDF is as follows:

(Click On Image To See a Larger Version)

Only Excel 2010 has a formula to calculate the negative binomial distribution’s CDF. The CDF can be calculated in Excel 2010 or later as follows:

Pr(X k) = NEGBINOM.DIST(x, r, p, TRUE)

Calculating the negative binomial distribution’s CDF in versions of Excel earlier than 2010 would require summing the PDF as i goes from 0 to x as follows:

(Click On Image To See a Larger Version)

A basic problem solved by the negative binomial distribution’s PDF in Excel is as follows:

A fair coin is continuously flipped. What is the probability that the 4th head will occur immediately following the 10th tail? A “head” will be counted as a success.

x = 10 = targeted number of failures (tails) occurring before the rth success (heads)

r = 4 = total number of successes (heads) that occur. The rth success occurs immediately following the xth failure (tails).

p = 0.5 = the probability of success on each trial

Pr(X = x) = NEGBINOM.DIST(x, r, p, FALSE)

Pr(X = 4) = NEGBINOM.DIST(10, 4, 0.5, FALSE) = 0.0174

There is a 1.74 percent probability the coin flip following the 10th tail will produce the 4th head.

## Geometric Distribution

Calculates the probability that a success will occur for the first time on the nth Bernoulli trial. This is a special case of the negative binomial distribution in which the number of successes that will occur before a specified number of failures is equal to one. Sampling is done with replacement during a geometric experiment. Like the binomial distribution, the probability of success on every trial is the same and the trials are independent, i.e., the outcome of one trial does not affect the outcome of any other trial.

As mentioned, the geometric distribution is equivalent to the negative binomial distribution with r (the number of successes) set to one.

A basic problem solved by the geometric distribution’s PDF in Excel is as follows:

A fair coin is continuously flipped. What is the probability that the 1st head will occur immediately following the 3rd tail? A “head” will be counted as a success.

x = 3 = targeted number of failures (tails) occurring before the rth success (heads)

r = 1 = total number of successes (heads) that occur. The rth success occurs immediately following the xth failure (tails).

p = 0.5 = the probability of success on each trial

Pr(X = x) = NEGBINOM.DIST(x, r, p, FALSE)

Pr(X = 1) = NEGBINOM.DIST(3, 1, 0.5, FALSE) = 0.0625

There is a 6.25 percent probability the coin flip following the 3rd tail will produce the 1st head.

Excel Master Series Blog Directory

Statistical Topics And Articles In

This Blog

You Will Become an Excel Statistical Master!