Tuesday, May 27, 2014

Spearman Correlation Coefficient in 6 Steps in Excel 2010 and Excel 2013

This is one of the following four articles on Correlations in Excel

Overview of Correlation In Excel 2010 and Excel 2013

Pearson Correlation in 3 Steps in Excel 2010 and Excel 2013

Pearson Correlation – Calculating r Critical and p Value of r in Excel

Spearman Correlation in 6 Steps in Excel 2010 and Excel 2013

 

Spearman Correlation in 6

Steps in Excel

The Spearman Correlation Coefficient is designated by either rs or by the Greek letter ρ, “rho.” As mentioned the Spearman correlation should be used instead of the Pearson correlation in any of the following circumstances:

1) An X-Y scatterplot of the data indicates that there is a nonlinear monotonic relationship between two variables. Monotonic simply means that one variable generally goes in one direction (either always up or always down) when the other variable moves in one direction.

2) There are significant outliers. The Pearson Correlation is very sensitive to outliers. The Spearman Correlation is not because the Spearman Correlation bases its calculation on the ranks and not the mean (as the Pearson Correlation does).

3) At least one of the variables is ordinal. An ordinal variable is one in which the order matters but the difference between values does not have meaning. A customer satisfaction scale or a Likert scale are examples of ordinal data. Satisfaction scales and Likert scales can be analyzed as interval data if the distance between values is considered to be the same. A Pearson correlation can be used if the variables are either interval or ration but cannot be used if any of the variables are ordinal.

 

Spearman Correlation Formula

The Spearman Correlation Coefficient is defined as the Pearson Correlation Coefficient between ranked variables. The Spearman Correlation is sometimes called the Spearman Rank-Order Correlation or simply Spearman’s rho (ρ) and is calculated as follows:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

For a sample of n (X-Y) data pairs, each Xi,Yi are converted to ranks xi,yi that appear in the preceding formula for Spearman’s rho.

 

Tied Data Values

Tied valued of X or Y are assigned the average rank of the tied values. For example, if the 2rd, 3rd, and 4th X value were equal to 19, the rank assigned to each would be 3. This is the average rank, which would be calculated as follows:

Average rank = (Sum of ranks)/(Number of ranks) = (2 + 3 + 4)/3 = 3

The RANK.AVG() formula will correctly calculate ranks if there are tied values among the values to be ranked.

 

No Ties Among Data Values

If there are no tied values of X or Y, the following simpler formula can be used to calculate Spearman’s rho:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

 

Spearman Correlation’s Only

Two Required Assumptions

1) The variables can be ratio, interval, or ordinal, but not nominal. Nominal variables are simply labels whose order doesn’t mean anything. The Spearman Correlation is nonparametric, i.e., the test’s outcome is not affected by the distributions of the data being compared.

2) There is a monotonic relationship between the two variables.

 

Example of Spearman

Correlation in Excel

The Spearman Correlation Coefficient will be calculated for the following data:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation

 

Step 1 – Plot the Data to Check For a Monotonic Relationship

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

A monotonic relationship exists is one variable generally moves in a single direction (either increasing or decreasing) as the other variable moves in a single direction. A monotonic relationship does not imply linearity. A monotonic relationship appears to exist between the X and Y variables. X values generally increase as Y values increase.

 

Step 2 – Automatically Checking For Tied X or Y Values in a Data Column

Checking a column of data for tied values can be automated in Excel. The cell U8 has the following formula:

=IF(SUM(IF(FREQUENCY($S$9:$S$15,$S$9:$S$15)>0,1))=COUNT($S$9:$S$15),“There Are No Tied Values”,”There Are Tied Values”)

In this case, there were no tied values from S9 to S15, as is demonstrated by the output of the above formula in cell U8 which states “There Are No Tied Values.”

Cell U18 contains a similar formula.

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

 

No Tied Values

 

Step 3 – Calculate the Ranks of the X and Y Values

This can be done in a single step in Excel with the RANK.AVG() formula as follows:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

The formula RANK.AVG() would also correctly calculate the ranks if there were any tied values.

 

Step 4 – Calculate the Sum of the Square of the Rank Differences

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

 

Step 5 – Calculate rs

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

rs is calculated in cell BH13 by the following formula:

=1-((6*BH8)/((BH10*(BH10^2-1))))

rs = 0.6786

 

Step 6 – Determine If rs Is Significant

 

Method 1 – Compare rs to r Critical

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

r Critical in cell BS18 is calculated by the following formula:

=T.INV(1-AG16/2,AG14)/SQRT(T.INV(1-AG16/2,AG14)^2+AG14)

r Critical = 0.7545

rs is not significant at α = 0.05 because rs (0.6786) is less than r Critical (0.7545).

 

Method 2 – Compare the p Value to Alpha

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

The p Value in cell BS29 is calculated by the following formula:

=1-F.DIST(((AG14*S15^2)/(1-S22^2)),1,AG14,TRUE) = 0.0536

rs is not significant at α = 0.05 because the p Value (0.0536) is greater than Alpha (0.05). This rs would be significant at α = 0.10 but not at α = 0.05.

 

If There Are Any Tied X or Y Values

 

Step 3 – Calculate rs

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

Cell AN58 contains the formula: =SUM(AN50:AN56) = 68.96

Cell AP58 contains the formula: =SUM(AP50:AP56) = 105.93

Cell AQ58 contains the formula: =SUM(AQ50:AQ58) = 83.36

Cell AP61 contains the formula: =SQRT(AP58*AQ58) = 93.97

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

rs is calculated in cell AP66 by the formula: =AN58/AP61 = 0.7339

 

Step 4 – Determine If rs Is Significant

 

Method 1 – Compare rs to r Critical

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

r Critical in cell BS18 is calculated by the following formula:

=T.INV(1-AG16/2,AG14)/SQRT(T.INV(1-AG16/2,AG14)^2+AG14)

r Critical = 0.7545

rs is not significant at α = 0.05 because rs (0.7339) is less than r Critical (0.7545). This rs would be significant at α = 0.10 but not at α = 0.05.

 

Method 2 – Compare the p value to Alpha

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

p Value is calculated in cell BS29 by the following formula:

=1-F.DIST(((AG14*S15^2)/(1-S22^2)),1,AG14,TRUE) = 0.0536

rs is not significant at α = 0.05 because the p Value (0.0536) is greater than Alpha (0.05). This rs would be significant at α = 0.10 but not at α = 0.05.

 

Two Different Methods Used to Calculate rs Critical Values

There is slight disagreement in the statistical community about how to calculate rs Critical Values.

Some use a table of Critical rs values. This table of values was created in 1938 in the journal The Annals of Mathematical Statistics.

Others use the formula for Critical rs as was done in this example. This formula is once again shown here as follows:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

The results are quite close. As sample size increase, the results of both methods converge. This is shown in the following comparison with α set to 0.05:

excel, excel 2010, excel 2013, statistics, correlation, spearman, r critical, p value, nonlinear correlation(Click On Image To See Larger Version)

 

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

No comments:

Post a Comment