## Tuesday, May 27, 2014

### Spearman Correlation Coefficient in 6 Steps in Excel 2010 and Excel 2013

This is one of the following four articles on Correlations in Excel

Overview of Correlation In Excel 2010 and Excel 2013

Pearson Correlation in 3 Steps in Excel 2010 and Excel 2013

Pearson Correlation – Calculating r Critical and p Value of r in Excel

Spearman Correlation in 6 Steps in Excel 2010 and Excel 2013

# Spearman Correlation in 6 Steps in Excel

The Spearman Correlation Coefficient is designated by either rs or by the Greek letter ρ, “rho.” As mentioned the Spearman correlation should be used instead of the Pearson correlation in any of the following circumstances:

1) An X-Y scatterplot of the data indicates that there is a nonlinear monotonic relationship between two variables. Monotonic simply means that one variable generally goes in one direction (either always up or always down) when the other variable moves in one direction.

2) There are significant outliers. The Pearson Correlation is very sensitive to outliers. The Spearman Correlation is not because the Spearman Correlation bases its calculation on the ranks and not the mean (as the Pearson Correlation does).

3) At least one of the variables is ordinal. An ordinal variable is one in which the order matters but the difference between values does not have meaning. A customer satisfaction scale or a Likert scale are examples of ordinal data. Satisfaction scales and Likert scales can be analyzed as interval data if the distance between values is considered to be the same. A Pearson correlation can be used if the variables are either interval or ration but cannot be used if any of the variables are ordinal.

## Spearman Correlation Formula

The Spearman Correlation Coefficient is defined as the Pearson Correlation Coefficient between ranked variables. The Spearman Correlation is sometimes called the Spearman Rank-Order Correlation or simply Spearman’s rho (ρ) and is calculated as follows: (Click On Image To See Larger Version)

For a sample of n (X-Y) data pairs, each Xi,Yi are converted to ranks xi,yi that appear in the preceding formula for Spearman’s rho.

### Tied Data Values

Tied valued of X or Y are assigned the average rank of the tied values. For example, if the 2rd, 3rd, and 4th X value were equal to 19, the rank assigned to each would be 3. This is the average rank, which would be calculated as follows:

Average rank = (Sum of ranks)/(Number of ranks) = (2 + 3 + 4)/3 = 3

The RANK.AVG() formula will correctly calculate ranks if there are tied values among the values to be ranked.

### No Ties Among Data Values

If there are no tied values of X or Y, the following simpler formula can be used to calculate Spearman’s rho: (Click On Image To See Larger Version)

## Spearman Correlation’s Only Two Required Assumptions

1) The variables can be ratio, interval, or ordinal, but not nominal. Nominal variables are simply labels whose order doesn’t mean anything. The Spearman Correlation is nonparametric, i.e., the test’s outcome is not affected by the distributions of the data being compared.

2) There is a monotonic relationship between the two variables.

## Example of Spearman Correlation in Excel

The Spearman Correlation Coefficient will be calculated for the following data: ### Step 1 – Plot the Data to Check For a Monotonic Relationship (Click On Image To See Larger Version)

A monotonic relationship exists is one variable generally moves in a single direction (either increasing or decreasing) as the other variable moves in a single direction. A monotonic relationship does not imply linearity. A monotonic relationship appears to exist between the X and Y variables. X values generally increase as Y values increase.

### Step 2 – Automatically Checking For Tied X or Y Values in a Data Column

Checking a column of data for tied values can be automated in Excel. The cell U8 has the following formula:

=IF(SUM(IF(FREQUENCY(\$S\$9:\$S\$15,\$S\$9:\$S\$15)>0,1))=COUNT(\$S\$9:\$S\$15),“There Are No Tied Values”,”There Are Tied Values”)

In this case, there were no tied values from S9 to S15, as is demonstrated by the output of the above formula in cell U8 which states “There Are No Tied Values.”

Cell U18 contains a similar formula. (Click On Image To See Larger Version)

## No Tied Values

### Step 3 – Calculate the Ranks of the X and Y Values

This can be done in a single step in Excel with the RANK.AVG() formula as follows: (Click On Image To See Larger Version)

The formula RANK.AVG() would also correctly calculate the ranks if there were any tied values.

### Step 4 – Calculate the Sum of the Square of the Rank Differences (Click On Image To See Larger Version)

### Step 5 – Calculate rs (Click On Image To See Larger Version)

rs is calculated in cell BH13 by the following formula:

=1-((6*BH8)/((BH10*(BH10^2-1))))

rs = 0.6786

### Step 6 – Determine If rs Is Significant

Method 1 – Compare rs to r Critical (Click On Image To See Larger Version) (Click On Image To See Larger Version)

r Critical in cell BS18 is calculated by the following formula:

=T.INV(1-AG16/2,AG14)/SQRT(T.INV(1-AG16/2,AG14)^2+AG14)

r Critical = 0.7545

rs is not significant at α = 0.05 because rs (0.6786) is less than r Critical (0.7545).

Method 2 – Compare the p Value to Alpha (Click On Image To See Larger Version) (Click On Image To See Larger Version)

The p Value in cell BS29 is calculated by the following formula:

=1-F.DIST(((AG14*S15^2)/(1-S22^2)),1,AG14,TRUE) = 0.0536

rs is not significant at α = 0.05 because the p Value (0.0536) is greater than Alpha (0.05). This rs would be significant at α = 0.10 but not at α = 0.05.

## If There Are Any Tied X or Y Values

### Step 3 – Calculate rs (Click On Image To See Larger Version)

Cell AN58 contains the formula: =SUM(AN50:AN56) = 68.96

Cell AP58 contains the formula: =SUM(AP50:AP56) = 105.93

Cell AQ58 contains the formula: =SUM(AQ50:AQ58) = 83.36

Cell AP61 contains the formula: =SQRT(AP58*AQ58) = 93.97 (Click On Image To See Larger Version)

rs is calculated in cell AP66 by the formula: =AN58/AP61 = 0.7339

### Step 4 – Determine If rs Is Significant

Method 1 – Compare rs to r Critical

r Critical in cell BS18 is calculated by the following formula:

=T.INV(1-AG16/2,AG14)/SQRT(T.INV(1-AG16/2,AG14)^2+AG14)

r Critical = 0.7545

rs is not significant at α = 0.05 because rs (0.7339) is less than r Critical (0.7545). This rs would be significant at α = 0.10 but not at α = 0.05.

Method 2 – Compare the p value to Alpha

p Value is calculated in cell BS29 by the following formula:

=1-F.DIST(((AG14*S15^2)/(1-S22^2)),1,AG14,TRUE) = 0.0536

rs is not significant at α = 0.05 because the p Value (0.0536) is greater than Alpha (0.05). This rs would be significant at α = 0.10 but not at α = 0.05.

### Two Different Methods Used to Calculate rs Critical Values

There is slight disagreement in the statistical community about how to calculate rs Critical Values.

Some use a table of Critical rs values. This table of values was created in 1938 in the journal The Annals of Mathematical Statistics.

Others use the formula for Critical rs as was done in this example. This formula is once again shown here as follows:

The results are quite close. As sample size increase, the results of both methods converge. This is shown in the following comparison with α set to 0.05: (Click On Image To See Larger Version)

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

• Histograms in Excel
• Bar Chart in Excel
• Combinations & Permutations in Excel
• Normal Distribution in Excel
• t-Distribution in Excel
• Binomial Distribution in Excel
• z-Tests in Excel
• t-Tests in Excel
• Hypothesis Tests of Proportion in Excel
• Chi-Square Independence Tests in Excel
• Chi-Square Goodness-Of-Fit Tests in Excel
• F Tests in Excel
• Correlation in Excel
• Pearson Correlation in Excel
• Spearman Correlation in Excel
• Confidence Intervals in Excel
• Simple Linear Regression in Excel
• Multiple Linear Regression in Excel
• Logistic Regression in Excel
• Single-Factor ANOVA in Excel
• Two-Factor ANOVA With Replication in Excel
• Two-Factor ANOVA Without Replication in Excel
• Randomized Block Design ANOVA in Excel
• Repeated-Measures ANOVA in Excel
• ANCOVA in Excel
• Normality Testing in Excel
• Nonparametric Testing in Excel
• Post Hoc Testing in Excel
• Creating Interactive Graphs of Statistical Distributions in Excel
• Solving Problems With Other Distributions in Excel
• Optimization With Excel Solver
• Chi-Square Population Variance Test in Excel
• Analyzing Data With Pivot Tables
• SEO Functions in Excel
• Time Series Analysis in Excel
• VLOOKUP