17.1.11.2 Algorithm (Correlation Coefficient)CorrCoef-Algorithm
There are a number of coefficients which are appropriate to use under different circumstances. Among them, the most frequently-used one is Pearson's product moment correlation coefficient.
Correlation Coefficients
Pearson's product moment correlation coefficient
Pearson's product moment correlation coefficient measures the linear relations between two variables.
Let .and be the standard deviations of two random variables X and Y respectively. Then the Pearson's product moment correlation coefficient between the variables is
where E(.) denotes the expected value of the variable, and cov(.) means covariance.
To use this method, one should make sure that the interval data comes from paired observations, and that the variables are normally distributed. The data should not contain any extreme values, because they are apt to affect the result. Pearson's product moment correlation coefficient could sometimes be misleadingly small when the variables have a non-linear relationship.
Spearman Rank Correlation Coefficient
Spearman Rank correlation coefficient is a non-parametric measure; therefore, it is suitable for data that is not normally distributed. It works better in detecting a non-linear relationship between two variables. It can be defined as
where d is the difference in statistical rank of corresponding variables.
Because statistical rank is just the ordinal number of a value in a list, Spearman Rank correlation coefficient can be computed even when actual values of the variables are unknown.
Kendall correlation coefficient
Kendall correlation coefficient, or Kendall tau, is equivalent to Spearman R in terms of their assumptions and statistical power. However, Kendal correlation coefficient has a more intuitive interpretation. And its algebraic structure is simpler. Furthermore, it does not require ordering of the data before the computation.
Kendall correlation coefficient can be computed by
where C is the number of concordant pairs (pairs of observations that have the same signs), D is the number of discordant pairs (pairs of observations that have opposite signs), and q is defined in Significance Level of r.
Significance of R
Pearson and Spearman types
For Pearson and Spearman correlation types, let
where r is the correlation of two variables and N is number of observations.
Then t follows a t-distribution with N-2 degrees of freedom. The two-tailed significance level can be calculated as:
Kendall type
For Kendall correlation type, let
where
Then z is approximated by a standard normal distribution. And the two-tailed significance level is:
|