2011年10月13日星期四

some sentences on correlation

Correlation measures are among the most basic tools in statistical data analysis and machine learning.
They are applied to pairs of observations to measure to which extent the two observations
comply with a certain model. The most prominent representative is surely Pearson’s product
moment coefficient [1, 13], often nonchalantly called correlation coefficient for short. Pearson’s
product moment coefficient can be applied to numerical data and assumes a linear relationship as
the underlying model; therefore, it can be used to detect linear relationships, but no non-linear
ones.

Rank correlation measures [7, 10, 12] are intended to measure to which extent a monotonic
function is able to model the inherent relationship between the two observables. They neither
assume a specific parametric model nor specific distributions of the observables. They can be
applied to ordinal data and, if some ordering relation is given, to numerical data too. Therefore,
rank correlation measures are ideally suited for detecting monotonic relationships, in particular, if
more specific information about the data is not available. The two most common approaches are
Spearman’s rank correlation coefficient (short Spearman’s rho) [14, 15] and Kendall’s tau (rank
correlation coefficient) [2, 9, 10]. Another simple rank correlation measure is the gamma rank
correlation measure according to Goodman and Kruskal [7].

The rank correlation measures cited above are designed for ordinal data. However, as argued in
[5], they are not ideally suited for measuring rank correlation for numerical data that are perturbed
by noise. Consequently, [5] introduces a family of robust rank correlation measures. The idea
is to replace the classical ordering of real numbers used in Goodman’s and Kruskal’s gamma [7]
by some fuzzy ordering [8, 3, 4] with smooth transitions — thereby ensuring that the correlation
measure is continuous with respect to the data.

cited from "RoCoCo-An R Package Implementing a Robust Rank Correlation Coefficient and a Corresponding Test".

没有评论:

发表评论