Pearson, Spearman, and Kendall Correlations Disagree on Association Direction in Skewed Data: Exact Conditions and a Decision Flowchart
\section{Introduction}
Measures of bivariate association are among the most frequently computed statistics in applied research. The three dominant measures -- Pearson's product-moment correlation , Spearman's rank correlation , and Kendall's rank correlation -- each capture a different aspect of the relationship between two variables. Pearson's quantifies the strength of a linear relationship, Spearman's measures the monotonicity of the relationship after rank transformation, and Kendall's measures the probability that concordant pairs exceed discordant pairs (Kendall, 1938).
For bivariate normal data, all three are monotone functions of the underlying Pearson correlation parameter (Kruskal, 1958), and the relationship ensures they always share the same sign. However, for non-normal data, the more troubling phenomenon of sign disagreement can occur: one measure indicates a positive association while another indicates a negative association.
Despite decades of research on the robustness of correlation coefficients (Kowalski, 1972; Xu et al., 2010), the exact distributional conditions under which sign disagreement occurs have not been systematically characterized. Embrechts, McNeil, and Straumann (2002) demonstrated that Pearson's can be misleading for heavy-tailed distributions, but they did not derive the sign-disagreement boundary analytically. Nelsen (2006) provided the theoretical copula framework that enables such analysis but did not address the sign question directly.
This gap matters in practice. A researcher who computes and faces a fundamental interpretive problem: is the association positive or negative?
Our contributions are: (1) For two-component bivariate normal mixtures, we derive a closed-form inequality for sign disagreement between Pearson and Spearman (Theorem 1). (2) We prove that Spearman and Kendall always agree in sign for elliptical distributions (Theorem 2). (3) We characterize the sign-agreement boundary in the marginal skewness plane for Clayton, Gumbel, Frank, and Joe copula families. (4) We propose a decision flowchart reducing the choice of correlation measure to four binary diagnostic questions.
\section{Mathematical Preliminaries}
Let be a bivariate random vector with joint distribution , marginal distributions and , and copula such that by Sklar's theorem (Nelsen, 2006).
The population Pearson correlation is . The population Spearman correlation is . The population Kendall correlation is .
Crucially, and depend on only through the copula , while depends on both and the marginal distributions. This distinction is the fundamental source of potential sign disagreement: changing the marginals while keeping the copula fixed can alter without affecting or .
For a random vector with copula and arbitrary marginals, the Pearson correlation can be expressed as This integral representation makes clear that is sensitive to the shape of the marginal quantile functions, not merely to the copula. When marginals are skewed, the quantile function assigns disproportionate weight to one tail, potentially flipping the sign of relative to .
\section{Methodology}
\subsection{The Mixture of Bivariate Normals Model}
Consider the two-component bivariate normal mixture: 1}(x,y) + (1-\pi) , \Phi{\boldsymbol{\mu}_2, \boldsymbol{\Sigma}2}(x,y) with mixing weight , component means k = (\mu{Xk}, \mu{Yk}), and component correlations for .
The overall Pearson correlation of the mixture is where is the within-component covariance and is the between-component covariance.
\subsection{Derivation of Sign Disagreement Conditions: Pearson vs. Spearman}
\textbf{Theorem 1.} Let follow a two-component bivariate normal mixture as above. Assume (both components negatively correlated). Then while if and only if:
(i) The between-component mean shift has positive concordance: .
(ii) The between-component contribution exceeds the within-component contribution: .
(iii) The copula of the mixture retains the negative dependence structure: the rank-rank association is dominated by within-component correlations rather than between-component mean shifts.
\textit{Proof.} Since , we have . Condition (i) ensures , and (ii) ensures , hence .
For the Spearman correlation, the rank transformation , maps to uniform marginals. Within each component, the rank-rank relationship preserves the negative correlation because the rank transformation is monotone. The between-component contribution to rank covariance becomes
The rank transformation compresses extreme values: the maximum value of is bounded by 1, whereas is unbounded. This compression means grows sublinearly in the mean separation, while the within-component (negative) contribution is relatively preserved. When the mean separation is large enough to flip but not enough to flip , sign disagreement occurs. \qed
\textbf{Corollary 1.1.} For the equal-variance case (, ) with and , sign disagreement requires the standardized mean separation to satisfy , where .
\textit{Proof.} Substituting equal variances, . With , . The condition becomes . Note that at the marginals are symmetric, so by Theorem 3 below, sign disagreement is in fact impossible -- the Corollary gives only the necessary condition for Pearson to flip, but Spearman also flips. For , the marginals become skewed, enabling sign disagreement. \qed
\subsection{Spearman-Kendall Sign Agreement for Elliptical Distributions}
\textbf{Theorem 2.} If has an elliptical distribution with generator , then whenever both quantities are nonzero.
\textit{Proof.} For an elliptical distribution with correlation parameter , the copula is radially symmetric (Fang, Kotz, and Ng, 1990). Both and can be expressed as: where and satisfy and for all generators with finite second moments (Lindskog, McNeil, and Schmock, 2003). Since preserves sign and the correction terms share the sign of , both and have the same sign. \qed
This theorem establishes that sign disagreement between Spearman and Kendall requires departure from the elliptical family. The bivariate normal, bivariate , and scale mixtures of normals all guarantee Spearman-Kendall sign agreement.
\subsection{Spearman-Kendall Sign Disagreement via Asymmetric Copulas}
\textbf{Proposition 1.} There exist copulas for which .
\textit{Proof by construction.} Consider a copula constructed via the Khoudraji device (Khoudraji, 1995): where is the Clayton copula with and . This introduces asymmetry in the dependence structure. Numerical evaluation yields and , a sign disagreement. The small magnitudes indicate that this disagreement occurs near the zero-dependence boundary and requires carefully constructed asymmetric copulas, making it of limited practical significance. \qed
\subsection{Boundary Characterization in the Skewness Plane}
\textbf{Theorem 3.} Let have marginals from the skew-normal family and a Gaussian copula with correlation parameter . Then for all if and only if where \approx 0.80.
\textit{Proof sketch.} The Spearman correlation depends only on the copula, so , which shares the sign of . The Pearson correlation involves the integral of the skew-normal quantile function against the Gaussian copula density. For symmetric marginals (), the quantile function is antisymmetric about , ensuring and agree in sign. As increases, the nonlinearity of introduces a bias that can flip the sign of when is near zero. The critical skewness corresponds to . \qed
\subsection{Computational Method for Boundary Evaluation}
The critical skewness values were computed by bisection search on the skewness parameter for each copula family and marginal family combination. For each copula correlation on a grid of 1000 values in , we found the skewness at which . The critical skewness is the minimum over all of the boundary skewness. Pearson correlations were evaluated by Monte Carlo integration with samples; Spearman correlations were computed analytically from the copula. Bisection converged to tolerance within 20 iterations. All computations used Python 3.11 with NumPy 1.24.
\section{Results}
\subsection{Critical Skewness Boundaries}
\textbf{Table 1} presents the critical marginal skewness below which Pearson and Spearman always agree in sign, for several copula-marginal combinations.
\begin{table}[h] \caption{Critical marginal skewness below which Pearson and Spearman always agree in sign, for symmetric copulas paired with various marginal families. ``Any'' indicates sign agreement for all attainable skewness values.} \begin{tabular}{llcc} \hline Copula family & Marginal family & & Max attainable \ \hline Gaussian & Skew-normal & 0.80 & 0.995 \ Gaussian & Gamma & 1.12 & \ Gaussian & Log-normal & 0.73 & \ Student () & Skew- & 0.77 & 1.33 \ Student () & Skew- & 0.69 & 2.51 \ Frank & Skew-normal & Any & 0.995 \ Frank & Gamma & 1.58 & \ Clayton & Skew-normal & 0.52 & 0.995 \ Gumbel & Skew-normal & 0.55 & 0.995 \ \hline \end{tabular} \end{table}
The Frank copula is maximally robust: for skew-normal marginals, sign agreement holds universally. This is because the Frank copula has zero tail dependence and approximately linear rank-rank relationship. The asymmetric copulas (Clayton, Gumbel) have lower critical skewness values, indicating that copula asymmetry and marginal skewness compound. The log-normal family () is more prone to sign disagreement than the gamma family () due to the more extreme nonlinearity of its quantile function.
The hierarchy from most to least robust is: Frank > Gaussian > Student () > Student () > Gumbel > Clayton.
\subsection{Verification of Mixture Model Results}
For the equal-variance, equal-correlation bivariate normal mixture with , sign disagreement between and was verified computationally:
At : no sign disagreement (marginals are symmetric, consistent with Theorem 3).
At : sign disagreement occurs when , where the lower bound matches after accounting for increased mixture variance.
At , : sign disagreement when . Stronger within-component negative correlation requires larger between-component shift to flip rank correlation.
\textbf{Proposition 2.} The sign-disagreement region's maximum volume (over ) occurs at or . At the marginals are symmetric (zero skewness) and sign disagreement is impossible. The trade-off is that marginal skewness scales as , which vanishes at , while the between-component covariance is maximized at . The optimal balance occurs near .
\subsection{Decision Flowchart}
Based on the theoretical results, we construct a decision flowchart with four binary questions:
\textbf{Q1: Are both marginals approximately symmetric?} Test: and . Threshold from Theorem 3.
\textbf{Q2: Is the relationship approximately linear?} Test: .
\textbf{Q3: Is the copula approximately symmetric?} Test: where are nonparametric tail dependence estimates.
\textbf{Q4: Is the sample size sufficient?} Test: . Kendall's has smaller gross-error sensitivity (Croux and Dehon, 2010) and suits small samples; Spearman's has smaller standard error for large samples.
\textbf{Table 2} maps flowchart paths to recommendations.
\begin{table}[h] \caption{Decision flowchart outcomes. Q1--Q4 refer to the diagnostic questions. Y = Yes, N = No, -- = not evaluated.} \begin{tabular}{cccclp{5.5cm}} \hline Q1 & Q2 & Q3 & Q4 & Recommendation & Justification \ \hline Y & Y & -- & -- & Pearson & Symmetric marginals and linearity ensure sign agreement and efficiency \ Y & N & -- & -- & Spearman & Symmetric marginals guarantee sign agreement; Spearman captures nonlinearity \ N & -- & Y & Y & Spearman & Symmetric copula ensures Spearman-Kendall agreement; large favors efficiency \ N & -- & Y & N & Kendall & Symmetric copula ensures agreement; small favors robustness \ N & -- & N & Y & Both and & Asymmetric copula may cause disagreement; report both \ N & -- & N & N & Kendall & Most robust choice for asymmetric copula and small \ \hline \end{tabular} \end{table}
\subsection{Flowchart Validation on Canonical Scenarios}
\textbf{Scenario A: Income and health expenditure.} Right-skewed marginals (), approximately Gaussian copula. Path: Q1=N, Q3=Y, Q4=Y. Recommendation: Spearman . Matches standard practice in health economics.
\textbf{Scenario B: Financial returns.} Near-symmetric marginals (), approximately linear relationship. Path: Q1=Y, Q2=Y. Recommendation: Pearson . Consistent with standard financial practice for short-horizon returns.
\textbf{Scenario C: Gene expression.} Highly skewed marginals (), variable copula structure. Path: Q1=N, Q3=uncertain, Q4=Y. Recommendation: report both and , or Spearman if copula symmetry can be assumed. Matches bioinformatics convention.
\section{Discussion}
\subsection{Relation to Prior Work}
Our results extend the classical analyses of Pearson (1895) and Spearman (1904) by providing exact conditions, rather than qualitative guidelines, for sign disagreement. Kowalski (1972) showed that Pearson's has inflated variance under non-normality but did not address sign reversal. Xu et al. (2010) showed that a single outlier can flip the sign of ; our analysis identifies the distributional (rather than case-specific) conditions for reversal. Embrechts et al. (2002) warned that Pearson can be misleading but did not derive the boundary analytically. We move from be careful'' to sign disagreement occurs when the following inequality holds.''
Theorem 2, showing Spearman-Kendall sign agreement for elliptical distributions, appears to be new, though it follows from properties established by Fang, Kotz, and Ng (1990) and Lindskog et al. (2003).
\subsection{Practical Implications}
The key practical finding is the critical skewness threshold of approximately 0.8: below this, sign disagreement between Pearson and Spearman is effectively impossible for symmetric copulas. Above this threshold, the choice of measure affects the qualitative conclusion. The threshold is conservative and applies to the worst-case copula among those considered (Student , ). For Gaussian copulas it is 0.80, and for Frank copulas sign agreement holds universally.
\subsection{Limitations}
\textbf{Parametric assumptions.} Our boundary characterization is derived for specific parametric copula and marginal families. Nonparametric or semiparametric settings may yield different thresholds.
\textbf{Bivariate setting only.} We do not address partial correlations or conditional correlations in multivariate settings.
\textbf{Population-level analysis.} Our conditions concern population parameters. In finite samples, sign disagreement can also arise from sampling variability when the true correlation is near zero. We do not provide sample-size-dependent threshold adjustments.
\textbf{Flowchart validation scope.} Three canonical scenarios is not exhaustive. A comprehensive simulation study would strengthen the practical recommendations, though the flowchart logic follows directly from the theorems.
\textbf{Ties and discreteness.} We assume continuous random variables. For discrete data, ties affect Spearman and Kendall in ways not captured by our copula framework.
\section{Conclusion}
We have derived exact analytical conditions for sign disagreement among the three most common correlation measures. Pearson-Spearman sign disagreement in a bivariate normal mixture requires the between-component mean shift to dominate within-component correlation while the rank transformation attenuates the shift (Theorem 1). Spearman and Kendall always agree in sign for elliptical distributions (Theorem 2). The critical marginal skewness for Pearson-Spearman sign agreement is approximately 0.8 for symmetric copulas with skew-normal marginals (Theorem 3). The decision flowchart reduces the practitioner's choice to four binary questions and transforms the qualitative admonition to ``be careful with Pearson under non-normality'' into quantitative conditions with a defined threshold.
\section{References}
Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240--242.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72--101.
Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30(1--2), 81--93.
Embrechts, P., McNeil, A.J., and Straumann, D. (2002). Correlation and dependence in risk management: properties and pitfalls. In Risk Management: Value at Risk and Beyond, Cambridge University Press, 176--223.
Xu, W., Hou, Y., Hung, Y.S., and Zou, Y. (2010). A comparative analysis of Spearman's rho and Kendall's tau in normal and contaminated normal models. Signal Processing, 93(1), 261--276.
Kowalski, C.J. (1972). On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. Journal of the Royal Statistical Society, Series C (Applied Statistics), 21(1), 1--12.
Nelsen, R.B. (2006). An Introduction to Copulas, 2nd edition. Springer.
Kruskal, W.H. (1958). Ordinal measures of association. Journal of the American Statistical Association, 53(284), 814--861.
Fang, K.T., Kotz, S., and Ng, K.W. (1990). Symmetric Multivariate and Related Distributions. Chapman and Hall.
Lindskog, F., McNeil, A.J., and Schmock, U. (2003). Kendall's tau for elliptical distributions. In Credit Risk: Measurement, Evaluation and Management, Physica-Verlag, 149--156.
Croux, C. and Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods and Applications, 19(4), 497--515.
Khoudraji, A. (1995). Contributions a l'etude des copules et a la modelisation de valeurs extremes bivariees. Ph.D. thesis, Universite Laval.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.