Covariance and Correlation

Measuring Linear Dependence

Independence is an all-or-nothing property. In practice, we often need a quantitative measure of how strongly two random variables are related. The covariance and correlation coefficient provide exactly this — they measure the strength and direction of the linear relationship between XX and YY. These quantities are central to estimation theory, principal component analysis, and the definition of wide-sense stationarity for stochastic processes.

Definition:

Covariance

The covariance of random variables XX and YY is

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y].\text{Cov}(X, Y) = \mathbb{E}\bigl[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])\bigr] = \mathbb{E}[XY] - \mathbb{E}[X]\,\mathbb{E}[Y].

Properties:

  1. Cov(X,X)=Var(X)\text{Cov}(X, X) = \text{Var}(X).
  2. Cov(X,Y)=Cov(Y,X)\text{Cov}(X, Y) = \text{Cov}(Y, X) (symmetry).
  3. Cov(aX+b,cY+d)=acCov(X,Y)\text{Cov}(aX + b, cY + d) = ac\,\text{Cov}(X, Y).
  4. Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y).

If XX and YY are independent, Cov(X,Y)=0\text{Cov}(X, Y) = 0 (but not conversely in general).

Definition:

Correlation Coefficient

The correlation coefficient (Pearson's ρ\rho) of XX and YY, assuming both have positive variance, is

ρX,Y=Cov(X,Y)Var(X)Var(Y).\rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)\,\text{Var}(Y)}}.

The correlation coefficient satisfies 1ρX,Y1-1 \le \rho_{X,Y} \le 1.

ρX,Y=±1\rho_{X,Y} = \pm 1 if and only if YY is an affine function of XX (i.e., Y=aX+bY = aX + b with probability 1). The sign of ρ\rho indicates the direction: positive means YY tends to increase with XX; negative means YY tends to decrease.

Theorem: Cauchy–Schwarz Bound on Correlation

For any random variables X,YX, Y with finite second moments:

Cov(X,Y)Var(X)Var(Y),|\text{Cov}(X, Y)| \le \sqrt{\text{Var}(X) \cdot \text{Var}(Y)},

or equivalently ρX,Y1|\rho_{X,Y}| \le 1. Equality holds iff Y=aX+bY = aX + b a.s. for some constants a,ba, b with a0a \ne 0.

Theorem: Variance of a Sum

For any random variables X1,,XnX_1, \ldots, X_n:

Var ⁣(i=1nXi)=i=1nVar(Xi)+2i<jCov(Xi,Xj).\text{Var}\!\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i) + 2\sum_{i < j} \text{Cov}(X_i, X_j).

If X1,,XnX_1, \ldots, X_n are pairwise uncorrelated (in particular, if independent), the cross terms vanish:

Var ⁣(i=1nXi)=i=1nVar(Xi).\text{Var}\!\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i).

Scatter Plot and Correlation Coefficient

Generate nn samples from a bivariate Gaussian with correlation ρ\rho and observe the scatter pattern. The empirical correlation coefficient is displayed.

Parameters
0
500

Example: Computing Covariance — Dice Example

Roll two fair dice. Let XX be the result of the first die and S=X+YS = X + Y where YY is the result of the second die. Compute Cov(X,S)\text{Cov}(X, S) and ρX,S\rho_{X,S}.

Covariance

A measure of the joint variability of two random variables: Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]. Positive covariance means the variables tend to move together.

Related: Independent random variables

Correlation coefficient

The normalized covariance ρX,Y=Cov(X,Y)/Var(X)Var(Y)\rho_{X,Y} = \text{Cov}(X,Y)/\sqrt{\text{Var}(X)\text{Var}(Y)}, always in [1,1][-1, 1]. It measures the strength and direction of the linear relationship between two random variables.

Related: Covariance

Historical Note: Pearson's Correlation Coefficient

1896

Karl Pearson introduced the product-moment correlation coefficient in 1896, building on Francis Galton's earlier work on regression. Galton had observed that the heights of children "regress toward the mean" relative to their parents — and the correlation coefficient ρ\rho quantifies precisely how much. Pearson's contribution was to define ρ\rho as a dimensionless quantity, bounded between 1-1 and 11, that is invariant under affine scaling of either variable. This simple idea became one of the most widely used statistics in all of science.

⚠️Engineering Note

Correlation Is Not Causation, and Not Even Full Dependence

The correlation coefficient measures only the linear component of the relationship between XX and YY. Two variables can have ρ=0\rho = 0 yet be perfectly dependent (e.g., XN(0,1)X \sim \mathcal{N}(0,1) and Y=X2Y = X^2). In modern practice, measures of statistical dependence such as mutual information, distance correlation, or maximal information coefficient capture nonlinear relationships. However, correlation remains the dominant tool in linear signal processing because for Gaussian variables, uncorrelated implies independent.

Key Takeaway

Covariance Cov(X,Y)\text{Cov}(X,Y) measures linear co-movement; the correlation coefficient ρ\rho normalizes it to [1,1][-1, 1]. For independent RVs, Cov(X,Y)=0\text{Cov}(X,Y) = 0 and the variance of a sum equals the sum of variances. The converse (uncorrelated implies independent) holds only for jointly Gaussian variables.