Ferkans — Interactive Telecom Tutor

Measuring Linear Dependence

Independence is an all-or-nothing property. In practice, we often need a quantitative measure of how strongly two random variables are related. The covariance and correlation coefficient provide exactly this — they measure the strength and direction of the linear relationship between $X$ and $Y$ . These quantities are central to estimation theory, principal component analysis, and the definition of wide-sense stationarity for stochastic processes.

Definition:
Covariance

The covariance of random variables $X$ and $Y$ is

$\text{Cov}(X, Y) = \mathbb{E}\bigl[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])\bigr] = \mathbb{E}[XY] - \mathbb{E}[X]\,\mathbb{E}[Y].$

Properties:

$\text{Cov}(X, X) = \text{Var}(X)$ .
$\text{Cov}(X, Y) = \text{Cov}(Y, X)$ (symmetry).
$\text{Cov}(aX + b, cY + d) = ac\,\text{Cov}(X, Y)$ .
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y)$ .

If $X$ and $Y$ are independent, $\text{Cov}(X, Y) = 0$ (but not conversely in general).

Definition:
Correlation Coefficient

The correlation coefficient (Pearson's $\rho$ ) of $X$ and $Y$ , assuming both have positive variance, is

$\rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)\,\text{Var}(Y)}}.$

The correlation coefficient satisfies $-1 \le \rho_{X,Y} \le 1$ .

$\rho_{X,Y} = \pm 1$ if and only if $Y$ is an affine function of $X$ (i.e., $Y = aX + b$ with probability 1). The sign of $\rho$ indicates the direction: positive means $Y$ tends to increase with $X$ ; negative means $Y$ tends to decrease.

Theorem: Cauchy–Schwarz Bound on Correlation

For any random variables $X, Y$ with finite second moments:

$|\text{Cov}(X, Y)| \le \sqrt{\text{Var}(X) \cdot \text{Var}(Y)},$

or equivalently $|\rho_{X,Y}| \le 1$ . Equality holds iff $Y = aX + b$ a.s. for some constants $a, b$ with $a \ne 0$ .

Proof

Apply Cauchy–Schwarz

Define $\tilde{X} = X - \mathbb{E}[X]$ and $\tilde{Y} = Y - \mathbb{E}[Y]$ . The Cauchy–Schwarz inequality for the inner product $\langle U, V \rangle = \mathbb{E}[UV]$ gives

$|\mathbb{E}[\tilde{X}\tilde{Y}]|^2 \le \mathbb{E}[\tilde{X}^2]\,\mathbb{E}[\tilde{Y}^2],$

which is $|\text{Cov}(X,Y)|^2 \le \text{Var}(X)\,\text{Var}(Y)$ .

Equality condition

Equality holds iff $\tilde{Y} = c\tilde{X}$ a.s., i.e., $Y - \mathbb{E}[Y] = c(X - \mathbb{E}[X])$ , which gives $Y = cX + (E[Y] - cE[X])$ , an affine function. $\blacksquare$

Theorem: Variance of a Sum

For any random variables $X_1, \ldots, X_n$ :

$\text{Var}\!\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i) + 2\sum_{i < j} \text{Cov}(X_i, X_j).$

If $X_1, \ldots, X_n$ are pairwise uncorrelated (in particular, if independent), the cross terms vanish:

$\text{Var}\!\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i).$

Proof

Expand

$\text{Var}\!\left(\sum_i X_i\right) = \mathbb{E}\!\left[\left(\sum_i (X_i - \mu_i)\right)^2\right] = \sum_i \sum_j \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)] = \sum_i \sum_j \text{Cov}(X_i, X_j).$ $

Separate diagonal and off-diagonal

The diagonal terms ( $i = j$ ) give $\sum_i \text{Var}(X_i)$ . The off-diagonal terms pair as $\text{Cov}(X_i, X_j) + \text{Cov}(X_j, X_i) = 2\text{Cov}(X_i, X_j)$ for $i < j$ . If pairwise uncorrelated, all cross terms are zero. $\blacksquare$

Scatter Plot and Correlation Coefficient

Generate $n$ samples from a bivariate Gaussian with correlation $\rho$ and observe the scatter pattern. The empirical correlation coefficient is displayed.

Parameters

\rho

0

n

(samples)500

Example: Computing Covariance — Dice Example

Roll two fair dice. Let $X$ be the result of the first die and $S = X + Y$ where $Y$ is the result of the second die. Compute $\text{Cov}(X, S)$ and $\rho_{X,S}$ .

Solution

Use linearity

$\text{Cov}(X, S) = \text{Cov}(X, X + Y) = \text{Cov}(X, X) + \text{Cov}(X, Y) = \text{Var}(X) + 0 = \text{Var}(X)$ ,

since $X$ and $Y$ are independent, so $\text{Cov}(X, Y) = 0$ .

Compute variances

For a fair die: $\mathbb{E}[X] = 7/2$ , $\mathbb{E}[X^2] = 91/6$ , so $\text{Var}(X) = 91/6 - 49/4 = 35/12$ . Also $\text{Var}(S) = \text{Var}(X) + \text{Var}(Y) = 70/12 = 35/6$ .

Correlation coefficient

$\rho_{X,S} = \frac{\text{Var}(X)}{\sqrt{\text{Var}(X) \cdot \text{Var}(S)}} = \sqrt{\frac{\text{Var}(X)}{\text{Var}(S)}} = \sqrt{\frac{35/12}{35/6}} = \sqrt{\frac{1}{2}} = \frac{1}{\sqrt{2}} \approx 0.707.$ $Positive correlation makes sense: increasing$ X $increases$ S$.

Covariance

A measure of the joint variability of two random variables: $\text{Cov}(X,Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$ . Positive covariance means the variables tend to move together.

Related: Independent random variables

Correlation coefficient

The normalized covariance $\rho_{X,Y} = \text{Cov}(X,Y)/\sqrt{\text{Var}(X)\text{Var}(Y)}$ , always in $[-1, 1]$ . It measures the strength and direction of the linear relationship between two random variables.

Related: Covariance

Historical Note: Pearson's Correlation Coefficient

1896

Karl Pearson introduced the product-moment correlation coefficient in 1896, building on Francis Galton's earlier work on regression. Galton had observed that the heights of children "regress toward the mean" relative to their parents — and the correlation coefficient $\rho$ quantifies precisely how much. Pearson's contribution was to define $\rho$ as a dimensionless quantity, bounded between $-1$ and $1$ , that is invariant under affine scaling of either variable. This simple idea became one of the most widely used statistics in all of science.

⚠️Engineering Note

Correlation Is Not Causation, and Not Even Full Dependence

The correlation coefficient measures only the linear component of the relationship between $X$ and $Y$ . Two variables can have $\rho = 0$ yet be perfectly dependent (e.g., $X \sim \mathcal{N}(0,1)$ and $Y = X^2$ ). In modern practice, measures of statistical dependence such as mutual information, distance correlation, or maximal information coefficient capture nonlinear relationships. However, correlation remains the dominant tool in linear signal processing because for Gaussian variables, uncorrelated implies independent.

Key Takeaway

Covariance $\text{Cov}(X,Y)$ measures linear co-movement; the correlation coefficient $\rho$ normalizes it to $[-1, 1]$ . For independent RVs, $\text{Cov}(X,Y) = 0$ and the variance of a sum equals the sum of variances. The converse (uncorrelated implies independent) holds only for jointly Gaussian variables.

Covariance and Correlation

Measuring Linear Dependence

Definition: Covariance

Definition: Correlation Coefficient

Theorem: Cauchy–Schwarz Bound on Correlation

Apply Cauchy–Schwarz

Equality condition

Theorem: Variance of a Sum

Expand

Separate diagonal and off-diagonal

Scatter Plot and Correlation Coefficient

Parameters

Example: Computing Covariance — Dice Example

Use linearity

Compute variances

Correlation coefficient

Covariance

Correlation coefficient

Historical Note: Pearson's Correlation Coefficient

Correlation Is Not Causation, and Not Even Full Dependence

Key Takeaway

Definition:
Covariance

Definition:
Correlation Coefficient