Variance and Higher Moments

The Spread Around the Mean

The expectation tells us where the center of a distribution is, but says nothing about how spread out the values are. A random variable that is always equal to its mean and one that fluctuates wildly can have the same expectation. The variance quantifies this spread. It is defined as the expected squared deviation from the mean, and it plays a central role in everything from the central limit theorem to the design of communication systems.

Definition:

Variance and Standard Deviation

The variance of a random variable XX with mean μ=E[X]\mu = \mathbb{E}[X] is

Var(X)E ⁣[(Xμ)2]=xX(xμ)2P(x).\text{Var}(X) \triangleq \mathbb{E}\!\left[(X - \mu)^2\right] = \sum_{x \in \mathcal{X}} (x - \mu)^2 \, P(x).

The standard deviation is σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}, which has the same units as XX.

The variance is always non-negative, and Var(X)=0\text{Var}(X) = 0 if and only if XX is a constant (i.e., P(X=c)=1\mathbb{P}(X = c) = 1 for some cc).

,

Theorem: Variance Shortcut Formula

For any random variable XX with finite second moment:

Var(X)=E[X2](E[X])2.\text{Var}(X) = \mathbb{E}[X^2] - \left(\mathbb{E}[X]\right)^2.

This "computational formula" is almost always easier to use than the definition, because computing E[X2]\mathbb{E}[X^2] via LOTUS and E[X]\mathbb{E}[X] separately is typically simpler than computing E[(Xμ)2]\mathbb{E}[(X - \mu)^2] directly.

Theorem: Variance Under Affine Transformation

For constants a,bRa, b \in \mathbb{R}:

Var(aX+b)=a2Var(X).\text{Var}(aX + b) = a^2 \, \text{Var}(X).

Adding a constant shifts the mean but does not affect the spread.

Definition:

Moments and Central Moments

The kk-th moment of XX is E[Xk]\mathbb{E}[X^k] (if it exists). The kk-th central moment is E[(Xμ)k]\mathbb{E}[(X - \mu)^k].

  • k=1k = 1: mean (first moment).
  • k=2k = 2: the second central moment is the variance.
  • k=3k = 3: the third central moment, normalized by σ3\sigma^3, gives the skewness γ1=E[(Xμ)3]/σ3\gamma_1 = \mathbb{E}[(X - \mu)^3] / \sigma^3, measuring asymmetry.
  • k=4k = 4: the fourth central moment, normalized by σ4\sigma^4, gives the kurtosis κ=E[(Xμ)4]/σ4\kappa = \mathbb{E}[(X - \mu)^4] / \sigma^4, measuring tail heaviness (κ=3\kappa = 3 for the Gaussian; excess kurtosis =κ3= \kappa - 3).

Theorem: Variance of a Sum of Independent Random Variables

If X1,,XnX_1, \ldots, X_n are independent random variables, then

Var ⁣(i=1nXi)=i=1nVar(Xi).\text{Var}\!\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i).

Unlike linearity of expectation, this property requires independence (or at least uncorrelatedness). When variables are positively correlated, the variance of the sum exceeds the sum of the variances; when negatively correlated, it can be smaller.

,

Example: Variance of the Bernoulli Distribution

Compute Var(X)\text{Var}(X) for XBernoulli(p)X \sim \text{Bernoulli}(p).

Bernoulli Variance p(1p)p(1-p)

The variance Var(X)=p(1p)\text{Var}(X) = p(1-p) of a Bernoulli RV as a function of pp. Maximum uncertainty (variance) occurs at p=1/2p = 1/2.

Parameters
0.5

Common Mistake: Variance of a Sum Requires Independence

Mistake:

Applying Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) when XX and YY are dependent.

Correction:

The general formula is Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y). The covariance term vanishes only when XX and YY are uncorrelated (which is implied by, but weaker than, independence).

Variance and Heavy Tails

The variance measures the spread of a distribution, but it is sensitive only to the "body" of the distribution. For heavy-tailed distributions — where extreme values occur with non-negligible probability — the variance may be infinite or, even when finite, may not adequately capture the risk of extreme outcomes. In such cases, higher moments (if they exist) or quantile-based measures provide more informative summaries. This issue arises in network traffic modeling, where packet inter-arrival times can exhibit heavy-tailed behavior.

🔧Engineering Note

Noise Power Is Variance

In communication systems, the "noise power" of a zero-mean noise process is precisely its variance: σ2=Var(W)=E[W2]\sigma^2 = \text{Var}(W) = \mathbb{E}[W^2]. The signal-to-noise ratio (SNR) is the ratio of signal power to noise variance. Reducing noise variance (e.g., by bandwidth filtering or averaging) is the fundamental mechanism by which receivers improve performance.

Quick Check

What is Var(X+5)\text{Var}(X + 5) if Var(X)=9\text{Var}(X) = 9?

14

9

4

25

Common Mistake: Standard Deviation Has the Same Units as XX

Mistake:

Reporting the variance when the units of XX are, say, seconds — leading to statements like "the variance is 4 seconds" when the correct statement is "the variance is 4 seconds2^2" or equivalently "the standard deviation is 2 seconds."

Correction:

Variance has units of X2X^2. Standard deviation σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)} restores the original units and is more interpretable for practical purposes.

Variance

Var(X)=E[(XE[X])2]=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2. Measures the expected squared deviation from the mean.

Related: Expectation

Key Takeaway

The shortcut formula Var(X)=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 is almost always easier to compute than the definition. Remember: variance of a sum equals the sum of variances only for independent (or uncorrelated) random variables.