Conditional Variance and the Law of Total Variance

Decomposing Uncertainty

A fundamental question in statistics and signal processing: when we observe YY, how much of the variability in XX does YY "explain"? The law of total variance gives a clean decomposition of Var(X)\text{Var}(X) into two interpretable terms — one measuring the variability explained by YY, the other measuring what remains unexplained.

Definition:

Conditional Variance

The conditional variance of XX given YY is the random variable

Var(XY)=E[(XE[XY])2Y]=E[X2Y](E[XY])2.\text{Var}(X|Y) = \mathbb{E}\bigl[(X - \mathbb{E}[X|Y])^2 \,\big|\, Y\bigr] = \mathbb{E}[X^2|Y] - (\mathbb{E}[X|Y])^2.

For each fixed value Y=yY = y, Var(XY=y)\text{Var}(X|Y=y) is the variance of the conditional distribution of XX given Y=yY = y. Since YY is random, Var(XY)\text{Var}(X|Y) is a random variable (a function of YY).

For jointly Gaussian (X,Y)(X,Y), the conditional variance Var(XY)=σX2(1ρ2)\text{Var}(X|Y) = \sigma_X^2(1 - \rho^2) does not depend on YY — it is a constant. This is a special (and very convenient) Gaussian property.

Theorem: Law of Total Variance (Eve's Law)

For any random variables XX and YY with E[X2]<\mathbb{E}[X^2] < \infty:

Var(X)=E[Var(XY)]unexplained variance+Var(E[XY])explained variance.\text{Var}(X) = \underbrace{\mathbb{E}[\text{Var}(X|Y)]}_{\text{unexplained variance}} + \underbrace{\text{Var}(\mathbb{E}[X|Y])}_{\text{explained variance}}.

The total variance of XX splits into two parts:

  • E[Var(XY)]\mathbb{E}[\text{Var}(X|Y)]: the average residual uncertainty in XX after observing YY — what YY cannot explain.
  • Var(E[XY])\text{Var}(\mathbb{E}[X|Y]): how much the conditional mean E[XY]\mathbb{E}[X|Y] itself varies as YY changes — what YY does explain.

If YY is very informative about XX, the explained variance is large and the unexplained variance is small (and vice versa).

,

Example: Total Variance for a Hierarchical Model

Let ΛGamma(α,β)\Lambda \sim \text{Gamma}(\alpha, \beta) and XΛPoisson(Λ)X | \Lambda \sim \text{Poisson}(\Lambda). Find Var(X)\text{Var}(X) using the law of total variance.

Example: Variance Decomposition in Fading Channels

Let the received signal power be P=H2PtP = |H|^2 P_t where HH is a Rayleigh fading channel (H2Exp(1)|H|^2 \sim \text{Exp}(1)) and PtP_t is the transmit power, uniformly distributed on [Pmin,Pmax][P_{\min}, P_{\max}] independently of HH. Decompose Var(P)\text{Var}(P) using the law of total variance, conditioning on PtP_t.

Law of Total Variance: Decomposition Visualization

Visualize the decomposition Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \text{Var}(\mathbb{E}[X|Y]) for different joint distributions. A bar chart shows total, explained, and unexplained variance. Adjust the "informativeness" of YY to see how the decomposition shifts.

Parameters
0.7

How informative $Y$ is about $X$

Theorem: Law of Total Expectation (Review)

For completeness, recall the law of total expectation (tower property):

E[X]=E[E[XY]].\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]].

Together with the law of total variance, these form the "iterated conditioning" toolkit for computing moments via conditioning.

Average the conditional mean over YY to get the unconditional mean. Average the conditional variance and add the variance of the conditional mean to get the unconditional variance. The pattern extends to higher moments via more elaborate decompositions.

Definition:

Explained Variance Ratio

The explained variance ratio (or coefficient of determination) is

η2=Var(E[XY])Var(X)=1E[Var(XY)]Var(X).\eta^2 = \frac{\text{Var}(\mathbb{E}[X|Y])}{\text{Var}(X)} = 1 - \frac{\mathbb{E}[\text{Var}(X|Y)]}{\text{Var}(X)}.

It measures the fraction of the total variance of XX that is "explained" by YY. For the linear case, η2=ρXY2\eta^2 = \rho_{XY}^2 (the squared correlation coefficient, i.e., the R2R^2 of regression).

In general, η2ρXY2\eta^2 \geq \rho_{XY}^2. Equality holds when E[XY]\mathbb{E}[X|Y] is linear in YY, which is the case for jointly Gaussian (X,Y)(X,Y).

Common Mistake: Conditional Variance Is Not Always Constant

Mistake:

Assuming Var(XY)\text{Var}(X|Y) is a constant (not depending on YY). This is true for the Gaussian case but false in general.

Correction:

For non-Gaussian distributions, Var(XY=y)\text{Var}(X|Y=y) can depend on yy. For example, if XY=yPoisson(y)X|Y=y \sim \text{Poisson}(y) and Y>0Y > 0, then Var(XY)=Y\text{Var}(X|Y) = Y, which varies with YY. The property that Var(XY)\text{Var}(X|Y) is constant (homoscedasticity) is special to the Gaussian family and a few others.

Key Takeaway

The law of total variance Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \text{Var}(\mathbb{E}[X|Y]) is the fundamental tool for analyzing how much of the variability in XX can be attributed to YY. It connects estimation theory (the MMSE is E[Var(XY)]\mathbb{E}[\text{Var}(X|Y)]) to statistical analysis (the explained variance ratio η2\eta^2).

Quick Check

If E[XY]=c\mathbb{E}[X|Y] = c (constant), what does the law of total variance tell us?

Var(X)=E[Var(XY)]\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)]

Var(X)=0\text{Var}(X) = 0

Var(X)=Var(E[XY])\text{Var}(X) = \text{Var}(\mathbb{E}[X|Y])

Var(XY)=Var(X)\text{Var}(X|Y) = \text{Var}(X) for all YY

Connection to ANOVA

The law of total variance is the population-level version of the analysis of variance (ANOVA) decomposition in statistics. In ANOVA, the total sum of squares is split into "between-group" (explained) and "within-group" (unexplained) components. The law of total variance provides the theoretical justification for this decomposition.