Conditional Variance and the Law of Total Variance
Decomposing Uncertainty
A fundamental question in statistics and signal processing: when we observe , how much of the variability in does "explain"? The law of total variance gives a clean decomposition of into two interpretable terms — one measuring the variability explained by , the other measuring what remains unexplained.
Definition: Conditional Variance
Conditional Variance
The conditional variance of given is the random variable
For each fixed value , is the variance of the conditional distribution of given . Since is random, is a random variable (a function of ).
For jointly Gaussian , the conditional variance does not depend on — it is a constant. This is a special (and very convenient) Gaussian property.
Theorem: Law of Total Variance (Eve's Law)
For any random variables and with :
The total variance of splits into two parts:
- : the average residual uncertainty in after observing — what cannot explain.
- : how much the conditional mean itself varies as changes — what does explain.
If is very informative about , the explained variance is large and the unexplained variance is small (and vice versa).
Start from the definition
.
We will compute by conditioning on .
Apply the tower property to $X^2$
.
Now .
Therefore .
Use the tower property for $\mathbb{E}[X]$
.
Combining:
The last two terms are .
Example: Total Variance for a Hierarchical Model
Let and . Find using the law of total variance.
Compute $\mathbb{E}[X|\Lambda]$ and $\ntn{var}(X|\Lambda)$
For a Poisson random variable with parameter : and .
Apply the law of total variance
$
Interpret
The first term is the average Poisson variance (unexplained), and the second term reflects the variability in the Poisson rate itself (explained by ). Together, they give the negative binomial variance, since marginally follows a negative binomial distribution.
Example: Variance Decomposition in Fading Channels
Let the received signal power be where is a Rayleigh fading channel () and is the transmit power, uniformly distributed on independently of . Decompose using the law of total variance, conditioning on .
Compute the conditional moments
Given : and (since for ).
Apply the decomposition
P_t \sim \text{Uniform}[P_{\min}, P_{\max}]\mathbb{E}[P_t^2] = (P_{\min}^2 + P_{\min}P_{\max} + P_{\max}^2)/3\text{Var}(P_t) = (P_{\max} - P_{\min})^2/12$.
Interpret
The unexplained variance () comes from fading — even if we knew , the random channel still causes variability. The explained variance () comes from the varying transmit power. A power control scheme that reduces reduces the explained component but cannot touch the fading component.
Law of Total Variance: Decomposition Visualization
Visualize the decomposition for different joint distributions. A bar chart shows total, explained, and unexplained variance. Adjust the "informativeness" of to see how the decomposition shifts.
Parameters
How informative $Y$ is about $X$
Theorem: Law of Total Expectation (Review)
For completeness, recall the law of total expectation (tower property):
Together with the law of total variance, these form the "iterated conditioning" toolkit for computing moments via conditioning.
Average the conditional mean over to get the unconditional mean. Average the conditional variance and add the variance of the conditional mean to get the unconditional variance. The pattern extends to higher moments via more elaborate decompositions.
Already proved
See TTower Property (Law of Iterated Expectations) in Section 12.1 for the full proof.
Definition: Explained Variance Ratio
Explained Variance Ratio
The explained variance ratio (or coefficient of determination) is
It measures the fraction of the total variance of that is "explained" by . For the linear case, (the squared correlation coefficient, i.e., the of regression).
In general, . Equality holds when is linear in , which is the case for jointly Gaussian .
Common Mistake: Conditional Variance Is Not Always Constant
Mistake:
Assuming is a constant (not depending on ). This is true for the Gaussian case but false in general.
Correction:
For non-Gaussian distributions, can depend on . For example, if and , then , which varies with . The property that is constant (homoscedasticity) is special to the Gaussian family and a few others.
Key Takeaway
The law of total variance is the fundamental tool for analyzing how much of the variability in can be attributed to . It connects estimation theory (the MMSE is ) to statistical analysis (the explained variance ratio ).
Quick Check
If (constant), what does the law of total variance tell us?
for all
If , then . So . All the variance is "unexplained" — cannot explain any of the variability in (at least not through the mean).
Connection to ANOVA
The law of total variance is the population-level version of the analysis of variance (ANOVA) decomposition in statistics. In ANOVA, the total sum of squares is split into "between-group" (explained) and "within-group" (unexplained) components. The law of total variance provides the theoretical justification for this decomposition.