Ferkans — Interactive Telecom Tutor

Decomposing Uncertainty

A fundamental question in statistics and signal processing: when we observe $Y$ , how much of the variability in $X$ does $Y$ "explain"? The law of total variance gives a clean decomposition of $\text{Var}(X)$ into two interpretable terms — one measuring the variability explained by $Y$ , the other measuring what remains unexplained.

Definition:
Conditional Variance

The conditional variance of $X$ given $Y$ is the random variable

$\text{Var}(X|Y) = \mathbb{E}\bigl[(X - \mathbb{E}[X|Y])^2 \,\big|\, Y\bigr] = \mathbb{E}[X^2|Y] - (\mathbb{E}[X|Y])^2.$

For each fixed value $Y = y$ , $\text{Var}(X|Y=y)$ is the variance of the conditional distribution of $X$ given $Y = y$ . Since $Y$ is random, $\text{Var}(X|Y)$ is a random variable (a function of $Y$ ).

For jointly Gaussian $(X,Y)$ , the conditional variance $\text{Var}(X|Y) = \sigma_X^2(1 - \rho^2)$ does not depend on $Y$ — it is a constant. This is a special (and very convenient) Gaussian property.

Theorem: Law of Total Variance (Eve's Law)

For any random variables $X$ and $Y$ with $\mathbb{E}[X^2] < \infty$ :

$\text{Var}(X) = \underbrace{\mathbb{E}[\text{Var}(X|Y)]}_{\text{unexplained variance}} + \underbrace{\text{Var}(\mathbb{E}[X|Y])}_{\text{explained variance}}.$

The total variance of $X$ splits into two parts:

$\mathbb{E}[\text{Var}(X|Y)]$ : the average residual uncertainty in $X$ after observing $Y$ — what $Y$ cannot explain.
$\text{Var}(\mathbb{E}[X|Y])$ : how much the conditional mean $\mathbb{E}[X|Y]$ itself varies as $Y$ changes — what $Y$ does explain.

If $Y$ is very informative about $X$ , the explained variance is large and the unexplained variance is small (and vice versa).

Proof

Start from the definition

$\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ .

We will compute $\mathbb{E}[X^2]$ by conditioning on $Y$ .

Apply the tower property to $X^2$

$\mathbb{E}[X^2] = \mathbb{E}[\mathbb{E}[X^2|Y]]$ .

Now $\mathbb{E}[X^2|Y] = \text{Var}(X|Y) + (\mathbb{E}[X|Y])^2$ .

Therefore $\mathbb{E}[X^2] = \mathbb{E}[\text{Var}(X|Y)] + \mathbb{E}[(\mathbb{E}[X|Y])^2]$ .

Use the tower property for $\mathbb{E}[X]$

$(\mathbb{E}[X])^2 = (\mathbb{E}[\mathbb{E}[X|Y]])^2$ .

Combining: $\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \mathbb{E}[(\mathbb{E}[X|Y])^2] - (\mathbb{E}[\mathbb{E}[X|Y]])^2.$

The last two terms are $\text{Var}(\mathbb{E}[X|Y])$ . $\blacksquare$

,

Example: Total Variance for a Hierarchical Model

Let $\Lambda \sim \text{Gamma}(\alpha, \beta)$ and $X | \Lambda \sim \text{Poisson}(\Lambda)$ . Find $\text{Var}(X)$ using the law of total variance.

Solution

Compute $\mathbb{E}[X|\Lambda]$ and $\ntn{var}(X|\Lambda)$

For a Poisson random variable with parameter $\Lambda$ : $\mathbb{E}[X|\Lambda] = \Lambda$ and $\text{Var}(X|\Lambda) = \Lambda$ .

Apply the law of total variance

$\text{Var}(X) = \mathbb{E}[\text{Var}(X|\Lambda)] + \text{Var}(\mathbb{E}[X|\Lambda]) = \mathbb{E}[\Lambda] + \text{Var}(\Lambda) = \frac{\alpha}{\beta} + \frac{\alpha}{\beta^2}.$ $

Interpret

The first term $\mathbb{E}[\Lambda]$ is the average Poisson variance (unexplained), and the second term $\text{Var}(\Lambda)$ reflects the variability in the Poisson rate itself (explained by $\Lambda$ ). Together, they give the negative binomial variance, since $X$ marginally follows a negative binomial distribution.

Example: Variance Decomposition in Fading Channels

Let the received signal power be $P = |H|^2 P_t$ where $H$ is a Rayleigh fading channel ( $|H|^2 \sim \text{Exp}(1)$ ) and $P_t$ is the transmit power, uniformly distributed on $[P_{\min}, P_{\max}]$ independently of $H$ . Decompose $\text{Var}(P)$ using the law of total variance, conditioning on $P_t$ .

Solution

Compute the conditional moments

Given $P_t$ : $\mathbb{E}[P|P_t] = P_t \mathbb{E}[|H|^2] = P_t$ and $\text{Var}(P|P_t) = P_t^2 \text{Var}(|H|^2) = P_t^2$ (since $\text{Var}(|H|^2) = 1$ for $|H|^2 \sim \text{Exp}(1)$ ).

Apply the decomposition

$\text{Var}(P) = \mathbb{E}[P_t^2] + \text{Var}(P_t).$ $With$ P_t \sim \text{Uniform}[P_{\min}, P_{\max}] $:$ \mathbb{E}[P_t^2] = (P_{\min}^2 + P_{\min}P_{\max} + P_{\max}^2)/3 $and$ \text{Var}(P_t) = (P_{\max} - P_{\min})^2/12$.

Interpret

The unexplained variance ( $\mathbb{E}[P_t^2]$ ) comes from fading — even if we knew $P_t$ , the random channel still causes variability. The explained variance ( $\text{Var}(P_t)$ ) comes from the varying transmit power. A power control scheme that reduces $\text{Var}(P_t)$ reduces the explained component but cannot touch the fading component.

Law of Total Variance: Decomposition Visualization

Visualize the decomposition $\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \text{Var}(\mathbb{E}[X|Y])$ for different joint distributions. A bar chart shows total, explained, and unexplained variance. Adjust the "informativeness" of $Y$ to see how the decomposition shifts.

Parameters

Model

|\rho|

or coupling strength0.7

How informative $Y$ is about $X$

Theorem: Law of Total Expectation (Review)

For completeness, recall the law of total expectation (tower property):

$\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]].$

Together with the law of total variance, these form the "iterated conditioning" toolkit for computing moments via conditioning.

Average the conditional mean over $Y$ to get the unconditional mean. Average the conditional variance and add the variance of the conditional mean to get the unconditional variance. The pattern extends to higher moments via more elaborate decompositions.

Proof

Already proved

See TTower Property (Law of Iterated Expectations) in Section 12.1 for the full proof. $\blacksquare$

Definition:
Explained Variance Ratio

The explained variance ratio (or coefficient of determination) is

$\eta^2 = \frac{\text{Var}(\mathbb{E}[X|Y])}{\text{Var}(X)} = 1 - \frac{\mathbb{E}[\text{Var}(X|Y)]}{\text{Var}(X)}.$

It measures the fraction of the total variance of $X$ that is "explained" by $Y$ . For the linear case, $\eta^2 = \rho_{XY}^2$ (the squared correlation coefficient, i.e., the $R^2$ of regression).

In general, $\eta^2 \geq \rho_{XY}^2$ . Equality holds when $\mathbb{E}[X|Y]$ is linear in $Y$ , which is the case for jointly Gaussian $(X,Y)$ .

Common Mistake: Conditional Variance Is Not Always Constant

Mistake:

Assuming $\text{Var}(X|Y)$ is a constant (not depending on $Y$ ). This is true for the Gaussian case but false in general.

Correction:

For non-Gaussian distributions, $\text{Var}(X|Y=y)$ can depend on $y$ . For example, if $X|Y=y \sim \text{Poisson}(y)$ and $Y > 0$ , then $\text{Var}(X|Y) = Y$ , which varies with $Y$ . The property that $\text{Var}(X|Y)$ is constant (homoscedasticity) is special to the Gaussian family and a few others.

Key Takeaway

The law of total variance $\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \text{Var}(\mathbb{E}[X|Y])$ is the fundamental tool for analyzing how much of the variability in $X$ can be attributed to $Y$ . It connects estimation theory (the MMSE is $\mathbb{E}[\text{Var}(X|Y)]$ ) to statistical analysis (the explained variance ratio $\eta^2$ ).

Quick Check

If $\mathbb{E}[X|Y] = c$ (constant), what does the law of total variance tell us?

$\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)]$

$\text{Var}(X) = 0$

$\text{Var}(X) = \text{Var}(\mathbb{E}[X|Y])$

$\text{Var}(X|Y) = \text{Var}(X)$ for all $Y$

Correction:

\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)]

If $\mathbb{E}[X|Y] = c$ , then $\text{Var}(\mathbb{E}[X|Y]) = \text{Var}(c) = 0$ . So $\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + 0$ . All the variance is "unexplained" — $Y$ cannot explain any of the variability in $X$ (at least not through the mean).

Connection to ANOVA

The law of total variance is the population-level version of the analysis of variance (ANOVA) decomposition in statistics. In ANOVA, the total sum of squares is split into "between-group" (explained) and "within-group" (unexplained) components. The law of total variance provides the theoretical justification for this decomposition.

Conditional Variance and the Law of Total Variance

Decomposing Uncertainty

Definition: Conditional Variance

Theorem: Law of Total Variance (Eve's Law)

Start from the definition

Apply the tower property to $X^2$

Use the tower property for $\mathbb{E}[X]$

Example: Total Variance for a Hierarchical Model

Compute $\mathbb{E}[X|\Lambda]$ and $\ntn{var}(X|\Lambda)$

Apply the law of total variance

Interpret

Example: Variance Decomposition in Fading Channels

Compute the conditional moments

Apply the decomposition

Interpret

Law of Total Variance: Decomposition Visualization

Parameters

Theorem: Law of Total Expectation (Review)

Already proved

Definition: Explained Variance Ratio

Common Mistake: Conditional Variance Is Not Always Constant

Key Takeaway

Quick Check

Connection to ANOVA

Definition:
Conditional Variance

Definition:
Explained Variance Ratio