Ferkans — Interactive Telecom Tutor

Why Conditional Expectation Deserves Its Own Chapter

In Chapter 4, we computed conditional expectations $\mathbb{E}[X|Y=y]$ — plugging in a specific observed value $y$ and obtaining a number. That perspective is useful for computation but misses the deeper structure.

The key shift in this chapter: we treat $\mathbb{E}[X|Y]$ as a random variable — a function of the random variable $Y$ , not of a particular value $y$ . This shift unlocks the tower property, the orthogonality principle, and the entire theory of optimal estimation.

The payoff is immediate: $\mathbb{E}[X|Y]$ turns out to be the best predictor of $X$ given $Y$ in the mean square sense — and understanding why requires thinking of it as a random variable.

Definition:
Conditional Expectation as a Random Variable

Let $X$ and $Y$ be random variables with joint density $f_{X,Y}(x,y)$ . The conditional expectation of $X$ given $Y$ , denoted $\mathbb{E}[X|Y]$ , is the random variable defined by

$\mathbb{E}[X|Y] = g(Y), \quad \text{where} \quad g(y) = \mathbb{E}[X|Y=y] = \int_{-\infty}^{\infty} x \, f(x|y) \, dx.$

The function $g: \mathbb{R} \to \mathbb{R}$ maps each possible value of $Y$ to the conditional mean of $X$ given that value. Since $Y$ is random, $g(Y)$ is random.

The distinction matters: $\mathbb{E}[X|Y=y]$ is a number (for each fixed $y$ ), while $\mathbb{E}[X|Y]$ is a random variable (a function of the random $Y$ ). The former is a function of $y$ ; the latter is a function of $\omega$ through $Y(\omega)$ .

Example: Conditional Expectation for Exponential-Gamma Pair

Let $Y \sim \text{Gamma}(\alpha, \beta)$ and $X | Y=y \sim \text{Exp}(y)$ . Find $\mathbb{E}[X|Y]$ as a random variable.

Solution

Compute the conditional mean

For fixed $y > 0$ , if $X|Y=y \sim \text{Exp}(y)$ , then $\mathbb{E}[X|Y=y] = 1/y$ .

Express as a random variable

Therefore $\mathbb{E}[X|Y] = g(Y) = 1/Y$ . This is a random variable because $Y$ is random. Its distribution is determined by the distribution of $Y$ .

Verify via tower property

$\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[1/Y]$ . For $Y \sim \text{Gamma}(\alpha, \beta)$ with $\alpha > 1$ , $\mathbb{E}[1/Y] = \beta/(\alpha - 1)$ .

Theorem: Tower Property (Law of Iterated Expectations)

For any random variables $X$ and $Y$ with $\mathbb{E}[|X|] < \infty$ :

$\mathbb{E}\bigl[\mathbb{E}[X|Y]\bigr] = \mathbb{E}[X].$

More generally, if $Y$ is a function of $Z$ (i.e., $Y = h(Z)$ ), then

$\mathbb{E}\bigl[\mathbb{E}[X|Z] \,\big|\, Y\bigr] = \mathbb{E}[X|Y].$

Averaging over $Y$ after conditioning on $Y$ recovers the unconditional average. Refining information (conditioning on more) and then coarsening (averaging out the extra) brings you back to the coarser conditioning.

Proof

Prove the basic form

$\mathbb{E}\bigl[\mathbb{E}[X|Y]\bigr] = \mathbb{E}[g(Y)] = \int g(y) f_{Y}(y)\,dy = \int \left(\int x \, f(x|y)\,dx\right) f_{Y}(y)\,dy.KATEXPLACEHOLDER0END= \int x \left(\int f(x|y) f_{Y}(y)\,dy\right) dx = \int x \, f_{X}(x)\,dx = \mathbb{E}[X].$ $

Intuition for the general form

The general form $\mathbb{E}[\mathbb{E}[X|Z] | Y] = \mathbb{E}[X|Y]$ says: if $Z$ carries more information than $Y$ (because $Y = h(Z)$ ), then conditioning on $Z$ and averaging over the "extra" information in $Z$ beyond $Y$ recovers the conditional expectation given $Y$ alone. The formal proof uses the defining property of conditional expectation (as a Radon-Nikodym derivative) and the tower property of $\sigma$ -algebras. $\blacksquare$

,

Theorem: Properties of Conditional Expectation

Let $X$ , $Y$ , $Z$ be random variables with finite expectations. Then:

Linearity: $\mathbb{E}[\alpha X + \beta Z | Y] = \alpha\,\mathbb{E}[X|Y] + \beta\,\mathbb{E}[Z|Y]$ for constants $\alpha, \beta$ .
Pulling out what is known: If $h(Y)$ is a function of $Y$ , then $\mathbb{E}[h(Y) \cdot X | Y] = h(Y) \cdot \mathbb{E}[X|Y]$ .
Independence: If $X$ and $Y$ are independent, then $\mathbb{E}[X|Y] = \mathbb{E}[X]$ .
Tower property: $\mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[X]$ .
Conditional Jensen: If $\varphi$ is convex, then $\varphi(\mathbb{E}[X|Y]) \leq \mathbb{E}[\varphi(X)|Y]$ .

Properties 1-2 say that conditional expectation behaves like an "expectation operator" where $Y$ plays the role of a constant. Property 3 says that if $Y$ tells you nothing about $X$ , conditioning on $Y$ does not improve your estimate. Property 4 is the tower property. Property 5 extends Jensen's inequality to the conditional setting.

Proof

Proof of linearity

By definition, $\mathbb{E}[\alpha X + \beta Z | Y=y] = \int (\alpha x + \beta z) f_{X,Z|Y}(x,z|y)\,dx\,dz = \alpha \int x \, f_{X|Y}(x|y)\,dx + \beta \int z \, f_{Z|Y}(z|y)\,dz = \alpha\,\mathbb{E}[X|Y=y] + \beta\,\mathbb{E}[Z|Y=y]$ .

Proof of pulling out known

$\mathbb{E}[h(Y) X | Y = y] = \int h(y) \cdot x \, f_{X|Y}(x|y)\,dx = h(y) \int x \, f_{X|Y}(x|y)\,dx = h(y)\,\mathbb{E}[X|Y=y]$ . Since $h(y)$ does not depend on the integration variable $x$ , it factors out.

Proof of independence case

If $X \perp Y$ , then $f_{X|Y}(x|y) = f_{X}(x)$ , so $\mathbb{E}[X|Y=y] = \int x f_{X}(x)\,dx = \mathbb{E}[X]$ for all $y$ . Hence $\mathbb{E}[X|Y] = \mathbb{E}[X]$ (a constant). $\blacksquare$

Quick Check

If $\mathbb{E}[X|Y] = c$ (a constant) for all values of $Y$ , what can we conclude?

$X$ and $Y$ are independent

$c = \mathbb{E}[X]$

$X$ is a constant

$\text{Var}(X|Y) = 0$

Correction:

c = \mathbb{E}[X]

By the tower property: $\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[c] = c$ .

Example: Conditional Expectation for Jointly Gaussian $(X,Y)$

Let $(X,Y)$ be jointly Gaussian with means $\mu_X, \mu_Y$ , variances $\sigma_X^2, \sigma_Y^2$ , and correlation coefficient $\rho$ . Find $\mathbb{E}[X|Y]$ .

Solution

Recall the conditional distribution

For jointly Gaussian random variables, the conditional distribution $X | Y = y$ is Gaussian with mean

$\mathbb{E}[X|Y=y] = \mu_X + \rho \frac{\sigma_X}{\sigma_Y}(y - \mu_Y)$

and variance $\sigma_X^2(1 - \rho^2)$ .

Express as a random variable

Therefore

$\mathbb{E}[X|Y] = \mu_X + \rho \frac{\sigma_X}{\sigma_Y}(Y - \mu_Y).$

This is a linear function of $Y$ . This is a special property of the Gaussian distribution — for non-Gaussian pairs, $\mathbb{E}[X|Y]$ is generally a nonlinear function of $Y$ .

Check the tower property

$\mathbb{E}[\mathbb{E}[X|Y]] = \mu_X + \rho \frac{\sigma_X}{\sigma_Y}(\mathbb{E}[Y] - \mu_Y) = \mu_X + 0 = \mu_X = \mathbb{E}[X]$ . Checks out.

Key Takeaway

For jointly Gaussian random variables, $\mathbb{E}[X|Y]$ is a linear function of $Y$ . This is the only distribution family with this property, and it is the reason why Gaussian models are so tractable in estimation theory.

Common Mistake: $\mathbb{E}[X|Y]$ Is Not a Number

Mistake:

Writing " $\mathbb{E}[X|Y] = 3$ " and treating it as a fixed quantity.

Correction:

$\mathbb{E}[X|Y]$ is a random variable. It takes different values for different realizations of $Y$ . The statement " $\mathbb{E}[X|Y] = 3$ " means that the function $g(y) = \mathbb{E}[X|Y=y]$ happens to equal 3 for all $y$ — which implies $\mathbb{E}[X] = 3$ by the tower property. In most cases, $\mathbb{E}[X|Y]$ varies with $Y$ .

Definition:
Conditional Expectation for Random Vectors

For random vectors $\mathbf{X} \in \mathbb{R}^n$ and $\mathbf{Y} \in \mathbb{R}^m$ , the conditional expectation $\mathbb{E}[\mathbf{X}|\mathbf{Y}]$ is the random vector whose $i$ -th component is $\mathbb{E}[X_i|\mathbf{Y}]$ :

$\mathbb{E}[\mathbf{X}|\mathbf{Y}] = \begin{pmatrix} \mathbb{E}[X_1|\mathbf{Y}] \\ \vdots \\ \mathbb{E}[X_n|\mathbf{Y}] \end{pmatrix}.$

All the properties (linearity, tower, pulling out known, independence) extend component-wise.

Conditional Density $f(x|y)$ and $\mathbb{E}[X|Y=y]$ for Jointly Gaussian $(X,Y)$

Visualize the joint Gaussian density, a slice at $Y=y$ , and the conditional mean $\mathbb{E}[X|Y=y]$ as $y$ varies. The red line traces the conditional mean across all $y$ values.

Parameters

\rho

0.7

Correlation coefficient

y

1

Conditioning value of $Y$

Historical Note: Kolmogorov and the Measure-Theoretic Foundation

1933

The rigorous definition of conditional expectation as a random variable was established by Andrey Kolmogorov in his 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung. Before Kolmogorov, conditional expectation was defined only for discrete random variables or via Bayes' rule when densities exist. Kolmogorov's approach — defining $\mathbb{E}[X|\mathcal{F}]$ as a Radon-Nikodym derivative — extended the concept to arbitrary $\sigma$ -algebras, laying the foundation for martingale theory and modern stochastic processes.

Conditional Expectation

The random variable $\mathbb{E}[X|Y] = g(Y)$ where $g(y) = \mathbb{E}[X|Y=y]$ . It is the best predictor of $X$ given $Y$ in the mean square error sense.

Tower Property

The identity $\mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[X]$ , also called the law of iterated expectations or the smoothing property. Averaging the conditional expectation over $Y$ recovers the unconditional expectation.

Related: Conditional Expectation

Conditional Expectation: The Deeper View

Why Conditional Expectation Deserves Its Own Chapter

Definition: Conditional Expectation as a Random Variable

Example: Conditional Expectation for Exponential-Gamma Pair

Compute the conditional mean

Express as a random variable

Verify via tower property

Theorem: Tower Property (Law of Iterated Expectations)

Prove the basic form

Intuition for the general form

Theorem: Properties of Conditional Expectation

Proof of linearity

Proof of pulling out known

Proof of independence case

Quick Check

Example: Conditional Expectation for Jointly Gaussian (X,Y)(X,Y)(X,Y)

Recall the conditional distribution

Express as a random variable

Check the tower property

Key Takeaway

Common Mistake: E[X∣Y]\mathbb{E}[X|Y]E[X∣Y] Is Not a Number

Definition: Conditional Expectation for Random Vectors

Conditional Density f(x∣y)f(x|y)f(x∣y) and E[X∣Y=y]\mathbb{E}[X|Y=y]E[X∣Y=y] for Jointly Gaussian (X,Y)(X,Y)(X,Y)

Parameters

Historical Note: Kolmogorov and the Measure-Theoretic Foundation

Conditional Expectation

Tower Property

Definition:
Conditional Expectation as a Random Variable

Example: Conditional Expectation for Jointly Gaussian $(X,Y)$

Common Mistake: $\mathbb{E}[X|Y]$ Is Not a Number

Definition:
Conditional Expectation for Random Vectors

Conditional Density $f(x|y)$ and $\mathbb{E}[X|Y=y]$ for Jointly Gaussian $(X,Y)$