Conditional Expectation: The Deeper View
Why Conditional Expectation Deserves Its Own Chapter
In Chapter 4, we computed conditional expectations — plugging in a specific observed value and obtaining a number. That perspective is useful for computation but misses the deeper structure.
The key shift in this chapter: we treat as a random variable — a function of the random variable , not of a particular value . This shift unlocks the tower property, the orthogonality principle, and the entire theory of optimal estimation.
The payoff is immediate: turns out to be the best predictor of given in the mean square sense — and understanding why requires thinking of it as a random variable.
Definition: Conditional Expectation as a Random Variable
Conditional Expectation as a Random Variable
Let and be random variables with joint density . The conditional expectation of given , denoted , is the random variable defined by
The function maps each possible value of to the conditional mean of given that value. Since is random, is random.
The distinction matters: is a number (for each fixed ), while is a random variable (a function of the random ). The former is a function of ; the latter is a function of through .
Example: Conditional Expectation for Exponential-Gamma Pair
Let and . Find as a random variable.
Compute the conditional mean
For fixed , if , then .
Express as a random variable
Therefore . This is a random variable because is random. Its distribution is determined by the distribution of .
Verify via tower property
. For with , .
Theorem: Tower Property (Law of Iterated Expectations)
For any random variables and with :
More generally, if is a function of (i.e., ), then
Averaging over after conditioning on recovers the unconditional average. Refining information (conditioning on more) and then coarsening (averaging out the extra) brings you back to the coarser conditioning.
Prove the basic form
$
Intuition for the general form
The general form says: if carries more information than (because ), then conditioning on and averaging over the "extra" information in beyond recovers the conditional expectation given alone. The formal proof uses the defining property of conditional expectation (as a Radon-Nikodym derivative) and the tower property of -algebras.
Theorem: Properties of Conditional Expectation
Let , , be random variables with finite expectations. Then:
-
Linearity: for constants .
-
Pulling out what is known: If is a function of , then .
-
Independence: If and are independent, then .
-
Tower property: .
-
Conditional Jensen: If is convex, then .
Properties 1-2 say that conditional expectation behaves like an "expectation operator" where plays the role of a constant. Property 3 says that if tells you nothing about , conditioning on does not improve your estimate. Property 4 is the tower property. Property 5 extends Jensen's inequality to the conditional setting.
Proof of linearity
By definition, .
Proof of pulling out known
. Since does not depend on the integration variable , it factors out.
Proof of independence case
If , then , so for all . Hence (a constant).
Quick Check
If (a constant) for all values of , what can we conclude?
and are independent
is a constant
By the tower property: .
Example: Conditional Expectation for Jointly Gaussian
Let be jointly Gaussian with means , variances , and correlation coefficient . Find .
Recall the conditional distribution
For jointly Gaussian random variables, the conditional distribution is Gaussian with mean
and variance .
Express as a random variable
Therefore
This is a linear function of . This is a special property of the Gaussian distribution — for non-Gaussian pairs, is generally a nonlinear function of .
Check the tower property
. Checks out.
Key Takeaway
For jointly Gaussian random variables, is a linear function of . This is the only distribution family with this property, and it is the reason why Gaussian models are so tractable in estimation theory.
Common Mistake: Is Not a Number
Mistake:
Writing "" and treating it as a fixed quantity.
Correction:
is a random variable. It takes different values for different realizations of . The statement "" means that the function happens to equal 3 for all — which implies by the tower property. In most cases, varies with .
Definition: Conditional Expectation for Random Vectors
Conditional Expectation for Random Vectors
For random vectors and , the conditional expectation is the random vector whose -th component is :
All the properties (linearity, tower, pulling out known, independence) extend component-wise.
Conditional Density and for Jointly Gaussian
Visualize the joint Gaussian density, a slice at , and the conditional mean as varies. The red line traces the conditional mean across all values.
Parameters
Correlation coefficient
Conditioning value of $Y$
Historical Note: Kolmogorov and the Measure-Theoretic Foundation
1933The rigorous definition of conditional expectation as a random variable was established by Andrey Kolmogorov in his 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung. Before Kolmogorov, conditional expectation was defined only for discrete random variables or via Bayes' rule when densities exist. Kolmogorov's approach — defining as a Radon-Nikodym derivative — extended the concept to arbitrary -algebras, laying the foundation for martingale theory and modern stochastic processes.
Conditional Expectation
The random variable where . It is the best predictor of given in the mean square error sense.
Related: Minimum Mean Square Error (MMSE) Estimator, Tower Property
Tower Property
The identity , also called the law of iterated expectations or the smoothing property. Averaging the conditional expectation over recovers the unconditional expectation.
Related: Conditional Expectation