Conditional Distributions
From Events to Random Variables
In Chapter 2 we defined for events. Now we extend this idea to random variables: given that takes a particular value , what is the distribution of ? This is the conditional distribution, and it is the mathematical foundation for Bayesian inference, channel estimation, and signal detection.
Definition: Conditional PMF (Discrete Case)
Conditional PMF (Discrete Case)
For discrete RVs and with joint PMF , the conditional PMF of given is
defined for all with .
For each fixed , the function is a valid PMF: it is non-negative and sums to 1 over . The conditional PMF is simply the -th row of the joint PMF table, normalized by the row sum .
Definition: Conditional PDF (Continuous Case)
Conditional PDF (Continuous Case)
For jointly continuous RVs with joint PDF , the conditional PDF of given is
defined for all with .
The conditional CDF is
The conditioning event has probability zero for continuous , so the definition is a limit: we condition on the thin strip and let . The resulting object is well-defined as a Radon-Nikodym derivative.
Slicing the Joint Density
Intuitively, conditioning on amounts to "slicing" the joint density at a fixed value and then renormalizing so that the slice integrates to 1. The shape of as a function of is proportional to , but the normalization constant ensures it is a proper density.
Conditional PDF as Varies
Use the slider to move the conditioning value and observe how the conditional density of changes shape. The joint density is a bivariate Gaussian with adjustable correlation.
Parameters
Theorem: Bayes' Rule for Continuous Random Variables
For jointly continuous with and :
where .
This is the continuous analogue of Bayes' theorem: the prior density is updated to the posterior density via the likelihood . The denominator serves as the normalizing constant.
Direct calculation
By definition of conditional PDF:
where the second equality uses .
Normalization
The denominator is the law of total probability in continuous form:
This ensures .
Definition: Conditional Expectation
Conditional Expectation
For jointly continuous , the conditional expectation of given is
Viewed as a function of , is a real-valued function. The random variable is called the conditional expectation of given .
Theorem: Law of Iterated Expectation (Tower Property)
For any random variables and with :
The tower property says: average the conditional averages, weighted by the distribution of what you conditioned on, and you recover the unconditional average. This identity is the workhorse behind performance analysis in wireless β whenever you want to average a rate or an error probability over a fading channel, you first compute the conditional quantity given the channel realization, then average over the channel distribution.
Continuous case
$
Simplify
Using , the double integral becomes
Theorem: Law of Total Variance
For random variables and with :
The total variance of decomposes into two parts: the average of the conditional variances (the "within-group" variability) plus the variance of the conditional means (the "between-group" variability). This identity is used extensively in Bayesian analysis: the posterior variance averages the conditional variance, and the remaining uncertainty comes from not knowing .
Expand using tower property
Recall . Apply the tower property to both terms:
Add and subtract
Let . Then
The first term is and the second is .
Example: Conditional Expectation on the Triangle
Let be uniform on (so on this region). Compute and verify the tower property.
Conditional PDF
(computed in Section 7.1). Therefore:
Given , is uniform on .
Conditional expectation
$
Verify tower property
\mathbb{E}[Y] = \int_0^1 y \cdot 2(1-y),dy = 2\bigl(\frac{1}{2} - \frac{1}{3}\bigr) = \frac{1}{3}$. They agree.
Conditional expectation
The expected value of a random variable computed under the conditional distribution given . The function is itself a random variable when evaluated at .
Related: Joint probability density function
Why This Matters: Tower Property in Fading Channel Analysis
The tower property is the engine behind computing average performance metrics over fading channels. For instance, the average bit error rate of a modulation scheme over a Rayleigh fading channel is computed as : first compute the conditional BER given the channel gain (which is just the AWGN BER at SNR ), then average over the distribution of . This two-step approach is used throughout Books 1 and FSI.
Common Mistake: Conditioning on a Zero-Probability Event
Mistake:
Writing for continuous , which gives .
Correction:
For continuous , , so the ratio is undefined. The conditional CDF is instead defined as the limit of as , which yields the formula involving the conditional PDF.
Quick Check
Let and . What is ?
, so .
Conditional Expectation as Optimal Estimator
The conditional expectation is the minimum mean square error (MMSE) estimator of given . That is, among all functions , the choice minimizes . This result, proved in Book FSI, is the theoretical foundation for Bayesian estimation, LMMSE filtering, and Kalman filtering.