Ferkans — Interactive Telecom Tutor

From Events to Random Variables

In Chapter 2 we defined $\mathbb{P}(A \mid B)$ for events. Now we extend this idea to random variables: given that $X$ takes a particular value $x$ , what is the distribution of $Y$ ? This is the conditional distribution, and it is the mathematical foundation for Bayesian inference, channel estimation, and signal detection.

Definition:
Conditional PMF (Discrete Case)

For discrete RVs $X$ and $Y$ with joint PMF $P_{X,Y}$ , the conditional PMF of $Y$ given $X = x_i$ is

$P_{Y|X}(y_j \mid x_i) = \frac{P_{X,Y}(x_i, y_j)}{P_{X}(x_i)},$

defined for all $x_i$ with $P_{X}(x_i) > 0$ .

For each fixed $x_i$ , the function $y_j \mapsto P_{Y|X}(y_j \mid x_i)$ is a valid PMF: it is non-negative and sums to 1 over $j$ . The conditional PMF is simply the $i$ -th row of the joint PMF table, normalized by the row sum $P_{X}(x_i)$ .

Definition:
Conditional PDF (Continuous Case)

For jointly continuous RVs $(X, Y)$ with joint PDF $f_{X,Y}$ , the conditional PDF of $Y$ given $X = x$ is

$f(y \mid x) = \frac{f_{X,Y}(x, y)}{f_{X}(x)},$

defined for all $x$ with $f_{X}(x) > 0$ .

The conditional CDF is

$F_{Y|X}(y \mid x) = \int_{-\infty}^{y} f(v \mid x)\,dv = \int_{-\infty}^{y} \frac{f_{X,Y}(x, v)}{f_{X}(x)}\,dv.$

The conditioning event $\{X = x\}$ has probability zero for continuous $X$ , so the definition is a limit: we condition on the thin strip $\{x < X \le x + dx\}$ and let $dx \to 0$ . The resulting object is well-defined as a Radon-Nikodym derivative.

Slicing the Joint Density

Intuitively, conditioning on $X = x$ amounts to "slicing" the joint density $f_{X,Y}(x,y)$ at a fixed $x$ value and then renormalizing so that the slice integrates to 1. The shape of $f(y \mid x)$ as a function of $y$ is proportional to $f_{X,Y}(x, y)$ , but the normalization constant $f_{X}(x)$ ensures it is a proper density.

Conditional PDF $f(y \mid x)$ as $x$ Varies

Use the slider to move the conditioning value $x$ and observe how the conditional density of $Y$ changes shape. The joint density is a bivariate Gaussian with adjustable correlation.

Parameters

Conditioning value

x

0

\rho

0.5

\sigma_X

1

\sigma_Y

1

Theorem: Bayes' Rule for Continuous Random Variables

For jointly continuous $(X, Y)$ with $f_{X}(x) > 0$ and $f_{Y}(y) > 0$ :

$f_{X|Y}(x \mid y) = \frac{f(y \mid x)\,f_{X}(x)}{f_{Y}(y)},$

where $f_{Y}(y) = \int_{-\infty}^{\infty} f(y \mid x)\,f_{X}(x)\,dx$ .

This is the continuous analogue of Bayes' theorem: the prior density $f_{X}(x)$ is updated to the posterior density $f_{X|Y}(x \mid y)$ via the likelihood $f(y \mid x)$ . The denominator $f_{Y}(y)$ serves as the normalizing constant.

Proof

Direct calculation

By definition of conditional PDF:

$f_{X|Y}(x \mid y) = \frac{f_{X,Y}(x,y)}{f_{Y}(y)} = \frac{f(y \mid x)\,f_{X}(x)}{f_{Y}(y)},$

where the second equality uses $f_{X,Y}(x,y) = f(y \mid x)\,f_{X}(x)$ .

Normalization

The denominator is the law of total probability in continuous form:

$f_{Y}(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx = \int_{-\infty}^{\infty} f(y \mid x)\,f_{X}(x)\,dx.$

This ensures $\int f_{X|Y}(x \mid y)\,dx = 1$ . $\blacksquare$

Definition:
Conditional Expectation

For jointly continuous $(X, Y)$ , the conditional expectation of $X$ given $Y = y$ is

$\mathbb{E}[X \mid Y = y] = \int_{-\infty}^{\infty} x\,f_{X|Y}(x \mid y)\,dx.$

Viewed as a function of $y$ , $g(y) = \mathbb{E}[X \mid Y = y]$ is a real-valued function. The random variable $g(Y) = \mathbb{E}[X \mid Y]$ is called the conditional expectation of $X$ given $Y$ .

Theorem: Law of Iterated Expectation (Tower Property)

For any random variables $X$ and $Y$ with $\mathbb{E}[|X|] < \infty$ :

$\mathbb{E}[X] = \mathbb{E}\bigl[\mathbb{E}[X \mid Y]\bigr].$

The tower property says: average the conditional averages, weighted by the distribution of what you conditioned on, and you recover the unconditional average. This identity is the workhorse behind performance analysis in wireless — whenever you want to average a rate or an error probability over a fading channel, you first compute the conditional quantity given the channel realization, then average over the channel distribution.

Proof

Continuous case

$\mathbb{E}\bigl[\mathbb{E}[X \mid Y]\bigr] = \int_{-\infty}^{\infty} \mathbb{E}[X \mid Y = y]\,f_{Y}(y)\,dy = \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} x\,f_{X|Y}(x \mid y)\,dx\right) f_{Y}(y)\,dy.$ $

Simplify

Using $f_{X|Y}(x \mid y)\,f_{Y}(y) = f_{X,Y}(x,y)$ , the double integral becomes

$\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} x\,f_{X,Y}(x,y)\,dx\,dy = \int_{-\infty}^{\infty} x\,f_{X}(x)\,dx = \mathbb{E}[X]. \quad\blacksquare$

Theorem: Law of Total Variance

For random variables $X$ and $Y$ with $\mathbb{E}[X^2] < \infty$ :

$\text{Var}(X) = \mathbb{E}\bigl[\text{Var}(X \mid Y)\bigr] + \text{Var}\bigl(\mathbb{E}[X \mid Y]\bigr).$

The total variance of $X$ decomposes into two parts: the average of the conditional variances (the "within-group" variability) plus the variance of the conditional means (the "between-group" variability). This identity is used extensively in Bayesian analysis: the posterior variance averages the conditional variance, and the remaining uncertainty comes from not knowing $Y$ .

Proof

Expand using tower property

Recall $\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ . Apply the tower property to both terms:

$\mathbb{E}[X^2] = \mathbb{E}\bigl[\mathbb{E}[X^2 \mid Y]\bigr],\qquad (\mathbb{E}[X])^2 = \bigl(\mathbb{E}[\mathbb{E}[X \mid Y]]\bigr)^2.$

Add and subtract

Let $\mu(Y) = \mathbb{E}[X \mid Y]$ . Then

$\text{Var}(X) = \mathbb{E}\bigl[\mathbb{E}[X^2 \mid Y] - \mu(Y)^2\bigr] + \mathbb{E}[\mu(Y)^2] - (\mathbb{E}[\mu(Y)])^2.$

The first term is $\mathbb{E}[\text{Var}(X \mid Y)]$ and the second is $\text{Var}(\mu(Y)) = \text{Var}(\mathbb{E}[X \mid Y])$ . $\blacksquare$

Example: Conditional Expectation on the Triangle

Let $(X, Y)$ be uniform on $\{(x,y) : 0 \le y \le x \le 1\}$ (so $f_{X,Y}(x,y) = 2$ on this region). Compute $\mathbb{E}[Y \mid X = x]$ and verify the tower property.

Solution

Conditional PDF

$f_{X}(x) = 2x$ (computed in Section 7.1). Therefore:

$f(y \mid x) = \frac{2}{2x} = \frac{1}{x}, \quad 0 \le y \le x.$

Given $X = x$ , $Y$ is uniform on $[0, x]$ .

Conditional expectation

$\mathbb{E}[Y \mid X = x] = \int_0^x y \cdot \frac{1}{x}\,dy = \frac{x}{2}.$ $

Verify tower property

$\mathbb{E}[\mathbb{E}[Y \mid X]] = \mathbb{E}\!\left[\frac{X}{2}\right] = \frac{1}{2}\int_0^1 x \cdot 2x\,dx = \frac{1}{2}\cdot\frac{2}{3} = \frac{1}{3}.$ $Direct computation:$ \mathbb{E}[Y] = \int_0^1 y \cdot 2(1-y),dy = 2\bigl(\frac{1}{2} - \frac{1}{3}\bigr) = \frac{1}{3}$. They agree.

Conditional expectation

The expected value of a random variable $X$ computed under the conditional distribution given $Y = y$ . The function $y \mapsto \mathbb{E}[X \mid Y = y]$ is itself a random variable when evaluated at $Y$ .

Why This Matters: Tower Property in Fading Channel Analysis

The tower property is the engine behind computing average performance metrics over fading channels. For instance, the average bit error rate of a modulation scheme over a Rayleigh fading channel is computed as $\mathbb{E}[P_e] = \mathbb{E}[\mathbb{E}[P_e \mid H]]$ : first compute the conditional BER given the channel gain $H = h$ (which is just the AWGN BER at SNR $|h|^2 \cdot \text{SNR}$ ), then average over the distribution of $|H|^2$ . This two-step approach is used throughout Books 1 and FSI.

Common Mistake: Conditioning on a Zero-Probability Event

Mistake:

Writing $\mathbb{P}(Y \le y \mid X = x) = \mathbb{P}(Y \le y, X = x) / \mathbb{P}(X = x)$ for continuous $X$ , which gives $0/0$ .

Correction:

For continuous $X$ , $\mathbb{P}(X = x) = 0$ , so the ratio is undefined. The conditional CDF is instead defined as the limit of $\mathbb{P}(Y \le y \mid x < X \le x + dx)$ as $dx \to 0$ , which yields the formula involving the conditional PDF.

Quick Check

Let $X \sim \text{Exp}(1)$ and $Y \mid X = x \sim \text{Uniform}[0, x]$ . What is $\mathbb{E}[Y]$ ?

$1/2$

$1$

$1/4$

$2$

Correction:

1/2

$\mathbb{E}[Y \mid X] = X/2$ , so $\mathbb{E}[Y] = \mathbb{E}[X/2] = 1/2$ .

🔧Engineering Note

Conditional Expectation as Optimal Estimator

The conditional expectation $\mathbb{E}[X \mid Y = y]$ is the minimum mean square error (MMSE) estimator of $X$ given $Y = y$ . That is, among all functions $g(Y)$ , the choice $g(y) = \mathbb{E}[X \mid Y = y]$ minimizes $\mathbb{E}[(X - g(Y))^2]$ . This result, proved in Book FSI, is the theoretical foundation for Bayesian estimation, LMMSE filtering, and Kalman filtering.

Conditional Distributions

From Events to Random Variables

Definition: Conditional PMF (Discrete Case)

Definition: Conditional PDF (Continuous Case)

Slicing the Joint Density

Conditional PDF f(y∣x)f(y \mid x)f(y∣x) as xxx Varies

Parameters

Theorem: Bayes' Rule for Continuous Random Variables

Direct calculation

Normalization

Definition: Conditional Expectation

Theorem: Law of Iterated Expectation (Tower Property)

Continuous case

Simplify

Theorem: Law of Total Variance

Expand using tower property

Add and subtract

Example: Conditional Expectation on the Triangle

Conditional PDF

Conditional expectation

Verify tower property

Conditional expectation

Why This Matters: Tower Property in Fading Channel Analysis

Common Mistake: Conditioning on a Zero-Probability Event

Quick Check

Conditional Expectation as Optimal Estimator

Definition:
Conditional PMF (Discrete Case)

Definition:
Conditional PDF (Continuous Case)

Conditional PDF $f(y \mid x)$ as $x$ Varies

Definition:
Conditional Expectation