Ferkans — Interactive Telecom Tutor

The Key Question: Are These Sequences Related?

In channel coding, the decoder receives $\mathbf{y}$ and must decide which codeword $\mathbf{x}$ was sent. The natural test is: are $\mathbf{x}$ and $\mathbf{y}$ "jointly typical" — do they look like they came from the joint distribution $P_{XY}$ ? If the empirical joint distribution of the pair $(\mathbf{x}, \mathbf{y})$ is close to $P_{XY}$ , we declare them related. If not, the codeword probably was not the one sent. This simple test — joint typicality decoding — is the basis of all random coding achievability proofs.

Definition:
Jointly Typical Set

The jointly typical set $\mathcal{T}_\epsilon^{(n)}(X,Y)$ is:

$\mathcal{T}_\epsilon^{(n)}(X,Y) = \left\{(\mathbf{x}, \mathbf{y}) \in \mathcal{X}^n \times \mathcal{Y}^n : |\hat{P}_{\mathbf{x},\mathbf{y}}(a,b) - P_{XY}(a,b)| \leq \epsilon \cdot P_{XY}(a,b) \text{ for all } (a,b)\right\}.$

The conditional typical set is: given a fixed $\mathbf{x}$ , $\mathcal{T}_\epsilon^{(n)}(Y|x) = \{\mathbf{y} : (\mathbf{x}, \mathbf{y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)\}$ . Its size is approximately $2^{nH(Y|X)}$ .

Theorem: Joint Typicality Lemma (JTL)

Let $(X^n, Y^n)$ be drawn i.i.d. $\sim P_{XY}$ . Let $\tilde{X}^n$ be drawn independently i.i.d. $\sim P_X$ (independent of $Y^n$ ). Then:

$\Pr((\mathbf{X}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \to 1$ as $n \to \infty$ .
$\Pr((\tilde{\mathbf{X}}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \doteq 2^{-nI(X;Y)}$ .

More precisely, for the second statement and large $n$ :

$\Pr((\tilde{\mathbf{X}}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \leq 2^{-n(I(X;Y) - \delta(\epsilon))}$

where $\delta(\epsilon) \to 0$ as $\epsilon \to 0$ .

Part 1 says a genuine pair is jointly typical with high probability. Part 2 says a random, independent pair is jointly typical with exponentially small probability — and the exponent is the mutual information. This is the crux of channel coding: if the code rate is less than $I(X;Y)$ , the probability that an incorrect codeword "looks jointly typical" with the output is exponentially small, and a union bound over all incorrect codewords works.

Proof

Part 1

Direct from the definition: the law of large numbers for the empirical joint distribution. The joint empirical distribution $\hat{P}_{\mathbf{X},\mathbf{Y}}(a,b) \to P_{XY}(a,b)$ in probability for each $(a,b)$ .

Part 2: probability for independent pair

For independent $\tilde{X}^n \sim P_X^n$ and $Y^n \sim P_Y^n$ :

$P(\tilde{\mathbf{x}}, \mathbf{y}) = P_X^n(\tilde{\mathbf{x}}) \cdot P_Y^n(\mathbf{y})$

For $(\tilde{\mathbf{x}}, \mathbf{y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)$ :

$P_{XY}^n(\tilde{\mathbf{x}}, \mathbf{y}) \doteq 2^{-nH(X,Y)}$ (jointly typical)

$P_X^n(\tilde{\mathbf{x}}) \cdot P_Y^n(\mathbf{y}) \doteq 2^{-nH(X)} \cdot 2^{-nH(Y)}$

The ratio gives the probability: $\Pr(\text{jointly typical}) \doteq 2^{-n[H(X) + H(Y) - H(X,Y)]} = 2^{-nI(X;Y)}$ .

Example: Joint Typicality for the BSC

Consider a BSC with crossover $\epsilon = 0.1$ and uniform input. For $n = 100$ , estimate the probability that a random independent codeword is jointly typical with the channel output.

Solution

Mutual information

$I(X;Y) = 1 - h_b(0.1) \approx 1 - 0.469 = 0.531$ bits.

Probability

$\Pr(\text{jointly typical, independent}) \approx 2^{-100 \times 0.531} = 2^{-53.1} \approx 10^{-16}$ .

This is fantastically small. If the codebook has $2^{nR}$ codewords with $R < 0.531$ , the union bound over incorrect codewords gives error probability $\leq 2^{nR} \cdot 2^{-n \times 0.531} = 2^{-n(0.531 - R)} \to 0$ .

Joint Typicality: True vs Independent Pairs

For a given joint distribution $P_{XY}$ , compare the probability that a true pair $(X^n, Y^n) \sim P_{XY}^n$ is jointly typical (converges to 1) versus the probability that an independent pair $(\tilde{X}^n, Y^n) \sim P_X^n \times P_Y^n$ is jointly typical (decays as $2^{-nI(X;Y)}$ ).

Parameters

I(X;Y) bits0.5

Mutual information

Max sequence length100

Maximum n to plot

Key Takeaway

The Joint Typicality Lemma is the engine of channel coding. True input-output pairs are jointly typical with probability $\to 1$ . Random independent pairs are jointly typical with probability $\doteq 2^{-nI(X;Y)}$ . This exponential gap is what makes reliable communication possible at rates below $I(X;Y)$ .

Jointly typical sequences

A pair $(\mathbf{x}, \mathbf{y})$ is jointly typical if its empirical joint distribution is close to $P_{XY}$ . The probability that an independent pair appears jointly typical decays as $2^{-nI(X;Y)}$ .

Joint Typicality

The Key Question: Are These Sequences Related?

Definition: Jointly Typical Set

Theorem: Joint Typicality Lemma (JTL)

Part 1

Part 2: probability for independent pair

Example: Joint Typicality for the BSC

Mutual information

Probability

Joint Typicality: True vs Independent Pairs

Parameters

Key Takeaway

Jointly typical sequences

Definition:
Jointly Typical Set