Joint Typicality

The Key Question: Are These Sequences Related?

In channel coding, the decoder receives y\mathbf{y} and must decide which codeword x\mathbf{x} was sent. The natural test is: are x\mathbf{x} and y\mathbf{y} "jointly typical" — do they look like they came from the joint distribution PXYP_{XY}? If the empirical joint distribution of the pair (x,y)(\mathbf{x}, \mathbf{y}) is close to PXYP_{XY}, we declare them related. If not, the codeword probably was not the one sent. This simple test — joint typicality decoding — is the basis of all random coding achievability proofs.

Definition:

Jointly Typical Set

The jointly typical set Tϵ(n)(X,Y)\mathcal{T}_\epsilon^{(n)}(X,Y) is:

Tϵ(n)(X,Y)={(x,y)Xn×Yn:P^x,y(a,b)PXY(a,b)ϵPXY(a,b) for all (a,b)}.\mathcal{T}_\epsilon^{(n)}(X,Y) = \left\{(\mathbf{x}, \mathbf{y}) \in \mathcal{X}^n \times \mathcal{Y}^n : |\hat{P}_{\mathbf{x},\mathbf{y}}(a,b) - P_{XY}(a,b)| \leq \epsilon \cdot P_{XY}(a,b) \text{ for all } (a,b)\right\}.

The conditional typical set is: given a fixed x\mathbf{x}, Tϵ(n)(Yx)={y:(x,y)Tϵ(n)(X,Y)}\mathcal{T}_\epsilon^{(n)}(Y|x) = \{\mathbf{y} : (\mathbf{x}, \mathbf{y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)\}. Its size is approximately 2nH(YX)2^{nH(Y|X)}.

Theorem: Joint Typicality Lemma (JTL)

Let (Xn,Yn)(X^n, Y^n) be drawn i.i.d. PXY\sim P_{XY}. Let X~n\tilde{X}^n be drawn independently i.i.d. PX\sim P_X (independent of YnY^n). Then:

  1. Pr((X,Y)Tϵ(n)(X,Y))1\Pr((\mathbf{X}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \to 1 as nn \to \infty.

  2. Pr((X~,Y)Tϵ(n)(X,Y))2nI(X;Y)\Pr((\tilde{\mathbf{X}}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \doteq 2^{-nI(X;Y)}.

More precisely, for the second statement and large nn:

Pr((X~,Y)Tϵ(n)(X,Y))2n(I(X;Y)δ(ϵ))\Pr((\tilde{\mathbf{X}}, \mathbf{Y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)) \leq 2^{-n(I(X;Y) - \delta(\epsilon))}

where δ(ϵ)0\delta(\epsilon) \to 0 as ϵ0\epsilon \to 0.

Part 1 says a genuine pair is jointly typical with high probability. Part 2 says a random, independent pair is jointly typical with exponentially small probability — and the exponent is the mutual information. This is the crux of channel coding: if the code rate is less than I(X;Y)I(X;Y), the probability that an incorrect codeword "looks jointly typical" with the output is exponentially small, and a union bound over all incorrect codewords works.

Example: Joint Typicality for the BSC

Consider a BSC with crossover ϵ=0.1\epsilon = 0.1 and uniform input. For n=100n = 100, estimate the probability that a random independent codeword is jointly typical with the channel output.

Joint Typicality: True vs Independent Pairs

For a given joint distribution PXYP_{XY}, compare the probability that a true pair (Xn,Yn)PXYn(X^n, Y^n) \sim P_{XY}^n is jointly typical (converges to 1) versus the probability that an independent pair (X~n,Yn)PXn×PYn(\tilde{X}^n, Y^n) \sim P_X^n \times P_Y^n is jointly typical (decays as 2nI(X;Y)2^{-nI(X;Y)}).

Parameters
0.5

Mutual information

100

Maximum n to plot

Key Takeaway

The Joint Typicality Lemma is the engine of channel coding. True input-output pairs are jointly typical with probability 1\to 1. Random independent pairs are jointly typical with probability 2nI(X;Y)\doteq 2^{-nI(X;Y)}. This exponential gap is what makes reliable communication possible at rates below I(X;Y)I(X;Y).

Jointly typical sequences

A pair (x,y)(\mathbf{x}, \mathbf{y}) is jointly typical if its empirical joint distribution is close to PXYP_{XY}. The probability that an independent pair appears jointly typical decays as 2nI(X;Y)2^{-nI(X;Y)}.

Related: Strong typicality, Packing lemma