Joint Typicality
The Key Question: Are These Sequences Related?
In channel coding, the decoder receives and must decide which codeword was sent. The natural test is: are and "jointly typical" — do they look like they came from the joint distribution ? If the empirical joint distribution of the pair is close to , we declare them related. If not, the codeword probably was not the one sent. This simple test — joint typicality decoding — is the basis of all random coding achievability proofs.
Definition: Jointly Typical Set
Jointly Typical Set
The jointly typical set is:
The conditional typical set is: given a fixed , . Its size is approximately .
Theorem: Joint Typicality Lemma (JTL)
Let be drawn i.i.d. . Let be drawn independently i.i.d. (independent of ). Then:
-
as .
-
.
More precisely, for the second statement and large :
where as .
Part 1 says a genuine pair is jointly typical with high probability. Part 2 says a random, independent pair is jointly typical with exponentially small probability — and the exponent is the mutual information. This is the crux of channel coding: if the code rate is less than , the probability that an incorrect codeword "looks jointly typical" with the output is exponentially small, and a union bound over all incorrect codewords works.
Part 1
Direct from the definition: the law of large numbers for the empirical joint distribution. The joint empirical distribution in probability for each .
Part 2: probability for independent pair
For independent and :
For :
(jointly typical)
The ratio gives the probability: .
Example: Joint Typicality for the BSC
Consider a BSC with crossover and uniform input. For , estimate the probability that a random independent codeword is jointly typical with the channel output.
Mutual information
bits.
Probability
.
This is fantastically small. If the codebook has codewords with , the union bound over incorrect codewords gives error probability .
Joint Typicality: True vs Independent Pairs
For a given joint distribution , compare the probability that a true pair is jointly typical (converges to 1) versus the probability that an independent pair is jointly typical (decays as ).
Parameters
Mutual information
Maximum n to plot
Key Takeaway
The Joint Typicality Lemma is the engine of channel coding. True input-output pairs are jointly typical with probability . Random independent pairs are jointly typical with probability . This exponential gap is what makes reliable communication possible at rates below .
Jointly typical sequences
A pair is jointly typical if its empirical joint distribution is close to . The probability that an independent pair appears jointly typical decays as .
Related: Strong typicality, Packing lemma