Ferkans — Interactive Telecom Tutor

The Random Coding Argument

We now prove that any rate $R < C$ is achievable. The proof is one of the most elegant arguments in mathematics: we show that a randomly generated codebook, with high probability, achieves vanishing error probability. No clever construction is needed — randomness itself provides enough structure.

The argument has three steps: (1) generate the codebook randomly, (2) decode by joint typicality, (3) show that the error probability vanishes for rates below $I(X; Y)$ . The packing lemma from Chapter 3 does the heavy lifting in step 3.

Random Coding and the Packing Argument

A visual walk through the random coding proof: codewords are scattered in sequence space, the decoder draws a "typical shell" around the received output, and we check whether any wrong codeword falls within the shell. For rates below capacity, the shell contains only the correct codeword.

Theorem: Channel Coding Theorem — Achievability

For any $R < C = \max_{P_X} I(X; Y)$ and any $\epsilon > 0$ , there exists an $(R, n)$ -code $\mathcal{C}_n$ with $P_{e,\max}(\mathcal{C}_n) \leq \epsilon$ for all sufficiently large $n$ .

Think of the codebook as $2^{nR}$ points scattered in $\mathcal{X}^n$ . When codeword $x^n(m)$ is transmitted, the output $y^n$ is "near" it in the sense of joint typicality. As long as no other codeword $x^n(m')$ is also near $y^n$ (also jointly typical), the decoder succeeds. The packing lemma guarantees this for $R < I(X; Y)$ : there is enough "room" in $\mathcal{Y}^n$ for $2^{nI(X;Y)}$ non-confusable codewords.

Proof

Step 1: Random codebook generation

Fix an input distribution $P_X$ and $\epsilon > 0$ .

Generate a random codebook: draw $2^{nR}$ codewords $X^n(1), X^n(2), \ldots, X^n(2^{nR})$ independently, each with i.i.d. entries $X_i(m) \sim P_X$ .

The codebook is revealed to both the encoder and the decoder before communication begins. The message $M$ is uniform on $\mathcal{M} = [1 : 2^{nR}]$ .

Step 2: Joint typicality decoding

The decoder, upon receiving $y^n$ , looks for a unique message $\hat{m}$ such that $(x^n(\hat{m}), y^n) \in \mathcal{T}_\epsilon^{(n)}(X, Y)$ .

$g(y^n) = \begin{cases} \hat{m} & \text{if } \hat{m} \text{ is the unique index with } (x^n(\hat{m}), y^n) \in \mathcal{T}_\epsilon^{(n)} \\ \text{error} & \text{otherwise (no match or multiple matches)} \end{cases}$

Step 3: Error analysis (ensemble average)

By the symmetry of the random codebook (all codewords drawn from the same distribution), the ensemble average error probability satisfies:

$\overline{P}_e^{(n)} = \mathbb{E}_{\mathcal{C}}[P_e(\mathcal{C})] = \Pr(g(Y^n) \neq 1 \mid M = 1)$

We condition on $M = 1$ (WLOG by symmetry) and define error events:

$\mathcal{E}_1 = \{(X^n(1), Y^n) \notin \mathcal{T}_\epsilon^{(n)}(X,Y)\}$ : the true pair is not jointly typical.
$\mathcal{E}_2 = \{\exists\, m \neq 1 : (X^n(m), Y^n) \in \mathcal{T}_\epsilon^{(n)}(X,Y)\}$ : a wrong codeword is jointly typical with $Y^n$ .

By the union bound: $\Pr(\text{error} \mid M=1) \leq \Pr(\mathcal{E}_1) + \Pr(\mathcal{E}_2)$ .

Step 4: Bounding $\Pr(\mathcal{E}_1)$

Since $M = 1$ was sent, $(X^n(1), Y^n)$ are jointly distributed as $\prod_{i=1}^n P_X(x_i) P_{Y|X}(y_i|x_i) = \prod_{i=1}^n P_{XY}(x_i, y_i)$ .

By the law of large numbers (AEP property of typical sequences): $\Pr(\mathcal{E}_1) = \Pr((X^n(1), Y^n) \notin \mathcal{T}_\epsilon^{(n)}) \to 0 \text{ as } n \to \infty$

Step 5: Bounding $\Pr(\mathcal{E}_2)$ via the Packing Lemma

For $m \neq 1$ , the codeword $X^n(m)$ is independent of $Y^n$ (since $Y^n$ depends only on $X^n(1)$ , and the codewords are drawn independently). So $(X^n(m), Y^n) \sim \prod_{i=1}^n P_X(x_i) P_Y(y_i)$ — the product distribution.

By the Packing Lemma (Chapter 3, Lemma 18): $\Pr((X^n(m), Y^n) \in \mathcal{T}_\epsilon^{(n)}) \leq 2^{-n(I(X;Y) - \delta(\epsilon))}$

By the union bound over the $2^{nR} - 1$ wrong codewords: $\Pr(\mathcal{E}_2) \leq (2^{nR} - 1) \cdot 2^{-n(I(X;Y) - \delta(\epsilon))} \leq 2^{-n(I(X;Y) - R - \delta(\epsilon))}$

This vanishes as $n \to \infty$ provided $R < I(X; Y) - \delta(\epsilon)$ .

Step 6: From ensemble average to existence

We have shown $\overline{P}_e^{(n)} \leq \epsilon + 2^{-n(I(X;Y) - R - \delta(\epsilon))} \leq 2\epsilon$ for large $n$ .

Since this is the average over random codebooks, there exists at least one deterministic codebook $\mathcal{C}_n^*$ with $P_e(\mathcal{C}_n^*) \leq 2\epsilon$ .

Choose $P_X = P_X^*$ (the capacity-achieving distribution) to get $R < C - \delta(\epsilon)$ , and let $\epsilon \to 0$ (which makes $\delta(\epsilon) \to 0$ ).

Step 7: Expurgation (average to maximal error)

The code $\mathcal{C}_n^*$ has $P_e(\mathcal{C}_n^*) \leq 2\epsilon$ (average error). Sort codewords by their individual error probabilities: $P_{e,1} \leq P_{e,2} \leq \cdots \leq P_{e,2^{nR}}$ .

The expurgated code $\widetilde{\mathcal{C}}_n^*$ keeps only the best half: codewords $1, \ldots, 2^{nR - 1}$ .

Since the average error is $\leq 2\epsilon$ , the worst codeword in the best half satisfies $P_{e,\max}(\widetilde{\mathcal{C}}_n^*) \leq 4\epsilon$ .

The new rate is $R' = R - 1/n$ , which is still below $C$ for large $n$ . So $P_{e,\max} \to 0$ .

, ,

The Power of Random Coding

The random coding argument is sometimes called the "greatest trick in information theory." We never construct a specific good code — we show that a randomly drawn code works with high probability. This is profoundly useful for proving existence but does not directly yield practical codes.

The challenge of coding theory for the next 70 years has been to find explicit codes that approach capacity: turbo codes (1993), LDPC codes (rediscovered 1996), and polar codes (2009) each represent major steps toward this goal. See Book telecom, Ch. 12 and Chapter 11 of this book for these constructions.

Common Mistake: Random Coding is Not a Code Construction

Mistake:

Treating the random coding argument as a practical code design method. In practice, one cannot generate a random codebook of size $2^{nR}$ for reasonable block lengths — the number of codewords grows exponentially.

Correction:

Random coding is a proof technique that establishes the existence of good codes. Practical codes use structure (algebraic, graphical, or recursive) to enable efficient encoding and decoding. The gap between the random coding bound and practical code performance has been nearly closed by modern codes (LDPC, polar).

Historical Note: Shannon's 1948 Proof

1948

Shannon's original 1948 paper introduced the random coding argument, though in a somewhat different form than presented here. The cleanest version of the proof, using joint typicality decoding, was developed later by Wolfowitz (1961) and refined by Csiszar and Korner (1981). The Orlitsky-Roche strong typicality framework, which we follow in this book, provides the most versatile version of the argument.

It is worth pausing to appreciate the audacity of Shannon's approach: he proved that reliable communication is possible by showing that a code chosen at random works — without ever exhibiting a specific code. This probabilistic method, borrowed from combinatorics (Erdos), was radical in the engineering context of 1948.

Quick Check

In the achievability proof, which step relies on the independence of the wrong codewords from the channel output?

Bounding $\Pr(\mathcal{E}_1)$ : the true pair must be jointly typical

Bounding $\Pr(\mathcal{E}_2)$ via the packing lemma: wrong codewords are independent of $Y^n$

The expurgation step: removing bad codewords

Correction:

Bounding

\Pr(\mathcal{E}_2)

via the packing lemma: wrong codewords are independent of

Y^n

Correct! For $m \\neq 1$ , $X^n(m)$ and $Y^n$ are independent (since they are drawn independently). This makes $(X^n(m), Y^n) \\sim P_X^n \\times P_Y^n$ (the product distribution), which is exactly the condition needed for the packing lemma to bound the joint typicality probability.

Key Takeaway

The achievability proof shows that rates below $C$ are achievable via random coding and joint typicality decoding. The key ingredients are: (1) the AEP guarantees the true pair is jointly typical, (2) the packing lemma bounds the probability of confusion with wrong codewords, and (3) expurgation converts average error to maximal error. This proof pattern — random coding for achievability — will be reused throughout the book.

The Channel Coding Theorem — Achievability

The Random Coding Argument

Random Coding and the Packing Argument

Theorem: Channel Coding Theorem — Achievability

Step 1: Random codebook generation

Step 2: Joint typicality decoding

Step 3: Error analysis (ensemble average)

Step 4: Bounding $\Pr(\mathcal{E}_1)$

Step 5: Bounding $\Pr(\mathcal{E}_2)$ via the Packing Lemma

Step 6: From ensemble average to existence

Step 7: Expurgation (average to maximal error)

The Power of Random Coding

Common Mistake: Random Coding is Not a Code Construction

Historical Note: Shannon's 1948 Proof

Quick Check

Key Takeaway