Ferkans — Interactive Telecom Tutor

The Central Question of Channel Coding

Parts I and II answered the question: how many bits does it take to describe a source? Now we turn to the complementary question: how many bits can we reliably transmit through a noisy channel?

This is the channel coding problem, and its answer — the channel coding theorem — is one of the most profound results in all of engineering. Shannon proved in 1948 that every channel has a definite capacity $C$ , measured in bits per channel use, and that reliable communication at any rate below $C$ is possible with sufficiently long codes. Above $C$ , reliable communication is impossible regardless of the code used.

The proof is a masterpiece of probabilistic reasoning. The achievability uses random coding — a randomly chosen codebook works with high probability — and the converse uses Fano's inequality, the same tool we used in converse proofs for source coding. This chapter develops both directions in full.

Definition:
Discrete Memoryless Channel (DMC)

A discrete memoryless channel (DMC) $(\mathcal{X}, P_{Y|X}, \mathcal{Y})$ consists of:

A finite input alphabet $\mathcal{X}$
A finite output alphabet $\mathcal{Y}$
A transition probability mass function $P_{Y|X}(y|x)$ for $x \in \mathcal{X}$ , $y \in \mathcal{Y}$

The memoryless property states that when the channel is used $n$ times:

$P(Y^n = y^n | X^n = x^n) = \prod_{i=1}^n P_{Y|X}(y_i | x_i)$

Each output $Y_i$ depends only on the current input $X_i$ , not on past or future inputs, not on the message, and not on past outputs.

,

Discrete memoryless channel (DMC)

A communication channel with finite input and output alphabets where each output symbol depends only on the corresponding input symbol, not on any other inputs or outputs. The channel is fully described by its transition probabilities $P_{Y|X}(y|x)$ .

Definition:
Block Code

A block code $\mathcal{C}$ with rate $R$ and block length $n$ (an $(R, n)$ -code) consists of:

A message set $\mathcal{M} = [1 : 2^{nR}] = \{1, 2, \ldots, 2^{nR}\}$
A codebook $\{x^n(1), x^n(2), \ldots, x^n(2^{nR})\}$ — an array of $2^{nR}$ codewords, each of length $n$ over $\mathcal{X}$
An encoding function $f : \mathcal{M} \to \mathcal{X}^n$ with $f(m) = x^n(m)$
A decoding function $g : \mathcal{Y}^n \to \mathcal{M}$ with $\hat{m} = g(y^n)$

The code rate $R = \frac{\log|\mathcal{M}|}{n} = \frac{\log 2^{nR}}{n}$ measures the number of information bits transmitted per channel use.

Definition:
Error Probability

For a block code $\mathcal{C}$ :

Individual message error probability: $P_{e,m}(\mathcal{C}) = \Pr(g(Y^n) \neq m \mid X^n = x^n(m))$

Maximal probability of error: $P_{e,\max}(\mathcal{C}) = \max_{m \in \mathcal{M}} P_{e,m}(\mathcal{C})$

Average probability of error (assuming uniform messages): $P_e(\mathcal{C}) = \frac{1}{|\mathcal{M}|} \sum_{m=1}^{|\mathcal{M}|} P_{e,m}(\mathcal{C})$

We typically analyze the average error probability (which is easier) and then use an expurgation argument to convert to maximal error probability. The key insight: if the average is small, at least half the codewords have small individual error probability, so we can keep the best half.

Definition:
Achievable Rate and Channel Capacity (Operational)

A rate $R$ is achievable if there exists a sequence of $(R, n)$ -codes $\{\mathcal{C}_n\}_{n=1}^\infty$ such that:

$P_{e,\max}(\mathcal{C}_n) \to 0 \quad \text{as } n \to \infty$

The channel capacity $C$ is the supremum of all achievable rates.

This is an operational definition — it tells us what capacity means but does not tell us how to compute it. The channel coding theorem provides the explicit formula $C = \max_{P_X} I(X; Y)$ , connecting the operational definition to a computable quantity.

Achievable rate

A rate $R$ is achievable for a channel if there exist codes of that rate with vanishing error probability as the block length grows. The supremum of achievable rates is the channel capacity.

Channel capacity

The maximum rate at which information can be reliably transmitted over a noisy channel. For a DMC: $C = \max_{P_X} I(X; Y)$ . Rates below capacity are achievable with vanishing error probability; rates above capacity are not.

Example: The Noisy Typewriter Channel

The noisy typewriter has $|\mathcal{X}| = |\mathcal{Y}| = 8$ (input/output symbols labeled $0, 1, \ldots, 7$ ). Each input maps to itself or the next symbol (cyclically) with equal probability $1/2$ :

$P_{Y|X}(y|x) = \begin{cases} 1/2 & \text{if } y = x \text{ or } y = x \oplus 1 \pmod{8} \\ 0 & \text{otherwise}\end{cases}$

Find a zero-error code and compute its rate.

Solution

Identify disjoint output sets

If input $x$ is used, the output is $\{x, x+1 \bmod 8\}$ . For inputs $\{0, 2, 4, 6\}$ (alternate symbols), the output sets are:

$x = 0$ : $\{0, 1\}$
$x = 2$ : $\{2, 3\}$
$x = 4$ : $\{4, 5\}$
$x = 6$ : $\{6, 7\}$

These output sets are disjoint, so the decoder can perfectly identify which of the 4 inputs was sent.

Compute rate

Using only $\{0, 2, 4, 6\}$ as the input alphabet gives $|\mathcal{M}| = 4^n$ codewords with $n$ channel uses, for a rate of $\log 4 = 2$ bits per use. Since $\log 8 = 3$ , we are using only half the alphabet.

This is a zero-error code — the error probability is exactly 0, not just vanishing. Zero-error capacity is a separate (combinatorial) concept; the Shannon capacity $C$ allows for codes with $P_e \to 0$ .

Quick Check

In a DMC, the memoryless property means that $Y_i$ depends only on $X_i$ . Which of the following is a consequence of this property?

The encoder cannot use feedback from previous outputs

The mutual information decomposes as $I(X^n; Y^n) = \sum_i I(X_i; Y_i)$ when inputs are i.i.d.

The capacity does not depend on the input distribution

Correction:

The mutual information decomposes as

I(X^n; Y^n) = \sum_i I(X_i; Y_i)

when inputs are i.i.d.

Correct! When the inputs are i.i.d. and the channel is memoryless, $(X_i, Y_i)$ are i.i.d. pairs, so the chain rule gives $\I(X^n; Y^n) = \\sum_i \I(X_i; Y_i) = n\I(X; Y)$ .

Key Takeaway

The channel coding problem asks for the maximum rate at which information can be reliably transmitted through a DMC. A rate $R$ is achievable if codes exist with $P_{e,\max} \to 0$ as the block length $n \to \infty$ . The channel capacity is the supremum of achievable rates. The channel coding theorem gives the explicit formula $C = \max_{P_X} I(X;Y)$ .

The Channel Coding Problem