The Channel Coding Problem

The Central Question of Channel Coding

Parts I and II answered the question: how many bits does it take to describe a source? Now we turn to the complementary question: how many bits can we reliably transmit through a noisy channel?

This is the channel coding problem, and its answer β€” the channel coding theorem β€” is one of the most profound results in all of engineering. Shannon proved in 1948 that every channel has a definite capacity CC, measured in bits per channel use, and that reliable communication at any rate below CC is possible with sufficiently long codes. Above CC, reliable communication is impossible regardless of the code used.

The proof is a masterpiece of probabilistic reasoning. The achievability uses random coding β€” a randomly chosen codebook works with high probability β€” and the converse uses Fano's inequality, the same tool we used in converse proofs for source coding. This chapter develops both directions in full.

Definition:

Discrete Memoryless Channel (DMC)

A discrete memoryless channel (DMC) (X,PY∣X,Y)(\mathcal{X}, P_{Y|X}, \mathcal{Y}) consists of:

  • A finite input alphabet X\mathcal{X}
  • A finite output alphabet Y\mathcal{Y}
  • A transition probability mass function PY∣X(y∣x)P_{Y|X}(y|x) for x∈Xx \in \mathcal{X}, y∈Yy \in \mathcal{Y}

The memoryless property states that when the channel is used nn times:

P(Yn=yn∣Xn=xn)=∏i=1nPY∣X(yi∣xi)P(Y^n = y^n | X^n = x^n) = \prod_{i=1}^n P_{Y|X}(y_i | x_i)

Each output YiY_i depends only on the current input XiX_i, not on past or future inputs, not on the message, and not on past outputs.

,

Discrete memoryless channel (DMC)

A communication channel with finite input and output alphabets where each output symbol depends only on the corresponding input symbol, not on any other inputs or outputs. The channel is fully described by its transition probabilities PY∣X(y∣x)P_{Y|X}(y|x).

Related: Channel capacity, Block Code, Memoryless property

Definition:

Block Code

A block code C\mathcal{C} with rate RR and block length nn (an (R,n)(R, n)-code) consists of:

  1. A message set M=[1:2nR]={1,2,…,2nR}\mathcal{M} = [1 : 2^{nR}] = \{1, 2, \ldots, 2^{nR}\}
  2. A codebook {xn(1),xn(2),…,xn(2nR)}\{x^n(1), x^n(2), \ldots, x^n(2^{nR})\} β€” an array of 2nR2^{nR} codewords, each of length nn over X\mathcal{X}
  3. An encoding function f:M→Xnf : \mathcal{M} \to \mathcal{X}^n with f(m)=xn(m)f(m) = x^n(m)
  4. A decoding function g:Yn→Mg : \mathcal{Y}^n \to \mathcal{M} with m^=g(yn)\hat{m} = g(y^n)

The code rate R=log⁑∣M∣n=log⁑2nRnR = \frac{\log|\mathcal{M}|}{n} = \frac{\log 2^{nR}}{n} measures the number of information bits transmitted per channel use.

Definition:

Error Probability

For a block code C\mathcal{C}:

Individual message error probability: Pe,m(C)=Pr⁑(g(Yn)β‰ m∣Xn=xn(m))P_{e,m}(\mathcal{C}) = \Pr(g(Y^n) \neq m \mid X^n = x^n(m))

Maximal probability of error: Pe,max⁑(C)=max⁑m∈MPe,m(C)P_{e,\max}(\mathcal{C}) = \max_{m \in \mathcal{M}} P_{e,m}(\mathcal{C})

Average probability of error (assuming uniform messages): Pe(C)=1∣Mβˆ£βˆ‘m=1∣M∣Pe,m(C)P_e(\mathcal{C}) = \frac{1}{|\mathcal{M}|} \sum_{m=1}^{|\mathcal{M}|} P_{e,m}(\mathcal{C})

We typically analyze the average error probability (which is easier) and then use an expurgation argument to convert to maximal error probability. The key insight: if the average is small, at least half the codewords have small individual error probability, so we can keep the best half.

Definition:

Achievable Rate and Channel Capacity (Operational)

A rate RR is achievable if there exists a sequence of (R,n)(R, n)-codes {Cn}n=1∞\{\mathcal{C}_n\}_{n=1}^\infty such that:

Pe,max⁑(Cn)β†’0asΒ nβ†’βˆžP_{e,\max}(\mathcal{C}_n) \to 0 \quad \text{as } n \to \infty

The channel capacity CC is the supremum of all achievable rates.

This is an operational definition β€” it tells us what capacity means but does not tell us how to compute it. The channel coding theorem provides the explicit formula C=max⁑PXI(X;Y)C = \max_{P_X} I(X; Y), connecting the operational definition to a computable quantity.

Achievable rate

A rate RR is achievable for a channel if there exist codes of that rate with vanishing error probability as the block length grows. The supremum of achievable rates is the channel capacity.

Related: Channel capacity, Block Code, Channel coding theorem

Channel capacity

The maximum rate at which information can be reliably transmitted over a noisy channel. For a DMC: C=max⁑PXI(X;Y)C = \max_{P_X} I(X; Y). Rates below capacity are achievable with vanishing error probability; rates above capacity are not.

Related: Achievable Rate and Channel Capacity (Operational), Channel coding theorem, Mutual Information

Example: The Noisy Typewriter Channel

The noisy typewriter has ∣X∣=∣Y∣=8|\mathcal{X}| = |\mathcal{Y}| = 8 (input/output symbols labeled 0,1,…,70, 1, \ldots, 7). Each input maps to itself or the next symbol (cyclically) with equal probability 1/21/2:

PY∣X(y∣x)={1/2ifΒ y=xΒ orΒ y=xβŠ•1(mod8)0otherwiseP_{Y|X}(y|x) = \begin{cases} 1/2 & \text{if } y = x \text{ or } y = x \oplus 1 \pmod{8} \\ 0 & \text{otherwise}\end{cases}

Find a zero-error code and compute its rate.

Quick Check

In a DMC, the memoryless property means that YiY_i depends only on XiX_i. Which of the following is a consequence of this property?

The encoder cannot use feedback from previous outputs

The mutual information decomposes as I(Xn;Yn)=βˆ‘iI(Xi;Yi)I(X^n; Y^n) = \sum_i I(X_i; Y_i) when inputs are i.i.d.

The capacity does not depend on the input distribution

Key Takeaway

The channel coding problem asks for the maximum rate at which information can be reliably transmitted through a DMC. A rate RR is achievable if codes exist with Pe,max⁑→0P_{e,\max} \to 0 as the block length nβ†’βˆžn \to \infty. The channel capacity is the supremum of achievable rates. The channel coding theorem gives the explicit formula C=max⁑PXI(X;Y)C = \max_{P_X} I(X;Y).