The Channel Coding Problem
The Central Question of Channel Coding
Parts I and II answered the question: how many bits does it take to describe a source? Now we turn to the complementary question: how many bits can we reliably transmit through a noisy channel?
This is the channel coding problem, and its answer β the channel coding theorem β is one of the most profound results in all of engineering. Shannon proved in 1948 that every channel has a definite capacity , measured in bits per channel use, and that reliable communication at any rate below is possible with sufficiently long codes. Above , reliable communication is impossible regardless of the code used.
The proof is a masterpiece of probabilistic reasoning. The achievability uses random coding β a randomly chosen codebook works with high probability β and the converse uses Fano's inequality, the same tool we used in converse proofs for source coding. This chapter develops both directions in full.
Definition: Discrete Memoryless Channel (DMC)
Discrete Memoryless Channel (DMC)
A discrete memoryless channel (DMC) consists of:
- A finite input alphabet
- A finite output alphabet
- A transition probability mass function for ,
The memoryless property states that when the channel is used times:
Each output depends only on the current input , not on past or future inputs, not on the message, and not on past outputs.
Discrete memoryless channel (DMC)
A communication channel with finite input and output alphabets where each output symbol depends only on the corresponding input symbol, not on any other inputs or outputs. The channel is fully described by its transition probabilities .
Related: Channel capacity, Block Code, Memoryless property
Definition: Block Code
Block Code
A block code with rate and block length (an -code) consists of:
- A message set
- A codebook β an array of codewords, each of length over
- An encoding function with
- A decoding function with
The code rate measures the number of information bits transmitted per channel use.
Definition: Error Probability
Error Probability
For a block code :
Individual message error probability:
Maximal probability of error:
Average probability of error (assuming uniform messages):
We typically analyze the average error probability (which is easier) and then use an expurgation argument to convert to maximal error probability. The key insight: if the average is small, at least half the codewords have small individual error probability, so we can keep the best half.
Definition: Achievable Rate and Channel Capacity (Operational)
Achievable Rate and Channel Capacity (Operational)
A rate is achievable if there exists a sequence of -codes such that:
The channel capacity is the supremum of all achievable rates.
This is an operational definition β it tells us what capacity means but does not tell us how to compute it. The channel coding theorem provides the explicit formula , connecting the operational definition to a computable quantity.
Achievable rate
A rate is achievable for a channel if there exist codes of that rate with vanishing error probability as the block length grows. The supremum of achievable rates is the channel capacity.
Related: Channel capacity, Block Code, Channel coding theorem
Channel capacity
The maximum rate at which information can be reliably transmitted over a noisy channel. For a DMC: . Rates below capacity are achievable with vanishing error probability; rates above capacity are not.
Related: Achievable Rate and Channel Capacity (Operational), Channel coding theorem, Mutual Information
Example: The Noisy Typewriter Channel
The noisy typewriter has (input/output symbols labeled ). Each input maps to itself or the next symbol (cyclically) with equal probability :
Find a zero-error code and compute its rate.
Identify disjoint output sets
If input is used, the output is . For inputs (alternate symbols), the output sets are:
- :
- :
- :
- :
These output sets are disjoint, so the decoder can perfectly identify which of the 4 inputs was sent.
Compute rate
Using only as the input alphabet gives codewords with channel uses, for a rate of bits per use. Since , we are using only half the alphabet.
This is a zero-error code β the error probability is exactly 0, not just vanishing. Zero-error capacity is a separate (combinatorial) concept; the Shannon capacity allows for codes with .
Quick Check
In a DMC, the memoryless property means that depends only on . Which of the following is a consequence of this property?
The encoder cannot use feedback from previous outputs
The mutual information decomposes as when inputs are i.i.d.
The capacity does not depend on the input distribution
Correct! When the inputs are i.i.d. and the channel is memoryless, are i.i.d. pairs, so the chain rule gives .
Key Takeaway
The channel coding problem asks for the maximum rate at which information can be reliably transmitted through a DMC. A rate is achievable if codes exist with as the block length . The channel capacity is the supremum of achievable rates. The channel coding theorem gives the explicit formula .