Fundamental Inequalities
The Workhorses of Converse Proofs
Two inequalities dominate the converse (impossibility) arguments throughout information theory: the data processing inequality and Fano's inequality. The data processing inequality says that no processing can create information. Fano's inequality converts a small probability of error into an upper bound on conditional entropy — and from there, into a bound on rate. Together, they are the tools that prove fundamental limits are tight: you truly cannot do better than capacity.
Definition: Markov Chain
Markov Chain
Random variables , , form a Markov chain in that order, written (or ), if the conditional distribution of given depends only on :
Equivalently, and are conditionally independent given .
Markov chain
A sequence of random variables where is conditionally independent of given . In information theory, Markov chains arise naturally in channel models: the message , the channel input, the channel output , and the decoder output form a chain .
Related: Data processing inequality
Theorem: Data Processing Inequality
If forms a Markov chain, then:
In particular, if is a deterministic function of , then .
You cannot increase information about by processing . The channel output is the "bottleneck" — any further processing can only lose information, never gain it. This is why sufficient statistics are so valuable: they are the functions of that preserve all information about .
Apply the chain rule
By the chain rule for mutual information:
Use the Markov property
Since , we have , so . Therefore:
Since , we conclude .
Equality condition
Equality holds iff , i.e., iff also forms a Markov chain. This means is a sufficient statistic of for .
Data processing inequality
If is a Markov chain, then . Processing data cannot create information. Equality holds iff is a sufficient statistic of for .
Related: Markov chain, Sufficient statistic
Sufficient statistic
A function is a sufficient statistic for based on if , i.e., captures all the information that contains about . Equivalently, .
Related: Data processing inequality
Theorem: Fano's Inequality
Let be a discrete random variable with alphabet , . Let be an estimate of based on observation . Define the probability of error . Then:
A useful weakened form: .
Fano's inequality provides a converse bridge: if we can decode from with small error probability , then the conditional entropy must be small. Equivalently, if is large, then must be large — reliable decoding is impossible.
This is the key tool in every converse proof. The argument always goes: assume rate , use Fano to show is bounded above, then show this leads to a contradiction (or that ).
Introduce the error indicator
Define , so . By the chain rule:
Since is a deterministic function of (given the decoder ), . Therefore:
Bound each term
First term: (binary entropy of the error probability).
Second term: Split by the value of :
When : , so is known given — thus .
When : , so takes one of at most values — thus .
Combine
P_e \to 0\to 0$, so reliable decoding implies small conditional entropy. Conversely, large conditional entropy implies large error probability — this is the converse direction.
Example: Fano's Inequality for the BSC
A binary symmetric channel has crossover probability . We send i.i.d. bits over this channel without coding (rate ). Use Fano's inequality to lower-bound the bit error rate.
Setup
For each bit: bits.
Fano's inequality gives: .
Since : .
Solve for $P_e$
We need such that . Since is symmetric about and increasing on , we solve to get .
This confirms that without coding, the bit error rate equals the crossover probability — exactly as expected. The power of Fano's inequality becomes apparent when we use it to prove that no coding scheme can achieve reliable communication above capacity.
Common Mistake: Fano's Inequality Is Not Always Tight
Mistake:
Treating the Fano bound as if it gives the exact conditional entropy. In particular, using the weakened form when a tighter bound is needed.
Correction:
Fano's inequality gives an upper bound on , not an equality. For binary alphabets (), Fano's inequality is tight (it becomes , which can be achieved). For larger alphabets, the bound can be loose because the entropy of conditioned on an error may be much less than . Tighter bounds exist (e.g., using the list size in list decoding) but Fano's inequality suffices for most converse proofs.
Data Processing Inequality and Receiver Architecture
The data processing inequality has direct implications for receiver design. Consider the processing chain in a typical wireless receiver:
At each stage, can only decrease. This means:
- ADC resolution matters: If the ADC quantizes too coarsely, information is lost irreversibly. For a given SNR, there is a minimum ADC resolution below which capacity is degraded.
- Matched filtering is not optional: The matched filter is the sufficient statistic for the AWGN channel — it preserves all information. Any other front-end filter loses information.
- Early decisions are costly: Making hard decisions (e.g., hard demapping before decoding) discards soft information that the decoder could use. Soft-in/soft-out processing preserves mutual information through the decoding chain.
- •
ADC quantization introduces irreversible information loss
- •
Hard decisions before decoding lose soft information
- •
Each processing stage must be designed to preserve mutual information
Fano's Inequality Bound
Visualize the Fano bound: for a given alphabet size and error probability , the conditional entropy is upper bounded by . See how the bound tightens as .
Parameters
Number of possible values of X
Key Takeaway
Data processing and Fano's inequality are the two pillars of converse proofs. The data processing inequality ( for Markov chains ) says processing cannot create information. Fano's inequality () converts small error probability into small conditional entropy. Together, they prove that reliable communication above capacity is impossible.