Ferkans — Interactive Telecom Tutor

The Workhorses of Converse Proofs

Two inequalities dominate the converse (impossibility) arguments throughout information theory: the data processing inequality and Fano's inequality. The data processing inequality says that no processing can create information. Fano's inequality converts a small probability of error into an upper bound on conditional entropy — and from there, into a bound on rate. Together, they are the tools that prove fundamental limits are tight: you truly cannot do better than capacity.

Definition:
Markov Chain

Random variables $X$ , $Y$ , $Z$ form a Markov chain in that order, written $X X \multimap Y \multimap Z Y X \multimap Y \multimap Z Z$ (or $X \to Y \to Z$ ), if the conditional distribution of $Z$ given $(X, Y)$ depends only on $Y$ :

$P_{Z|XY}(z|x,y) = P_{Z|Y}(z|y) \quad \text{for all } x, y, z.$

Equivalently, $X$ and $Z$ are conditionally independent given $Y$ .

Markov chain

A sequence of random variables $X \to Y \to Z$ where $Z$ is conditionally independent of $X$ given $Y$ . In information theory, Markov chains arise naturally in channel models: the message $X$ , the channel input, the channel output $Y$ , and the decoder output $\hat{X}$ form a chain $X \to Y \to \hat{X}$ .

Related: Data processing inequality

Theorem: Data Processing Inequality

If $X X \multimap Y \multimap Z Y X \multimap Y \multimap Z Z$ forms a Markov chain, then:

$I(X;Z) \leq I(X;Y).$

In particular, if $Z = g(Y)$ is a deterministic function of $Y$ , then $I(X;g(Y)) \leq I(X;Y)$ .

You cannot increase information about $X$ by processing $Y$ . The channel output $Y$ is the "bottleneck" — any further processing can only lose information, never gain it. This is why sufficient statistics are so valuable: they are the functions of $Y$ that preserve all information about $X$ .

Proof

Apply the chain rule

By the chain rule for mutual information:

$I(X; Y, Z) = I(X;Z) + I(X;Y|Z) = I(X;Y) + I(X;Z|Y).$

Use the Markov property

Since $X X \multimap Y \multimap Z Y X \multimap Y \multimap Z Z$ , we have $X \perp Z \mid Y$ , so $I(X;Z|Y) = 0$ . Therefore:

$I(X;Z) + I(X;Y|Z) = I(X;Y).$

Since $I(X;Y|Z) \geq 0$ , we conclude $I(X;Z) \leq I(X;Y)$ .

Equality condition

Equality $I(X;Z) = I(X;Y)$ holds iff $I(X;Y|Z) = 0$ , i.e., iff $X X \multimap Y \multimap Z Z X \multimap Y \multimap Z Y$ also forms a Markov chain. This means $Z$ is a sufficient statistic of $Y$ for $X$ .

Data processing inequality

If $X \to Y \to Z$ is a Markov chain, then $I(X;Z) \leq I(X;Y)$ . Processing data cannot create information. Equality holds iff $Z$ is a sufficient statistic of $Y$ for $X$ .

Sufficient statistic

A function $T(Y)$ is a sufficient statistic for $X$ based on $Y$ if $X X \multimap Y \multimap Z T(Y) X \multimap Y \multimap Z Y$ , i.e., $T(Y)$ captures all the information that $Y$ contains about $X$ . Equivalently, $I(X; T(Y)) = I(X;Y)$ .

Related: Data processing inequality

Theorem: Fano's Inequality

Let $X$ be a discrete random variable with alphabet $\mathcal{X}$ , $|\mathcal{X}| = M$ . Let $\hat{X} = g(Y)$ be an estimate of $X$ based on observation $Y$ . Define the probability of error $P_e = \Pr(\hat{X} \neq X)$ . Then:

$H(X|Y) \leq h_b(P_e) + P_e \log(M-1) \triangleq H(P_e, M).$

A useful weakened form: $H(X|Y) \leq 1 + P_e \log(M-1)$ .

Fano's inequality provides a converse bridge: if we can decode $X$ from $Y$ with small error probability $P_e$ , then the conditional entropy $H(X|Y)$ must be small. Equivalently, if $H(X|Y)$ is large, then $P_e$ must be large — reliable decoding is impossible.

This is the key tool in every converse proof. The argument always goes: assume rate $R > C$ , use Fano to show $H(X|Y)$ is bounded above, then show this leads to a contradiction (or that $P_e \to 1$ ).

Proof

Introduce the error indicator

Define $E = \mathbf{1}\{\hat{X} \neq X\}$ , so $P_e = \Pr(E = 1)$ . By the chain rule:

$H(E, X | Y) = H(X|Y) + H(E|X,Y).$

Since $E$ is a deterministic function of $(X, Y)$ (given the decoder $g$ ), $H(E|X,Y) = 0$ . Therefore:

$H(X|Y) = H(E, X|Y) = H(E|Y) + H(X|E,Y).$

Bound each term

First term: $H(E|Y) \leq H(E) = h_b(P_e)$ (binary entropy of the error probability).

Second term: Split by the value of $E$ :

$H(X|E,Y) = (1-P_e) \cdot H(X|E=0,Y) + P_e \cdot H(X|E=1,Y).$

When $E=0$ : $\hat{X} = X$ , so $X$ is known given $Y$ — thus $H(X|E=0,Y) = 0$ .

When $E=1$ : $X \neq \hat{X}$ , so $X$ takes one of at most $M-1$ values — thus $H(X|E=1,Y) \leq \log(M-1)$ .

Combine

$H(X|Y) \leq h_b(P_e) + P_e \log(M-1).$ $For$ P_e \to 0 $: the right side$ \to 0$, so reliable decoding implies small conditional entropy. Conversely, large conditional entropy implies large error probability — this is the converse direction.

Example: Fano's Inequality for the BSC

A binary symmetric channel has crossover probability $\epsilon = 0.1$ . We send $n$ i.i.d. bits over this channel without coding (rate $R = 1$ ). Use Fano's inequality to lower-bound the bit error rate.

Solution

Setup

For each bit: $H(X_i | Y_i) = h_b(\epsilon) = h_b(0.1) \approx 0.469$ bits.

Fano's inequality gives: $h_b(P_e) + P_e \log 1 \geq H(X_i | Y_i)$ .

Since $\log 1 = 0$ : $h_b(P_e) \geq 0.469$ .

Solve for $P_e$

We need $P_e$ such that $h_b(P_e) \geq 0.469$ . Since $h_b$ is symmetric about $1/2$ and increasing on $[0, 1/2]$ , we solve $h_b(P_e) = 0.469$ to get $P_e \approx 0.1$ .

This confirms that without coding, the bit error rate equals the crossover probability — exactly as expected. The power of Fano's inequality becomes apparent when we use it to prove that no coding scheme can achieve reliable communication above capacity.

Common Mistake: Fano's Inequality Is Not Always Tight

Mistake:

Treating the Fano bound $P_e \log(M-1)$ as if it gives the exact conditional entropy. In particular, using the weakened form $H(X|Y) \leq 1 + P_e \log M$ when a tighter bound is needed.

Correction:

Fano's inequality gives an upper bound on $H(X|Y)$ , not an equality. For binary alphabets ( $M = 2$ ), Fano's inequality is tight (it becomes $H(X|Y) \leq h_b(P_e)$ , which can be achieved). For larger alphabets, the bound can be loose because the entropy of $X$ conditioned on an error may be much less than $\log(M-1)$ . Tighter bounds exist (e.g., using the list size in list decoding) but Fano's inequality suffices for most converse proofs.

⚠️Engineering Note

Data Processing Inequality and Receiver Architecture

The data processing inequality has direct implications for receiver design. Consider the processing chain in a typical wireless receiver:

$X \to Y_{\text{analog}} \to Y_{\text{digital}} \to Y_{\text{filtered}} \to \hat{X}$

At each stage, $I(X; \cdot)$ can only decrease. This means:

ADC resolution matters: If the ADC quantizes too coarsely, information is lost irreversibly. For a given SNR, there is a minimum ADC resolution below which capacity is degraded.
Matched filtering is not optional: The matched filter is the sufficient statistic for the AWGN channel — it preserves all information. Any other front-end filter loses information.
Early decisions are costly: Making hard decisions (e.g., hard demapping before decoding) discards soft information that the decoder could use. Soft-in/soft-out processing preserves mutual information through the decoding chain.

Practical Constraints

•
ADC quantization introduces irreversible information loss
•
Hard decisions before decoding lose soft information
•
Each processing stage must be designed to preserve mutual information

Fano's Inequality Bound

Visualize the Fano bound: for a given alphabet size $M$ and error probability $P_e$ , the conditional entropy $H(X|Y)$ is upper bounded by $h_b(P_e) + P_e \log(M-1)$ . See how the bound tightens as $P_e \to 0$ .

Parameters

Alphabet size M8

Number of possible values of X

Key Takeaway

Data processing and Fano's inequality are the two pillars of converse proofs. The data processing inequality ( $I(X;Z) \leq I(X;Y)$ for Markov chains $X \to Y \to Z$ ) says processing cannot create information. Fano's inequality ( $H(X|Y) \leq h_b(P_e) + P_e \log(M-1)$ ) converts small error probability into small conditional entropy. Together, they prove that reliable communication above capacity is impossible.

Fundamental Inequalities