Ferkans — Interactive Telecom Tutor

The Central Quantity in Information Theory

If entropy measures the uncertainty of a single random variable, we now ask: how much does one random variable tell us about another? This quantity — mutual information — turns out to be the single most important concept in information theory. It determines the capacity of channels, the limits of compression with side information, and the fundamental tradeoffs in multiuser communication. Everything in this book ultimately revolves around computing, bounding, or optimizing mutual information.

Definition:
Mutual Information

The mutual information between discrete random variables $X$ and $Y$ is

$I(X;Y) = \sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x,y) \log \frac{p(x,y)}{p(x)\,p(y)}.$

Equivalently:

$I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y).$

The first expression shows that $I(X;Y) = D(P_{XY} \| P_X \times P_Y)$ — mutual information is the KL divergence between the joint distribution and the product of the marginals. It measures how far $(X,Y)$ is from independence. The second set of expressions connect mutual information to entropy and conditional entropy.

Mutual information

The reduction in uncertainty about $X$ due to observing $Y$ (or equivalently, about $Y$ due to observing $X$ ): $I(X;Y) = H(X) - H(X|Y)$ . Always non-negative. Equals zero iff $X$ and $Y$ are independent. Symmetric: $I(X;Y) = I(Y;X)$ .

Theorem: Properties of Mutual Information

For discrete random variables $X$ and $Y$ :

Symmetry: $I(X;Y) = I(Y;X)$ .
Non-negativity: $I(X;Y) \geq 0$ .
Self-information: $I(X;X) = H(X)$ .
Independence: $I(X;Y) = 0$ if and only if $X \perp Y$ .
Upper bounds: $I(X;Y) \leq \min\{H(X), H(Y)\}$ .

Property 1 says information flow is symmetric — $Y$ tells us as much about $X$ as $X$ tells us about $Y$ . Property 3 says a variable is maximally informative about itself. Property 5 says we cannot learn more about $X$ from $Y$ than the total uncertainty in $X$ .

Proof

Symmetry

From the definition: $I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = I(Y;X)$ .

Alternatively, $p(x,y) \log \frac{p(x,y)}{p(x)p(y)}$ is symmetric in the roles of $x$ and $y$ in the summation.

Non-negativity

We have $I(X;Y) = D(P_{XY} \| P_X \times P_Y)$ (as shown in Section 1.4). Non-negativity of KL divergence — the information inequality — gives $I(X;Y) \geq 0$ .

Self-information and independence

$I(X;X) = H(X) - H(X|X) = H(X) - 0 = H(X)$ .

$I(X;Y) = 0$ iff $D(P_{XY} \| P_X \times P_Y) = 0$ iff $P_{XY} = P_X \times P_Y$ iff $X \perp Y$ .

Upper bounds

$I(X;Y) = H(X) - H(X|Y) \leq H(X)$ since $H(X|Y) \geq 0$ . By symmetry, $I(X;Y) \leq H(Y)$ . Together: $I(X;Y) \leq \min\{H(X), H(Y)\}$ .

Definition:
Conditional Mutual Information

The conditional mutual information of $X$ and $Y$ given $Z$ is

$I(X;Y|Z) = H(X|Z) - H(X|Y,Z) = \sum_{z} p(z) I(X;Y|Z=z).$

It measures the average information that $Y$ provides about $X$ when $Z$ is already known.

Theorem: Chain Rule for Mutual Information

$I(X_1, X_2, \ldots, X_n ; Y) = \sum_{i=1}^{n} I(X_i ; Y | X_{i-1}, \ldots, X_1).$ $

The total information that the collection $(X_1, \ldots, X_n)$ provides about $Y$ equals the sum of the incremental contributions of each $X_i$ , given the previous variables. This telescoping structure is the same pattern as the entropy chain rule — and it reappears in the converse proof of every channel coding theorem in this book.

Proof

Apply entropy chain rules

$I(X_1, \ldots, X_n; Y) = H(X_1, \ldots, X_n) - H(X_1, \ldots, X_n | Y)KATEXPLACEHOLDER0END= \sum_{i=1}^n H(X_i | X^{i-1}) - \sum_{i=1}^n H(X_i | X^{i-1}, Y)KATEXPLACEHOLDER1END= \sum_{i=1}^n [H(X_i | X^{i-1}) - H(X_i | X^{i-1}, Y)] = \sum_{i=1}^n I(X_i; Y | X^{i-1}).$ $Here$ X^{i-1} $denotes$ (X_1, \ldots, X_{i-1})$.

Example: Mutual Information of the Binary Symmetric Channel

For the binary symmetric channel (BSC) with crossover probability $\epsilon$ and uniform input $X \sim \text{Bernoulli}(1/2)$ , compute $I(X;Y)$ .

Solution

Using the definition

From the example in Section 1.2:

$I(X;Y) = H(Y) - H(Y|X) = 1 - h_b(\epsilon).$

When $\epsilon = 0$ : $I(X;Y) = 1$ bit (perfect channel, $Y$ determines $X$ ).

When $\epsilon = 1/2$ : $I(X;Y) = 0$ (useless channel, $Y$ independent of $X$ ).

When $\epsilon = 1$ : $I(X;Y) = 1$ bit (deterministic bit-flip, $Y$ still determines $X$ ).

Interpretation

The mutual information $1 - h_b(\epsilon)$ is precisely the capacity of the BSC (as we will prove in Chapter 9). The point is that mutual information quantifies how much useful information passes through the channel per use — and for the BSC with uniform input, this equals the capacity.

BSC Mutual Information

Visualize the mutual information $I(X;Y) = H(Y) - h_b(\epsilon)$ for a binary symmetric channel as a function of the input bias $p$ and crossover probability $\epsilon$ . The capacity (maximum over $p$ ) is achieved at $p = 1/2$ .

Parameters

Crossover probability ε0.1

BSC crossover probability

Input bias p0.5

P(X=1)

Why This Matters: Mutual Information and Channel Capacity

The channel capacity is $C = \max_{p_X} I(X;Y)$ — the maximum mutual information over all input distributions. This is the fundamental theorem of channel coding (Chapter 9). For the AWGN channel, this yields $C = \frac{1}{2}\log(1 + \text{SNR})$ , which is the starting point for all modern wireless system design. For MIMO channels, the capacity involves optimizing the input covariance matrix, leading to waterfilling across spatial modes (see Book telecom, Ch. 15).

Quick Check

If $I(X;Y) = 0.7$ bits, what is $I(Y;X)$ ?

$0.7$ bits

Cannot be determined without knowing $H(X)$ and $H(Y)$

$1 - 0.7 = 0.3$ bits

$0$ bits

Correction:

0.7

bits

Mutual information is symmetric: $I(X;Y) = I(Y;X)$ . The information $Y$ provides about $X$ equals the information $X$ provides about $Y$ .

Conditional mutual information

The mutual information between $X$ and $Y$ when $Z$ is known: $I(X;Y|Z) = H(X|Z) - H(X|Y,Z)$ . Can be larger or smaller than $I(X;Y)$ — conditioning on a third variable can increase or decrease mutual information.

Related: Mutual information

Mutual Information