Mutual Information
The Central Quantity in Information Theory
If entropy measures the uncertainty of a single random variable, we now ask: how much does one random variable tell us about another? This quantity β mutual information β turns out to be the single most important concept in information theory. It determines the capacity of channels, the limits of compression with side information, and the fundamental tradeoffs in multiuser communication. Everything in this book ultimately revolves around computing, bounding, or optimizing mutual information.
Definition: Mutual Information
Mutual Information
The mutual information between discrete random variables and is
Equivalently:
The first expression shows that β mutual information is the KL divergence between the joint distribution and the product of the marginals. It measures how far is from independence. The second set of expressions connect mutual information to entropy and conditional entropy.
Mutual information
The reduction in uncertainty about due to observing (or equivalently, about due to observing ): . Always non-negative. Equals zero iff and are independent. Symmetric: .
Related: Entropy, Conditional entropy, Kullback-Leibler divergence
Theorem: Properties of Mutual Information
For discrete random variables and :
- Symmetry: .
- Non-negativity: .
- Self-information: .
- Independence: if and only if .
- Upper bounds: .
Property 1 says information flow is symmetric β tells us as much about as tells us about . Property 3 says a variable is maximally informative about itself. Property 5 says we cannot learn more about from than the total uncertainty in .
Symmetry
From the definition: .
Alternatively, is symmetric in the roles of and in the summation.
Non-negativity
We have (as shown in Section 1.4). Non-negativity of KL divergence β the information inequality β gives .
Self-information and independence
.
iff iff iff .
Upper bounds
since . By symmetry, . Together: .
Definition: Conditional Mutual Information
Conditional Mutual Information
The conditional mutual information of and given is
It measures the average information that provides about when is already known.
Theorem: Chain Rule for Mutual Information
$
The total information that the collection provides about equals the sum of the incremental contributions of each , given the previous variables. This telescoping structure is the same pattern as the entropy chain rule β and it reappears in the converse proof of every channel coding theorem in this book.
Apply entropy chain rules
X^{i-1}(X_1, \ldots, X_{i-1})$.
Example: Mutual Information of the Binary Symmetric Channel
For the binary symmetric channel (BSC) with crossover probability and uniform input , compute .
Using the definition
From the example in Section 1.2:
When : bit (perfect channel, determines ).
When : (useless channel, independent of ).
When : bit (deterministic bit-flip, still determines ).
Interpretation
The mutual information is precisely the capacity of the BSC (as we will prove in Chapter 9). The point is that mutual information quantifies how much useful information passes through the channel per use β and for the BSC with uniform input, this equals the capacity.
BSC Mutual Information
Visualize the mutual information for a binary symmetric channel as a function of the input bias and crossover probability . The capacity (maximum over ) is achieved at .
Parameters
BSC crossover probability
P(X=1)
Why This Matters: Mutual Information and Channel Capacity
The channel capacity is β the maximum mutual information over all input distributions. This is the fundamental theorem of channel coding (Chapter 9). For the AWGN channel, this yields , which is the starting point for all modern wireless system design. For MIMO channels, the capacity involves optimizing the input covariance matrix, leading to waterfilling across spatial modes (see Book telecom, Ch. 15).
Quick Check
If bits, what is ?
bits
Cannot be determined without knowing and
bits
bits
Mutual information is symmetric: . The information provides about equals the information provides about .
Conditional mutual information
The mutual information between and when is known: . Can be larger or smaller than β conditioning on a third variable can increase or decrease mutual information.
Related: Mutual information