Non-Causal State Information: The Gel'fand-Pinsker Theorem
What If the Encoder Knows the Future?
Suppose the encoder knows the entire state sequence before transmission begins β non-causal state information. This may seem unrealistic at first, but it arises naturally in several important settings:
- Broadcast channels: the transmitter knows the messages for all users, so the signal intended for user 2 is "state" known non-causally when encoding user 1's message.
- Precoding: in MIMO downlink, the base station generates the interference and therefore knows it perfectly in advance.
- Digital watermarking: the host signal is known to the embedder.
The Gel'fand-Pinsker theorem gives the capacity for this setting, and its Gaussian specialization (Costa's theorem) reveals the astonishing result that known interference can be canceled for free.
Theorem: The Gel'fand-Pinsker Theorem
The capacity of a DMC with state where the state is known non-causally at the encoder and not at the decoder is
where the maximization is over all joint distributions with , and the mutual information is computed with , .
The formula has a natural interpretation: is the total information the decoder gets, and is the "cost" of correlating the auxiliary codeword with the state. The encoder must embed the message into a codeword that is jointly typical with (covering), and the decoder must recover from (packing). The net rate is the packing capacity minus the covering cost.
Achievability β random binning
Generate codewords i.i.d. . Assign them uniformly to bins, each containing codewords.
Encoding: Given message and state , find a codeword in bin that is jointly typical with . By the covering lemma, this succeeds if . Set for each .
Achievability β decoding
Decoding: The decoder observes and finds the unique that is jointly typical with . By the packing lemma, this succeeds if .
Combining the rate constraints
From the covering constraint: . From the packing constraint: . Eliminating : .
Converse
The converse uses Fano's inequality and the identification of the auxiliary . Standard single-letterization arguments yield .
Random binning
A coding technique where codewords are randomly partitioned into bins. Used in the Gel'fand-Pinsker theorem (and Slepian-Wolf coding) to handle side information. The encoder selects a codeword from the correct bin that is compatible with the state.
Related: Channel state information (CSI)
Gel'fand-Pinsker Random Binning
The Packing-Covering Duality
The Gel'fand-Pinsker theorem beautifully illustrates a duality that pervades information theory:
- Packing (channel coding): fit as many non-overlapping codewords as possible. Rate limited by .
- Covering (source coding / binning): find a codeword jointly typical with the state. Rate cost .
The net rate is packing minus covering β exactly the same structure as Wyner-Ziv (lossy source coding with side information at the decoder) but with the roles of encoder and decoder reversed.
Historical Note: Gel'fand and Pinsker's 1980 Result
Simon Gel'fand and Mark Pinsker published their coding theorem for channels with non-causal state information in 1980, in a Russian journal. The result was not widely known in the Western literature until it was independently rediscovered and popularized by Heegard and El Gamal (1983) and Costa (1983). The Gaussian specialization by Costa brought the result to prominence and connected it to practical precoding problems.
It is one of those results where the information-theoretic insight preceded practical application by decades: dirty paper coding (DPC) became the theoretical foundation for MIMO broadcast channel capacity only in the early 2000s, with the work of Weingarten, Steinberg, and Shamai.
Common Mistake: The Auxiliary Is Not the Input
Mistake:
Setting in the Gel'fand-Pinsker formula, which gives and can be negative.
Correction:
The auxiliary is a design variable that can depend on both and . In Costa's theorem, the optimal choice is , which couples the codeword with the state. Setting ignores the state knowledge and is generally suboptimal. The optimization over is the heart of the Gel'fand-Pinsker theorem.
Quick Check
In the Gel'fand-Pinsker formula , what does the term represent?
The rate needed for the encoder to learn the state
The rate cost of finding a codeword jointly typical with the state (covering)
The mutual information between the state and the output
The penalty for not knowing the state at the decoder
The term is the rate overhead for the binning operation: we need codewords per bin so that at least one is jointly typical with . This "covering cost" is subtracted from the "packing capacity" to give the net communication rate.
Computational Complexity of Random Binning
The Gel'fand-Pinsker achievability proof relies on random binning: the encoder searches through codewords per bin to find one jointly typical with the state . In the worst case, this is an exponential-time search.
Practical implementations avoid exhaustive search:
- Nested lattice codes (Erez-Shamai-Zamir, 2005) replace random binning with algebraic structure, enabling polynomial-time encoding.
- Tomlinson-Harashima precoding uses modulo arithmetic as a practical approximation with encoding complexity.
- LDPC-based DPC uses syndrome coding to approximate binning, with iterative encoding in per iteration.
The gap between the information-theoretic optimum (random binning) and practical structured codes is typically 1-2 dB for DPC implementations.
- β’
Random binning: exponential search over codewords
- β’
Nested lattice codes: for lattice reduction
- β’
THP/linear precoding: for users, symbols