Writing on Dirty Paper (Costa's Theorem)

The Most Surprising Theorem in Multiuser Information Theory

Consider a Gaussian channel corrupted by interference SS that is known at the encoder but not at the decoder:

Y=X+S+Z,SN(0,Q),ZN(0,N).Y = X + S + Z, \quad S \sim \mathcal{N}(0, Q), \quad Z \sim \mathcal{N}(0, N).

The encoder knows SnS^n non-causally and must satisfy E[X2]P\mathbb{E}[X^2] \leq P. What is the capacity?

The naive answer would be 12log(1+P/(Q+N))\frac{1}{2}\log(1 + P/(Q + N)) (treat SS as noise) or perhaps something between this and 12log(1+P/N)\frac{1}{2}\log(1 + P/N) (the interference-free capacity). Costa's astonishing result is that the capacity equals 12log(1+P/N)\frac{1}{2}\log(1 + P/N)the interference can be completely eliminated, regardless of its power QQ, even though the decoder knows nothing about SS.

This is like writing a message on paper that already has ink stains (the "dirt" SS). If you know where the stains are, you can write around them so that the reader sees only your message — even though the reader cannot distinguish your ink from the stains.

Theorem: Costa's Dirty Paper Coding Theorem

For the Gaussian channel Y=X+S+ZY = X + S + Z with SN(0,Q)S \sim \mathcal{N}(0, Q) known non-causally at the encoder, ZN(0,N)Z \sim \mathcal{N}(0, N) unknown, and E[X2]P\mathbb{E}[X^2] \leq P, the capacity is

C=12log ⁣(1+PN),C = \frac{1}{2}\log\!\left(1 + \frac{P}{N}\right),

the same as if the interference SS were absent. The capacity-achieving auxiliary variable is U=X+αSU = X + \alpha^* S with

α=PP+N.\alpha^* = \frac{P}{P + N}.

The encoder does not subtract SS from its transmission (that would cost power). Instead, it embeds the message into a codeword UnU^n that is designed jointly with SnS^n. The optimal α\alpha^* is the MMSE coefficient for estimating XX from YY — it ensures that UU and ZZ are independent given YY, which means the state SS becomes "invisible" to the decoder.

The point is that the encoder uses its knowledge of SS not to cancel it (which would waste power) but to structure the codebook so that the interference falls in the "null space" of the decoding operation.

Key Takeaway

Costa's theorem says that interference known at the encoder is as good as no interference at all — the capacity is 12log(1+P/N)\frac{1}{2}\log(1 + P/N) regardless of the interference power QQ. The encoder does not cancel the interference (that would waste power) but rather codes around it. This is the information-theoretic foundation for dirty paper coding (DPC) and multiuser MIMO precoding.

Dirty paper coding (DPC)

An encoding technique for channels with non-causal state information, based on Costa's theorem. The encoder structures the codebook jointly with the known interference so that the decoder sees an interference-free channel. Achieves the capacity of the MIMO broadcast channel.

Related: Channel state information (CSI)

Channel capacity

The supremum of achievable rates for reliable communication over a noisy channel. For the AWGN channel: C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}) bits per channel use.

Related: AWGN channel, Signal-to-noise ratio (SNR)

Dirty Paper Coding: The Key Insight

Compares three scenarios for the channel Y=X+S+ZY = X + S + Z: no state information, DPC (non-causal at encoder), and no interference. Costa's remarkable result is that DPC achieves the interference-free capacity 12log(1+P/N)\frac{1}{2}\log(1 + P/N) regardless of how strong the interference SS is.

Example: Verifying Costa's Formula

For the dirty paper channel Y=X+S+ZY = X + S + Z with P=10P = 10, Q=100Q = 100, N=1N = 1, verify that the optimal α\alpha^* gives C=12log(1+10)C = \frac{1}{2}\log(1 + 10) regardless of the large interference power Q=100Q = 100.

Dirty Paper Coding: Capacity vs. Interference Power

Costa's theorem predicts that the DPC capacity equals the interference-free capacity 12log(1+P/N)\frac{1}{2}\log(1 + P/N) regardless of QQ. Compare with: no state information (12log(1+P/(Q+N))\frac{1}{2}\log(1 + P/(Q+N))), optimal causal CSI, and the interference-free bound.

Parameters
10
1
100

Why α=P/(P+N)\alpha^* = P/(P+N)?

The optimal Costa parameter α=P/(P+N)\alpha^* = P/(P+N) is the MMSE estimation coefficient for estimating XX from Y=X+ZY = X + Z (ignoring SS). This is not a coincidence.

Intuitively, U=X+αSU = X + \alpha^* S is constructed so that the decoder's MMSE estimate of UU from YY is also the MMSE estimate of XX from YY (up to a scaling). The state SS appears in UU precisely to the extent that it helps the decoder — no more, no less. If α\alpha were too large, UU would be "too correlated" with SS, and the covering cost I(U;S)I(U; S) would dominate. If α\alpha were too small, the residual interference (1α)S(1-\alpha)S in YY would reduce I(U;Y)I(U; Y).

The MMSE coefficient achieves the perfect balance between these two effects.

Why This Matters: DPC and the MIMO Broadcast Channel

Dirty paper coding is the information-theoretic key to the MIMO broadcast channel (BC). When a base station transmits to KK users simultaneously, each user's signal acts as interference for the others. Since the base station generates all signals, it knows this interference non-causally.

The capacity region of the Gaussian MIMO BC is achieved by DPC (Weingarten, Steinberg, Shamai, 2006): the transmitter encodes each user's message treating all previously encoded users' signals as known interference, coding around them via DPC. The result is that multiuser interference does not reduce the sum capacity — exactly Costa's insight, extended to the vector channel.

In practice, DPC is approximated by linear precoding techniques (zero-forcing, regularized ZF, Tomlinson-Harashima precoding). See Chapter 18 for the full treatment of the MIMO BC capacity and Book telecom, Ch. 17 for practical multiuser MIMO.

🎓CommIT Contribution(2023)

ISAC Capacity-Distortion Tradeoff

F. Liu, G. CaireIEEE Trans. Inform. Theory, vol. 69, no. 9

Liu and Caire studied the fundamental tradeoff between communication rate and sensing distortion when a transmitter must simultaneously communicate to a receiver and sense a target. The channel model extends the state-dependent framework of this chapter: the "state" SS represents the target parameter to be estimated, and the transmitted signal serves dual purposes.

The key result establishes the capacity-distortion region {(R,D)}\{(R, D)\} for the Gaussian ISAC channel, showing that there is an inherent tension between communication rate and sensing accuracy. For the Gaussian case, the tradeoff is characterized by a water-filling-like power allocation between communication and sensing modes.

ISACcapacity-distortiondual functionView Paper →
⚠️Engineering Note

Practical Approximations to DPC

True DPC requires non-linear encoding (structured binning) and is computationally prohibitive for real-time implementation. Practical systems use linear approximations:

  • Zero-forcing (ZF) precoding: nulls the interference exactly at each user. Simple but suboptimal at low SNR.
  • Regularized ZF (MMSE precoding): adds a regularization term that balances interference cancellation and noise enhancement. Near-optimal at moderate SNR.
  • Tomlinson-Harashima precoding (THP): a non-linear scheme based on modulo arithmetic that captures some of the DPC gain.
  • Vector perturbation: searches for a lattice shift that minimizes transmit power. Approaches DPC performance at high complexity.

The gap between linear precoding and DPC is typically 1-3 dB, depending on the number of users and the channel condition.

Practical Constraints
  • ZF precoding requires full channel knowledge at the transmitter

  • THP adds 0.5-1.0 dB over linear precoding at moderate complexity

  • Full DPC is implementable only for 2-3 users in practice

Common Mistake: DPC Does Not Subtract the Interference

Mistake:

Thinking that dirty paper coding works by having the encoder subtract the known interference SS from the transmitted signal XX.

Correction:

Subtracting SS would require transmit power proportional to QQ (the interference power), which violates the power constraint when QQ is large. Instead, DPC structures the codebook jointly with SS using the auxiliary U=X+αSU = X + \alpha^* S. The encoder does not cancel SS directly — it codes around it. The transmitted signal XX is independent of SS and satisfies E[X2]=P\mathbb{E}[X^2] = P.

Quick Check

For the dirty paper channel Y=X+S+ZY = X + S + Z with P=1P = 1, Q=1000Q = 1000, N=1N = 1, and SnS^n known non-causally at the encoder, the capacity is:

12log(1+1/1001)0\frac{1}{2}\log(1 + 1/1001) \approx 0 bits

12log(1+1)=0.5\frac{1}{2}\log(1 + 1) = 0.5 bits

12log(1+1/1)=1\frac{1}{2}\log(1 + 1/1) = 1 bit

12log(1+1001)5\frac{1}{2}\log(1 + 1001) \approx 5 bits

Historical Note: Costa's 1983 Paper

Max Costa published "Writing on Dirty Paper" in 1983, two years after Gel'fand and Pinsker's general result but independently motivated by the Gaussian problem. Costa's key insight was the choice U=X+αSU = X + \alpha S with α=P/(P+N)\alpha = P/(P+N) — a seemingly simple substitution that yields the powerful result that interference is free to cancel.

The paper initially attracted moderate attention. It was only in the early 2000s, when Caire and Shamai (2003) and Weingarten, Steinberg, and Shamai (2006) connected DPC to the MIMO broadcast channel capacity, that the result's full significance became clear. Today, DPC is recognized as one of the most important results in multiuser information theory.