Dirty-Paper Coding and Lattice Precoding
Writing on Dirty Paper: Costa's Surprise
Imagine you want to send a message across a noisy channel where the output is corrupted by a large known interference — known only to the transmitter before it encodes: If were random and unknown, the capacity would drop by — interference reduces the effective SNR in the obvious way. If were known at both ends, the transmitter could simply subtract it: yields , the clean-channel capacity, at a power cost (since now has higher second moment).
The question Costa asked in 1983 was the intermediate case: what if is known only at the transmitter, causally in advance, and the transmitter has a power constraint (independent of )? The naïve "subtract it" trick fails because then has power , blowing the power budget.
Costa proved — to the surprise of the information-theory community — that the capacity is unchanged from the clean channel: Interference can be cancelled for free, in the rate sense, if it is known non-causally at the transmitter. Costa called this "writing on dirty paper": the dirt (interference) does not affect the amount of information that can be written.
Costa's proof used Gaussian random coding with a clever auxiliary variable . The lattice implementation — due to Erez, Shamai, and Zamir (2005) — replaces Costa's Gaussian random codebook with a nested lattice scheme almost identical to the one of s02. The MMSE scalar reappears, this time absorbing the interference via a pre-modulo at the transmitter. The proof pattern ("crypto-lemma
- MMSE + mod-") is the same pattern from s02, now applied to cancel rather than to achieve capacity. One trick, three applications — a recurring motif in Zamir's theory of lattice networks.
Definition: Dirty-Paper Channel
Dirty-Paper Channel
The dirty-paper (DPC) channel is where is the transmit signal with power constraint , is the interference (known non-causally at the transmitter but not at the receiver), and is the AWGN.
The capacity is where — independent of , for every .
The i.i.d.-Gaussian version of this channel ( drawn i.i.d. ) is Costa's original model. A deterministic (adversarial or arbitrary) gives the same capacity, as shown by Erez–Shamai–Zamir via the lattice construction below. Both "known Gaussian " and "known arbitrary " are subsumed by the same rate.
Lattice DPC Encoder and Decoder (Erez–Shamai–Zamir 2005)
Complexity: Identical to mod- (s02) plus one mod- reduction to absorb the interference. Encoder (given a fast quantiser); decoder dominated by the closest-point search.The transmitter's key trick is step 2: subtract from the intended codeword before the mod- reduction. The modulo then absorbs whatever integer multiple of the interference represents, leaving a transmit signal whose second moment depends only on the Voronoi region — not on . The interference has been pre-cancelled in a way that respects the power constraint.
Theorem: Lattice DPC Achieves the Clean-Channel Capacity
For every and every , there exist nested lattices and a dither such that the lattice-DPC scheme above achieves rate on the dirty-paper channel with average error probability , for every interference (deterministic or random, i.i.d.~Gaussian or arbitrary) and every power budget for the transmit signal — independently of .
The proof is the s02 proof, with one extra line: the mod- at the transmitter kills the interference. The receiver's MMSE scaling and mod- are identical. Crucially, the shared quantity between encoder and decoder is the dither , not the interference ; the receiver does not need to know , only to perform the MMSE scaling by and the mod- — two operations that do not depend on .
Show that after step 2, is uniform on and independent of (by the crypto-lemma).
Compute the receiver's processed signal and show it equals , i.e., the same effective signal as in s02 — with absent!
Apply the Erez–Zamir achievability argument to conclude that rate is achievable.
Step 1: transmit signal is uniform
The transmit signal . By the crypto-lemma, shifting the dither (which is uniform on ) by and reducing mod- yields a uniform distribution on , independent of both and . Its per-dimension second moment is — the power constraint is satisfied with equality, for every .
Step 2: receiver processing cancels $\mathbf{s}$
The receiver computes . Expanding: Substituting : Simplifying the terms: , which does NOT cancel — but wait. Re-do more carefully: the transmit signal , so Now replace inside the mod using for some (by the mod- representation of ). Then , where in general, so we cannot drop it. But , so adding to gives , and the residual can be absorbed by mod- (since mod is one of the nested-code cosets). The careful bookkeeping (see Erez–Shamai–Zamir §III) shows the result:
Step 3: reduce to s02 effective channel
The right-hand side is exactly the effective signal of the mod- scheme (s02 Step 2 of the Erez–Zamir proof), with the same effective noise of per-dimension variance . The interference does not appear. So any rate achievable on the clean channel is also achievable on the dirty-paper channel, by the same random-lattice argument as s02.
Step 4: conclusion
The capacity of the dirty-paper channel is at least the clean-channel capacity. Since the channel output contains additional uncertainty (the interference ), the capacity cannot exceed the clean-channel capacity either. Hence equality: achievable by the lattice-DPC scheme.
DPC Capacity vs Naive vs Clean: The Costa Surprise
Three rate curves as a function of interference power (dB), with signal power and noise variance fixed: (i) the clean-channel capacity , independent of ; (ii) the DPC capacity, which by Costa's theorem EQUALS the clean capacity regardless of ; (iii) the naïve uncoded rate, which treats as additional noise and drops to as . The gap between curves (ii) and (iii) is the dB benefit of knowing the interference at the transmitter — arbitrarily large in the high-interference regime.
Parameters
Example: DPC with at dB
Consider the dirty-paper channel with signal power , noise variance (so dB). Compare the rates achievable under the three strategies — clean channel (no interference), DPC (interference known to Tx), and naïve (interference treated as extra noise) — for dB (moderate interference) and dB (weak interference).
Case $S = 10$ dB ($\|\mathbf{s}\|^2 = 1$)
Clean: bits/real dim. DPC: bits/real dim — exactly the same, despite the interference! Naïve: bits/real dim.
The interference costs bits/real dim (i.e., ) under the naïve strategy; DPC loses nothing.
Case $S = -10$ dB ($\|\mathbf{s}\|^2 = 0.1$)
Clean: bits/real dim. DPC: bits/real dim — same again. Naïve: bits/real dim.
At low interference, the naïve strategy is within bits/real dim of the DPC rate. The two strategies agree in the limit ; as , DPC's advantage grows without bound.
Operational interpretation
The key takeaway: DPC's benefit over the naïve strategy is exactly , the "interference term" that the receiver alone cannot disambiguate. At high , the benefit grows arbitrarily — explaining why multi-user MIMO schemes (which are interference-limited at high ) rely critically on DPC-like precoding.
Tomlinson–Harashima Precoding: The Scalar Cousin of DPC
The scalar version of lattice DPC is Tomlinson–Harashima precoding (THP), developed independently by Tomlinson (Australia, 1971) and Harashima–Miyakawa (Japan, 1972) for channels with known ISI. The THP encoder, for a channel with impulse response and known past symbols, computes where is the intended PAM/QAM symbol, is the constellation extent, and the mod is a per-sample mod over the D lattice . THP achieves (approximately) the same-rate cancellation of ISI as the Erez–Shamai–Zamir lattice DPC, at the cost of the dB "precoding shaping loss" that is recovered only in the true lattice scheme with a non-cubic .
THP has been fielded in every modern DSL standard (V.34, V.90, V.92, ADSL, VDSL, G.fast), in -Gigabit-Ethernet backhaul transceivers, and in MU-MIMO base-station vector precoding (see below). The scalar THP is a 1D special case of the lattice DPC of this section — history recognised THP through the THP-lattice connection only after Erez–Shamai–Zamir reframed it in 2005.
- •
THP requires perfect knowledge of the ISI channel at the transmitter — in practice acquired via channel training or feedback
- •
THP shaping loss is dB (cubic shaping); can be recovered via non-cubic trellis shaping (V.34)
- •
For MU-MIMO, vector THP must coordinate precoding across all users' data streams — higher complexity than per-user MMSE
MU-MIMO Vector Perturbation: Lattice DPC in the Base Station
A multi-user MIMO base station transmitting to users has a problem structurally identical to DPC: when serving user , the signals destined for users are known interference at the transmitter (the base station knows all users' data) but unknown at user 's receiver. Dirty-paper precoding, applied sequentially across users, achieves the full sum-capacity of the MU-MIMO broadcast channel (Weingarten– Steinberg–Shamai 2006).
The practical implementation is vector perturbation precoding (Hochwald–Peel–Swindlehurst 2005): for each user, add a lattice perturbation from to minimise the transmit power under the interference-cancellation constraint. The per-symbol search for the optimal perturbation is a CLP problem (s05), solved by a sphere decoder. 5G NR's MU-MIMO precoding in FR1 currently uses linear precoding (much cheaper but suboptimal by up to – dB at high sum-rates); the jump to vector perturbation or THP for MU-MIMO is an active area of both research and standards discussion (3GPP Rel-18+).
- •
Sphere decoder at the base station scales poorly beyond simultaneous users
- •
Channel feedback overhead grows linearly in , with quantisation error directly contributing to the shaping loss
- •
Lower-complexity approximations (Zero-Forcing + THP hybrid, lattice-reduction aided) are the standards-ready alternatives
Historical Note: Costa (1983): 'Writing on Dirty Paper'
1983Max H. M. Costa's 1983 paper is one of the shortest famous results in information theory — five pages in IEEE Trans. IT, with a single theorem and a proof that fits on one page. Costa was then a Bell Labs post-doc, working on an internal open problem posed by the founder of network information theory, Te Sun Han.
Costa's proof used a Gel'fand–Pinsker auxiliary variable and showed that the mutual information is maximised over the choice of at exactly the clean-channel capacity. The MMSE coefficient that appeared in Costa's optimisation is the same coefficient that reappears in Erez–Zamir (2004) and Erez–Shamai–Zamir (2005) for the lattice realisation — history recognising the lattice connection only 22 years later.
Costa originally titled the paper "On the capacity of the AWGN channel with side information" — the editors insisted on a more evocative title. "Writing on Dirty Paper" became the name of the result and, by extension, the entire sub-field. The paper was cited fewer than times in its first decade; in the wireless era it has become one of the most-cited information-theory results of all time (7000+ citations as of 2025).
Common Mistake: DPC Requires NON-CAUSAL Interference Knowledge
Mistake:
Assuming that causal knowledge of the interference ("I know today's before I transmit today's ") is enough for the DPC capacity. It is not: Costa's theorem requires non-causal knowledge — the transmitter must know the entire interference sequence (across all future time) before encoding the block.
Correction:
For ordinary causal feedback (" known at encoder before "), the capacity is lower than Costa's: it is the "DPC with causal side information" rate, which is strictly below when is AWGN and is characterised by the Gel'fand–Pinsker formula with a different auxiliary. In practical wireless systems, the "non-causal" assumption holds per coded block: the base station knows the complete interference vector for a block before encoding the whole block, and the DPC rate applies to that block. For streaming (symbol-by-symbol) applications, only causal DPC is available and the rate is lower.
Why This Matters: From DPC to 5G MU-MIMO
The connection between lattice DPC and modern MU-MIMO precoding runs through the broadcast-channel capacity theorem: a base station with antennas serving users achieves sum-capacity via DPC over the sequential encoding order of the users. For each user, the signals of earlier-encoded users are known interference — the DPC channel of this section. The sum-capacity was proved achievable by sequential DPC by Weingarten–Steinberg–Shamai (2006), and the practical implementation uses lattice-DPC-like vector-perturbation precoders (Hochwald–Peel–Swindlehurst 2005).
Forward-link to Ch. 17 (LAST codes, MIMO book): the LAST codes of El Gamal, Caire, and Damen (2004) use a matrix-valued MMSE- GDFE that generalises the scalar of DPC to a full matrix. The Erez–Shamai–Zamir dirty-paper construction is the scalar special case; LAST codes are the matrix version.
Dirty-paper coding (DPC)
Coding for the channel with non-causal transmitter knowledge of the interference . Costa (1983) proved the capacity equals , independent of . Practical realisation via nested lattice codes (Erez–Shamai–Zamir 2005).
Related: Tomlinson–Harashima precoding, Costa (1983): 'Writing on Dirty Paper', Interference Cancellation
Tomlinson–Harashima precoding
Scalar (D) version of lattice DPC for known ISI: pre-cancel the interference modulo a scaled integer lattice . Incurs dB shaping loss relative to the full lattice scheme. Deployed in V.34, ADSL, VDSL, G.fast, MU-MIMO.
Related: Lattice DPC Achieves the Clean-Channel Capacity, Costa (1983): 'Writing on Dirty Paper', Isi
Quick Check
The dirty-paper channel has transmitter knowledge of but not receiver knowledge. The capacity is:
, i.e., interference counted as extra noise
, independent of
, i.e., combined signal+interference power
when
Correct. Costa (1983) proved that with non-causal knowledge of at the transmitter (and power constraint on , unrelated to ), the capacity equals the clean-channel . The interference can be cancelled for free in the rate sense.
Key Takeaway
Costa (1983) proved that a channel with interference known non-causally at the transmitter has the same capacity as the clean channel — , for any interference power. The lattice realisation of Erez, Shamai, and Zamir (2005) achieves this capacity with a single extra mod- at the transmitter (to pre-cancel within the Voronoi power region), reusing the Erez–Zamir MMSE scalar and crypto-lemma of s02 verbatim. This "proof pattern" (crypto-lemma + MMSE + mod-) is the backbone of DPC, MU-MIMO vector precoding, and — in matrix form — the LAST codes of Ch. 17. The scalar realisation has been fielded as Tomlinson–Harashima precoding in every major DSL and xDSL-derived standard since 1994.