Dirty-Paper Coding and Lattice Precoding

Writing on Dirty Paper: Costa's Surprise

Imagine you want to send a message across a noisy channel where the output is corrupted by a large known interference s\mathbf{s} — known only to the transmitter before it encodes: y  =  x+s+w.\mathbf{y} \;=\; \mathbf{x} + \mathbf{s} + \mathbf{w}. If s\mathbf{s} were random and unknown, the capacity would drop by 12log2(1+SNRs)\tfrac12 \log_2(1 + \text{SNR}_\mathbf{s}) — interference reduces the effective SNR in the obvious way. If s\mathbf{s} were known at both ends, the transmitter could simply subtract it: x:=x0s\mathbf{x} := \mathbf{x}_0 - \mathbf{s} yields y=x0+w\mathbf{y} = \mathbf{x}_0 + \mathbf{w}, the clean-channel capacity, at a power cost (since x\mathbf{x} now has higher second moment).

The question Costa asked in 1983 was the intermediate case: what if s\mathbf{s} is known only at the transmitter, causally in advance, and the transmitter has a power constraint E[x2]/nP\mathbb{E}[\|\mathbf{x}\|^2]/n \le P (independent of s\mathbf{s})? The naïve "subtract it" trick fails because then x=x0s\mathbf{x} = \mathbf{x}_0 - \mathbf{s} has power P+s2/nP + \|\mathbf{s}\|^2/n, blowing the power budget.

Costa proved — to the surprise of the information-theory community — that the capacity is unchanged from the clean channel: CDPC  =  12log2(1+SNR)regardless of s2.C_{\text{DPC}} \;=\; \tfrac12 \log_2(1 + \text{SNR}) \quad \text{regardless of } \|\mathbf{s}\|^2. Interference can be cancelled for free, in the rate sense, if it is known non-causally at the transmitter. Costa called this "writing on dirty paper": the dirt (interference) does not affect the amount of information that can be written.

Costa's proof used Gaussian random coding with a clever auxiliary variable U=x+αsU = \mathbf{x} + \alpha \mathbf{s}. The lattice implementation — due to Erez, Shamai, and Zamir (2005) — replaces Costa's Gaussian random codebook with a nested lattice scheme almost identical to the one of s02. The MMSE scalar α\alpha reappears, this time absorbing the interference via a pre-modulo at the transmitter. The proof pattern ("crypto-lemma

  • MMSE + mod-Λ\Lambda") is the same pattern from s02, now applied to cancel s\mathbf{s} rather than to achieve capacity. One trick, three applications — a recurring motif in Zamir's theory of lattice networks.
,

Definition:

Dirty-Paper Channel

The dirty-paper (DPC) channel is y  =  x+s+w,\mathbf{y} \;=\; \mathbf{x} + \mathbf{s} + \mathbf{w}, where x\mathbf{x} is the transmit signal with power constraint E[x2]/nP\mathbb{E}[\|\mathbf{x}\|^2]/n \le P, sRn\mathbf{s} \in \mathbb{R}^n is the interference (known non-causally at the transmitter but not at the receiver), and wN(0,σ2I)\mathbf{w} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}) is the AWGN.

The capacity is CDPC(SNR,s2)  =  12log2(1+SNR),C_{\text{DPC}}(\text{SNR}, \|\mathbf{s}\|^2) \;=\; \tfrac12 \log_2(1 + \text{SNR}), where SNR=P/σ2\text{SNR} = P/\sigma^2independent of s2\|\mathbf{s}\|^2, for every s\mathbf{s}.

The i.i.d.-Gaussian version of this channel (s\mathbf{s} drawn i.i.d. N(0,S)\mathcal{N}(0, S)) is Costa's original model. A deterministic (adversarial or arbitrary) s\mathbf{s} gives the same capacity, as shown by Erez–Shamai–Zamir via the lattice construction below. Both "known Gaussian s\mathbf{s}" and "known arbitrary s\mathbf{s}" are subsumed by the same rate.

,

Lattice DPC Encoder and Decoder (Erez–Shamai–Zamir 2005)

Complexity: Identical to mod-Λ\Lambda (s02) plus one mod-Λs\Lambda_s reduction to absorb the interference. Encoder O(n)O(n) (given a fast Λs\Lambda_s quantiser); decoder dominated by the Λc\Lambda_c closest-point search.
Setup. Nested lattice pair ΛsΛc\Lambda_s \subset \Lambda_c with
Voronoi-shaped Λs\Lambda_s of second moment PP, MMSE coefficient
α=P/(P+σ2)=SNR/(1+SNR)\alpha = P/(P + \sigma^2) = \text{SNR}/(1 + \text{SNR}),
shared dither d\mathbf{d} uniform on V(Λs)\mathcal{V}(\Lambda_s).
Encoder (given message uu and known interference s\mathbf{s}):
1. c(u)\mathbf{c}(u) \leftarrow fine-lattice codeword of uu.
2. v[c(u)αsd]modΛs\mathbf{v} \leftarrow [\mathbf{c}(u) - \alpha \mathbf{s} - \mathbf{d}] \bmod \Lambda_s.
3. Transmit xv\mathbf{x} \leftarrow \mathbf{v}.
Channel: y=x+s+w\mathbf{y} = \mathbf{x} + \mathbf{s} + \mathbf{w}.
Decoder:
4. y[αy+d]modΛs\mathbf{y}' \leftarrow [\alpha \mathbf{y} + \mathbf{d}] \bmod \Lambda_s.
5. c^QΛc(y)\hat{\mathbf{c}} \leftarrow Q_{\Lambda_c}(\mathbf{y}');
u^c1(c^)\hat{u} \leftarrow \mathbf{c}^{-1}(\hat{\mathbf{c}}).

The transmitter's key trick is step 2: subtract αs\alpha \mathbf{s} from the intended codeword before the mod-Λs\Lambda_s reduction. The modulo then absorbs whatever integer multiple of Λs\Lambda_s the interference represents, leaving a transmit signal whose second moment depends only on the Voronoi region V(Λs)\mathcal{V}( \Lambda_s) — not on s2\|\mathbf{s}\|^2. The interference has been pre-cancelled in a way that respects the power constraint.

Theorem: Lattice DPC Achieves the Clean-Channel Capacity

For every R<12log2(1+SNR)R < \tfrac12 \log_2(1 + \text{SNR}) and every ε>0\varepsilon > 0, there exist nested lattices (Λc,Λs)(\Lambda_c, \Lambda_s) and a dither d\mathbf{d} such that the lattice-DPC scheme above achieves rate RR on the dirty-paper channel y=x+s+w\mathbf{y} = \mathbf{x} + \mathbf{s} + \mathbf{w} with average error probability Pe<εP_e < \varepsilon, for every interference s\mathbf{s} (deterministic or random, i.i.d.~Gaussian or arbitrary) and every power budget PP for the transmit signal — independently of s2\|\mathbf{s}\|^2.

The proof is the s02 proof, with one extra line: the mod-Λs\Lambda_s at the transmitter kills the interference. The receiver's MMSE scaling and mod-Λs\Lambda_s are identical. Crucially, the shared quantity between encoder and decoder is the dither d\mathbf{d}, not the interference s\mathbf{s}; the receiver does not need to know s\mathbf{s}, only to perform the MMSE scaling by α\alpha and the mod-Λs\Lambda_s — two operations that do not depend on s\mathbf{s}.

, ,

DPC Capacity vs Naive vs Clean: The Costa Surprise

Three rate curves as a function of interference power s2=S\|\mathbf{s}\|^2 = S (dB), with signal power PP and noise variance σ2\sigma^2 fixed: (i) the clean-channel capacity 12log2(1+P/σ2)\tfrac12 \log_2(1 + P/\sigma^2), independent of SS; (ii) the DPC capacity, which by Costa's theorem EQUALS the clean capacity regardless of SS; (iii) the naïve uncoded rate, which treats s\mathbf{s} as additional noise and drops to 12log2(1+P/(S+σ2))0\tfrac12 \log_2(1 + P/(S + \sigma^2)) \to 0 as SS \to \infty. The gap between curves (ii) and (iii) is the dB benefit of knowing the interference at the transmitter — arbitrarily large in the high-interference regime.

Parameters
10

Example: DPC with s\mathbf{s} at ±10\pm 10 dB

Consider the dirty-paper channel y=x+s+w\mathbf{y} = \mathbf{x} + \mathbf{s} + \mathbf{w} with signal power P=1P = 1, noise variance σ2=0.1\sigma^2 = 0.1 (so SNR=10\text{SNR} = 10 dB). Compare the rates achievable under the three strategies — clean channel (no interference), DPC (interference s\mathbf{s} known to Tx), and naïve (interference treated as extra noise) — for s2/n=S=10\|\mathbf{s}\|^2/n = S = 10 dB (moderate interference) and S=10S = -10 dB (weak interference).

🚨Critical Engineering Note

Tomlinson–Harashima Precoding: The Scalar Cousin of DPC

The scalar version of lattice DPC is Tomlinson–Harashima precoding (THP), developed independently by Tomlinson (Australia, 1971) and Harashima–Miyakawa (Japan, 1972) for channels with known ISI. The THP encoder, for a channel with impulse response h[k]h[k] and known past symbols, computes x~k  =  [ck>0h[]x~k]mod2M,\tilde{x}_k \;=\; [c_k - \textstyle\sum_{\ell > 0} h[\ell] \tilde{x}_{k - \ell}] \bmod 2M, where ckc_k is the intended PAM/QAM symbol, 2M2M is the constellation extent, and the mod is a per-sample mod over the 11D lattice 2MZ2 M \mathbb{Z}. THP achieves (approximately) the same-rate cancellation of ISI as the Erez–Shamai–Zamir lattice DPC, at the cost of the 1.53\approx 1.53 dB "precoding shaping loss" that is recovered only in the true lattice scheme with a non-cubic Λs\Lambda_s.

THP has been fielded in every modern DSL standard (V.34, V.90, V.92, ADSL, VDSL, G.fast), in 100100-Gigabit-Ethernet backhaul transceivers, and in MU-MIMO base-station vector precoding (see below). The scalar THP is a 1D special case of the lattice DPC of this section — history recognised THP through the THP-lattice connection only after Erez–Shamai–Zamir reframed it in 2005.

Practical Constraints
  • THP requires perfect knowledge of the ISI channel at the transmitter — in practice acquired via channel training or feedback

  • THP shaping loss is 1.53\approx 1.53 dB (cubic shaping); can be recovered via non-cubic trellis shaping (V.34)

  • For MU-MIMO, vector THP must coordinate precoding across all users' data streams — higher complexity than per-user MMSE

📋 Ref: ITU V.34; ANSI T1.413 (ADSL); G.992 (VDSL); G.9701 (G.fast)
, ,
⚠️Engineering Note

MU-MIMO Vector Perturbation: Lattice DPC in the Base Station

A multi-user MIMO base station transmitting to KK users has a problem structurally identical to DPC: when serving user kk, the signals destined for users 1,,k11, \ldots, k-1 are known interference at the transmitter (the base station knows all users' data) but unknown at user kk's receiver. Dirty-paper precoding, applied sequentially across users, achieves the full sum-capacity of the MU-MIMO broadcast channel (Weingarten– Steinberg–Shamai 2006).

The practical implementation is vector perturbation precoding (Hochwald–Peel–Swindlehurst 2005): for each user, add a lattice perturbation from Λs=τZ2K\Lambda_s = \tau \mathbb{Z}^{2 K} to minimise the transmit power under the interference-cancellation constraint. The per-symbol search for the optimal perturbation is a CLP problem (s05), solved by a sphere decoder. 5G NR's MU-MIMO precoding in FR1 currently uses linear precoding (much cheaper but suboptimal by up to 3355 dB at high sum-rates); the jump to vector perturbation or THP for MU-MIMO is an active area of both research and standards discussion (3GPP Rel-18+).

Practical Constraints
  • Sphere decoder at the base station scales poorly beyond K16K \ge 16 simultaneous users

  • Channel feedback overhead grows linearly in KK, with quantisation error directly contributing to the shaping loss

  • Lower-complexity approximations (Zero-Forcing + THP hybrid, lattice-reduction aided) are the standards-ready alternatives

📋 Ref: 3GPP TR 38.821 (NR MU-MIMO study); Hochwald–Peel–Swindlehurst 2005

Historical Note: Costa (1983): 'Writing on Dirty Paper'

1983

Max H. M. Costa's 1983 paper is one of the shortest famous results in information theory — five pages in IEEE Trans. IT, with a single theorem and a proof that fits on one page. Costa was then a Bell Labs post-doc, working on an internal open problem posed by the founder of network information theory, Te Sun Han.

Costa's proof used a Gel'fand–Pinsker auxiliary variable U=x+αsU = \mathbf{x} + \alpha \mathbf{s} and showed that the mutual information I(U;y)I(U;s)I(U; \mathbf{y}) - I(U; \mathbf{s}) is maximised over the choice of α\alpha at exactly the clean-channel capacity. The MMSE coefficient α=P/(P+σ2)\alpha = P/(P + \sigma^2) that appeared in Costa's optimisation is the same coefficient that reappears in Erez–Zamir (2004) and Erez–Shamai–Zamir (2005) for the lattice realisation — history recognising the lattice connection only 22 years later.

Costa originally titled the paper "On the capacity of the AWGN channel with side information" — the editors insisted on a more evocative title. "Writing on Dirty Paper" became the name of the result and, by extension, the entire sub-field. The paper was cited fewer than 5050 times in its first decade; in the wireless era it has become one of the most-cited information-theory results of all time (7000+ citations as of 2025).

Common Mistake: DPC Requires NON-CAUSAL Interference Knowledge

Mistake:

Assuming that causal knowledge of the interference ("I know today's s\mathbf{s} before I transmit today's x\mathbf{x}") is enough for the DPC capacity. It is not: Costa's theorem requires non-causal knowledge — the transmitter must know the entire interference sequence s\mathbf{s} (across all future time) before encoding the block.

Correction:

For ordinary causal feedback ("sk\mathbf{s}_k known at encoder before xk\mathbf{x}_k"), the capacity is lower than Costa's: it is the "DPC with causal side information" rate, which is strictly below 12log2(1+SNR)\tfrac12 \log_2(1+\text{SNR}) when s\mathbf{s} is AWGN and is characterised by the Gel'fand–Pinsker formula with a different auxiliary. In practical wireless systems, the "non-causal" assumption holds per coded block: the base station knows the complete interference vector for a block before encoding the whole block, and the DPC rate applies to that block. For streaming (symbol-by-symbol) applications, only causal DPC is available and the rate is lower.

Why This Matters: From DPC to 5G MU-MIMO

The connection between lattice DPC and modern MU-MIMO precoding runs through the broadcast-channel capacity theorem: a base station with MM antennas serving KK users achieves sum-capacity via DPC over the sequential encoding order of the users. For each user, the signals of earlier-encoded users are known interference — the DPC channel of this section. The sum-capacity was proved achievable by sequential DPC by Weingarten–Steinberg–Shamai (2006), and the practical implementation uses lattice-DPC-like vector-perturbation precoders (Hochwald–Peel–Swindlehurst 2005).

Forward-link to Ch. 17 (LAST codes, MIMO book): the LAST codes of El Gamal, Caire, and Damen (2004) use a matrix-valued MMSE- GDFE that generalises the scalar α\alpha of DPC to a full matrix. The Erez–Shamai–Zamir dirty-paper construction is the scalar special case; LAST codes are the matrix version.

Dirty-paper coding (DPC)

Coding for the channel y=x+s+w\mathbf{y} = \mathbf{x} + \mathbf{s} + \mathbf{w} with non-causal transmitter knowledge of the interference s\mathbf{s}. Costa (1983) proved the capacity equals 12log2(1+SNR)\tfrac12 \log_2(1 + \text{SNR}), independent of s2\|\mathbf{s}\|^2. Practical realisation via nested lattice codes (Erez–Shamai–Zamir 2005).

Related: Tomlinson–Harashima precoding, Costa (1983): 'Writing on Dirty Paper', Interference Cancellation

Tomlinson–Harashima precoding

Scalar (11D) version of lattice DPC for known ISI: pre-cancel the interference modulo a scaled integer lattice 2MZ2 M \mathbb{Z}. Incurs 1.53\approx 1.53 dB shaping loss relative to the full lattice scheme. Deployed in V.34, ADSL, VDSL, G.fast, MU-MIMO.

Related: Lattice DPC Achieves the Clean-Channel Capacity, Costa (1983): 'Writing on Dirty Paper', Isi

Quick Check

The dirty-paper channel y=x+s+w\mathbf{y} = \mathbf{x} + \mathbf{s} + \mathbf{w} has transmitter knowledge of s\mathbf{s} but not receiver knowledge. The capacity is:

12log2(1+P/(S+σ2))\tfrac12 \log_2(1 + P/(S + \sigma^2)), i.e., interference counted as extra noise

12log2(1+SNR)\tfrac12 \log_2(1 + \text{SNR}), independent of s2\|\mathbf{s}\|^2

12log2(1+(P+S)/σ2)\tfrac12 \log_2(1 + (P + S)/\sigma^2), i.e., combined signal+interference power

00 when s2\|\mathbf{s}\|^2 \to \infty

Key Takeaway

Costa (1983) proved that a channel with interference s\mathbf{s} known non-causally at the transmitter has the same capacity as the clean channel12log2(1+SNR)\tfrac12 \log_2(1 + \text{SNR}), for any interference power. The lattice realisation of Erez, Shamai, and Zamir (2005) achieves this capacity with a single extra mod-Λs\Lambda_s at the transmitter (to pre-cancel αs\alpha \mathbf{s} within the Voronoi power region), reusing the Erez–Zamir MMSE scalar α\alpha and crypto-lemma of s02 verbatim. This "proof pattern" (crypto-lemma + MMSE + mod-Λ\Lambda) is the backbone of DPC, MU-MIMO vector precoding, and — in matrix form — the LAST codes of Ch. 17. The scalar realisation has been fielded as Tomlinson–Harashima precoding in every major DSL and xDSL-derived standard since 1994.