Ferkans — Interactive Telecom Tutor

Why Lattices Should Achieve AWGN Capacity

Shannon's 1948 proof that a Gaussian codebook achieves the AWGN capacity $\tfrac12 \log_2(1 + \text{SNR})$ is famously non-constructive. The codewords are drawn i.i.d. from a Gaussian distribution, so the code has no algebraic structure whatsoever — no linearity, no symmetry, no efficient encoder or decoder. For half a century, the open question was: can a structured codebook (one with algebraic linearity, so that encoder and decoder are as cheap as matrix-vector products) achieve the same capacity?

The point is that lattices are the natural structured codebook for a Gaussian channel. Every lattice operation is linear. Every lattice codeword is an integer combination of basis vectors, and the lattice "universe" $\mathbb{R}^n$ matches the Gaussian noise support. So lattices should work — but the naïve scheme (take a dense lattice, truncate to a power constraint, decode to the nearest lattice point) leaves a gap that nobody could close for 56 years.

Erez and Zamir, in their 2004 IEEE Trans. IT paper, finally closed it. Their scheme has four moving parts:

A good fine lattice $\Lambda_c$ (provides the coding gain);
A good coarse lattice $\Lambda_s$ (provides the shaping gain);
An MMSE scalar $\alpha = \text{SNR}/(1 + \text{SNR})$ at the receiver (the single scalar that Shannon's proof doesn't need but lattices do);
A random dither $\mathbf{d}$ , uniform on $\mathcal{V}(\Lambda_s)$ and known at both ends, that converts the input to an unconstrained AWGN input via the crypto-lemma.

The "proof pattern" of inflating a bounded input to look like an unbounded Gaussian via dither + modulo is the trick that recurs throughout lattice coding theory — in multiple-access, broadcast, interference, and compute-and-forward. Master it here, and the rest of Part IV falls into place.

,

Definition:
MMSE Coefficient $\alpha$

For the AWGN channel $\mathbf{y} = \mathbf{x} + \mathbf{w}$ with power-constrained input $\mathbb{E}[\|\mathbf{x}\|^2]/n \le P$ and noise variance $\sigma^2$ per dimension, the MMSE coefficient is $\alpha \;=\; \frac{P}{P + \sigma^2} \;=\; \frac{\text{SNR}}{1 + \text{SNR}}.$ It is the unique scalar that minimises $\mathbb{E}[\|\mathbf{x} - \alpha \mathbf{y}\|^2]$ when $\mathbf{x}$ is zero-mean with second moment $P$ and uncorrelated with the noise.

$\alpha$ is strictly less than $1$ : the MMSE estimator shrinks the received signal towards zero because the noise adds energy that the estimator can discount. The parameter $1 - \alpha = \sigma^2 / (P + \sigma^2)$ is the noise share of the total energy. The specific choice $\alpha = \text{SNR}/(1 + \text{SNR})$ is what makes the residual noise after scaling $\mathbf{z} = \alpha \mathbf{y} - \mathbf{x}$ have the smallest possible variance $\alpha \sigma^2$ — a factor of $\alpha < 1$ smaller than the raw channel noise.

,

Definition:
Dither Vector

A dither is a random vector $\mathbf{d} \in \mathbb{R}^n$ uniformly distributed on the Voronoi region $\mathcal{V}(\Lambda_s)$ of the shaping lattice, drawn independently of the message and the channel noise, and known to both transmitter and receiver (e.g., via a shared pseudo-random seed).

The dither is the ingredient that makes the transmitted signal $\mathbf{x}' = [\mathbf{x} - \mathbf{d}] \bmod \Lambda_s$ unconditionally uniform on $\mathcal{V}(\Lambda_s)$ , regardless of which codeword $\mathbf{x}$ was sent.

The dither is not transmitted; it is a pre-agreed random seed. In practice the shared randomness is generated by a pseudo-random sequence seeded with a public frame index (as in scrambling codes in 3GPP and DVB). The dither's only purpose is analytic: it lets us apply the crypto-lemma and reduce the bounded-input AWGN problem to an unbounded-input lattice-decoding problem.

,

Theorem: Crypto-Lemma (Zamir–Feder)

Let $\mathbf{x} \in \mathbb{R}^n$ be an arbitrary (possibly deterministic) vector, and let $\mathbf{d}$ be uniform on $\mathcal{V}(\Lambda)$ and independent of $\mathbf{x}$ . Then $[\mathbf{x} + \mathbf{d}] \bmod \Lambda$ is uniformly distributed on $\mathcal{V}(\Lambda)$ and independent of $\mathbf{x}$ .

Adding an independent uniform dither on $\mathcal{V}(\Lambda)$ and then reducing mod- $\Lambda$ spreads any input $\mathbf{x}$ to the uniform distribution — because any rigid translation of a uniform distribution on a fundamental region gives a uniform distribution on the (translated, then-modulo-reduced) fundamental region, and the modulo reduction maps back into the canonical Voronoi cell. This is the lattice analogue of the one-time pad.

Show Hint

Condition on $\mathbf{x} = \mathbf{x}_0$ . Show that $\mathbf{x}_0 + \mathbf{d}$ is uniform on $\mathcal{V}(\Lambda) + \mathbf{x}_0$ , a translated Voronoi region.

Use shift-invariance of Lebesgue measure and the fact that $\mathcal{V}(\Lambda)$ is a fundamental region, so $\mathcal{V}(\Lambda) + \mathbf{x}_0$ still tiles $\mathbb{R}^n$ under $\Lambda$ .

Apply the modulo reduction: $[\mathbf{x}_0 + \mathbf{d}] \bmod \Lambda$ is uniform on $\mathcal{V}(\Lambda)$ , and this holds for every conditioning $\mathbf{x}_0$ — hence independence.

Proof

Conditional distribution

Fix any $\mathbf{x} = \mathbf{x}_0$ . Then $\mathbf{x}_0 + \mathbf{d}$ has the distribution of $\mathbf{d}$ shifted by $\mathbf{x}_0$ : uniform on the set $\mathbf{x}_0 + \mathcal{V}(\Lambda)$ . This is another fundamental region of $\Lambda$ , because any translate of a fundamental region is again a fundamental region.

Modulo reduction is a bijection fundamental-region → $\mathcal{V}(\Lambda)$

The map $\mathbf{y} \mapsto [\mathbf{y}] \bmod \Lambda$ is a bijection from any fundamental region of $\Lambda$ onto the canonical Voronoi cell $\mathcal{V}(\Lambda)$ (it pushes each coset representative to its canonical representative). Moreover the map is Lebesgue-measure-preserving because translation by a lattice vector preserves Lebesgue measure.

Result

Therefore $[\mathbf{x}_0 + \mathbf{d}] \bmod \Lambda$ is uniform on $\mathcal{V}(\Lambda)$ — and this holds for every $\mathbf{x}_0$ . Since the conditional distribution of $[\mathbf{x} + \mathbf{d}] \bmod \Lambda$ given $\mathbf{x}$ is the same uniform distribution for every value of $\mathbf{x}$ , the two random variables are independent. $\blacksquare$

,

Erez–Zamir Achievability: mod- $\Lambda$ vs Shannon vs Uncoded QAM

Rate curves as a function of SNR (dB) per real dimension: (i) the AWGN capacity $C = \tfrac12 \log_2(1 + \text{SNR})$ ; (ii) the rate achieved by the Erez–Zamir mod- $\Lambda$ scheme with optimal $\Lambda_c, \Lambda_s$ — which coincides with $C$ at every SNR; (iii) the rate of uncoded $M$ -QAM at a fixed target symbol error probability of $10^{-5}$ , showing the gap to capacity that classical truncated-cube constellations leave on the table. For small $M$ the gap is $\approx 9$ dB at high SNR; Erez–Zamir closes it entirely. Increasing $M$ asymptotically recovers the capacity but at exponentially growing complexity, whereas the lattice scheme achieves capacity with polynomial $n$ .

Parameters

QAM size

M

Erez–Zamir mod- $\Lambda$ Encoder and Decoder

Complexity: Encoder:

O(n)

for mod-

\Lambda_s

(if

\Lambda_s

has a fast quantiser). Decoder: dominated by step 5 — the closest

\Lambda_c

point, which is a CLP problem (s05).

Setup. Fine lattice

\Lambda_c

, coarse lattice

\Lambda_s \subset \Lambda_c

, both in

\mathbb{R}^n

; codebook

\mathcal{C} = \Lambda_c \cap \mathcal{V}(\Lambda_s)

.

Power

P =

second moment of

\mathcal{V}(\Lambda_s)

.

Shared dither

\mathbf{d}

uniform on

\mathcal{V}(\Lambda_s)

.

Encoder (given message $u$ ):

1. Map

u \in \{0, \ldots, |\mathcal{C}| - 1\}

to

\mathbf{c}(u) \in \mathcal{C}

by a fixed enumeration.

2. Transmit

\mathbf{x} \leftarrow [\mathbf{c}(u) - \mathbf{d}] \bmod \Lambda_s

.

Channel: receiver observes

\mathbf{y} = \mathbf{x} + \mathbf{w}

, with

\mathbf{w} \sim \mathcal{N}(0, \sigma^2 \mathbf{I})

.

Decoder:

3. Compute MMSE-scaled receive

\mathbf{y}' \leftarrow \alpha \mathbf{y}

, with

\alpha = P / (P + \sigma^2)

.

4. Add dither back and reduce mod-

\Lambda_s

:

\mathbf{r} \leftarrow [\mathbf{y}' + \mathbf{d}] \bmod \Lambda_s

.

5. Decode to the nearest

\Lambda_c

point:

\hat{\mathbf{c}} \leftarrow Q_{\Lambda_c}(\mathbf{r})

.

6. Un-enumerate:

\hat{u} \leftarrow \mathbf{c}^{-1}(\hat{\mathbf{c}})

.

Steps 3–4 together reduce the coded AWGN channel to an unconstrained lattice AWGN channel: receiving point $\mathbf{r}$ , decoding the closest $\Lambda_c$ -point without any power constraint. The Erez–Zamir theorem below shows that this reduction loses nothing — the capacity of the unconstrained lattice channel at effective SNR $\text{SNR}$ equals the AWGN capacity.

Theorem: Erez–Zamir (2004): Lattice Codes Achieve AWGN Capacity

For every $R < C = \tfrac12 \log_2(1 + \text{SNR})$ (bits per real dimension) and every $\varepsilon > 0$ , there exist nested lattices $\Lambda_s \subset \Lambda_c \subset \mathbb{R}^n$ (for some $n < \infty$ ) and a dither $\mathbf{d}$ such that the mod- $\Lambda_s$ scheme (Algorithm above) with MMSE coefficient $\alpha = \text{SNR}/(1 + \text{SNR})$ achieves rate $R$ with average error probability $P_e < \varepsilon$ under a power constraint $P$ .

The proof has three moves, each earning a "good" lattice. First, the dither + mod- $\Lambda_s$ + MMSE reduces the bounded-input AWGN channel to an unbounded lattice channel with effective noise $\mathbf{z} = (1-\alpha)(-\mathbf{d}) + \alpha \mathbf{w}$ — a mixture of dither and scaled noise, whose second moment is exactly $\alpha \sigma^2$ (the MMSE error power). Second, by the Loeliger random-lattice averaging, there exists a fine lattice $\Lambda_c$ whose decoding error probability on a pure Gaussian channel at SNR $\alpha / (\alpha) = 1/\alpha \times \ldots$ vanishes at rates up to the Poltyrev capacity $\tfrac12 \log_2 (1 / G(\Lambda_s) \cdot 2 \pi e \cdot \alpha \sigma^2)$ . Third, for a Voronoi-shaping lattice $\Lambda_s$ with $G(\Lambda_s) \to 1/(2 \pi e)$ (Rogers, Poltyrev), this rate equals $\tfrac12 \log_2 (1 + \text{SNR})$ exactly. The three moves compose to close the full capacity gap.

Show Hint

Step 1: use the crypto-lemma to show that the transmit signal $\mathbf{x} = [\mathbf{c}(u) - \mathbf{d}] \bmod \Lambda_s$ is uniform on $\mathcal{V}(\Lambda_s)$ and independent of the message.

Step 2: compute the effective noise $\mathbf{z}$ after the decoder's mod- $\Lambda_s$ reduction. Show that $\mathbf{z} \bmod \Lambda_s = [(1-\alpha)(-\mathbf{d}) + \alpha \mathbf{w}] \bmod \Lambda_s$ , and that $\mathbb{E}[\|\mathbf{z}\|^2]/n = \alpha \sigma^2$ .

Step 3: use the Loeliger random-lattice averaging to argue existence of a good $\Lambda_c$ whose decoding error probability on the effective Gaussian-like noise vanishes at rates up to the Poltyrev capacity.

Step 4: optimise $\Lambda_s$ to drive the normalised second moment $G(\Lambda_s) \to 1/(2 \pi e)$ ; substitute to recover the Shannon formula.

Proof

Step 1: transmit distribution is uniform

By the crypto-lemma (Theorem above), the transmit signal $\mathbf{x} = [\mathbf{c}(u) - \mathbf{d}] \bmod \Lambda_s$ is uniform on $\mathcal{V}(\Lambda_s)$ and statistically independent of the message $u$ . Its per-dimension second moment is $P =$ second moment of $\mathcal{V}(\Lambda_s)$ , satisfying the power constraint with equality.

Step 2: equivalent unconstrained channel

At the decoder, $\mathbf{r} = [\alpha \mathbf{y} + \mathbf{d}] \bmod \Lambda_s = [\alpha(\mathbf{x} + \mathbf{w}) + \mathbf{d}] \bmod \Lambda_s = [\mathbf{c}(u) + (1 - \alpha)(-\mathbf{d}) + \alpha \mathbf{w}] \bmod \Lambda_s$ , where we used $\alpha \mathbf{x} + \mathbf{d} = \alpha[\mathbf{c}(u) - \mathbf{d}] + \mathbf{d} = \alpha \mathbf{c}(u) + (1 - \alpha) \mathbf{d} \equiv \mathbf{c}(u) + (1-\alpha)(-\mathbf{d}) \pmod{\Lambda_s}$ by $\mathbf{c}(u) \in \Lambda_c \supset \Lambda_s$ and a $(1 - \alpha) \alpha \mathbf{c}(u)$ absorption into the mod. Define the effective noise $\mathbf{z} := (1 - \alpha)( -\mathbf{d}) + \alpha \mathbf{w}$ . Its per-dimension second moment is $\mathbb{E}\bigl[\tfrac{\|\mathbf{z}\|^2}{n}\bigr] \;=\; (1 - \alpha)^2 P + \alpha^2 \sigma^2 \;=\; \alpha \sigma^2.$ The last equality uses $\alpha = P/(P + \sigma^2)$ : $(1 - \alpha)^2 P + \alpha^2 \sigma^2 = (\sigma^2/(P + \sigma^2))^2 P + (P/(P + \sigma^2))^2 \sigma^2 = P \sigma^2 / (P + \sigma^2) = \alpha \sigma^2$ .

Step 3: the effective channel is 'almost Gaussian'

The decoder sees $\mathbf{r} = [\mathbf{c}(u) + \mathbf{z}] \bmod \Lambda_s$ with $\mathbb{E}[\|\mathbf{z}\|^2]/n = \alpha \sigma^2$ . If $\mathbf{z}$ were purely Gaussian, this would be the unconstrained lattice channel (Poltyrev channel) at effective noise power $\alpha \sigma^2$ , whose capacity is $\tfrac12 \log_2 (1 / (2 \pi e G(\Lambda_s)) \cdot V(\Lambda_s)^{2/n} / (\alpha \sigma^2))$ by Poltyrev's analysis. The dither component $(1 - \alpha)(-\mathbf{d})$ is not Gaussian — it is uniform on a scaled Voronoi region — but a key lemma (Erez–Zamir's MMSE-inflation lemma, also known as the "Linder–Zamir" / Zamir–Feder moment inequality) shows that for any lattice $\Lambda_s$ with finite second moment, Gaussian decoding of $\mathbf{c}(u)$ through the $\mathbf{z}$ - noise achieves the same asymptotic error performance as through a pure Gaussian of the same variance. The key step is Lemma 6 of Erez–Zamir: the cumulant-generating function of $\mathbf{z}$ is dominated by that of a Gaussian of variance $\alpha \sigma^2$ .

Step 4: random-lattice averaging supplies $\Lambda_c$

By the Loeliger random-lattice averaging argument (Ch. 15 Minkowski–Hlawka, Theorem 15.x), for any rate $R' < \tfrac12 \log_2(1/(2 \pi e G(\Lambda_s)) \cdot V(\Lambda_s)^{2/n} / (\alpha \sigma^2))$ there exists a fine lattice $\Lambda_c \supset \Lambda_s$ with $\log_2 |\Lambda_c / \Lambda_s|/n \ge R'$ and decoding error probability (at effective noise variance $\alpha \sigma^2$ ) vanishing exponentially in $n$ . The Poltyrev exponent is strictly positive whenever we are below the Poltyrev capacity, so ensemble averaging gives us what we need.

Step 5: optimise $\Lambda_s$ to close the shaping gap

Rogers (1959) and Poltyrev (1994) proved that there exist lattices with normalised second moment $G(\Lambda_s) \to 1/(2 \pi e)$ as $n \to \infty$ . (The cube has $G(\mathbb{Z}^n) = 1/12$ , so the shaping gap is $1/12 \div 1/(2 \pi e) = \pi e / 6 \approx 1.53$ dB — Ch. 4 and s03 of this chapter.) Substituting $G(\Lambda_s) \to 1/(2 \pi e)$ and $V(\Lambda_s)^{2/n} = P / G(\Lambda_s) \to 2 \pi e P$ (the second-moment identity $V^{2/n} G = P$ ), the rate bound in Step 4 becomes $R' \;<\; \tfrac12 \log_2 \frac{2 \pi e P}{2 \pi e \cdot \alpha \sigma^2} \;=\; \tfrac12 \log_2 \frac{P}{\alpha \sigma^2} \;=\; \tfrac12 \log_2 \frac{P + \sigma^2}{\sigma^2} \;=\; \tfrac12 \log_2(1 + \text{SNR}).$ The last equality used $\alpha = P/(P + \sigma^2)$ to cancel. Hence every $R < C$ is achievable by the mod- $\Lambda_s$ scheme with a suitable $(\Lambda_c, \Lambda_s)$ . $\blacksquare$

, ,

Erez–Zamir mod- $\Lambda$ Scheme: Achieving AWGN Capacity with Lattices

Animation of the Erez–Zamir mod-

\Lambda

encoder and decoder, step by step: input

u

maps to a fine-lattice codeword

\mathbf{x}

inside

\mathcal{V}(\Lambda_s)

; a pseudo-random dither

\mathbf{d}

is subtracted and the result is reduced mod-

\Lambda_s

; the transmit signal passes through the AWGN channel; the receiver MMSE-scales by

\alpha

, adds back the dither, and takes a mod-

\Lambda_s

reduction, collapsing the lattice-wrapped Gaussian noise into a single Voronoi cell; finally, a closest-

\Lambda_c

search recovers

\hat{\mathbf{x}}

and the message

\hat{u}

. The visualisation highlights the role of the dither (spreading the input to a uniform distribution on

\mathcal{V}(\Lambda_s)

) and the MMSE scaling (concentrating the effective noise into the smallest-variance Gaussian-like wrap-around).

Four stages of the mod-

\Lambda

scheme: encoder (dither subtract + mod-

\Lambda_s

), AWGN channel, decoder (MMSE

\alpha

+ dither add + mod-

\Lambda_s

), and

\Lambda_c

closest-point lookup. The scheme achieves

\tfrac12 \log_2(1 + \text{SNR})

exactly.

Example: mod- $\Lambda$ at Rate $R = 2$ bits per Real Dimension

For a nested lattice code with $\Lambda_c = \mathbb{Z}^n$ and $\Lambda_s = 4 \mathbb{Z}^n$ , the rate is $R = \tfrac{1}{n} \log_2 |\Lambda_c / \Lambda_s| = \log_2 4 = 2$ bits per real dimension. By the Erez–Zamir theorem, what is the minimum SNR at which this rate is achievable in the limit of large $n$ ? Compare to the SNR required by an uncoded 16-QAM (rate $2$ ) constellation at error probability $10^{-5}$ , and to a cubic- shaping ( $\Lambda_s =$ scaled cube) lattice code.

Solution

Shannon limit for $R = 2$

The AWGN capacity equals $R = 2$ when $\tfrac12 \log_2(1 + \text{SNR}) = 2$ , i.e., $\text{SNR} = 15 = 11.76$ dB. This is the absolute minimum SNR at which $R = 2$ can be reliably communicated, irrespective of coding scheme.

Erez–Zamir asymptotic

The Erez–Zamir theorem tells us that any nested lattice pair $(\Lambda_c, \Lambda_s)$ with a Voronoi-optimal shaping lattice (i.e., $G(\Lambda_s) \to 1/(2 \pi e)$ ) and a Poltyrev-good coding lattice (i.e., decoding error probability vanishing exponentially) achieves $R = 2$ at $\text{SNR} = 11.76$ dB as $n \to \infty$ . The scheme closes the gap completely.

Uncoded 16-QAM baseline

Uncoded 16-QAM requires $\text{SNR} \approx 23$ dB for symbol error probability $10^{-5}$ (using the Q-function approximation). The gap to the Shannon limit is $\approx 11.2$ dB, split into a coding gap and a shaping gap. The shaping gap alone (going from cubic shaping to Voronoi shaping) is $\pi e / 6 \approx 1.53$ dB asymptotically. The remaining $\approx 9.7$ dB is coding gain, recoverable by choosing a dense $\Lambda_c$ (e.g., $E_8$ in dimension 8 gives $\gamma_c = 3$ dB; $\Lambda_{24}$ gives $6$ dB; random lattices in dimension $n \to \infty$ recover the full $9.7$ dB).

Practical implication

In practice, reaching within $0.5$ dB of the Shannon limit at $R = 2$ requires $n \ge 1000$ -ish dimensional lattices, which is done via LDPC-lattice constructions (e.g., LDPC-sigma-mapping by Sommer–Feder–Shalvi 2008). This is the structured-lattice counterpart of the LDPC-with-BICM of Ch. 9, and it is the construction that Erez–Zamir proved sharply optimal. $\blacksquare$

⚠️Engineering Note

Why the MMSE Coefficient Matters: 56 Years of Progress

From Shannon's 1948 paper until Erez–Zamir's 2004 paper, it was known that un-scaled lattice decoding (the so-called "inflated lattice" argument of de Buda, 1989, and Urbanke–Rimoldi, 1998) could achieve only $\tfrac12 \log_2(\text{SNR}/(2 \pi e / 12))$ , about $1.53$ dB short of the Shannon capacity at high SNR. The gap was the absence of the MMSE scalar $\alpha$ ; without it, the effective noise after mod- $\Lambda_s$ has variance $\sigma^2$ , not $\alpha \sigma^2$ , and the shaping- gain factor $2\pi e / 12$ appears as a residual loss.

The Erez–Zamir insight was subtle but decisive: scaling by $\alpha < 1$ before the modulo operation is the receiver's analogue of the MMSE-DFE, and it converts the "un-inflated" lattice channel into a capacity-achieving one. The MMSE factor is a fixed scalar determined by SNR; in practical receivers it requires only AGC-level calibration.

Practical Constraints

•
Receiver must know SNR to compute $\alpha = \text{SNR}/(1 + \text{SNR})$
•
Transmitter and receiver must share the dither (pseudo-random seed)
•
Sensitive to SNR mismatch at low SNR: a $1$ dB error in $\text{SNR}$ estimate can cost $\sim 0.1$ dB

📋 Ref: Erez–Zamir (2004); Zamir 2014 ch. 7

Common Mistake: MMSE Scaling $\alpha$ Is Not the Same as Zero-Forcing

Mistake:

Confusing the Erez–Zamir MMSE scalar $\alpha = \text{SNR}/(1 + \text{SNR})$ with a zero-forcing or matched-filter receiver. A zero-forcing receiver would use $\alpha = 1$ (no scaling); a matched-filter receiver would use $\alpha =$ signal-to-total power ratio, depending on how "total" is defined. Neither choice achieves the AWGN capacity with lattice codes.

Correction:

The MMSE scalar $\alpha < 1$ is the exact scalar that minimises the per-dimension expected-square error $\mathbb{E}[(x - \alpha y)^2]$ when $x \sim \mathcal{N}(0, P)$ and $y = x + w$ with $w \sim \mathcal{N}(0, \sigma^2)$ . It is the orthogonality-principle solution. Using $\alpha = 1$ introduces an information-theoretic loss of exactly $1.53$ dB at high SNR (the shaping-gain ceiling reappearing as a residual). The MMSE $\alpha$ is what makes the effective noise after modulo reduction have variance $\alpha \sigma^2$ rather than $\sigma^2$ , closing precisely this $1.53$ dB gap.

Capacity Gap = Coding Gap + Shaping Gap (Additive in dB)

Once the Erez–Zamir theorem is in hand, it sharpens an earlier intuition of Forney–Ungerboeck into a quantitative statement. For any finite-dimensional lattice code $(\Lambda_c, \Lambda_s)$ , write the rate gap to capacity as $C - R \;=\; \underbrace{\Delta_c}_{\text{coding-gain shortfall}} + \underbrace{\Delta_s}_{\text{shaping-gain shortfall}}.$ These two terms are additive (in dB) and independent: $\Delta_c$ depends only on $\Lambda_c$ , $\Delta_s$ only on $\Lambda_s$ . For the designer this means: pick any good fine lattice you like (for its algebraic nice-ness and decoder complexity), pick any good coarse lattice you like (for its second-moment closeness to a ball), and the overall gap is the sum of the two. This is the operational payoff of the Erez–Zamir proof — it transforms "achieving capacity" into two decoupled engineering problems, neither of which is easy but both of which are well-understood.

,

Quick Check

In the Erez–Zamir mod- $\Lambda$ scheme with MMSE coefficient $\alpha = \text{SNR}/(1+\text{SNR})$ , the effective noise after modulo reduction has per-dimension second moment equal to:

$\sigma^2$ (the raw channel noise variance)

$\alpha \, \sigma^2$

$(1 - \alpha) \, \sigma^2$

$P$ (the transmit power)

Correction:

\alpha \, \sigma^2

Correct. The effective noise $\mathbf{z} = (1 - \alpha)(- \mathbf{d}) + \alpha \mathbf{w}$ has variance $(1 - \alpha)^2 P + \alpha^2 \sigma^2 = \alpha \sigma^2$ after substituting $\alpha = P/(P + \sigma^2)$ . This factor $\alpha < 1$ is exactly the shrinkage that closes the $1.53$ dB shaping-gain gap.

Key Takeaway

The Erez–Zamir mod- $\Lambda$ scheme achieves the AWGN capacity $\tfrac12 \log_2(1 + \text{SNR})$ with a structured codebook, by combining a good fine lattice $\Lambda_c$ (via Loeliger's random lattices), a good coarse lattice $\Lambda_s$ (with normalised second moment $G(\Lambda_s) \to 1/(2 \pi e)$ ), an MMSE scalar $\alpha = \text{SNR}/(1 + \text{SNR})$ , and a shared dither $\mathbf{d}$ uniform on $\mathcal{V}(\Lambda_s)$ . The crypto-lemma turns the bounded-input channel into an unconstrained lattice channel; the MMSE factor shrinks the effective noise by $\alpha$ ; and optimising $\Lambda_s$ kills the $\pi e / 6 \approx 1.53$ dB shaping gap. The capacity gap of any finite lattice code decomposes additively (in dB) into a coding gap and a shaping gap — two independent knobs for the designer.

The mod-Λ\LambdaΛ Scheme: Erez–Zamir Capacity

Why Lattices Should Achieve AWGN Capacity

Definition: MMSE Coefficient α\alphaα

Definition: Dither Vector

Theorem: Crypto-Lemma (Zamir–Feder)

Conditional distribution

Modulo reduction is a bijection fundamental-region → $\mathcal{V}(\Lambda)$

Result

Erez–Zamir Achievability: mod-Λ\LambdaΛ vs Shannon vs Uncoded QAM

Parameters

Erez–Zamir mod-Λ\LambdaΛ Encoder and Decoder

Theorem: Erez–Zamir (2004): Lattice Codes Achieve AWGN Capacity

Step 1: transmit distribution is uniform

Step 2: equivalent unconstrained channel

Step 3: the effective channel is 'almost Gaussian'

Step 4: random-lattice averaging supplies $\Lambda_c$

Step 5: optimise $\Lambda_s$ to close the shaping gap

Erez–Zamir mod-Λ\LambdaΛ Scheme: Achieving AWGN Capacity with Lattices

Example: mod-Λ\LambdaΛ at Rate R=2R = 2R=2 bits per Real Dimension

Shannon limit for $R = 2$

Erez–Zamir asymptotic

Uncoded 16-QAM baseline

Practical implication

Why the MMSE Coefficient Matters: 56 Years of Progress

Common Mistake: MMSE Scaling α\alphaα Is Not the Same as Zero-Forcing

Capacity Gap = Coding Gap + Shaping Gap (Additive in dB)

Quick Check

Key Takeaway

The mod- $\Lambda$ Scheme: Erez–Zamir Capacity

Definition:
MMSE Coefficient $\alpha$

Definition:
Dither Vector

Erez–Zamir Achievability: mod- $\Lambda$ vs Shannon vs Uncoded QAM

Erez–Zamir mod- $\Lambda$ Encoder and Decoder

Erez–Zamir mod- $\Lambda$ Scheme: Achieving AWGN Capacity with Lattices

Example: mod- $\Lambda$ at Rate $R = 2$ bits per Real Dimension

Common Mistake: MMSE Scaling $\alpha$ Is Not the Same as Zero-Forcing