Ferkans — Interactive Telecom Tutor

From Algebra to Information Theory: Why We Need LAST Codes

Chapter 13 closed the DMT question for linear space-time codes algebraically: pick a cyclic division algebra of degree $n_t$ , verify the non-vanishing-determinant property, and you have a code that achieves the Zheng-Tse curve $d^*(r) = (n_t - r)(n_r - r)$ for every $(n_t, n_r, r)$ . The point is that this proof rests on the minimum-determinant $\delta_{\min} \ge \delta_0$ of the codebook — a number-theoretic invariant of the CDA. It works, but it does not tell us why lattice codes are DMT-optimal; it tells us that one particular lattice (the image of the CDA's regular representation) is.

Here we take the complementary route. We ask: is there a construction that achieves the DMT using only information-theoretic ingredients — a random lattice (Ch. 15 Minkowski-Hlawka), an MMSE-type pre-processor (Ch. 16 Erez-Zamir), and a union bound? The answer, due to El Gamal, Caire, and Damen (2004), is yes: the Lattice Space-Time (LAST) code family. The construction is a direct composition of two prior chapters.

From Ch. 16 (Erez-Zamir) we take the recipe for achieving the AWGN capacity: a fine lattice $\Lambda_c$ inside a coarse shaping lattice $\Lambda_s$ , common random dithering, and MMSE scaling to defeat the effective noise. From Ch. 10-13 (MIMO) we take the idea that a MIMO fading channel can be triangularised by a decision-feedback receiver. The LAST construction does both: it sends Erez-Zamir codewords over the MIMO channel, triangularises the channel with the MMSE-GDFE receiver (§2), and — the main theorem of §3 — achieves the Zheng-Tse DMT at every rate.

Intuitively, what happens is that the MMSE-GDFE is to lattice codes what MMSE-SIC is to Gaussian random codes: it is the receiver that makes the capacity-achieving codebook work on a multi-layer channel. The analogy is precise and will be the central thread of §§2-3.

The construction was immediately extended (§4) to structured LAST codes by K. R. Kumar and G. Caire (2008), who showed that replacing the random lattice by $E_8$ , the Leech lattice, or any other dense known lattice preserves DMT-optimality AND gives a 3-6 dB coding-gain boost at moderate SNR. This practically-oriented extension is the bridge between information theory and deployment.

,

Definition:
LAST Codebook

A Lattice Space-Time (LAST) codebook of rate $R$ bits per channel use, block length $T$ , and transmit dimension $n_t$ is defined by a pair of lattices $\Lambda_c \supseteq \Lambda_s$ in $\mathbb{R}^{2 n_t T}$ (real dimensions; $\mathbb{C}^{n_t T}$ is identified with $\mathbb{R}^{2 n_t T}$ via the standard complex-to-real isomorphism). The codebook is $\mathcal{C} \;=\; \Lambda_c \cap \mathcal{V}(\Lambda_s) \;=\; \{\mathbf{x} \in \Lambda_c : \mathbf{x} \in \mathcal{V}(\Lambda_s)\},$ where $\mathcal{V}(\Lambda_s)$ is the Voronoi region of the coarse (shaping) lattice $\Lambda_s$ , and $\Lambda_c$ is the fine (coding) lattice. The rate is $R = T^{-1} \log_2 |\Lambda_c / \Lambda_s| = T^{-1} \log_2 (V(\Lambda_s) / V(\Lambda_c))$ bits per channel use.

Given message $\mathbf{u} \in \mathbb{Z}^{2 n_t T}$ and a common random dither $\mathbf{d} \sim \mathrm{Unif}(\mathcal{V}(\Lambda_s))$ known at both transmitter and receiver, the transmitted codeword matrix is $\mathbf{x} \;=\; [\mathbf{G}\mathbf{u} + \mathbf{d}] \bmod \Lambda_s \;\in\; \mathcal{V}(\Lambda_s),$ reshaped into a space-time matrix $\mathbf{X} \in \mathbb{C}^{n_t \times T}$ . The per-channel-use transmit energy constraint is $\mathbb{E}[\|\mathbf{X}\|_F^2 / T] = \sigma_s^2(\Lambda_s)$ , the second moment of the shaping Voronoi region; choosing $\Lambda_s$ so that $\sigma_s^2 = E_s$ (the transmit energy budget) meets the power constraint.

Three design knobs define a LAST code: (i) the fine lattice $\Lambda_c$ , which controls coding gain (denser $\Lambda_c$ = better $d_{\min}$ ); (ii) the coarse lattice $\Lambda_s$ , which controls shaping gain (rounder $\mathcal{V}(\Lambda_s)$ = closer to capacity-achieving Gaussian signalling); and (iii) the common random dither $\mathbf{d}$ , which makes the transmit signal uniform on $\mathcal{V}(\Lambda_s)$ — the lattice-code analog of i.i.d. random Gaussian coding. The dither is what turns a deterministic codebook into one whose error-probability analysis is tractable via averaging — directly analogous to random-coding achievability proofs.

,

Definition:
LAST Transmission over a MIMO Block-Fading Channel

Let the MIMO channel be $\ntn{Y} \;=\; \mathbf{H} \mathbf{X} + \mathbf{w},$ where $\mathbf{H} \in \mathbb{C}^{n_r \times n_t}$ is the channel matrix (constant over the block of length $T$ ), $\mathbf{w} \in \mathbb{C}^{n_r \times T}$ has i.i.d. $\mathcal{CN}(0, 1)$ entries, and the transmit constraint is $\mathrm{tr}(\mathbb{E}[\mathbf{X} \mathbf{X}^H])/T \le n_t E_s$ . The SNR is $\text{SNR} = E_s$ (equivalently: per-symbol energy equal to SNR times noise variance, which is normalised to $1$ ).

LAST transmission. Vectorise the codeword matrix by stacking columns: $\mathrm{vec}(\mathbf{X}) \in \mathbb{C}^{n_t T}$ . Using the Kronecker identity $\mathrm{vec}(\mathbf{H} \mathbf{X}) = (\mathbf{I}_T \otimes \mathbf{H}) \mathrm{vec}(\mathbf{X})$ , the block observation becomes $\mathrm{vec}(\ntn{Y}) \;=\; (\mathbf{I}_T \otimes \mathbf{H}) \mathrm{vec}(\mathbf{X}) + \mathrm{vec}(\mathbf{w}),$ i.e., an effective flat $(n_t T) \to (n_r T)$ linear channel $\tilde{\mathbf{H}} = \mathbf{I}_T \otimes \mathbf{H}$ of block length $1$ over $\mathbb{R}^{2 n_t T}$ -space. LAST decoding operates on this vectorised representation, treating the entire block as a single lattice-encoded super-symbol.

The key modelling move is the vectorisation: after stacking, the MIMO-over-time problem looks like a flat lattice-channel problem of enlarged dimension $n_t T \to n_r T$ . The fine lattice $\Lambda_c$ lives in this enlarged space. This is why $T$ appears as a design parameter: larger $T$ = larger lattice dimension = more opportunity for shaping and coding gain, but also exponentially heavier sphere- decoding complexity.

,

Structure of a LAST Codebook (2D Slice)

A two-dimensional slice through the $n_t T$ -dimensional LAST codebook. Fine lattice points of $\Lambda_c$ are shown as small dots; the Voronoi region of the coarse lattice $\Lambda_s$ is shaded. The codebook consists of fine-lattice points inside the shaded region. Increasing $n_t$ or $T$ increases the dimension of $\Lambda_c$ (and therefore the number of shaped codewords inside $\mathcal{V}(\Lambda_s)$ ). The visualisation projects the high-dimensional lattice pair onto a 2D subspace for illustration; the actual construction lives in $\mathbb{R}^{2 n_t T}$ .

Parameters

n_t

2

Block length

T

2

🎓CommIT Contribution(2004)

Lattice Space-Time Codes Achieving the Optimal DMT

H. El Gamal, G. Caire, M. O. Damen — IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 968–985

Hesham El Gamal (Ohio State), Giuseppe Caire (then at Politecnico di Torino, later USC, now TU Berlin), and Mohamed Oussama Damen (Ecole Nationale Supérieure des Télécommunications de Paris, later Waterloo) introduced the Lattice Space-Time (LAST) code family and gave the first information-theoretic proof that lattice codes are diversity-multiplexing-tradeoff (DMT) optimal on MIMO fading channels. The paper has become one of the foundational references of lattice-based space-time coding.

The construction. Before this paper, MIMO codes were either algebraic (Alamouti, OSTBC, CDA-NVD codes of Ch. 13) or ad-hoc linear-dispersion codes (Hassibi-Hochwald). El Gamal, Caire, and Damen proposed a new family: take a random fine lattice $\Lambda_c$ (in the sense of Minkowski-Hlawka, Ch. 15), a shaping lattice $\Lambda_s \subset \Lambda_c$ , and a common random dither $\mathbf{d}$ ; transmit shaped codewords $\mathbf{x} = [\mathbf{G} \mathbf{u} + \mathbf{d}] \bmod \Lambda_s$ over the vectorised MIMO channel. The codebook has no algebraic structure — only geometric (lattice) structure.

The receiver: MMSE-GDFE. The key insight of the paper is the receiver. The MMSE-SIC receiver of V-BLAST (Wolniansky et al. 1998) achieves the MIMO sum capacity for Gaussian random codebooks by triangularising the channel via a decision-feedback structure. El Gamal, Caire, and Damen identified the lattice analog: the MMSE-GDFE (Minimum-Mean-Square-Error Generalised Decision Feedback Equaliser). It is derived from the QR decomposition of the augmented channel matrix $[\mathbf{H}^{T}, \sqrt{\alpha} \mathbf{I}]^T$ with $\alpha = 1/\text{SNR}$ , and it converts the MIMO channel into $n_t$ parallel triangular layers, each of which can be lattice- decoded in turn. The analogy MMSE-GDFE : lattice codes :: MMSE-SIC : Gaussian codes is precise and is developed in §2.

The main theorem. The paper's main result is that the LAST codebook $\mathcal{C} = \Lambda_c \cap \mathcal{V}(\Lambda_s)$ decoded with MMSE-GDFE lattice decoding achieves the full Zheng-Tse DMT curve $d^*(r) = (n_t - r)(n_r - r)$ for every $(n_t, n_r)$ and every $r \in [0, \min(n_t, n_r)]$ . The proof has three ingredients: (i) averaging over random lattices (Minkowski-Hlawka existence); (ii) Erez-Zamir analysis of the effective AWGN channel after MMSE-GDFE; (iii) matching the Laplace-method Wishart outage exponent of Zheng-Tse. The theorem is reproduced and proved in full in §3.

Why it redefined the field. Before 2004, DMT optimality was known only via algebraic CDA constructions (Ch. 13). Those constructions require deep number theory (cyclic division algebras, non-norm conditions, Selmer groups) and are inflexible: the codebook dimension is fixed by the algebra. The LAST paper showed that any lattice — random, $E_8$ , Leech, Barnes-Wall — wrapped in the MMSE-GDFE receiver achieves the DMT. This decoupled code-design from algebra: you can pick the lattice to match your finite-SNR coding-gain target (which is what Kumar and Caire 2008, below, did with $E_8$ and Leech) and still have DMT optimality for free. It also established the decoder as the load-bearing component: it is not the codebook alone but the codebook + MMSE-GDFE combination that achieves the DMT. This mindset — code+receiver co-design — has been foundational to every subsequent lattice-STC paper, including IR-LAST (Ch. 14), compute-and-forward (Ch. 18), and integer-forcing linear receivers.

lastdmtmmse-gdfelattice-decodingmimoView Paper →

Theorem: LAST Codebook Realises a Nested-Lattice Code over the MIMO Channel

The LAST codebook $\mathcal{C} = \Lambda_c \cap \mathcal{V}(\Lambda_s)$ with common random dither $\mathbf{d}$ , transmitted over the vectorised MIMO channel $\mathrm{vec}(\ntn{Y}) = (\mathbf{I}_T \otimes \mathbf{H}) \mathrm{vec}(\mathbf{X}) + \mathrm{vec}(\mathbf{w})$ , is statistically equivalent to an Erez-Zamir nested-lattice code transmitted over an AWGN channel with channel matrix $\mathbf{I}_T \otimes \mathbf{H}$ and noise $\mathcal{CN}(0, \mathbf{I}_{n_r T})$ , in the sense that for every realisation of the dither $\mathbf{d}$ , the modulo-lattice operation $[\cdot] \bmod \Lambda_s$ converts the space-time channel into the same equivalent AWGN channel that Erez-Zamir analyse.

The point is that dithering + modulo reduction removes all codeword-structure from the transmitted signal: what the channel sees is a uniform distribution on $\mathcal{V}(\Lambda_s)$ , which is isotropic in $\mathbb{R}^{2 n_t T}$ when $\Lambda_s$ is Voronoi- rounded. This is the lattice analog of Gaussian-random-code achievability: by randomising the codebook across dither realisations, we get ensemble-averaged error probabilities that match the AWGN setting.

Show Hint

Recall the Erez-Zamir equivalent-AWGN theorem (Ch. 16): for nested-lattice codes with common random dither, the modulo-lattice output is equivalent in distribution to AWGN at the effective noise level.

Apply the same argument to the vectorised MIMO channel. The channel matrix $\mathbf{I}_T \otimes \mathbf{H}$ does not affect the dither-modulo argument.

Proof

Step 1 — Vectorise and dither

Write $\mathbf{v} = \mathrm{vec}(\mathbf{X}) = [\mathbf{G} \mathbf{u} + \mathbf{d}] \bmod \Lambda_s$ . By the crypto lemma (Erez- Zamir, Ch. 16), for any deterministic $\mathbf{G} \mathbf{u}$ , the distribution of $\mathbf{v}$ over the dither $\mathbf{d} \sim \mathrm{Unif}(\mathcal{V}(\Lambda_s))$ is uniform on $\mathcal{V}(\Lambda_s)$ .

Step 2 — Channel output after modulo

The receiver forms $\tilde{\ntn{Y}} = [(\mathbf{I}_T \otimes \mathbf{H})^H \mathrm{vec}(\ntn{Y}) - \mathbf{d}] \bmod \Lambda_s$ , undoing the dither. By modulo-arithmetic and the MMSE scaling of §2 (taken here as a black box), this is equivalent in distribution to $\mathbf{v}' + \tilde{\mathbf{w}}$ where $\mathbf{v}' = \mathbf{G} \mathbf{u} \bmod \Lambda_s$ and $\tilde{\mathbf{w}}$ is the MMSE-effective noise of variance $\sigma_{\text{eff}}^2 \to 0$ as $\text{SNR} \to \infty$ .

Step 3 — Equivalent lattice-AWGN channel

The receiver is now decoding $\mathbf{v}' \in \mathcal{C}$ from $\mathbf{v}' + \tilde{\mathbf{w}}$ — an Erez-Zamir lattice- AWGN decoding problem in dimension $2 n_t T$ . All the machinery of Ch. 16 applies verbatim: the minimum-distance decoder on $\Lambda_c$ correctly identifies $\mathbf{v}'$ as long as the effective noise is inside the Voronoi region of $\Lambda_c$ .

Step 4 — Error probability via Minkowski-Hlawka averaging

Averaging over $\Lambda_c \sim$ random-lattice ensemble (Minkowski-Hlawka, Ch. 15), the error probability is bounded by the volume-to-Voronoi-ball ratio. This will give DMT optimality in §3 when combined with the channel-outage analysis. The assertion of the theorem — statistical equivalence to a nested-lattice code over AWGN — is established. $\blacksquare$

,

Example: Computing Rate and Dimension for a $2 \times 2$ LAST Code

Consider a $2 \times 2$ MIMO channel ( $n_t = n_r = 2$ ) with block length $T = 4$ . Let $\Lambda_c$ be a random lattice in $\mathbb{R}^{16}$ (after complex-to-real conversion) with fundamental volume $V(\Lambda_c) = 1$ , and let $\Lambda_s = 2^k \mathbb{Z}^{16}$ for some integer $k$ . Compute: (a) The cardinality $|\mathcal{C}|$ of the LAST codebook in terms of $k$ . (b) The per-channel-use rate $R$ in bits/ch.use. (c) The block-wise rate of information symbols in bits. (d) The total multiplexing gain $r$ as a function of the SNR-scaling of $k$ .

Solution

Part (a): Codebook size

$|\mathcal{C}| = |\Lambda_c / \Lambda_s| = V(\Lambda_s) / V(\Lambda_c) = (2^k)^{16} / 1 = 2^{16 k}$ .

Part (b): Per-channel-use rate

$R = T^{-1} \log_2 |\mathcal{C}| = (1/4) \cdot 16 k = 4 k$ bits/ch.use.

Part (c): Block-wise information rate

Total bits per block $= 16 k$ . Per channel use of the $T = 4$ block: $4 k$ bits/ch.use, matching (b).

Part (d): Multiplexing gain

The multiplexing gain is $r = R / \log_2(\text{SNR})$ . If $k = \lceil (1/4) r \log_2(\text{SNR}) \rceil$ , then $R = 4 k \approx r \log_2 \text{SNR}$ and the multiplexing gain is $r$ . $\blacksquare$

LAST vs. CDA: Two Routes to the Same DMT Peak

It is worth pausing to place LAST codes next to the CDA-NVD codes of Ch. 13. Both constructions achieve the Zheng-Tse DMT curve, but they do so through radically different mathematical machinery.

CDA-NVD codes (Ch. 13). The proof uses the non-vanishing determinant property of cyclic division algebras — a number-theoretic statement about the algebraic integers of a cyclic Galois extension. The code is explicit (a CDA generator matrix) and deterministic (no random dither needed). Coding gain is an algebraic constant depending on the choice of non-norm element $\gamma$ and the class of the ideal.
LAST codes (this chapter). The proof uses the Minkowski- Hlawka averaging over random lattices — an information- theoretic statement about the generic dense lattice. The code is non-explicit (existence proof) and randomised (common random dither). Coding gain is a constant depending on the second moment of the shaping Voronoi region and the density of the random fine lattice.

The two constructions are complementary, not competing. CDA codes are the best explicit codes for moderate $n_t$ (say, $n_t \le 4$ ) where the number-theoretic machinery is manageable and the resulting sphere decoder is tractable. LAST codes are the information- theoretic baseline that tells us why MIMO lattice codes can achieve the DMT — and, crucially, LAST gives us the freedom to substitute any well-known dense lattice ( $E_8$ , Leech, BW_16) for the fine lattice, which is the path Kumar and Caire (2008) took in §4. For $n_t \ge 5$ or large $n_t T$ products, structured LAST is competitive with or exceeds CDA in finite-SNR performance.

The underlying lesson is the same one we have seen repeatedly: if you have two proofs of the same theorem, each proof offers a different design freedom. Use both.

,

Common Mistake: Random LAST Is an Existence Proof, Not a Deployable Code

Mistake:

A reading of the 2004 LAST paper might suggest that any random lattice, once wrapped in MMSE-GDFE, gives a DMT-optimal, practical space-time code. One might then proceed to sample a "random lattice from the Minkowski-Hlawka ensemble" and try to implement it.

Correction:

The Minkowski-Hlawka existence theorem (Ch. 15) is non-constructive: it averages over the space of lattices with a Haar-like measure and says that a lattice achieving the density bound exists — but does not tell us which lattice does. Sampling from the ensemble requires high-precision arithmetic, is non-reproducible, and does not yield the best finite-SNR coding gain. For practical deployment one must use an explicit dense lattice — which is why the 2008 Kumar-Caire paper (§4) proposed $E_8$ , Leech, and Barnes-Wall lattices. The 2004 existence result is valuable because it tells us what the information-theoretic baseline is; the 2008 constructive result tells us how to deploy a practical code.

,

The LAST Code Construction