Ferkans — Interactive Telecom Tutor

The Curve that Binds Diversity and Multiplexing

We are ready to state the central theorem of Part III: the Zheng-Tse diversity-multiplexing tradeoff. For an $n_t \times n_r$ i.i.d. Rayleigh channel with block length $L \ge n_t + n_r - 1$ , the optimal tradeoff curve $d^*(r)$ is the piecewise-linear interpolation of the points $(k,\, (n_t - k)(n_r - k)), \qquad k = 0, 1, \ldots, \min(n_t, n_r).$

The theorem has two independent parts:

Converse — an outage-probability lower bound: no code family operating at multiplexing gain $r$ can achieve diversity gain exceeding $(n_t - r)(n_r - r)$ . This is a purely information-theoretic statement, proved by computing the outage exponent from the Wishart eigenvalue distribution of $\mathbf{H}\mathbf{H}^{H}$ .
Achievability — a constructive upper bound: a Gaussian random codebook with i.i.d. $\mathcal{CN}(0, \text{SNR}/n_t)$ entries and rate $r \log_2 \text{SNR}$ achieves diversity exactly $(n_t - r)(n_r - r)$ in the high-SNR limit. This is the error-exponent counterpart of Shannon's random-coding theorem, lifted to the exponential- equality regime.

The matching of the two bounds gives a tight characterisation: the DMT curve is the exact, best-possible tradeoff. The proof pattern — outage converse + Gaussian random-coding achievability — is the same pattern as ergodic-capacity proofs (Chapter 10), adapted for the exponential-equality regime. We walk through both parts in full.

,

Definition:
Diversity-Multiplexing Tradeoff Curve $d^*(r)$

The diversity-multiplexing tradeoff (DMT) curve of an $n_t \times n_r$ MIMO channel is the function $d^*(r) \;=\; \sup\{d^* \;:\; \exists \text{ family of codes with multiplexing gain } r \text{ and diversity gain } d^*\}, \qquad r \in [0, r_{\max}].$ By definition $d^*(r)$ is the pointwise maximum diversity attainable at each multiplexing gain $r$ . The curve is non-increasing in $r$ : higher $r$ costs more diversity.

Endpoints. At $r = 0$ (fixed rate), $d^*(0)$ is the maximum classical diversity order — for an $n_t \times n_r$ i.i.d. Rayleigh channel this is $n_t n_r$ . At $r = r_{\max} = \min(n_t, n_r)$ (maximum multiplexing slope, approaching capacity), $d^*(r_{\max}) = 0$ : there is no reliability margin left when the rate tracks capacity.

Operational meaning. A code family's operating point $(r, d^*)$ must lie below the curve: $d^* \le d^*(r)$ . A code is DMT-optimal if its operating point lies on the curve for the entire range $r \in [0, r_{\max}]$ — i.e., it achieves the best diversity for every multiplexing slope. Alamouti achieves the curve only at $r = 0$ ; the Golden code (§4) achieves it for all $r \in [0, 2]$ on $2 \times 2$ .

,

Theorem: Zheng-Tse Diversity-Multiplexing Tradeoff

Consider an $n_t \times n_r$ i.i.d. Rayleigh MIMO channel with coherent detection and block length $L \ge n_t + n_r - 1$ . The DMT curve is the piecewise-linear interpolation of the integer corner points $d^*(k) \;=\; (n_t - k)(n_r - k), \qquad k = 0, 1, \ldots, \min(n_t, n_r).$ Explicitly, for $r \in [k, k+1]$ with $k \in \{0, 1, \ldots, \min(n_t, n_r) - 1\}$ , $d^*(r) \;=\; (n_t - k)(n_r - k) - \big[(n_t - k)(n_r - k) - (n_t - k - 1)(n_r - k - 1)\big] (r - k).$ The maximum multiplexing gain is $r_{\max} = \min(n_t, n_r)$ and the maximum diversity gain is $d^*(0) = n_t n_r$ .

The optimal tradeoff is achieved by a Gaussian random codebook with i.i.d. $\mathcal{CN}(0, \text{SNR}/n_t)$ entries.

Think of $r$ as "how many independent streams you are sending", and $d^*$ as "how much protection each stream gets". The channel has $n_t n_r$ independent scalar fading coefficients in $\mathbf{H}$ . If you send $k$ streams, they "consume" $k$ rows of $\mathbf{H}$ on the transmit side and $k$ columns on the receive side; what remains for diversity protection is an $(n_t - k) \times (n_r - k)$ sub-channel with $(n_t - k)(n_r - k)$ independent fading coefficients. That sub-channel supplies the diversity exponent. The tradeoff is exactly the geometric product of the "spare" transmit and receive dimensions.

Show Hint

Converse — compute the outage exponent from the joint Wishart density of $\mathbf{H}\mathbf{H}^{H}$ eigenvalues.

Achievability — use a Gaussian random codebook at rate $r \log_2 \text{SNR}$ ; bound the PEP by the outage probability plus an exponentially small correction.

Matching — both the converse and achievability computations reduce to the same large-deviations principle on the eigenvalue vector, with the rate function $\sum_i (2i - 1 + n_r - n_t)\alpha_i^+$ under $\sum_i (1 - \alpha_i)^+ < r$ .

Proof

Setup — rewrite outage as eigenvalue event

Let $\mathbf{H}$ be $n_r \times n_t$ with i.i.d. $\mathcal{CN}(0, 1)$ entries. The instantaneous capacity is $C(\mathbf{H}) = \log_2 \det\!\left(\mathbf{I}_{n_r} + \tfrac{\text{SNR}}{n_t} \mathbf{H}\mathbf{H}^{H}\right) = \sum_{i = 1}^{m} \log_2 (1 + \tfrac{\text{SNR}}{n_t} \lambda_i),$ where $m = \min(n_t, n_r)$ and $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_m$ are the ordered nonzero eigenvalues of $\mathbf{H}\mathbf{H}^{H}$ . Reparametrise each eigenvalue as $\lambda_i = \text{SNR}^{-\alpha_i}$ : the exponent vector $\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_m)$ with $\alpha_1 \le \alpha_2 \le \cdots \le \alpha_m$ encodes the "fading depth" of each eigenvalue.

The outage event at rate $R = r \log_2 \text{SNR}$ is $\mathcal{O} = \{C(\mathbf{H}) < r \log_2 \text{SNR}\} = \Big\{\sum_{i} \log(1 + \text{SNR}^{1-\alpha_i}) < r \log \text{SNR}\Big\}.$ For large $\text{SNR}$ , $\log(1 + \text{SNR}^{1-\alpha_i}) \doteq (1 - \alpha_i)^+ \log \text{SNR}$ , so the outage event reduces to $\mathcal{O}_{\text{SNR}} = \Big\{\sum_i (1 - \alpha_i)^+ < r\Big\}.$

Eigenvalue density — Wishart asymptotics

The joint density of the ordered eigenvalues of $\mathbf{H}\mathbf{H}^{H}$ (complex Wishart, $n_r \ge n_t$ case) is $p(\boldsymbol{\lambda}) = K \prod_i \lambda_i^{n_r - n_t} \prod_{i < j} (\lambda_i - \lambda_j)^2 \exp(-\sum_i \lambda_i),$ where $K$ is a normalising constant. Substituting $\lambda_i = \text{SNR}^{-\alpha_i}$ , $d\lambda_i = -(\log\text{SNR}) \text{SNR}^{-\alpha_i} d\alpha_i$ , the density in $\boldsymbol{\alpha}$ is $p(\boldsymbol{\alpha}) \doteq \prod_i \text{SNR}^{-\alpha_i(n_r - n_t + 1)} \prod_{i < j} \text{SNR}^{-2\alpha_i} \exp(-\text{SNR}^{-\alpha_m}) \cdot (\log \text{SNR})^m.$ The $\exp(-\text{SNR}^{-\alpha_m})$ factor is $\le 1$ for $\alpha_m \ge 0$ and contributes exponent $0$ if $\alpha_m > 0$ , or suppresses the density super-polynomially if $\alpha_m < 0$ . So the relevant regime is $\boldsymbol{\alpha} \in \mathbb{R}_+^m$ (all eigenvalues decay in SNR), and $p(\boldsymbol{\alpha}) \doteq \text{SNR}^{-\sum_i (2i - 1 + n_r - n_t)\alpha_i}$ after collecting the Vandermonde and marginal factors (assuming $\alpha_1 \le \alpha_2 \le \cdots \le \alpha_m$ ; note the Vandermonde contributes $-2 \alpha_i (i - 1)$ from the $j > i$ pairs and $-2 \alpha_i(m - i)$ from the $j < i$ pairs, summing to $-2\alpha_i (m-1)$ ; but with ordering, only $j > i$ pairs contribute and the exponent becomes $-\sum_i (2(i - 1))\alpha_i$ , which combined with the $\prod_i \lambda_i^{n_r - n_t}$ factor gives the $(2i - 1 + n_r - n_t)\alpha_i$ coefficient; see Zheng-Tse eq. (15)).

Outage exponent — large-deviations optimisation

The outage probability at multiplexing gain $r$ is $P_{\rm out}(r \log_2 \text{SNR}) = \int_{\sum_i (1 - \alpha_i)^+ < r} p(\boldsymbol{\alpha})\,d\boldsymbol{\alpha} \doteq \text{SNR}^{-d^*_{\rm out}(r)},$ with $d^*_{\rm out}(r) = \inf_{\boldsymbol{\alpha}\in\mathcal{A}(r)} \sum_{i=1}^m (2i - 1 + n_r - n_t)\alpha_i,$ where $\mathcal{A}(r) = \{\boldsymbol{\alpha}: 0 \le \alpha_1 \le \cdots \le \alpha_m,\, \sum_i (1 - \alpha_i)^+ < r\}$ . This is a linear program in $\boldsymbol{\alpha}$ over a convex region.

For $r \in [k, k+1]$ with integer $k$ , the optimum is attained at $\alpha_i = 0$ for $i \le k$ (these eigenvalues fail to fade) and $\alpha_i = 1$ for $i > k$ (these eigenvalues fade at rate $1 + \epsilon$ ), with $\alpha_{k+1}$ tuned to exactly make $\sum_i (1-\alpha_i)^+ = r$ : $\alpha_{k+1} = 1 - (r - k)$ . Plugging in, the sum $\sum_i (2i - 1 + n_r - n_t)\alpha_i$ at the optimum evaluates to the piecewise-linear interpolation of $(n_t - k)(n_r - k)$ at $r = k$ . After the algebra (Zheng-Tse Lemma 5), one obtains exactly $d^*_{\rm out}(r) = (n_t - r)(n_r - r) \text{ at integer } r,$ with linear interpolation between integers.

Converse — diversity of any code is $\le d^*_{\rm out}(r)$

For any coding scheme at rate $R(\text{SNR}) = r \log_2 \text{SNR}$ , the block error probability is lower-bounded by the outage: $P_e(\text{SNR}) \ge P_{\rm out}(r \log_2 \text{SNR}) \doteq \text{SNR}^{-d^*_{\rm out}(r)}.$ Therefore the diversity gain satisfies $d^* \le d^*_{\rm out}(r)$ . No code can do better than the outage exponent in diversity.

Achievability — Gaussian random codebook

Consider a random codebook $\mathcal{C}$ of size $M = 2^{r L \log_2 \text{SNR}} = \text{SNR}^{r L}$ , each codeword $\ntn{X} \in \mathbb{C}^{n_t \times L}$ with i.i.d. $\mathcal{CN}(0, \text{SNR}/n_t)$ entries. The average block error probability under ML decoding is bounded by $\bar P_e \le P_{\rm out}(r \log_2 \text{SNR}) + \mathbb{P}(\mathcal{E}^c \cap \mathcal{O}^c).$ The first term is the outage — its exponent is $d^*_{\rm out}(r)$ . The second term (undetected error given not in outage) can be made exponentially smaller than the outage by a Gallager random-coding argument provided $L \ge n_t + n_r - 1$ : the ensemble average over Gaussian codebooks, combined with a union bound and the Chernoff-bound-to-PEP analysis of Chapter 11, gives $\mathbb{P}(\mathcal{E} \cap \mathcal{O}^c) \doteq \text{SNR}^{-\infty},$ i.e., faster-than-polynomial decay. The block-length assumption $L \ge n_t + n_r - 1$ is needed so that the error-matrix product $\boldsymbol{\Delta}\boldsymbol{\Delta}^H$ is full-rank with probability $1$ — §5 discusses what breaks when $L$ is shorter.

Combining, $\bar P_e \doteq \text{SNR}^{-d^*_{\rm out}(r)}$ . The matching converse and achievability give $d^*(r) = d^*_{\rm out}(r) = (n_t - r)(n_r - r)$ at integer $r$ , with linear interpolation. $\blacksquare$

,

Building the DMT Curve: Corner Points $(k, (n_t-k)(n_r-k))$

Animated construction of the DMT curve for a

2 \times 2

MIMO channel. The corner points

(0, n_t n_r) = (0, 4)

,

(1, (n_t - 1)(n_r - 1)) = (1, 1)

, and

(\min(n_t, n_r), 0) = (2, 0)

appear one at a time, and then the piecewise-linear interpolation is drawn in between. This is the geometric content of the Zheng-Tse theorem — the DMT curve as the convex hull of the integer corner points.

For a

2 \times 2

i.i.d. Rayleigh MIMO channel, the DMT curve is the piecewise-linear graph through

(0, 4)

,

(1, 1)

, and

(2, 0)

. The initial slope (

-3

) is steeper than the final slope (

-1

): the first unit of multiplexing costs three units of diversity, the second unit costs one.

The DMT Curve $d^*(r)$ for Configurable $(n_t, n_r)$

Explore the DMT curve $d^*(r) = (n_t - r)(n_r - r)$ (piecewise-linear between integer corner points) as a function of $(n_t, n_r)$ . The corner points $(k, (n_t - k)(n_r - k))$ for $k = 0, 1, \ldots, \min(n_t, n_r)$ are marked. Use the "Compare with" dropdown to overlay a second configuration and see how asymmetric $n_t \ne n_r$ (e.g., $2 \times 4$ vs $4 \times 2$ ) gives identical DMT curves — the $(r, d^*)$ pair is symmetric in $(n_t, n_r)$ .

Parameters

Transmit antennas

n_t

2

Receive antennas

n_r

2

Compare with

Example: DMT Corner Points for $4\times 4$ , $2\times 4$ , $4\times 2$ , $2\times 2$

For each of the MIMO configurations $(n_t, n_r) \in \{(4, 4), (2, 4), (4, 2), (2, 2)\}$ , list the corner points of the DMT curve. Sketch the four curves on a single graph and identify the maximum diversity and maximum multiplexing of each.

Solution

$4 \times 4$

$\min(n_t, n_r) = 4$ . Corners at $r = 0, 1, 2, 3, 4$ : $(0, 16),\, (1, 9),\, (2, 4),\, (3, 1),\, (4, 0).$ Max diversity $d^*(0) = 16$ ; max multiplexing $r_{\max} = 4$ . The curve is the piecewise-linear graph through these five points.

$2 \times 4$ and $4 \times 2$

$\min(n_t, n_r) = 2$ in both cases. Corners at $r = 0, 1, 2$ : $(0, 8),\, (1, 3),\, (2, 0).$ Max diversity $d^*(0) = 8$ (= $n_t n_r$ ); max multiplexing $r_{\max} = 2$ (= $\min(n_t, n_r)$ ).

The DMT curve is identical for $2 \times 4$ and $4 \times 2$ : the formula $(n_t - k)(n_r - k)$ is symmetric in $(n_t, n_r)$ . Even though the two physical channels are different (uplink vs downlink asymmetry), their fundamental-limit tradeoff is the same. The asymmetry re-emerges at finite SNR through coding gain and through the complexity of transmitter/receiver processing.

$2 \times 2$

$\min(n_t, n_r) = 2$ . Corners at $r = 0, 1, 2$ : $(0, 4),\, (1, 1),\, (2, 0).$ This is the canonical $2 \times 2$ DMT curve — the one that governs Alamouti, V-BLAST, and the Golden code. The steep initial slope $(0 \to 1: \Delta d^* = -3)$ vs shallow final slope $(1 \to 2: \Delta d^* = -1)$ is the tell-tale piecewise structure.

Takeaway

Larger MIMO dimensions give both more diversity (at any fixed $r$ ) and more multiplexing (higher $r_{\max}$ ). There is no tradeoff between adding antennas and any of the two resources — the tradeoff is within a fixed antenna pair, between $r$ and $d^*(r)$ . This is why 5G massive MIMO aims for $n_t \ge 64$ : even a modest rank $k \ll n_t$ retains a large $(n_t - k)(n_r - k)$ diversity buffer.

,

Pattern-Aware: Zheng-Tse = Shannon, One Level Up

The structure of the Zheng-Tse proof should feel familiar:

Converse: the outage lower bound — error probability $\ge$ outage probability at the target rate. This is the DMT-regime analogue of "error probability is lower bounded by the probability that rate exceeds capacity", the Fano-inequality backbone of classical converses.
Achievability: a Gaussian random codebook. Same distribution as Telatar's ergodic-capacity proof — and in fact, the same codebook achieves both the ergodic capacity at $r = r_{\max}$ and the full DMT curve at every other $r$ . Gaussian is doubly universal.
Matching: both computations reduce to the same large-deviations rate function on the eigenvalue exponent vector $\boldsymbol{\alpha}$ , with the Vandermonde factor $\prod_{i<j}(\lambda_i - \lambda_j)^2$ of the Wishart distribution driving the coefficients $(2i - 1 + n_r - n_t)$ . The LP optimum is the piecewise-linear DMT curve.

So Zheng-Tse is Shannon's random-coding-argument template, lifted from the rate-reliability plane $(R, E)$ to the multiplexing-diversity plane $(r, d)$ . The same proof machinery, a different scaling regime. This is why Gaussian i.i.d. codebooks "just work" in MIMO theory — they are the default answer to "what codebook achieves the exponent", at every level of the hierarchy.

,

Quick Check

On a $3 \times 3$ i.i.d. Rayleigh MIMO channel with block length $L = 5 \ge n_t + n_r - 1$ , what is $d^*(1.5)$ (linearly interpolated)?

$d^*(1.5) = 9 \cdot 0.5 + 4 \cdot 0.5 = 6.5$

$d^*(1.5) = 4 \cdot 0.5 + 1 \cdot 0.5 = 2.5$

$d^*(1.5) = (3 - 1.5)^2 = 2.25$

$d^*(1.5) = 0$ (because $r = 1.5$ exceeds the spatial degrees of freedom $m = 3$ )

Correction:

d^*(1.5) = 4 \cdot 0.5 + 1 \cdot 0.5 = 2.5

Corner points: $d^*(0) = 3 \cdot 3 = 9$ , $d^*(1) = 2 \cdot 2 = 4$ , $d^*(2) = 1 \cdot 1 = 1$ , $d^*(3) = 0$ . At $r = 1.5$ , interpolate linearly between $(1, 4)$ and $(2, 1)$ : $d^*(1.5) = 4 + (1 - 4) \cdot 0.5 = 2.5$ . Equivalently, $(3 - 1.5)(3 - 1.5) = 2.25$ is not the correct answer at non-integer $r$ — the curve is piecewise-linear, not the continuous quadratic.

Common Mistake: $d^*(r) \ne (n_t - r)(n_r - r)$ at Non-Integer $r$

Mistake:

Using the continuous formula $d^*(r) = (n_t - r)(n_r - r)$ at non-integer $r$ — e.g., writing $d^*(1.5) = (n_t - 1.5)(n_r - 1.5)$ .

Correction:

The continuous formula $(n_t - r)(n_r - r)$ is only correct at integer $r$ . Between integer corner points the DMT is the piecewise-linear interpolation of those corners. The piecewise-linear curve is above the continuous quadratic: e.g., on $3 \times 3$ at $r = 1.5$ , the piecewise-linear value is $2.5$ but the quadratic would give $2.25$ .

The quadratic is the lower envelope (the achievable exponent using a fixed-rank scheme) while the piecewise linear is the full tradeoff (achievable by time-sharing / rank-adaptive schemes). The $0.25$ diversity gap is the benefit of rank adaptation / time-sharing, which connects fractional $r$ to adjacent integer operating points.

⚠️Engineering Note

DMT Asymptotics vs Finite-SNR Practice

The DMT curve is an asymptotic statement: $d^*(r)$ is exact only as $\text{SNR} \to \infty$ . In practice, "high SNR" in cellular and Wi-Fi systems means $10$ – $35$ dB, which is where the DMT is approximate. At these operating points:

The empirical diversity exponent lags the asymptotic $d^*(r)$ by $10$ – $30$ %. The interactive plot above shows this convergence gap directly: the empirical $-\log P_{\rm out}/\log \text{SNR}$ slope at $\text{SNR} = 20$ dB is typically $0.7 d^*$ to $0.9 d^*$ , reaching $\ge 0.95 d^*$ only around $30$ dB.
Coding gain — the constant in front of $\text{SNR}^{-d^*}$ — matters as much as $d^*$ itself at these SNRs. A 3 dB coding-gain gap corresponds to a 2x SNR improvement; two codes with the same $d^*$ but different coding gains can differ substantially in practice.
Chapter 13 introduces CDA / Golden-code families that achieve both the optimal $d^*(r)$ and high coding gain, closing the finite-SNR gap.

Practical Constraints

•
DMT asymptotic SNR regime: typically $\text{SNR} \ge 20$ dB for $2 \times 2$ , $\ge 25$ dB for $4 \times 4$ .
•
Finite-SNR gap between empirical and asymptotic exponent: $10$ – $30$ % at cellular operating SNRs.
•
Coding gain and DMT are independent design targets — optimize both.

📋 Ref: 3GPP TR 38.802 (study on NR access technology), §7

,

Historical Note: Zheng and Tse 2003: The Unifying Tradeoff

2003

Lizhong Zheng and David Tse's May 2003 IEEE Trans. Information Theory paper, "Diversity and multiplexing: A fundamental tradeoff in multiple- antenna channels" (vol. 49, no. 5, pp. 1073–1096), is one of the most cited papers in wireless information theory. Zheng, then a PhD student at Berkeley, and Tse, his advisor, resolved the five-year-old Alamouti- vs-V-BLAST tension by showing that both schemes are corner points of a single piecewise-linear curve — the DMT.

Three contributions made the paper a classic:

The right question. The prior literature debated "is Alamouti good?" and "is V-BLAST good?" as if they competed; Zheng-Tse reframed the question as "which $(r, d)$ pair are you targeting, and is your code optimal at that point?"
The closed-form answer. $d^*(r) = (n_t - r)(n_r - r)$ at integer corners is a formula that even wireless practitioners (not just information theorists) could compute and use as a design target.
The proof technique. The Wishart large-deviations + Gaussian random-coding matching proof became the template for dozens of follow-up papers on DMT in other channel models: correlated fading (Telatar-Tse variants), ARQ (El Gamal-Caire-Damen), multicasting, MIMO broadcast, etc. The proof template is as influential as the theorem itself.

Subsequent CommIT-group work built directly on Zheng-Tse: Elia-Kumar- Pawar-Kumar-Lu 2006 constructed explicit DMT-optimal codes (CDA / Golden), and El Gamal-Caire-Damen 2004 proved lattice codes achieve the DMT. These are the topics of Chapters 13 and 17 respectively.

DMT Curve

The function $d^*(r)$ giving the maximum diversity gain achievable at multiplexing gain $r$ on an $n_t \times n_r$ i.i.d. Rayleigh channel. For block length $L \ge n_t + n_r - 1$ , it is the piecewise-linear interpolation of $(k, (n_t - k)(n_r - k))$ for $k = 0, 1, \ldots, \min(n_t, n_r)$ (Zheng-Tse 2003). The fundamental performance limit of any MIMO space-time code.

Wishart Distribution

The distribution of $\mathbf{H}\mathbf{H}^{H}$ when $\mathbf{H}$ is an $n_r \times n_t$ matrix with i.i.d. $\mathcal{CN}(0, 1)$ entries. The joint density of the ordered eigenvalues carries a $\prod_i \lambda_i^{n_r - n_t} \prod_{i<j}(\lambda_i - \lambda_j)^2$ factor (Vandermonde times marginal), which drives the DMT large-deviations exponent computation.

The Zheng-Tse Theorem