The Zheng-Tse Theorem

The Curve that Binds Diversity and Multiplexing

We are ready to state the central theorem of Part III: the Zheng-Tse diversity-multiplexing tradeoff. For an nt×nrn_t \times n_r i.i.d. Rayleigh channel with block length Lnt+nr1L \ge n_t + n_r - 1, the optimal tradeoff curve d(r)d^*(r) is the piecewise-linear interpolation of the points (k,(ntk)(nrk)),k=0,1,,min(nt,nr).(k,\, (n_t - k)(n_r - k)), \qquad k = 0, 1, \ldots, \min(n_t, n_r).

The theorem has two independent parts:

  1. Converse — an outage-probability lower bound: no code family operating at multiplexing gain rr can achieve diversity gain exceeding (ntr)(nrr)(n_t - r)(n_r - r). This is a purely information-theoretic statement, proved by computing the outage exponent from the Wishart eigenvalue distribution of HHH\mathbf{H}\mathbf{H}^{H}.

  2. Achievability — a constructive upper bound: a Gaussian random codebook with i.i.d. CN(0,SNR/nt)\mathcal{CN}(0, \text{SNR}/n_t) entries and rate rlog2SNRr \log_2 \text{SNR} achieves diversity exactly (ntr)(nrr)(n_t - r)(n_r - r) in the high-SNR limit. This is the error-exponent counterpart of Shannon's random-coding theorem, lifted to the exponential- equality regime.

The matching of the two bounds gives a tight characterisation: the DMT curve is the exact, best-possible tradeoff. The proof pattern — outage converse + Gaussian random-coding achievability — is the same pattern as ergodic-capacity proofs (Chapter 10), adapted for the exponential-equality regime. We walk through both parts in full.

,

Definition:

Diversity-Multiplexing Tradeoff Curve d(r)d^*(r)

The diversity-multiplexing tradeoff (DMT) curve of an nt×nrn_t \times n_r MIMO channel is the function d(r)  =  sup{d  :   family of codes with multiplexing gain r and diversity gain d},r[0,rmax].d^*(r) \;=\; \sup\{d^* \;:\; \exists \text{ family of codes with multiplexing gain } r \text{ and diversity gain } d^*\}, \qquad r \in [0, r_{\max}]. By definition d(r)d^*(r) is the pointwise maximum diversity attainable at each multiplexing gain rr. The curve is non-increasing in rr: higher rr costs more diversity.

Endpoints. At r=0r = 0 (fixed rate), d(0)d^*(0) is the maximum classical diversity order — for an nt×nrn_t \times n_r i.i.d. Rayleigh channel this is ntnrn_t n_r. At r=rmax=min(nt,nr)r = r_{\max} = \min(n_t, n_r) (maximum multiplexing slope, approaching capacity), d(rmax)=0d^*(r_{\max}) = 0: there is no reliability margin left when the rate tracks capacity.

Operational meaning. A code family's operating point (r,d)(r, d^*) must lie below the curve: dd(r)d^* \le d^*(r). A code is DMT-optimal if its operating point lies on the curve for the entire range r[0,rmax]r \in [0, r_{\max}] — i.e., it achieves the best diversity for every multiplexing slope. Alamouti achieves the curve only at r=0r = 0; the Golden code (§4) achieves it for all r[0,2]r \in [0, 2] on 2×22 \times 2.

,

Theorem: Zheng-Tse Diversity-Multiplexing Tradeoff

Consider an nt×nrn_t \times n_r i.i.d. Rayleigh MIMO channel with coherent detection and block length Lnt+nr1L \ge n_t + n_r - 1. The DMT curve is the piecewise-linear interpolation of the integer corner points d(k)  =  (ntk)(nrk),k=0,1,,min(nt,nr).d^*(k) \;=\; (n_t - k)(n_r - k), \qquad k = 0, 1, \ldots, \min(n_t, n_r). Explicitly, for r[k,k+1]r \in [k, k+1] with k{0,1,,min(nt,nr)1}k \in \{0, 1, \ldots, \min(n_t, n_r) - 1\}, d(r)  =  (ntk)(nrk)[(ntk)(nrk)(ntk1)(nrk1)](rk).d^*(r) \;=\; (n_t - k)(n_r - k) - \big[(n_t - k)(n_r - k) - (n_t - k - 1)(n_r - k - 1)\big] (r - k). The maximum multiplexing gain is rmax=min(nt,nr)r_{\max} = \min(n_t, n_r) and the maximum diversity gain is d(0)=ntnrd^*(0) = n_t n_r.

The optimal tradeoff is achieved by a Gaussian random codebook with i.i.d. CN(0,SNR/nt)\mathcal{CN}(0, \text{SNR}/n_t) entries.

Think of rr as "how many independent streams you are sending", and dd^* as "how much protection each stream gets". The channel has ntnrn_t n_r independent scalar fading coefficients in H\mathbf{H}. If you send kk streams, they "consume" kk rows of H\mathbf{H} on the transmit side and kk columns on the receive side; what remains for diversity protection is an (ntk)×(nrk)(n_t - k) \times (n_r - k) sub-channel with (ntk)(nrk)(n_t - k)(n_r - k) independent fading coefficients. That sub-channel supplies the diversity exponent. The tradeoff is exactly the geometric product of the "spare" transmit and receive dimensions.

,

Building the DMT Curve: Corner Points (k,(ntk)(nrk))(k, (n_t-k)(n_r-k))

Animated construction of the DMT curve for a 2×22 \times 2 MIMO channel. The corner points (0,ntnr)=(0,4)(0, n_t n_r) = (0, 4), (1,(nt1)(nr1))=(1,1)(1, (n_t - 1)(n_r - 1)) = (1, 1), and (min(nt,nr),0)=(2,0)(\min(n_t, n_r), 0) = (2, 0) appear one at a time, and then the piecewise-linear interpolation is drawn in between. This is the geometric content of the Zheng-Tse theorem — the DMT curve as the convex hull of the integer corner points.
For a 2×22 \times 2 i.i.d. Rayleigh MIMO channel, the DMT curve is the piecewise-linear graph through (0,4)(0, 4), (1,1)(1, 1), and (2,0)(2, 0). The initial slope (3-3) is steeper than the final slope (1-1): the first unit of multiplexing costs three units of diversity, the second unit costs one.

The DMT Curve d(r)d^*(r) for Configurable (nt,nr)(n_t, n_r)

Explore the DMT curve d(r)=(ntr)(nrr)d^*(r) = (n_t - r)(n_r - r) (piecewise-linear between integer corner points) as a function of (nt,nr)(n_t, n_r). The corner points (k,(ntk)(nrk))(k, (n_t - k)(n_r - k)) for k=0,1,,min(nt,nr)k = 0, 1, \ldots, \min(n_t, n_r) are marked. Use the "Compare with" dropdown to overlay a second configuration and see how asymmetric ntnrn_t \ne n_r (e.g., 2×42 \times 4 vs 4×24 \times 2) gives identical DMT curves — the (r,d)(r, d^*) pair is symmetric in (nt,nr)(n_t, n_r).

Parameters
2
2

Example: DMT Corner Points for 4×44\times 4, 2×42\times 4, 4×24\times 2, 2×22\times 2

For each of the MIMO configurations (nt,nr){(4,4),(2,4),(4,2),(2,2)}(n_t, n_r) \in \{(4, 4), (2, 4), (4, 2), (2, 2)\}, list the corner points of the DMT curve. Sketch the four curves on a single graph and identify the maximum diversity and maximum multiplexing of each.

,

Pattern-Aware: Zheng-Tse = Shannon, One Level Up

The structure of the Zheng-Tse proof should feel familiar:

  • Converse: the outage lower bound — error probability \ge outage probability at the target rate. This is the DMT-regime analogue of "error probability is lower bounded by the probability that rate exceeds capacity", the Fano-inequality backbone of classical converses.
  • Achievability: a Gaussian random codebook. Same distribution as Telatar's ergodic-capacity proof — and in fact, the same codebook achieves both the ergodic capacity at r=rmaxr = r_{\max} and the full DMT curve at every other rr. Gaussian is doubly universal.
  • Matching: both computations reduce to the same large-deviations rate function on the eigenvalue exponent vector α\boldsymbol{\alpha}, with the Vandermonde factor i<j(λiλj)2\prod_{i<j}(\lambda_i - \lambda_j)^2 of the Wishart distribution driving the coefficients (2i1+nrnt)(2i - 1 + n_r - n_t). The LP optimum is the piecewise-linear DMT curve.

So Zheng-Tse is Shannon's random-coding-argument template, lifted from the rate-reliability plane (R,E)(R, E) to the multiplexing-diversity plane (r,d)(r, d). The same proof machinery, a different scaling regime. This is why Gaussian i.i.d. codebooks "just work" in MIMO theory — they are the default answer to "what codebook achieves the exponent", at every level of the hierarchy.

,

Quick Check

On a 3×33 \times 3 i.i.d. Rayleigh MIMO channel with block length L=5nt+nr1L = 5 \ge n_t + n_r - 1, what is d(1.5)d^*(1.5) (linearly interpolated)?

d(1.5)=90.5+40.5=6.5d^*(1.5) = 9 \cdot 0.5 + 4 \cdot 0.5 = 6.5

d(1.5)=40.5+10.5=2.5d^*(1.5) = 4 \cdot 0.5 + 1 \cdot 0.5 = 2.5

d(1.5)=(31.5)2=2.25d^*(1.5) = (3 - 1.5)^2 = 2.25

d(1.5)=0d^*(1.5) = 0 (because r=1.5r = 1.5 exceeds the spatial degrees of freedom m=3m = 3)

Common Mistake: d(r)(ntr)(nrr)d^*(r) \ne (n_t - r)(n_r - r) at Non-Integer rr

Mistake:

Using the continuous formula d(r)=(ntr)(nrr)d^*(r) = (n_t - r)(n_r - r) at non-integer rr — e.g., writing d(1.5)=(nt1.5)(nr1.5)d^*(1.5) = (n_t - 1.5)(n_r - 1.5).

Correction:

The continuous formula (ntr)(nrr)(n_t - r)(n_r - r) is only correct at integer rr. Between integer corner points the DMT is the piecewise-linear interpolation of those corners. The piecewise-linear curve is above the continuous quadratic: e.g., on 3×33 \times 3 at r=1.5r = 1.5, the piecewise-linear value is 2.52.5 but the quadratic would give 2.252.25.

The quadratic is the lower envelope (the achievable exponent using a fixed-rank scheme) while the piecewise linear is the full tradeoff (achievable by time-sharing / rank-adaptive schemes). The 0.250.25 diversity gap is the benefit of rank adaptation / time-sharing, which connects fractional rr to adjacent integer operating points.

⚠️Engineering Note

DMT Asymptotics vs Finite-SNR Practice

The DMT curve is an asymptotic statement: d(r)d^*(r) is exact only as SNR\text{SNR} \to \infty. In practice, "high SNR" in cellular and Wi-Fi systems means 10103535 dB, which is where the DMT is approximate. At these operating points:

  • The empirical diversity exponent lags the asymptotic d(r)d^*(r) by 10103030%. The interactive plot above shows this convergence gap directly: the empirical logPout/logSNR-\log P_{\rm out}/\log \text{SNR} slope at SNR=20\text{SNR} = 20 dB is typically 0.7d0.7 d^* to 0.9d0.9 d^*, reaching 0.95d\ge 0.95 d^* only around 3030 dB.
  • Coding gain — the constant in front of SNRd\text{SNR}^{-d^*} — matters as much as dd^* itself at these SNRs. A 3 dB coding-gain gap corresponds to a 2x SNR improvement; two codes with the same dd^* but different coding gains can differ substantially in practice.
  • Chapter 13 introduces CDA / Golden-code families that achieve both the optimal d(r)d^*(r) and high coding gain, closing the finite-SNR gap.
Practical Constraints
  • DMT asymptotic SNR regime: typically SNR20\text{SNR} \ge 20 dB for 2×22 \times 2, 25\ge 25 dB for 4×44 \times 4.

  • Finite-SNR gap between empirical and asymptotic exponent: 10103030% at cellular operating SNRs.

  • Coding gain and DMT are independent design targets — optimize both.

📋 Ref: 3GPP TR 38.802 (study on NR access technology), §7
,

Historical Note: Zheng and Tse 2003: The Unifying Tradeoff

2003

Lizhong Zheng and David Tse's May 2003 IEEE Trans. Information Theory paper, "Diversity and multiplexing: A fundamental tradeoff in multiple- antenna channels" (vol. 49, no. 5, pp. 1073–1096), is one of the most cited papers in wireless information theory. Zheng, then a PhD student at Berkeley, and Tse, his advisor, resolved the five-year-old Alamouti- vs-V-BLAST tension by showing that both schemes are corner points of a single piecewise-linear curve — the DMT.

Three contributions made the paper a classic:

  1. The right question. The prior literature debated "is Alamouti good?" and "is V-BLAST good?" as if they competed; Zheng-Tse reframed the question as "which (r,d)(r, d) pair are you targeting, and is your code optimal at that point?"
  2. The closed-form answer. d(r)=(ntr)(nrr)d^*(r) = (n_t - r)(n_r - r) at integer corners is a formula that even wireless practitioners (not just information theorists) could compute and use as a design target.
  3. The proof technique. The Wishart large-deviations + Gaussian random-coding matching proof became the template for dozens of follow-up papers on DMT in other channel models: correlated fading (Telatar-Tse variants), ARQ (El Gamal-Caire-Damen), multicasting, MIMO broadcast, etc. The proof template is as influential as the theorem itself.

Subsequent CommIT-group work built directly on Zheng-Tse: Elia-Kumar- Pawar-Kumar-Lu 2006 constructed explicit DMT-optimal codes (CDA / Golden), and El Gamal-Caire-Damen 2004 proved lattice codes achieve the DMT. These are the topics of Chapters 13 and 17 respectively.

DMT Curve

The function d(r)d^*(r) giving the maximum diversity gain achievable at multiplexing gain rr on an nt×nrn_t \times n_r i.i.d. Rayleigh channel. For block length Lnt+nr1L \ge n_t + n_r - 1, it is the piecewise-linear interpolation of (k,(ntk)(nrk))(k, (n_t - k)(n_r - k)) for k=0,1,,min(nt,nr)k = 0, 1, \ldots, \min(n_t, n_r) (Zheng-Tse 2003). The fundamental performance limit of any MIMO space-time code.

Related: Diversity Gain dd^*, Multiplexing Gain rr, Wishart, Outage Probability and ϵ\epsilon-Outage Capacity

Wishart Distribution

The distribution of HHH\mathbf{H}\mathbf{H}^{H} when H\mathbf{H} is an nr×ntn_r \times n_t matrix with i.i.d. CN(0,1)\mathcal{CN}(0, 1) entries. The joint density of the ordered eigenvalues carries a iλinrnti<j(λiλj)2\prod_i \lambda_i^{n_r - n_t} \prod_{i<j}(\lambda_i - \lambda_j)^2 factor (Vandermonde times marginal), which drives the DMT large-deviations exponent computation.

Related: Wishart Eigenvalues, DMT Slope and Segment Structure, Outage Probability and ϵ\epsilon-Outage Capacity, Mimo Channel