The Marchenko-Pastur Law

The Eigenvalue Density of Random Covariance Matrices

We now answer the central question from Section 21.1: what is the limiting spectral distribution of 1mHHH\frac{1}{m}\mathbf{H}^H\mathbf{H} when H\mathbf{H} has i.i.d. CN(0,1)\mathcal{CN}(0,1) entries? The answer — the Marchenko-Pastur law — is one of the most important results in random matrix theory. It tells us exactly how the eigenvalues distribute themselves, and this distribution is what determines the capacity of a Rayleigh fading MIMO channel.

Theorem: The Marchenko-Pastur Law

Let HCn×m\mathbf{H} \in \mathbb{C}^{n \times m} have i.i.d. CN(0,1)\mathcal{CN}(0,1) entries. As n,mn, m \to \infty with n/mβ(0,)n/m \to \beta \in (0, \infty), the empirical spectral distribution of W=1mHHH\mathbf{W} = \frac{1}{m}\mathbf{H}^H\mathbf{H} converges almost surely to the Marchenko-Pastur distribution FβF_\beta with density fMP(λ;β)=12πβλ(λ+λ)(λλ)1{λλλ+},f_{\mathrm{MP}}(\lambda; \beta) = \frac{1}{2\pi\beta\lambda}\sqrt{(\lambda_+ - \lambda)(\lambda - \lambda_-)}\, \mathbf{1}_{\{\lambda_- \leq \lambda \leq \lambda_+\}}, where λ±=(1±β)2\lambda_{\pm} = (1 \pm \sqrt{\beta})^2.

If β>1\beta > 1, there is additionally a point mass of weight (11/β)(1 - 1/\beta) at λ=0\lambda = 0 (corresponding to the mnm - n zero eigenvalues when n<mn < m... but since we form HHHCm×m\mathbf{H}^H\mathbf{H} \in \mathbb{C}^{m \times m} with rank at most nn, for β>1\beta > 1 we have n<mn < m and there are mnm - n zero eigenvalues).

More precisely, for β>1\beta > 1: Fβ(λ)=(11β)1{λ0}+0λfMP(t;β)dt.F_\beta(\lambda) = \left(1 - \frac{1}{\beta}\right)\mathbf{1}_{\{\lambda \geq 0\}} + \int_0^{\lambda} f_{\mathrm{MP}}(t;\beta)\, dt.

The density has a characteristic "bulk" shape supported on [λ,λ+][\lambda_-, \lambda_+]. When β=1\beta = 1 (square matrix), the support starts at λ=0\lambda_- = 0 and extends to λ+=4\lambda_+ = 4. As β0\beta \to 0 (many more columns than rows), the distribution concentrates around λ=1\lambda = 1 — the eigenvalues become nearly deterministic. The width of the support, λ+λ=4β\lambda_+ - \lambda_- = 4\sqrt{\beta}, measures the "spread" of eigenvalues and directly affects capacity.

,

Example: Marchenko-Pastur Density for β=1\beta = 1 (Square Matrix)

Write out the Marchenko-Pastur density for β=1\beta = 1 and compute its first two moments.

Example: Moments of the Marchenko-Pastur Distribution

Show that the kk-th moment of the Marchenko-Pastur distribution satisfies the recursion mk=j=0k1βj(kj)(k1j)/(j+1)m_k = \sum_{j=0}^{k-1} \beta^j \binom{k}{j}\binom{k-1}{j} / (j+1) (Narayana numbers). Compute m1,m2,m3m_1, m_2, m_3 for general β\beta.

Marchenko-Pastur Density for Varying β\beta

Visualize the Marchenko-Pastur density fMP(λ;β)f_{\mathrm{MP}}(\lambda; \beta) and see how the support [λ,λ+][\lambda_-, \lambda_+] changes with the aspect ratio β=n/m\beta = n/m.

Parameters
0.5

Definition:

Support and Condition Number of the Marchenko-Pastur Distribution

The Marchenko-Pastur distribution with parameter β(0,1]\beta \in (0,1] is supported on [λ,λ+][\lambda_-, \lambda_+] where λ=(1β)2,λ+=(1+β)2.\lambda_- = (1 - \sqrt{\beta})^2, \quad \lambda_+ = (1 + \sqrt{\beta})^2. The ratio λ+/λ=(1+β1β)2\lambda_+/\lambda_- = \left(\frac{1 + \sqrt{\beta}}{1 - \sqrt{\beta}}\right)^2 is the asymptotic condition number of the Wishart matrix. It grows as β1\beta \to 1 (square matrices are ill-conditioned) and shrinks as β0\beta \to 0 (many more columns than rows concentrates the eigenvalues near 1).

Theorem: Asymptotic MIMO Capacity via Marchenko-Pastur

For an i.i.d. Rayleigh fading MIMO channel with nrn_r receive and ntn_t transmit antennas, nt/nrβn_t/n_r \to \beta, and equal power allocation at SNR ρ\rho, the per-receive-antenna ergodic capacity converges: Cnrlog2(1+ρλ)dFβ(λ)=λλ+log2(1+ρλ)fMP(λ;β)dλ.\frac{C}{n_r} \to \int \log_2(1 + \rho\lambda)\, dF_\beta(\lambda) = \int_{\lambda_-}^{\lambda_+} \log_2(1 + \rho\lambda)\, f_{\mathrm{MP}}(\lambda;\beta)\, d\lambda. This integral can be evaluated in closed form using the Stieltjes transform (Section 21.3).

Each eigenvalue contributes log2(1+ρλ)\log_2(1 + \rho\lambda) to the capacity. As the matrix grows, the eigenvalue histogram hardens to the MP density, so the sum over eigenvalues converges to an integral. The capacity per antenna is a deterministic functional of the MP distribution — no randomness remains in the large-system limit.

Quick Check

For β=1/4\beta = 1/4 (i.e., 4 times more columns than rows), what is the support [λ,λ+][\lambda_-, \lambda_+] of the Marchenko-Pastur distribution?

[1/4,9/4][1/4, 9/4]

[0,4][0, 4]

[1/16,9/16][1/16, 9/16]

[0,9/4][0, 9/4]

Common Mistake: Convention for β\beta: Rows/Columns or Columns/Rows?

Mistake:

Different references define β\beta differently. Some use β=n/m\beta = n/m (rows over columns), others β=m/n\beta = m/n. Using the wrong convention leads to a reflected MP density.

Correction:

In this book (following Couillet-Debbah), β=n/m\beta = n/m where HCn×m\mathbf{H} \in \mathbb{C}^{n \times m}. When β<1\beta < 1, the matrix 1mHHH\frac{1}{m}\mathbf{H}^H\mathbf{H} has no zero eigenvalues. When β>1\beta > 1, there are mnm - n zero eigenvalues (mass at the origin). Always check which convention a reference uses before applying formulas.

Common Mistake: Forgetting the 1/m1/m Normalization

Mistake:

Computing the eigenvalues of HHH\mathbf{H}^H\mathbf{H} (without the 1/m1/m factor) and expecting them to follow the standard Marchenko-Pastur density.

Correction:

Without the 1/m1/m normalization, the eigenvalues of HHH\mathbf{H}^H\mathbf{H} grow with mm. The MP law describes the eigenvalues of 1mHHH\frac{1}{m}\mathbf{H}^H\mathbf{H}, which have O(1)O(1) support. Alternatively, one can state the MP law for HHH\mathbf{H}^H\mathbf{H} with support [mλ,mλ+][m\lambda_-, m\lambda_+], but the standard form uses the normalized version.

Historical Note: Marchenko and Pastur's 1967 Paper

1967

Vladimir Marchenko and Leonid Pastur published their landmark result in 1967, working at the Institute for Low Temperature Physics in Kharkov (now Kharkiv), Ukraine. Their paper "Distribution of eigenvalues for some sets of random matrices" established the law that now bears their name. The result was initially a contribution to mathematical physics, describing the spectral properties of large covariance matrices. It would take over three decades before Telatar (1999) and others recognized its central importance for MIMO capacity analysis.

⚠️Engineering Note

Finite-Size Corrections for System Design

The Marchenko-Pastur law is exact only in the limit n,mn, m \to \infty. For practical MIMO systems with nt=64n_t = 64 antennas and K=16K = 16 users (β=1/4\beta = 1/4), the ESD closely matches the MP density, but edge eigenvalues exhibit Tracy-Widom fluctuations of order n2/3n^{-2/3}. These fluctuations affect the tails of the capacity distribution (outage rates) even when the mean capacity is well approximated by the asymptotic formula. For outage analysis with min(nt,nr)<16\min(n_t, n_r) < 16, use finite-dimensional expressions or Monte Carlo.

Practical Constraints
  • Tracy-Widom fluctuations at the edge: O(n2/3)O(n^{-2/3})

  • Central eigenvalues converge faster: O(n1)O(n^{-1})

Marchenko-Pastur Distribution

The limiting spectral distribution of the sample covariance matrix 1mHHH\frac{1}{m}\mathbf{H}^H\mathbf{H} when H\mathbf{H} has i.i.d. entries with zero mean and unit variance, as n,mn, m \to \infty with n/mβn/m \to \beta. It has density fMP(λ;β)=12πβλ(λ+λ)(λλ)f_{\mathrm{MP}}(\lambda;\beta) = \frac{1}{2\pi\beta\lambda}\sqrt{(\lambda_+ - \lambda)(\lambda - \lambda_-)} on [λ,λ+][\lambda_-, \lambda_+] with λ±=(1±β)2\lambda_\pm = (1 \pm \sqrt{\beta})^2.

Related: Wishart-Type Matrix, The Marchenko-Pastur Law

Wishart Matrix

A random matrix of the form W=HHH\mathbf{W} = \mathbf{H}^H\mathbf{H} where HCn×m\mathbf{H} \in \mathbb{C}^{n \times m} has i.i.d. Gaussian entries. Named after John Wishart who studied its distribution in 1928 for multivariate statistics.

Related: Wishart-Type Matrix

Key Takeaway

The Marchenko-Pastur law gives the limiting eigenvalue density of sample covariance matrices with i.i.d. entries. Its support [λ,λ+]=[(1β)2,(1+β)2][\lambda_-, \lambda_+] = [(1-\sqrt{\beta})^2, (1+\sqrt{\beta})^2] determines the eigenvalue spread, which in turn governs the capacity of i.i.d. Rayleigh MIMO channels. The result is universal — it depends only on the aspect ratio β\beta, not on the specific distribution of the entries (as long as they have zero mean and unit variance).