The Blessing and Curse of High Dimensions

Why Classical Asymptotics Are Not Enough

Classical estimation theory, which underpins the CRLB machinery of Chapter 4, was built around an asymptotic regime in which the dimension NN of the parameter is held fixed and the number of observations MM grows without bound. In that regime the MLE is consistent, asymptotically unbiased, and efficient; the sample covariance converges to the true covariance; linear regression is well conditioned.

The point is that this regime is no longer the one we live in. A base station with Nt=256N_t=256 antennas estimating its channel from M=64M=64 pilots, a radar system forming a covariance estimate from fewer snapshots than it has array elements, a genomicist fitting a regression with more genes than patients — all operate in the proportional asymptotic regime, where NN and MM are both large but their ratio γ:=N/M\gamma := N/M is an Θ(1)\Theta(1) constant.

The behaviour of estimators in this regime is qualitatively different. Eigenvalues of sample covariance matrices do not concentrate on true eigenvalues. The MLE, if it exists, can have strictly larger risk than a biased alternative. Regularization, which looks like a statistical crutch from the classical viewpoint, becomes essential — and its optimal amount depends on γ\gamma.

Definition:

Proportional Asymptotic Regime

Let (AM)M1(\mathbf{A}_M)_{M\geq 1} be a sequence of measurement matrices with AMRM×NM\mathbf{A}_M\in\mathbb{R}^{M\times N_M}. The proportional asymptotic regime is the joint limit

M,NM,NMMγ(0,).M\to\infty,\quad N_M\to\infty,\quad \frac{N_M}{M}\to\gamma\in(0,\infty).

A statistic TMT_M is said to have a deterministic equivalent T(γ)T(\gamma) in this regime if TMT(γ)T_M\to T(\gamma) almost surely (or in probability) as MM\to\infty. Results expressed in terms of deterministic equivalents are the natural high-dimensional analogue of classical large-sample limits.

The aspect ratio γ\gamma plays the role that is reserved for "number of samples" in classical asymptotics. Everything interesting in this chapter is a function of γ\gamma.

Definition:

Gaussian Linear Observation Model

Throughout the chapter we work with the canonical model

y=Ax+w,ARM×N, wN(0,σ2IM),\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{w},\qquad \mathbf{A}\in\mathbb{R}^{M\times N},\ \mathbf{w}\sim\mathcal{N}(\mathbf{0},\sigma^2\mathbf{I}_M),

where xRN\mathbf{x}\in\mathbb{R}^N is the unknown parameter, yRM\mathbf{y}\in\mathbb{R}^M is the observation, and w\mathbf{w} is independent additive Gaussian noise. The entries of A\mathbf{A} are typically i.i.d. N(0,1/M)\mathcal{N}(0,1/M) (so that E[ATA]=NMIN\mathbb{E}[\mathbf{A}^{T}\mathbf{A}]=\tfrac{N}{M}\mathbf{I}_N), which is the standard scaling for the Marchenko–Pastur regime.

The scaling 1/M1/M (rather than 1/N1/N) on the columns is a convention choice. With this scaling the operator norm of ATA\mathbf{A}^{T}\mathbf{A} concentrates at (1+γ)2(1+\sqrt{\gamma})^2 and its minimum eigenvalue at (1γ)2(1-\sqrt{\gamma})^2 when γ<1\gamma<1.

Theorem: Breakdown of MLE in the Proportional Regime

Consider the model of DGaussian Linear Observation Model with A\mathbf{A} having i.i.d. N(0,1/M)\mathcal{N}(0,1/M) entries and γ<1\gamma<1. The Ordinary-Least-Squares (MLE) estimator is x^OLS=(ATA)1ATy\hat{\mathbf{x}}_{\text{OLS}}=(\mathbf{A}^{T}\mathbf{A})^{-1}\mathbf{A}^{T}\mathbf{y}. In the proportional asymptotic regime its normalised MSE satisfies

1NE[x^OLSx2]  a.s.  γ1γσ2.\frac{1}{N}\mathbb{E}\bigl[\|\hat{\mathbf{x}}_{\text{OLS}}-\mathbf{x}\|^2\bigr]\;\xrightarrow{\text{a.s.}}\;\frac{\gamma}{1-\gamma}\,\sigma^2.

In particular, the risk blows up as γ1\gamma\uparrow 1, and OLS is not defined for γ1\gamma\geq 1.

Classical CRLB analysis would predict a per-coordinate variance σ2/(MN)\sigma^2/(M-N), which is σ2/(M(1γ))\sigma^2/(M(1-\gamma)). Dividing by NN and sending MM\to\infty gives exactly γσ2/(1γ)\gamma\sigma^2/(1-\gamma). The surprise is not the formula — it is that the CRLB itself blows up as γ1\gamma\to 1. The MLE is optimal and disastrous.

Key Takeaway

In the proportional regime the MLE does not fail because the estimator is wrong — it fails because the problem itself becomes ill-conditioned as γ1\gamma\to 1. The cure is not a cleverer estimator but a shift of perspective: one must give up unbiasedness.

Marchenko–Pastur Eigenvalue Density

Plot the limiting eigenvalue density of 1MATA\tfrac{1}{M}\mathbf{A}^{T}\mathbf{A} as a function of the aspect ratio γ=N/M\gamma=N/M, with the empirical histogram from a finite draw overlaid. Watch the support shift and the left edge approach zero as γ1\gamma\to 1.

Parameters
0.5

Aspect ratio

400

Number of rows (empirical sample size)

Normalised OLS Risk as a Function of γ\gamma

Compare the theoretical OLS risk γ/(1γ)σ2\gamma/(1-\gamma)\cdot\sigma^2 against Monte Carlo simulation for varying aspect ratio. The blow-up at γ=1\gamma=1 is the proportional-regime analogue of the classical identifiability boundary.

Parameters
0.1

Noise variance

200

Example: Channel Estimation with Too Few Pilots

A massive-MIMO base station with Nt=128N_t=128 antennas estimates its downlink channel from M=64M=64 orthogonal pilot symbols using OLS. The per-antenna noise variance is σ2=0.01\sigma^2=0.01. What per-entry MSE does the OLS estimate attain, and what would classical CRLB thinking predict?

⚠️Engineering Note

Sample Covariance is Biased High and Low

A practical consequence of the Marchenko–Pastur law: a sample covariance matrix Σ^=1Mi=1MxixiT\hat{\boldsymbol{\Sigma}}=\tfrac{1}{M}\sum_{i=1}^M\mathbf{x}_i\mathbf{x}_i^T computed from MM samples of N(0,IN)\mathcal{N}(\mathbf{0},\mathbf{I}_N) does not have eigenvalues concentrated at 11 — they spread over [(1γ)2,(1+γ)2][(1-\sqrt\gamma)^2,(1+\sqrt\gamma)^2]. Plugging the sample covariance into any whitening or beamforming routine produces systematic errors that do not go away as MM\to\infty if γ\gamma is held fixed.

Practical Constraints
  • For massive-MIMO covariance estimation, calibration campaigns must either collect MNM\gg N snapshots or use shrinkage (see [?ledoit2004well]).

  • Eigenvalue clipping / linear-shrinkage estimators (Σ^shrink=αΣ^+(1α)I\hat{\boldsymbol{\Sigma}}_{\text{shrink}}=\alpha\hat{\boldsymbol{\Sigma}}+(1-\alpha)\mathbf{I}) are standard in portfolio optimisation and array processing.

Historical Note: Marchenko and Pastur (1967)

1960s

Vladimir Marchenko and Leonid Pastur, working in Kharkov, derived their eponymous law in 1967 as a curiosity about the spectra of large random matrices. It lay outside mainstream statistics for three decades and was rediscovered by the statistics and wireless communications communities in the late 1990s, when massive antenna arrays and genome-scale regression made the proportional regime unavoidable. The law now underpins the analysis of every high-dimensional estimator we use.

Common Mistake: Plugging NN into the Classical CRLB

Mistake:

A common reflex is to compute the CRLB for a model with NN unknowns and declare the resulting number a lower bound on the MSE, regardless of MM.

Correction:

The classical CRLB is a lower bound on the variance of unbiased estimators. When N/MN/M is not negligible — or worse, when N>MN>M — there is no unbiased estimator of x\mathbf{x}, so the bound is vacuous. Either use the Bayesian CRLB (if a prior is available) or switch to the minimax framework of Section 22.4.

Quick Check

For γ=0.25\gamma=0.25, the Marchenko–Pastur distribution of 1MATA\tfrac1M\mathbf{A}^{T}\mathbf{A} is supported on which interval?

[0.25,2.25][0.25,2.25]

[0.25,1.75][0.25,1.75]

[0,1][0,1]

[1γ,1+γ]=[0.75,1.25][1-\gamma,1+\gamma]=[0.75,1.25]

Proportional Asymptotic Regime

The joint limit in which both the dimension NN and the sample size MM tend to infinity with the ratio γ=N/M\gamma=N/M held fixed. This is the natural scaling for modern statistical problems in which the parameter dimension grows with the data.

Related: Marchenko and Pastur (1967), Random Matrix Theory