The Blessing and Curse of High Dimensions
Why Classical Asymptotics Are Not Enough
Classical estimation theory, which underpins the CRLB machinery of Chapter 4, was built around an asymptotic regime in which the dimension of the parameter is held fixed and the number of observations grows without bound. In that regime the MLE is consistent, asymptotically unbiased, and efficient; the sample covariance converges to the true covariance; linear regression is well conditioned.
The point is that this regime is no longer the one we live in. A base station with antennas estimating its channel from pilots, a radar system forming a covariance estimate from fewer snapshots than it has array elements, a genomicist fitting a regression with more genes than patients — all operate in the proportional asymptotic regime, where and are both large but their ratio is an constant.
The behaviour of estimators in this regime is qualitatively different. Eigenvalues of sample covariance matrices do not concentrate on true eigenvalues. The MLE, if it exists, can have strictly larger risk than a biased alternative. Regularization, which looks like a statistical crutch from the classical viewpoint, becomes essential — and its optimal amount depends on .
Definition: Proportional Asymptotic Regime
Proportional Asymptotic Regime
Let be a sequence of measurement matrices with . The proportional asymptotic regime is the joint limit
A statistic is said to have a deterministic equivalent in this regime if almost surely (or in probability) as . Results expressed in terms of deterministic equivalents are the natural high-dimensional analogue of classical large-sample limits.
The aspect ratio plays the role that is reserved for "number of samples" in classical asymptotics. Everything interesting in this chapter is a function of .
Definition: Gaussian Linear Observation Model
Gaussian Linear Observation Model
Throughout the chapter we work with the canonical model
where is the unknown parameter, is the observation, and is independent additive Gaussian noise. The entries of are typically i.i.d. (so that ), which is the standard scaling for the Marchenko–Pastur regime.
The scaling (rather than ) on the columns is a convention choice. With this scaling the operator norm of concentrates at and its minimum eigenvalue at when .
Theorem: Breakdown of MLE in the Proportional Regime
Consider the model of DGaussian Linear Observation Model with having i.i.d. entries and . The Ordinary-Least-Squares (MLE) estimator is . In the proportional asymptotic regime its normalised MSE satisfies
In particular, the risk blows up as , and OLS is not defined for .
Classical CRLB analysis would predict a per-coordinate variance , which is . Dividing by and sending gives exactly . The surprise is not the formula — it is that the CRLB itself blows up as . The MLE is optimal and disastrous.
Use the rotational invariance of 's distribution to diagonalise .
The Marchenko–Pastur law characterises the limiting eigenvalue distribution of : its density on is .
The trace of is where are its eigenvalues; take the limit of this empirical average using the MP density.
Express the risk in terms of the eigenvalues
Since and , we obtain Let denote the eigenvalues of . Then .
Apply the Marchenko–Pastur law
In the proportional regime the empirical spectral distribution of converges weakly (a.s.) to the Marchenko–Pastur distribution with density Hence (this is a standard MP integral — see [?bai2010spectral]).
Combine
Therefore
Key Takeaway
In the proportional regime the MLE does not fail because the estimator is wrong — it fails because the problem itself becomes ill-conditioned as . The cure is not a cleverer estimator but a shift of perspective: one must give up unbiasedness.
Marchenko–Pastur Eigenvalue Density
Plot the limiting eigenvalue density of as a function of the aspect ratio , with the empirical histogram from a finite draw overlaid. Watch the support shift and the left edge approach zero as .
Parameters
Aspect ratio
Number of rows (empirical sample size)
Normalised OLS Risk as a Function of
Compare the theoretical OLS risk against Monte Carlo simulation for varying aspect ratio. The blow-up at is the proportional-regime analogue of the classical identifiability boundary.
Parameters
Noise variance
Example: Channel Estimation with Too Few Pilots
A massive-MIMO base station with antennas estimates its downlink channel from orthogonal pilot symbols using OLS. The per-antenna noise variance is . What per-entry MSE does the OLS estimate attain, and what would classical CRLB thinking predict?
Identify the aspect ratio
Here . Since , the matrix is rank-deficient and the OLS estimator is not defined — the pseudoinverse would return the minimum-norm solution, but its bias is unbounded.
What classical CRLB says
The classical (low-dimensional) CRLB would predict a per-coordinate variance — an encouragingly small number. But it is simply wrong: with more parameters than observations, no unbiased estimator exists.
What needs to happen
One must add prior information — sparsity, low-rank structure, or a ridge penalty. Section 22.2 quantifies how much prior information buys back, and Section 22.3 shows that even with no prior at all, shrinkage dominates the MLE whenever .
Sample Covariance is Biased High and Low
A practical consequence of the Marchenko–Pastur law: a sample covariance matrix computed from samples of does not have eigenvalues concentrated at — they spread over . Plugging the sample covariance into any whitening or beamforming routine produces systematic errors that do not go away as if is held fixed.
- •
For massive-MIMO covariance estimation, calibration campaigns must either collect snapshots or use shrinkage (see [?ledoit2004well]).
- •
Eigenvalue clipping / linear-shrinkage estimators () are standard in portfolio optimisation and array processing.
Historical Note: Marchenko and Pastur (1967)
1960sVladimir Marchenko and Leonid Pastur, working in Kharkov, derived their eponymous law in 1967 as a curiosity about the spectra of large random matrices. It lay outside mainstream statistics for three decades and was rediscovered by the statistics and wireless communications communities in the late 1990s, when massive antenna arrays and genome-scale regression made the proportional regime unavoidable. The law now underpins the analysis of every high-dimensional estimator we use.
Common Mistake: Plugging into the Classical CRLB
Mistake:
A common reflex is to compute the CRLB for a model with unknowns and declare the resulting number a lower bound on the MSE, regardless of .
Correction:
The classical CRLB is a lower bound on the variance of unbiased estimators. When is not negligible — or worse, when — there is no unbiased estimator of , so the bound is vacuous. Either use the Bayesian CRLB (if a prior is available) or switch to the minimax framework of Section 22.4.
Quick Check
For , the Marchenko–Pastur distribution of is supported on which interval?
The support is . With we get and .
Proportional Asymptotic Regime
The joint limit in which both the dimension and the sample size tend to infinity with the ratio held fixed. This is the natural scaling for modern statistical problems in which the parameter dimension grows with the data.