Prerequisites & Notation

Before You Begin

This chapter sits at the intersection of estimation theory, random matrix theory, and empirical Bayes. The classical CRLB-centred picture that dominates Chapters 4–7 is not wrong — it is simply incomplete. The point is that in modern applications (massive MIMO, compressed sensing, covariance estimation, high-dimensional inference) the dimension NN of the parameter vector grows in lockstep with the number of observations MM. Classical consistency arguments, which hold NN fixed and let MM\to\infty, collapse in this regime, and the behaviour of the MLE can swing from "efficient" to "spectacularly wrong". The reader should be comfortable with the following before continuing.

  • Maximum likelihood estimation and the Cramér–Rao lower bound(Review ch04)

    Self-check: Can you state the CRLB for a vector parameter, write down the Fisher information matrix, and explain when the MLE achieves the bound?

  • Linear MMSE estimation and the Wiener filter(Review ch05)

    Self-check: Can you derive the LMMSE estimator for the Gaussian linear model y=Ax+w\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{w} and compute its MSE?

  • Compressed sensing and LASSO at a conceptual level(Review ch17)

    Self-check: Can you state the 1\ell_1-minimisation programme and recognise why an 1\ell_1 penalty promotes sparsity?

  • Random matrix theory essentials

    Self-check: Do you know what the Marchenko–Pastur law says about the eigenvalues of 1MAHA\frac{1}{M}\mathbf{A}^H\mathbf{A} when A\mathbf{A} has i.i.d. entries?

  • Convex optimisation (unconstrained and penalised)

    Self-check: Can you recognise a convex problem, write the KKT conditions for a quadratic-plus-1\ell_1 objective, and describe proximal-gradient iteration?

  • Bayesian decision theory(Review ch06)

    Self-check: Can you compute a Bayes risk, define admissibility, and explain the relationship between a minimax estimator and a least-favourable prior?

Notation for This Chapter

Symbols used throughout Chapter 22. The ratio γ=N/M\gamma=N/M is the single most important parameter — the entire chapter can be read as a study of how estimation behaves as this ratio departs from zero.

SymbolMeaningIntroduced
NNAmbient dimension of the parameter vector xRN\mathbf{x}\in\mathbb{R}^Ns01
MMNumber of observations (rows of A\mathbf{A} / samples)s01
γ\gammaAspect ratio γ=N/M\gamma = N/M; the proportional-asymptotics regime fixes γ(0,)\gamma\in(0,\infty)s01
A\mathbf{A}Design / sensing / measurement matrix in RM×N\mathbb{R}^{M\times N}s01
λ\lambdaRegularization parameter (ridge / LASSO penalty)s02
x^ridge(λ)\hat{\mathbf{x}}_{\text{ridge}}(\lambda)Ridge estimator (ATA+λI)1ATy(\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I})^{-1}\mathbf{A}^T\mathbf{y}s02
x^LASSO(λ)\hat{\mathbf{x}}_{\text{LASSO}}(\lambda)LASSO estimator argmin12yAx2+λx1\arg\min\tfrac12\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2+\lambda\|\mathbf{x}\|_1s02
x^JS\hat{\mathbf{x}}_{\text{JS}}James–Stein estimators03
R(θ^,θ)R(\hat\theta,\theta)Frequentist risk Eθ[θ^θ2]\mathbb{E}_\theta[\|\hat\theta-\theta\|^2]s03
rmmr_*^{\text{mm}}Minimax risk infθ^supθR(θ^,θ)\inf_{\hat\theta}\sup_\theta R(\hat\theta,\theta) over a parameter classs04
ssSparsity level — number of non-zero components of x\mathbf{x}s04
π\pi^*Least-favourable prior attaining the minimax risks04