Computation–Estimation Tradeoffs

A Puzzle That Frames a Decade of Research

Throughout this book, when we derived a rate — the Cramér–Rao bound in Chapter 17, the minimax rate for sparse regression in Chapter 19, the denoising threshold for compressed sensing in Chapter 21 — we answered the question: how much data is needed for consistent estimation? This question has a crisp information-theoretic answer, and we treated it as the end of the story.

But there is a second question, hiding in plain sight: can we compute the estimator in polynomial time? For several celebrated high-dimensional problems — sparse principal components, stochastic block model clustering, tensor PCA — the honest answer is: we do not know. What is worse, there is growing evidence of a statistical–computational gap: a regime where the minimum sample size for information-theoretic identifiability is strictly smaller than the minimum sample size at which any polynomial-time estimator succeeds.

The point is that Kay and Poor tell you what is possible; the planted clique conjecture tells you what might be efficiently achievable. The reader who designs wireless algorithms must know both.

Definition:

Statistical Threshold

For a parametric family indexed by a signal strength λ\lambda and a sample budget nn, the statistical threshold λstat(n)\lambda_{\text{stat}}(n) is the smallest signal strength at which some estimator (with no computational restriction) can recover the parameter with vanishing error as nn \to \infty. Formally:

λstat(n)=inf{λ0:θ^n with Pr{θ^nθ}0}.\lambda_{\text{stat}}(n) = \inf\bigl\{\lambda \geq 0 : \exists\, \hat{\theta}_n \text{ with } \Pr\{\hat{\theta}_n \neq \theta\} \to 0\bigr\}.

The threshold is typically identified via Fano's inequality (lower bound) and maximum-likelihood analysis (matching upper bound).

Definition:

Computational Threshold

The computational threshold λcomp(n)\lambda_{\text{comp}}(n) is the smallest signal strength at which some polynomial-time estimator succeeds:

λcomp(n)=inf{λ0:θ^nP with Pr{θ^nθ}0}.\lambda_{\text{comp}}(n) = \inf\bigl\{\lambda \geq 0 : \exists\, \hat{\theta}_n \in \mathrm{P} \text{ with } \Pr\{\hat{\theta}_n \neq \theta\} \to 0\bigr\}.

By construction λcomp(n)λstat(n)\lambda_{\text{comp}}(n) \geq \lambda_{\text{stat}}(n). When the inequality is strict in the asymptotic rate exponent, we say the problem exhibits a statistical–computational gap.

The computational threshold is necessarily conditional — we do not have unconditional lower bounds on polynomial-time algorithms. Evidence comes from reductions to conjecturally hard problems (planted clique), lower bounds in restricted computational models (sum-of-squares hierarchy, statistical query model), and the absence of matching algorithms after sustained effort.

Statistical–Computational Gap

A regime where the information-theoretic sample complexity for an estimation task is strictly smaller than the sample complexity required by any known polynomial-time algorithm. The existence of the gap is conjectural and typically tied to the planted clique hypothesis.

Related: Planted Clique, Sum Of Squares

Definition:

Planted Clique Problem

Let G(N,1/2)G(N, 1/2) be the Erdős–Rényi random graph on NN vertices with edge probability 1/21/2. In the planted clique problem, one selects kk vertices and adds all (k2)\binom{k}{2} edges among them (turning them into a clique). The task is to recover the planted clique from the resulting graph.

Information-theoretic detectability requires k(2+ε)log2Nk \geq (2 + \varepsilon)\log_2 N. In contrast, the best known polynomial-time algorithms require kCNk \geq C\sqrt{N} — an exponential gap.

The planted clique conjecture asserts that no polynomial-time algorithm solves the problem when k=o(N)k = o(\sqrt{N}). This conjecture is the central hardness hypothesis in average-case complexity for high-dimensional statistics.

Planted Clique

A random graph with a densely connected subset of vertices hidden inside it. The conjectured hardness of finding the clique when k=o(N)k = o(\sqrt{N}) is the standard assumption used to prove computational lower bounds for sparse PCA, submatrix detection, and community detection.

Historical Note: From Karp to Jerrum: The Birth of Average-Case Hardness

1976–present

Karp (1976) introduced the planted clique problem as a concrete example where worst-case hardness (finding the maximum clique is NP-hard) might not transfer to random instances. Jerrum (1992) and Kučera (1995) independently proposed the first spectral algorithms that succeed down to k=Θ(N)k = \Theta(\sqrt{N}). Three decades later nobody has done better, and the planted clique conjecture has become the bedrock on which most statistical–computational gap proofs rest. It is a humbling example: a problem that looks computationally easy but has stubbornly resisted all polynomial-time attacks.

Theorem: Statistical–Computational Gap for Sparse PCA

Consider the spiked covariance model xiN(0,Id+λvvT)\mathbf{x}_i \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_d + \lambda \mathbf{v}\mathbf{v}^T) with v0k\|\mathbf{v}\|_0 \leq k and v2=1\|\mathbf{v}\|_2 = 1. Given nn samples:

  1. (Statistical) If λC1klog(d)/n\lambda \geq C_1 \sqrt{k \log(d)/n}, the constrained MLE recovers v\mathbf{v} with vanishing error.

  2. (Computational — conditional) Assuming the planted clique conjecture, no polynomial-time algorithm recovers v\mathbf{v} unless λC2k2/n\lambda \geq C_2 \sqrt{k^2/n}, which for klogdk \gg \sqrt{\log d} is strictly worse.

Information-theoretically you need nklogd/λ2n \gtrsim k\log d / \lambda^2 samples. Computationally you need nk2/λ2n \gtrsim k^2/\lambda^2. The ratio is k/logdk / \log d — polynomial in the sparsity. This gap says: you have the data, but you cannot use it efficiently.

Example: Community Detection in the Stochastic Block Model

In the symmetric two-community SBM, NN nodes are split into two equal-sized groups. Edges within a group appear with probability a/Na/N and across groups with b/Nb/N. Characterize the regimes for (i) information-theoretic detection, (ii) polynomial-time detection, and (iii) exact recovery.

Where Gaps Appear, and Where They Don't

ProblemStatistical ratePoly-time rateGap?
Sparse mean estimation (kk-sparse in dd)σklog(d/k)/n\sigma\sqrt{k\log(d/k)/n}SameNo
Sparse linear regression (Lasso)σklogd/n\sigma\sqrt{k\log d/n}Same (Lasso)No
Sparse PCA (spiked covariance)klogd/n\sqrt{k\log d/n}k2/n\sqrt{k^2/n}Yes (planted clique)
Tensor PCA (order 3)λd/n\lambda \geq \sqrt{d/n}λd3/4/n\lambda \geq d^{3/4}/\sqrt{n}Yes (conjectured)
Two-community SBM (detection)(ab)2>2(a+b)(a-b)^2 > 2(a+b)SameNo
Planted dense subgraphklogNk \geq \log NkNk \geq \sqrt{N}Yes
Robust PCA (sparse + low-rank)Below some constantConvex relaxationProblem-dependent

Statistical vs Computational Rate in Sparse PCA

The two curves show the minimum signal strength λ\lambda required as a function of sparsity kk: the information-theoretic rate λstat=C1klogd/n\lambda_{\text{stat}} = C_1\sqrt{k\log d/n} and the conjectured polynomial-time rate λcomp=C2k2/n\lambda_{\text{comp}} = C_2\sqrt{k^2/n}. Adjust dd, nn, and the constants to see when the gap opens up.

Parameters
1000
500
1
1

The Three Regimes of the Planted Clique

Visual sweep of clique size kk relative to NN: (i) k<logNk < \log N = information-theoretically impossible; (ii) logNkN\log N \leq k \leq \sqrt{N} = possible but conjecturally hard; (iii) k>Nk > \sqrt{N} = spectral methods succeed.
Adjacency spectra in each of the three regimes. The clique becomes visible in the second eigenvalue only beyond kNk \approx \sqrt{N}.
⚠️Engineering Note

Lasso: The Beautiful Special Case

Sparse linear regression with random Gaussian designs is gap-free: the 1\ell_1-penalized estimator (Lasso) matches the minimax rate σklogd/n\sigma\sqrt{k\log d/n} up to constants, and it runs in polynomial time. This is why Lasso has become the workhorse for channel estimation in compressed-sensing receivers: theory and practice align. Do not confuse this with sparse PCA — changing from a linear model to a quadratic (covariance) one flips the problem from gap-free to gap-full.

Practical Constraints
  • Gap-free behavior relies on the design matrix satisfying the restricted eigenvalue condition

  • In channel estimation with structured pilots, the RE condition is not automatic

  • The correct regularization strength λLassoσlogd/n\lambda_{\text{Lasso}} \asymp \sigma\sqrt{\log d/n} depends on the noise level

Common Mistake: Reporting a Minimax Rate Is Not Enough

Mistake:

A paper claims a "minimax-optimal" sample complexity nn^{\star} without specifying whether the matching estimator runs in polynomial time. A reader implementing the method discovers that the "optimal" estimator is a combinatorial search over (dk)\binom{d}{k} supports, infeasible for d>30d > 30.

Correction:

Always state the computational class alongside the statistical rate. If the matching estimator is exponential, label it so — and clearly identify whether a polynomial-time gap is known, conjectured, or open. The reader should ask: minimax over what class of estimators?

Why This Matters: Why This Matters for Massive MIMO Channel Estimation

High-dimensional channel estimation in massive MIMO is often reduced to a sparse or low-rank recovery problem — angular-domain channels are sparse, spatial correlation matrices are low-rank. When the receiver uses a quadratic functional of the data (covariance eigendecomposition, subspace estimation), we are in sparse-PCA territory: statistical–computational gaps may apply. When the receiver uses a linear functional (pilot-based LS / Lasso), we are safe. This distinction informs the choice between subspace-based and least-squares-based channel estimators in practice.

Quick Check

You have derived a minimax rate of nCklogdn \geq C k \log d for recovering a kk-sparse vector. A colleague proposes an algorithm that searches over all (dk)\binom{d}{k} supports. Is this evidence that the problem has no statistical–computational gap?

Yes — an algorithm exists, so the gap is closed.

No — the algorithm is exponential in kk, so it does not rule out a gap.

The question is ill-posed — minimax rates don't involve computation.

It depends on the constant CC.

Key Takeaway

A rate is not a complete answer to an estimation question. Always specify: (i) is this rate information-theoretic or polynomial-time? (ii) is the matching estimator convex, combinatorial, or heuristic? (iii) if there is a gap, is it proven, conjectural, or open?