Computation–Estimation Tradeoffs
A Puzzle That Frames a Decade of Research
Throughout this book, when we derived a rate — the Cramér–Rao bound in Chapter 17, the minimax rate for sparse regression in Chapter 19, the denoising threshold for compressed sensing in Chapter 21 — we answered the question: how much data is needed for consistent estimation? This question has a crisp information-theoretic answer, and we treated it as the end of the story.
But there is a second question, hiding in plain sight: can we compute the estimator in polynomial time? For several celebrated high-dimensional problems — sparse principal components, stochastic block model clustering, tensor PCA — the honest answer is: we do not know. What is worse, there is growing evidence of a statistical–computational gap: a regime where the minimum sample size for information-theoretic identifiability is strictly smaller than the minimum sample size at which any polynomial-time estimator succeeds.
The point is that Kay and Poor tell you what is possible; the planted clique conjecture tells you what might be efficiently achievable. The reader who designs wireless algorithms must know both.
Definition: Statistical Threshold
Statistical Threshold
For a parametric family indexed by a signal strength and a sample budget , the statistical threshold is the smallest signal strength at which some estimator (with no computational restriction) can recover the parameter with vanishing error as . Formally:
The threshold is typically identified via Fano's inequality (lower bound) and maximum-likelihood analysis (matching upper bound).
Definition: Computational Threshold
Computational Threshold
The computational threshold is the smallest signal strength at which some polynomial-time estimator succeeds:
By construction . When the inequality is strict in the asymptotic rate exponent, we say the problem exhibits a statistical–computational gap.
The computational threshold is necessarily conditional — we do not have unconditional lower bounds on polynomial-time algorithms. Evidence comes from reductions to conjecturally hard problems (planted clique), lower bounds in restricted computational models (sum-of-squares hierarchy, statistical query model), and the absence of matching algorithms after sustained effort.
Statistical–Computational Gap
A regime where the information-theoretic sample complexity for an estimation task is strictly smaller than the sample complexity required by any known polynomial-time algorithm. The existence of the gap is conjectural and typically tied to the planted clique hypothesis.
Related: Planted Clique, Sum Of Squares
Definition: Planted Clique Problem
Planted Clique Problem
Let be the Erdős–Rényi random graph on vertices with edge probability . In the planted clique problem, one selects vertices and adds all edges among them (turning them into a clique). The task is to recover the planted clique from the resulting graph.
Information-theoretic detectability requires . In contrast, the best known polynomial-time algorithms require — an exponential gap.
The planted clique conjecture asserts that no polynomial-time algorithm solves the problem when . This conjecture is the central hardness hypothesis in average-case complexity for high-dimensional statistics.
Planted Clique
A random graph with a densely connected subset of vertices hidden inside it. The conjectured hardness of finding the clique when is the standard assumption used to prove computational lower bounds for sparse PCA, submatrix detection, and community detection.
Historical Note: From Karp to Jerrum: The Birth of Average-Case Hardness
1976–presentKarp (1976) introduced the planted clique problem as a concrete example where worst-case hardness (finding the maximum clique is NP-hard) might not transfer to random instances. Jerrum (1992) and Kučera (1995) independently proposed the first spectral algorithms that succeed down to . Three decades later nobody has done better, and the planted clique conjecture has become the bedrock on which most statistical–computational gap proofs rest. It is a humbling example: a problem that looks computationally easy but has stubbornly resisted all polynomial-time attacks.
Theorem: Statistical–Computational Gap for Sparse PCA
Consider the spiked covariance model with and . Given samples:
-
(Statistical) If , the constrained MLE recovers with vanishing error.
-
(Computational — conditional) Assuming the planted clique conjecture, no polynomial-time algorithm recovers unless , which for is strictly worse.
Information-theoretically you need samples. Computationally you need . The ratio is — polynomial in the sparsity. This gap says: you have the data, but you cannot use it efficiently.
For the statistical bound, use a Le Cam two-point argument combined with the distance between two Gaussian spikes differing in a single coordinate.
For the computational bound, reduce an instance of planted clique of size in to a sparse PCA instance with , , carefully designing the covariance to inherit clique structure.
The gap between the two rates becomes visible only when grows faster than — i.e., in the genuinely high-dimensional regime.
Statistical upper bound
The MLE over -sparse unit vectors achieves sin-angle error by standard empirical process theory: union-bound over supports, each with a -dimensional sub-Gaussian suprema of order .
Statistical lower bound (Le Cam)
Construct two hypotheses differing in a pair of coordinates. The KL divergence between the -sample distributions is . By Le Cam's method, testing is impossible when this is , giving .
Computational reduction
Given a planted-clique instance with clique size in vertices, construct samples whose empirical covariance encodes the adjacency structure. A sparse-PCA solver with recovers the clique, contradicting the planted clique hypothesis. The reduction preserves polynomial running time.
Example: Community Detection in the Stochastic Block Model
In the symmetric two-community SBM, nodes are split into two equal-sized groups. Edges within a group appear with probability and across groups with . Characterize the regimes for (i) information-theoretic detection, (ii) polynomial-time detection, and (iii) exact recovery.
Detection threshold (Decelle–Krzakala–Moore–Zdeborová 2011)
Detection (distinguishing SBM from an Erdős–Rényi null) is information-theoretically possible if and only if . Remarkably, this same condition is achievable by a polynomial-time spectral method (non-backtracking walk), so there is no gap in this regime.
Exact recovery threshold (Abbe–Bandeira–Hall 2016)
With , exact recovery is possible iff . Semidefinite relaxation achieves this exactly — again no gap. The SBM has been a minor miracle: the algorithmic frontier tracks the information-theoretic frontier.
Sparse mixture version has a gap
For the sparse SBM with more than two communities and unbalanced sizes, or for the Gaussian mixture clustering problem with sparse means, statistical–computational gaps reappear. The reader should consult Banks–Moore–Neeman (2016) for the precise conditions.
Where Gaps Appear, and Where They Don't
| Problem | Statistical rate | Poly-time rate | Gap? |
|---|---|---|---|
| Sparse mean estimation (-sparse in ) | Same | No | |
| Sparse linear regression (Lasso) | Same (Lasso) | No | |
| Sparse PCA (spiked covariance) | Yes (planted clique) | ||
| Tensor PCA (order 3) | Yes (conjectured) | ||
| Two-community SBM (detection) | Same | No | |
| Planted dense subgraph | Yes | ||
| Robust PCA (sparse + low-rank) | Below some constant | Convex relaxation | Problem-dependent |
Statistical vs Computational Rate in Sparse PCA
The two curves show the minimum signal strength required as a function of sparsity : the information-theoretic rate and the conjectured polynomial-time rate . Adjust , , and the constants to see when the gap opens up.
Parameters
The Three Regimes of the Planted Clique
Lasso: The Beautiful Special Case
Sparse linear regression with random Gaussian designs is gap-free: the -penalized estimator (Lasso) matches the minimax rate up to constants, and it runs in polynomial time. This is why Lasso has become the workhorse for channel estimation in compressed-sensing receivers: theory and practice align. Do not confuse this with sparse PCA — changing from a linear model to a quadratic (covariance) one flips the problem from gap-free to gap-full.
- •
Gap-free behavior relies on the design matrix satisfying the restricted eigenvalue condition
- •
In channel estimation with structured pilots, the RE condition is not automatic
- •
The correct regularization strength depends on the noise level
Common Mistake: Reporting a Minimax Rate Is Not Enough
Mistake:
A paper claims a "minimax-optimal" sample complexity without specifying whether the matching estimator runs in polynomial time. A reader implementing the method discovers that the "optimal" estimator is a combinatorial search over supports, infeasible for .
Correction:
Always state the computational class alongside the statistical rate. If the matching estimator is exponential, label it so — and clearly identify whether a polynomial-time gap is known, conjectured, or open. The reader should ask: minimax over what class of estimators?
Why This Matters: Why This Matters for Massive MIMO Channel Estimation
High-dimensional channel estimation in massive MIMO is often reduced to a sparse or low-rank recovery problem — angular-domain channels are sparse, spatial correlation matrices are low-rank. When the receiver uses a quadratic functional of the data (covariance eigendecomposition, subspace estimation), we are in sparse-PCA territory: statistical–computational gaps may apply. When the receiver uses a linear functional (pilot-based LS / Lasso), we are safe. This distinction informs the choice between subspace-based and least-squares-based channel estimators in practice.
Quick Check
You have derived a minimax rate of for recovering a -sparse vector. A colleague proposes an algorithm that searches over all supports. Is this evidence that the problem has no statistical–computational gap?
Yes — an algorithm exists, so the gap is closed.
No — the algorithm is exponential in , so it does not rule out a gap.
The question is ill-posed — minimax rates don't involve computation.
It depends on the constant .
Exactly. Gap questions ask whether polynomial-time algorithms match the statistical rate. An exhaustive search over supports is typically , not polynomial.
Key Takeaway
A rate is not a complete answer to an estimation question. Always specify: (i) is this rate information-theoretic or polynomial-time? (ii) is the matching estimator convex, combinatorial, or heuristic? (iii) if there is a gap, is it proven, conjectural, or open?