Exercises
ex-ch25-01
EasySparse PCA minimax rate. Consider the spiked model with sparsity , ambient dimension , and samples. Write down the information-theoretic rate and the conjectured polynomial-time rate . For , , , compute both and the ratio.
The statistical rate scales like .
The computational rate scales like .
The ratio is .
Write down the rates
Up to constants, and .
Plug in numbers
With , . Then and . Ratio .
Interpretation
A polynomial-time algorithm needs roughly more signal strength than the information-theoretic minimum in this regime.
ex-ch25-02
EasyRegret of the best fixed expert. In a prediction-with-experts game with experts, rounds, and cumulative losses . The learner achieved cumulative loss . Compute the regret against the best expert.
Regret is learner loss minus best-expert loss.
The best expert is the one with smallest .
Find the best expert
.
Regret
.
ex-ch25-03
EasyConsensus matrix check. For the ring on nodes with weights and , verify that is symmetric, doubly stochastic, and has as an eigenvector.
Write as a circulant matrix.
Sum across any row.
Build matrix
$
Verify properties
Symmetric by inspection. Each row sums to ; by symmetry so does each column. Then because every row-sum is , so is an eigenvector with eigenvalue .
ex-ch25-04
EasyUCB exploration bonus. The UCB1 index at time for arm that has been pulled times is . For , , , compute the index.
Plug into the formula.
Compute
Bonus = . Index .
ex-ch25-05
EasyDifferential privacy mechanism. Let be the sample mean of iid -valued observations. To release an -differentially private estimate, add Laplace noise with scale . Compute and for , .
The sensitivity of the mean is (changing one sample moves the mean by at most ).
Sensitivity
.
Noise scale
. The MSE added is , which is comparable to the sampling variance .
ex-ch25-06
MediumMultiplicative weights regret bound. Show that against any sequence of losses , the MWU algorithm with step size has regret .
Track the potential and bound .
Use for .
Lower bound (best expert).
Potential method
Let with . The update gives .
Bound per-round change
Applying and then ,
Telescope and lower bound
Summing: . Also . Combine and set to get .
ex-ch25-07
MediumOnline to batch conversion. Suppose an online algorithm achieves regret for convex losses. Show that averaging its iterates gives a statistical estimator with excess risk .
Use convexity of .
Take expectation over iid data.
Regret inequality
where is the loss on sample .
Take expectations
For iid data , and the comparator dominates , so .
Jensen
By convexity, . Taking expectations and dividing by : .
ex-ch25-08
MediumSpectral gap controls consensus. For a symmetric doubly stochastic , show that the averaging error satisfies .
Project onto the subspace orthogonal to .
Use the spectral decomposition of .
Decompose initial state
Write where .
Apply iteration
since .
Bound with spectral norm
Restricted to , the spectral radius of is . Thus .
ex-ch25-09
MediumUCB regret via confidence bounds. Sketch the argument that UCB1 achieves regret in stochastic bandits with Gaussian rewards and gap .
Bound the expected number of pulls of a suboptimal arm.
Use Hoeffding to show the optimal arm is not under-explored.
Show that if arm is pulled, either its empirical mean is too high or the optimal arm is too low.
Event decomposition
Arm is pulled at round only if .
Bad events
This requires at least one of: (i) , (ii) , or (iii) .
Hoeffding and union bound
Each of (i),(ii) has probability at most , summed gives contribution. Event (iii) gives , so regret contribution is .
ex-ch25-10
MediumStochastic approximation. Consider Robbins-Monro updates where , iid. With , show that in the limit.
Let , write recursion for .
Unroll the recursion.
Error recursion
.
Unroll with $\gamma_t = 1/t$
. The product , and , so .
Variance
by independence.
ex-ch25-11
MediumGossip vs synchronous consensus. On a ring of nodes, synchronous averaging takes rounds with total messages . Pairwise randomized gossip converges in pair exchanges in expectation. Which uses fewer messages? How does the answer change on a complete graph?
Each synchronous round uses message transmissions ( = edges).
On a ring ; on complete .
Ring messages
Synchronous: rounds messages = . Gossip: pair exchanges 2 messages = . Gossip wins by factor .
Complete graph
Synchronous converges in rounds (spectral gap is ), using messages. Gossip still takes pair exchanges. They match up to constants.
Lesson
Gossip wins when the spectral gap is small; synchronous wins when the graph is dense and each round is globally useful.
ex-ch25-12
MediumPaper critique. A submitted paper claims a new estimator achieves MSE scaling of for estimating a Lipschitz density at a point. Explain why this claim is extraordinary and what checks you would perform as a reviewer.
What is the minimax rate for nonparametric density estimation?
The claim should either cite the breaking of a lower bound or restrict assumptions.
Known lower bound
The minimax rate for pointwise estimation of a Lipschitz density is . A rate is faster than the parametric rate , which is suspicious.
Reviewer checks
(i) Is the claim in MSE or squared-bias-plus-variance correctly accounted? (ii) Does the proof implicitly restrict to a parametric sub-family (smoother than Lipschitz)? (iii) Is the constant allowed to depend on ? (iv) Are boundary and tuning effects absorbed?
Conclusion
Either the class is stronger than claimed, the estimator uses side information, or the analysis has a bug. Request clarified assumptions and a simulation against the Le Cam lower bound.
ex-ch25-13
MediumMetropolis-Hastings weights. Design symmetric gossip weights for a graph where each node only knows its own degree and its neighbours' degrees. Use the Metropolis rule for and verify double stochasticity.
The diagonal must be chosen to make each row sum to .
Off-diagonal
Set for edges, elsewhere. Symmetric by construction.
Diagonal
. Since and there are neighbours, .
Double stochasticity
Rows sum to by construction; symmetry gives column sums.
ex-ch25-14
HardPlanted clique reduction. Sketch how a polynomial-time sparse-PCA algorithm achieving rate would refute the planted clique hardness conjecture.
Encode a graph on vertices as a sample matrix.
A planted clique of size induces a rank-1 spike.
Detecting the spike lets you recover the clique.
Reduction construction
Given a graph with adjacency , form samples whose covariance is where is the clique indicator (sparsity ). Under the null (no clique) there is no spike.
What the algorithm gives
If a poly-time sparse-PCA estimator achieves detection below , applying it to the reduction detects planted cliques with , which contradicts the planted clique conjecture.
Takeaway
Beating the computational rate is therefore at least as hard as refuting a widely-believed average-case complexity assumption.
ex-ch25-15
HardFollow-the-regularized-leader. For OCO with convex losses and strongly convex regularizer , show that attains regret .
Use the FTL-BTL lemma: follow-the-leader on has telescoping regret.
Bound the stability term via strong convexity of .
FTL-BTL lemma
For and iterates minimizing , telescoping gives .
Stability from strong convexity
Strong convexity of implies ; convexity of gives .
Combine
Plugging in, . The factor comes from a tighter version of the stability step.
ex-ch25-16
HardDistributed Kalman filter consistency. Each of sensors observes with iid. Show that the consensus+innovations update converges to the centralized ML estimator under appropriate .
Split analysis into disagreement and average .
Require , .
Average dynamics
Summing over and using double stochasticity of : . This is a stochastic approximation toward .
Disagreement vanishes
Consensus contracts the disagreement at rate , while innovation injects noise of order . With , disagreement shrinks to in mean-square.
Robbins-Monro limit
Standard stochastic approximation arguments (under ) yield almost surely.
ex-ch25-17
HardCost of privacy. Under -differential privacy, the minimax risk for estimating a scalar mean over iid samples is . Derive both regimes and identify the sample size at which privacy becomes free.
Laplace noise of scale contributes MSE .
Without privacy the rate is .
Non-private part
Empirical mean has variance .
Private part
Adding Laplace noise with scale adds MSE .
Total and regime
Total . Privacy is free when , i.e. .
ex-ch25-18
ChallengeOpen-ended: statistical-computational trade-offs in your field. Pick a telecom estimation problem (pick one: channel estimation in massive MIMO with sparse angular support, user-activity detection, symbol decoding with low-precision ADCs). (a) Identify a candidate statistical-computational gap in that problem. (b) Propose a polynomial-time estimator and its rate. (c) Propose a lower-bound strategy (reduction, sum-of-squares lower bound, or average-case hardness).
Sparse recovery with often has gaps like sparse PCA.
Low-rank tensor estimation has very wide gaps.
Average-case hardness from planted clique transfers via black-box reductions.
Example: massive MIMO sparse channel
Angular-domain channel is -sparse in directions with pilots. Information-theoretic minimax rate is ; LASSO achieves ; thresholding after matched filter .
Polynomial estimator
Orthogonal matching pursuit or LASSO with penalty; analysis in Ch.14-15 shows mutual-incoherence rates.
Lower bound candidate
A reduction from planted dense submatrix would give conditional lower bounds at — the widely-believed barrier for poly-time sparse PCA/CCA. Write up as a research note.
References
- N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006— The canonical reference for adversarial online learning. Theorem 2.2 is the multiplicative-weights regret bound cited in Section 25.2.
- E. Hazan, Introduction to Online Convex Optimization, 2016— Modern treatment of OCO and regret minimization. Used for the online-to-batch conversion and projected-gradient regret analysis.
- S. Bubeck and N. Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, 2012— Comprehensive survey of stochastic and adversarial bandits. Covers UCB, EXP3, and the lower-bound techniques used in Section 25.2.
- P. Auer, N. Cesa-Bianchi, and P. Fischer, Finite-Time Analysis of the Multiarmed Bandit Problem, 2002— Original paper introducing UCB1 with the logarithmic regret bound cited in Theorem 25.2.2.
- H. Robbins and S. Monro, A Stochastic Approximation Method, 1951— Origin of stochastic approximation, the mathematical parent of modern online estimation and SGD.
- L. Xiao and S. Boyd, Fast Linear Iterations for Distributed Averaging, 2004— The reference for optimal symmetric gossip weights and the spectral-gap convergence rate used in Section 25.3.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized Gossip Algorithms, 2006— Rigorous analysis of averaging and pairwise gossip, including the message-complexity lower bounds compared in the simulation.
- R. Olfati-Saber, J. A. Fax, and R. M. Murray, Consensus and Cooperation in Networked Multi-Agent Systems, 2007— Broad survey of consensus dynamics, Laplacian spectra, and applications to distributed estimation and control.
- S. Kar, J. M. F. Moura, and K. Ramanan, Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication, 2012— Distributed consensus+innovations estimation with convergence guarantees under random link failures.
- Q. Berthet and P. Rigollet, Computational Lower Bounds for Sparse PCA, 2013— Landmark result establishing a polynomial-time statistical- computational gap for sparse PCA under the planted clique hypothesis — the basis for Section 25.1.
- F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborova, and P. Zhang, Spectral Redemption in Clustering Sparse Networks, 2013— Non-backtracking spectral methods closing part of the statistical- computational gap for stochastic block models.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006— Introduces differential privacy and the Laplace mechanism — the starting point for privacy-constrained estimation.
- M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Cambridge University Press, 2019— Comprehensive reference for minimax rates, concentration, and non-asymptotic analysis underlying the tradeoffs in this chapter.
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, 2009— Modern convergence analysis for online / stochastic gradient estimation used throughout Section 25.2.