Exercises

ex-ch25-01

Easy

Sparse PCA minimax rate. Consider the spiked model with sparsity kk, ambient dimension dd, and nn samples. Write down the information-theoretic rate λstat2\lambda_{\text{stat}}^2 and the conjectured polynomial-time rate λcomp2\lambda_{\text{comp}}^2. For d=1000d = 1000, n=500n = 500, k=10k = 10, compute both and the ratio.

,

ex-ch25-02

Easy

Regret of the best fixed expert. In a prediction-with-experts game with N=4N=4 experts, T=100T=100 rounds, and cumulative losses LT1=72,LT2=45,LT3=61,LT4=58L_T^1 = 72, L_T^2 = 45, L_T^3 = 61, L_T^4 = 58. The learner achieved cumulative loss 6464. Compute the regret against the best expert.

ex-ch25-03

Easy

Consensus matrix check. For the ring on N=4N=4 nodes with weights wii=1/2w_{ii} = 1/2 and wi,i±1=1/4w_{i,i\pm1} = 1/4, verify that W\mathbf{W} is symmetric, doubly stochastic, and has 1\mathbf{1} as an eigenvector.

ex-ch25-04

Easy

UCB exploration bonus. The UCB1 index at time tt for arm aa that has been pulled nan_a times is μ^a+2lnt/na\hat\mu_a + \sqrt{2 \ln t / n_a}. For t=100t=100, na=4n_a = 4, μ^a=0.6\hat\mu_a = 0.6, compute the index.

ex-ch25-05

Easy

Differential privacy mechanism. Let μn\mu_n be the sample mean of nn iid [0,1][0,1]-valued observations. To release an ε\varepsilon-differentially private estimate, add Laplace noise with scale b=Δ/εb = \Delta / \varepsilon. Compute Δ\Delta and bb for n=200n=200, ε=0.5\varepsilon = 0.5.

ex-ch25-06

Medium

Multiplicative weights regret bound. Show that against any sequence of losses ti[0,1]\ell_t^i \in [0,1], the MWU algorithm with step size η=2lnN/T\eta = \sqrt{2 \ln N / T} has regret RT2TlnNR_T \le \sqrt{2 T \ln N}.

,

ex-ch25-07

Medium

Online to batch conversion. Suppose an online algorithm achieves regret RTCTR_T \le C\sqrt{T} for convex losses. Show that averaging its iterates gives a statistical estimator θˉT\bar\theta_T with excess risk E[f(θˉT)]f(θ)C/T\mathbb{E}[f(\bar\theta_T)] - f(\theta^\star) \le C / \sqrt{T}.

,

ex-ch25-08

Medium

Spectral gap controls consensus. For a symmetric doubly stochastic W\mathbf{W}, show that the averaging error satisfies x(t)xˉ12λ2(W)tx(0)xˉ12\|\mathbf{x}^{(t)} - \bar x \mathbf{1}\|_2 \le \lambda_2(\mathbf{W})^t \|\mathbf{x}^{(0)} - \bar x\mathbf{1}\|_2.

,

ex-ch25-09

Medium

UCB regret via confidence bounds. Sketch the argument that UCB1 achieves regret O(a:Δa>0log(T)/Δa)\mathcal{O}(\sum_{a: \Delta_a > 0} \log(T) / \Delta_a) in stochastic bandits with Gaussian rewards and gap Δa\Delta_a.

,

ex-ch25-10

Medium

Stochastic approximation. Consider Robbins-Monro updates θt+1=θtγt(θtyt)\theta_{t+1} = \theta_t - \gamma_t (\theta_t - y_t) where yt=θ+wty_t = \theta^\star + w_t, wtN(0,σ2)w_t \sim \mathcal{N}(0, \sigma^2) iid. With γt=1/t\gamma_t = 1/t, show that E[(θtθ)2]=σ2/t\mathbb{E}[(\theta_t - \theta^\star)^2] = \sigma^2 / t in the limit.

,

ex-ch25-11

Medium

Gossip vs synchronous consensus. On a ring of NN nodes, synchronous averaging takes Θ(N2)\Theta(N^2) rounds with total messages NΘ(N2)N \cdot \Theta(N^2). Pairwise randomized gossip converges in Θ(N2)\Theta(N^2) pair exchanges in expectation. Which uses fewer messages? How does the answer change on a complete graph?

,

ex-ch25-12

Medium

Paper critique. A submitted paper claims a new estimator achieves MSE scaling of 1/n21/n^2 for estimating a Lipschitz density at a point. Explain why this claim is extraordinary and what checks you would perform as a reviewer.

ex-ch25-13

Medium

Metropolis-Hastings weights. Design symmetric gossip weights W\mathbf{W} for a graph where each node only knows its own degree and its neighbours' degrees. Use the Metropolis rule wij=1/(1+max(di,dj))w_{ij} = 1/(1 + \max(d_i, d_j)) for (i,j)E(i,j) \in E and verify double stochasticity.

,

ex-ch25-14

Hard

Planted clique reduction. Sketch how a polynomial-time sparse-PCA algorithm achieving rate λ<Ck2/n\lambda < C\sqrt{k^2/n} would refute the planted clique hardness conjecture.

,

ex-ch25-15

Hard

Follow-the-regularized-leader. For OCO with convex losses ftf_t and strongly convex regularizer RR, show that θt=argminθ{s<tfs(θ)+R(θ)/η}\theta_t = \arg\min_\theta \{\sum_{s<t} f_s(\theta) + R(\theta)/\eta\} attains regret RTR(θ)/η+ηtft(θt)2/2R_T \le R(\theta^\star)/\eta + \eta \sum_t \|\nabla f_t(\theta_t)\|_\star^2 / 2.

,

ex-ch25-16

Hard

Distributed Kalman filter consistency. Each of NN sensors observes yi=hiθ+viy_i = h_i \theta + v_i with viN(0,1)v_i \sim \mathcal{N}(0, 1) iid. Show that the consensus+innovations update θi(t+1)=jwijθj(t)+αthi(yihiθi(t))\theta_i^{(t+1)} = \sum_j w_{ij}\theta_j^{(t)} + \alpha_t h_i(y_i - h_i \theta_i^{(t)}) converges to the centralized ML estimator under appropriate αt\alpha_t.

, ,

ex-ch25-17

Hard

Cost of privacy. Under ε\varepsilon-differential privacy, the minimax risk for estimating a scalar mean over iid [0,1][0,1] samples is Rnmax(1/n,1/(nε)2)R_n \asymp \max(1/n, 1/(n\varepsilon)^2). Derive both regimes and identify the sample size at which privacy becomes free.

,

ex-ch25-18

Challenge

Open-ended: statistical-computational trade-offs in your field. Pick a telecom estimation problem (pick one: channel estimation in massive MIMO with sparse angular support, user-activity detection, symbol decoding with low-precision ADCs). (a) Identify a candidate statistical-computational gap in that problem. (b) Propose a polynomial-time estimator and its rate. (c) Propose a lower-bound strategy (reduction, sum-of-squares lower bound, or average-case hardness).

, ,