Prerequisites & Notation

Before You Begin

This capstone chapter synthesizes material from across the book and points toward active research frontiers. Brush up on the following before starting.

  • Convex optimization and duality (used throughout, especially in online learning)(Review ch01)

    Self-check: Can you state the KKT conditions and explain why convex problems have no duality gap under Slater's condition?

  • Bayesian MMSE, LMMSE, and posterior computation(Review ch16)

    Self-check: Can you write the LMMSE estimator θ^=ΣθyΣy1y\hat{\theta} = \boldsymbol{\Sigma}_{\theta y}\boldsymbol{\Sigma}_y^{-1}\mathbf{y} from memory and state when it is optimal?

  • Kalman filtering: state-space model and recursive update(Review ch23)

    Self-check: Can you write the Kalman gain Kk\mathbf{K}_k and state the innovation equation?

  • Cramér–Rao bound, Bayesian CRB, and the threshold effect(Review ch24)

    Self-check: Can you explain why the CRB is loose in the low-SNR regime?

  • Eigenvalues of graph Laplacians, spectral graph theory basics

    Self-check: Do you know why the second-smallest Laplacian eigenvalue (algebraic connectivity) controls mixing rates?

  • PAC learning and concentration inequalities (Hoeffding, Azuma)

    Self-check: Can you state Hoeffding's inequality and explain what a sub-Gaussian tail is?

Notation for This Chapter

Symbols introduced in this chapter. The notation bridges statistics, online learning, and distributed computation.

SymbolMeaningIntroduced
RTR_TCumulative regret over horizon TT: difference between learner's loss and best fixed expert's losss02
t(w)\ell_t(\mathbf{w})Convex loss at round tt evaluated at iterate w\mathbf{w}s02
η\etaLearning rate (step size) in online gradient descent / MWUs02
wi(t)w_i^{(t)}Weight on expert ii at round tt in MWUs02
NNNumber of experts / arms (online learning) or nodes (graphs)s02
KKNumber of bandit armss02
W\mathbf{W}Doubly stochastic gossip/consensus matrixs03
λ2(W)\lambda_2(\mathbf{W})Second-largest eigenvalue of W\mathbf{W} in magnitude; governs consensus rates03
G=(V,E)\mathcal{G} = (\mathcal{V}, \mathcal{E})Undirected graph with node set V\mathcal{V} and edge set E\mathcal{E}s03
L\mathbf{L}Graph Laplacian L=DA\mathbf{L} = \mathbf{D} - \mathbf{A}s03
λstat,λcomp\lambda_{\text{stat}}, \lambda_{\text{comp}}Statistical and computational thresholds in a high-dimensional estimation problems01
n,d,kn, d, kSample size, ambient dimension, sparsity levels01
θ^i(t)\hat{\theta}_i^{(t)}Estimate at node ii, iteration tts03
ε\varepsilonDifferential privacy parameter (small ε\varepsilon = strong privacy)s03