Prerequisites & Notation

Before You Begin

This capstone chapter synthesizes material from across the book and points toward active research frontiers. Brush up on the following before starting.

Convex optimization and duality (used throughout, especially in online learning)(Review ch01)
Self-check: Can you state the KKT conditions and explain why convex problems have no duality gap under Slater's condition?
Bayesian MMSE, LMMSE, and posterior computation(Review ch16)
Self-check: Can you write the LMMSE estimator $\hat{\theta} = \boldsymbol{\Sigma}_{\theta y}\boldsymbol{\Sigma}_y^{-1}\mathbf{y}$ from memory and state when it is optimal?
Kalman filtering: state-space model and recursive update(Review ch23)
Self-check: Can you write the Kalman gain $\mathbf{K}_k$ and state the innovation equation?
Cramér–Rao bound, Bayesian CRB, and the threshold effect(Review ch24)
Self-check: Can you explain why the CRB is loose in the low-SNR regime?
Eigenvalues of graph Laplacians, spectral graph theory basics
Self-check: Do you know why the second-smallest Laplacian eigenvalue (algebraic connectivity) controls mixing rates?
PAC learning and concentration inequalities (Hoeffding, Azuma)
Self-check: Can you state Hoeffding's inequality and explain what a sub-Gaussian tail is?

Notation for This Chapter

Symbols introduced in this chapter. The notation bridges statistics, online learning, and distributed computation.

Symbol	Meaning	Introduced
$R_T$	Cumulative regret over horizon $T$ : difference between learner's loss and best fixed expert's loss	s02
$\ell_t(\mathbf{w})$	Convex loss at round $t$ evaluated at iterate $\mathbf{w}$	s02
$\eta$	Learning rate (step size) in online gradient descent / MWU	s02
$w_i^{(t)}$	Weight on expert $i$ at round $t$ in MWU	s02
$N$	Number of experts / arms (online learning) or nodes (graphs)	s02
$K$	Number of bandit arms	s02
$\mathbf{W}$	Doubly stochastic gossip/consensus matrix	s03
$\lambda_2(\mathbf{W})$	Second-largest eigenvalue of $\mathbf{W}$ in magnitude; governs consensus rate	s03
$\mathcal{G} = (\mathcal{V}, \mathcal{E})$	Undirected graph with node set $\mathcal{V}$ and edge set $\mathcal{E}$	s03
$\mathbf{L}$	Graph Laplacian $\mathbf{L} = \mathbf{D} - \mathbf{A}$	s03
$\lambda_{\text{stat}}, \lambda_{\text{comp}}$	Statistical and computational thresholds in a high-dimensional estimation problem	s01
$n, d, k$	Sample size, ambient dimension, sparsity level	s01
$\hat{\theta}_i^{(t)}$	Estimate at node $i$ , iteration $t$	s03
$\varepsilon$	Differential privacy parameter (small $\varepsilon$ = strong privacy)	s03

← Ch 24 Computation–Estimation Tradeoffs