Prerequisites & Notation

Before You Begin

This chapter introduces distributed computing as the practical setting in which all the information-theoretic problems of this book live. The mathematical demands are modest — the goal is to motivate the formal framework that Chapter 2 will build. If anything below feels unfamiliar, spend a few minutes refreshing it before proceeding.

Basic probability: random variables, expectation, independence
Self-check: Given $N$ i.i.d. exponential service times $T_1, \ldots, T_N$ , can you write $\mathbb{E}[\max_k T_k]$ ?
Asymptotic notation: $\mathcal{O}(\cdot)$ , $\Theta(\cdot)$ , $o(\cdot)$
Self-check: Is $n \log n = \mathcal{O}(n^{1.01})$ ?
Vector / matrix notation, gradients of multivariate functions
Self-check: For $f(\mathbf{w}) = \tfrac{1}{2}\|\mathbf{w}\|_2^2$ , what is $\nabla f(\mathbf{w})$ ?
Stochastic gradient descent at a high level
Self-check: Can you write the update rule $\mathbf{w}_{t+1} = \mathbf{w}_t - \eta \nabla \ell(\mathbf{w}_t)$ and say in one sentence why we use a sample-based gradient instead of the full one?
Comfort with order statistics — at least the maximum of i.i.d. samples
Self-check: For $T_i \sim \mathrm{Exp}(\lambda)$ i.i.d., do you remember that $\mathbb{E}[\max_{i \leq N} T_i] = \frac{1}{\lambda}\sum_{i=1}^N \frac{1}{i}$ ?

Notation for This Chapter

Symbols introduced in this chapter. We use $N$ for the number of workers and $n$ for the number of users in a federated setting, distinguishing the two compute architectures we will see in Parts II and III. Throughout the SC book, $K$ is reserved for the recovery threshold (Part II) and the number of files (Part IV); we point out the context whenever it matters.

Symbol	Meaning	Introduced
$N$	Number of workers (compute nodes) coordinated by a master	s01
$M$	Number of distinct intermediate-value chunks (MapReduce shuffle)	s01
$\mu \in [0, 1]$	Computation (storage) load: fraction of dataset stored per worker	s01
$\Delta$	Communication load (bits exchanged, normalized by file size)	s01
$T_i$	Random task-completion time of worker $i$	s02
$T_{(N)}$	Maximum (slowest) of $N$ task-completion times	s02
$\mathbf{w}_t \in \mathbb{R}^d$	Model parameter vector at iteration $t$	s03
$\mathbf{g}_k$	Local gradient computed by worker (or user) $k$	s03
$\mathbf{G} = \sum_{k=1}^n \mathbf{g}_k$	Aggregated gradient at the master / parameter server	s03
$d$	Model dimensionality (parameters per gradient)	s03
$\eta$	Learning rate (step size)	s03
$B$	Number of Byzantine (malicious) workers	s04

Back to chapter overview MapReduce and the Communication Bottleneck