Prerequisites & Notation

Before You Begin

This chapter introduces distributed computing as the practical setting in which all the information-theoretic problems of this book live. The mathematical demands are modest — the goal is to motivate the formal framework that Chapter 2 will build. If anything below feels unfamiliar, spend a few minutes refreshing it before proceeding.

  • Basic probability: random variables, expectation, independence

    Self-check: Given NN i.i.d. exponential service times T1,,TNT_1, \ldots, T_N, can you write E[maxkTk]\mathbb{E}[\max_k T_k]?

  • Asymptotic notation: O()\mathcal{O}(\cdot), Θ()\Theta(\cdot), o()o(\cdot)

    Self-check: Is nlogn=O(n1.01)n \log n = \mathcal{O}(n^{1.01})?

  • Vector / matrix notation, gradients of multivariate functions

    Self-check: For f(w)=12w22f(\mathbf{w}) = \tfrac{1}{2}\|\mathbf{w}\|_2^2, what is f(w)\nabla f(\mathbf{w})?

  • Stochastic gradient descent at a high level

    Self-check: Can you write the update rule wt+1=wtη(wt)\mathbf{w}_{t+1} = \mathbf{w}_t - \eta \nabla \ell(\mathbf{w}_t) and say in one sentence why we use a sample-based gradient instead of the full one?

  • Comfort with order statistics — at least the maximum of i.i.d. samples

    Self-check: For TiExp(λ)T_i \sim \mathrm{Exp}(\lambda) i.i.d., do you remember that E[maxiNTi]=1λi=1N1i\mathbb{E}[\max_{i \leq N} T_i] = \frac{1}{\lambda}\sum_{i=1}^N \frac{1}{i}?

Notation for This Chapter

Symbols introduced in this chapter. We use NN for the number of workers and nn for the number of users in a federated setting, distinguishing the two compute architectures we will see in Parts II and III. Throughout the SC book, KK is reserved for the recovery threshold (Part II) and the number of files (Part IV); we point out the context whenever it matters.

SymbolMeaningIntroduced
NNNumber of workers (compute nodes) coordinated by a masters01
MMNumber of distinct intermediate-value chunks (MapReduce shuffle)s01
μ[0,1]\mu \in [0, 1]Computation (storage) load: fraction of dataset stored per workers01
Δ\DeltaCommunication load (bits exchanged, normalized by file size)s01
TiT_iRandom task-completion time of worker iis02
T(N)T_{(N)}Maximum (slowest) of NN task-completion timess02
wtRd\mathbf{w}_t \in \mathbb{R}^dModel parameter vector at iteration tts03
gk\mathbf{g}_kLocal gradient computed by worker (or user) kks03
G=k=1ngk\mathbf{G} = \sum_{k=1}^n \mathbf{g}_kAggregated gradient at the master / parameter servers03
ddModel dimensionality (parameters per gradient)s03
η\etaLearning rate (step size)s03
BBNumber of Byzantine (malicious) workerss04