Prerequisites & Notation

Before You Begin

Chapter 7 applies the coded-computing framework of Part II to a specific operational need in distributed machine learning: the re-shuffling of data across workers at every training epoch. The prerequisites are the coded-caching intuition of Chapter 4, the polynomial-code construction of Chapter 5, and enough SGD background to know why shuffling matters at all.

  • Coded caching delivery (Chapter 4 §4.3)(Review ch04)

    Self-check: State the Maddah-Ali / Niesen global caching gain 1+KM/F1 + KM/F and identify where the alignment trick enters.

  • Polynomial codes for matrix multiplication (Chapter 5)(Review ch05)

    Self-check: For p=qp = q partitions, what is the recovery threshold KK of the polynomial code?

  • Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)

    Self-check: Why does stochastic gradient descent require data shuffling across epochs?

  • The (μ,Δ)(\mu, \Delta) tradeoff framework (Chapter 2)(Review ch02)

    Self-check: State the optimal coded-shuffling load Δ(μ)=(1μ)/(Nμ)\Delta^*(\mu) = (1 - \mu)/(N\mu).

  • Standard SGD convergence intuition(Review ch05)

    Self-check: Why is random-order sampling (with replacement) related to cyclic re-ordering of mini-batches?

Notation for This Chapter

We follow the coded-caching convention: NN workers, MM is the per-worker memory (measured in number of data points), and KK is reserved for the number of distinct data points per "epoch" when the chapter uses the caching-style framing. Where the chapter inherits Chapter 2's convention, KK denotes the recovery threshold and the shift is noted explicitly.

SymbolMeaningIntroduced
NNNumber of workerss01
DDTotal number of data points in the training sets01
MMPer-worker memory (number of data points each worker can hold)s01
μ=M/D\mu = M/DPer-worker storage fraction (consistent with Chapter 2)s01
Δ\DeltaPer-epoch shuffling communication load (normalized)s02
πt\pi_tPermutation of the dataset used at epoch tts02
WdW_dThe dd-th data point, d=1,,Dd = 1, \ldots, Ds02
Tk(t)\mathcal{T}_k^{(t)}Set of data points stored at worker kk during epoch tts02
R(M)R^*(M)Optimal per-epoch shuffling rate (the CommIT result)s03