Prerequisites & Notation
Before You Begin
Chapter 7 applies the coded-computing framework of Part II to a specific operational need in distributed machine learning: the re-shuffling of data across workers at every training epoch. The prerequisites are the coded-caching intuition of Chapter 4, the polynomial-code construction of Chapter 5, and enough SGD background to know why shuffling matters at all.
- Coded caching delivery (Chapter 4 §4.3)(Review ch04)
Self-check: State the Maddah-Ali / Niesen global caching gain and identify where the alignment trick enters.
- Polynomial codes for matrix multiplication (Chapter 5)(Review ch05)
Self-check: For partitions, what is the recovery threshold of the polynomial code?
- Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)
Self-check: Why does stochastic gradient descent require data shuffling across epochs?
- Standard SGD convergence intuition(Review ch05)
Self-check: Why is random-order sampling (with replacement) related to cyclic re-ordering of mini-batches?
Notation for This Chapter
We follow the coded-caching convention: workers, is the per-worker memory (measured in number of data points), and is reserved for the number of distinct data points per "epoch" when the chapter uses the caching-style framing. Where the chapter inherits Chapter 2's convention, denotes the recovery threshold and the shift is noted explicitly.
| Symbol | Meaning | Introduced |
|---|---|---|
| Number of workers | s01 | |
| Total number of data points in the training set | s01 | |
| Per-worker memory (number of data points each worker can hold) | s01 | |
| Per-worker storage fraction (consistent with Chapter 2) | s01 | |
| Per-epoch shuffling communication load (normalized) | s02 | |
| Permutation of the dataset used at epoch | s02 | |
| The -th data point, | s02 | |
| Set of data points stored at worker during epoch | s02 | |
| Optimal per-epoch shuffling rate (the CommIT result) | s03 |