Prerequisites & Notation

Before You Begin

This chapter applies coded caching principles to distributed machine learning. Prerequisites: MAN basics and familiarity with distributed ML (stochastic gradient descent, parameter server architectures).

MAN coded caching gain (Ch 2)(Review ch02)
Self-check: Can you state the coded multicasting gain $1 + KM/N$ ?
Stochastic gradient descent / distributed ML
Self-check: What is data shuffling between epochs and why is it needed?
Parameter server architecture(Review ch26)
Self-check: Can you describe the all-reduce operation in PyTorch distributed training?
Basic combinatorics / XOR operations(Review ch02)
Self-check: Why do XOR-coded messages work in MAN delivery?
Index coding perspective (Ch 4)(Review ch04)
Self-check: What is the role of the conflict graph in coded delivery?

Notation for This Chapter

Symbols for distributed ML data shuffling.

Symbol	Meaning	Introduced
$K$	Number of workers in distributed ML	s01
$D$	Total dataset (analogous to library $N$ in MAN)	s01
$s$	Fraction of dataset stored at each worker; $s = M/N$ in MAN terms	s01
$R_\text{shuffle}$	Communication cost per shuffling epoch (data units)	s01
$T$	Number of epochs (rounds of shuffling + training)	s01
$r$	Straggler tolerance in gradient coding	s03
$t$	Caching-gain-like parameter $t = Ks$ for shuffling	s02

← Ch 14 The Data Shuffling Problem