Prerequisites & Notation

Before You Begin

This chapter applies coded caching principles to distributed machine learning. Prerequisites: MAN basics and familiarity with distributed ML (stochastic gradient descent, parameter server architectures).

  • MAN coded caching gain (Ch 2)(Review ch02)

    Self-check: Can you state the coded multicasting gain 1+KM/N1 + KM/N?

  • Stochastic gradient descent / distributed ML

    Self-check: What is data shuffling between epochs and why is it needed?

  • Parameter server architecture(Review ch26)

    Self-check: Can you describe the all-reduce operation in PyTorch distributed training?

  • Basic combinatorics / XOR operations(Review ch02)

    Self-check: Why do XOR-coded messages work in MAN delivery?

  • Index coding perspective (Ch 4)(Review ch04)

    Self-check: What is the role of the conflict graph in coded delivery?

Notation for This Chapter

Symbols for distributed ML data shuffling.

SymbolMeaningIntroduced
KKNumber of workers in distributed MLs01
DDTotal dataset (analogous to library NN in MAN)s01
ssFraction of dataset stored at each worker; s=M/Ns = M/N in MAN termss01
RshuffleR_\text{shuffle}Communication cost per shuffling epoch (data units)s01
TTNumber of epochs (rounds of shuffling + training)s01
rrStraggler tolerance in gradient codings03
ttCaching-gain-like parameter t=Kst = Ks for shufflings02