Part 2: Coded Computing
Chapter 7: Coded Data Shuffling
Advanced~200 min
Learning Objectives
- Motivate why modern ML training pipelines re-shuffle data every epoch and quantify the communication cost of doing so
- Formulate the -data-shuffling problem in the information-theoretic framework
- State and prove the Wan/Tuninetti/Caire memory-communication tradeoff, a CommIT contribution
- Construct the coded-shuffling achievability scheme via XOR combinations over subsets of size
- Derive the matching converse via a cut-set argument specialized to the shuffling problem
- Connect coded data shuffling with the coded-caching framework of Chapter 4, and compare with Chapter 2's coded-MapReduce result
Sections
💬 Discussion
Loading discussions...