Exercises
ex-ch16-e01
EasyVerify the LMYA communication load formula at the endpoints: (uncoded) and (full replication).
Plug in : should get uncoded load .
Plug in : should get 0 β no shuffle needed (every worker has all data).
r = 1
. Matches uncoded: each worker broadcasts to peers per key.
r = K
. Full replication eliminates shuffle β each worker has everything.
ex-ch16-e02
EasyState the analog of MAN's subpacketization constraint for coded MapReduce. Why is this a practical limitation?
LMYA partitions into batches.
For , : how many batches?
Number of batches
file batches, each requiring separate map function invocation. For , : .
Practical impact
Partition overhead dominates. LMYA works well for small (2-5); large infeasible. PDA-like reductions (Ch 14) can mitigate for coded MapReduce but require active research.
ex-ch16-e03
MediumDerive the condition under which coded MapReduce yields wallclock speedup over uncoded.
Total time = map + shuffle + reduce.
Coded: map scales by , shuffle scales by .
Set total time inequality and solve.
Wallclock times
Uncoded: . Coded: .
Condition
Speedup .
Simplified
Factor out : . Coded MapReduce wins when replication doesn't exceed the shuffle-to-map time ratio.
Optimal r
Differentiate: . Balances map and shuffle exactly.
ex-ch16-e04
MediumFor gradient coding with workers and straggler tolerance, derive the storage overhead factor.
Redundancy = per data partition.
Storage factor
: each data partition stored at 5 of 20 workers.
Aggregate storage
Total storage = times the uncoded total. 5Γ overhead for 4-straggler tolerance.
ex-ch16-e05
MediumShow that coded matrix multiplication with threshold is equivalent to an -MDS code over the rows of the product .
An MDS code tolerates erasures.
Lagrange interpolation reconstructs from any evaluations.
MDS property
-MDS: any of coded symbols determine all . Equivalently: tolerate any erasures.
Coded matmul correspondence
Polynomial encoding: for . Any evaluations uniquely determine β same as -MDS.
Straggler correspondence
Stragglers = erasures. MDS tolerates erasures, so coded matmul tolerates stragglers.
ex-ch16-e06
MediumFor Netflix-scale analytics ( workers, ), compute expected shuffle reduction and map overhead.
Shuffle reduction
reduction.
Map overhead
Each file mapped at 5 workers β 5Γ CPU time.
Wallclock
Net benefit only if .
ex-ch16-e07
HardProve the converse: the LMYA is optimal. (Sketch.)
Use a cut-set bound on the shuffle phase.
Consider the set of values needed by any subset of workers.
Cut-set setup
For subset of workers, they need values from files mapped by workers in . With computation load , the fraction of "new" values (not already at ) is
Bounding expected transmissions
After averaging over all , total shuffle load . Matches LMYA achievable.
Matching
LMYA's scheme exactly achieves the lower bound for integer . For non-integer , time-sharing closes the gap.
ex-ch16-e08
HardDesign a gradient coding scheme for , (tolerate 2 stragglers). Specify the encoding matrix.
Cyclic assignment: stored at workers 1, 2, 3.
Encoding matrix rows must span when any 3 rows selected.
Cyclic assignment
Data stored at workers .
Encoding matrix (one valid choice)
(rows = workers, columns = data partitions). Each column has 3 ones (data at 3 workers).
Decodability check
Any 3 rows span the all-ones vector can reconstruct . Verified by checking all selections.
ex-ch16-e09
HardCompare coded data shuffling (Ch 15) and coded MapReduce (Ch 16). What are their respective domains? Can they be combined?
Shuffling: ML inter-epoch data movement.
MapReduce: analytics query execution.
Different systems
Coded shuffling: ML training pipeline, regular epochs. Coded MapReduce: batch analytics jobs, one-shot processing.
Same formula
Both yield or -factor reduction in communication via MAN-style coding.
Composition
In hybrid pipelines (feature engineering + training): apply coded MapReduce for feature computation, coded shuffling for training data epoch shuffling. Distinct layers, both using coded caching principles.
ex-ch16-e10
MediumImplement a toy , coded MapReduce simulation in Python (pseudocode). Compare uncoded and coded shuffle messages.
6 workers, 15 file batches (combinations of pairs).
Each XOR message is a -subset of workers.
Placement
Batches β 15 total. Worker holds iff .
Shuffle messages
Number of 3-subsets: . Each message is an XOR of 3 intermediate-value chunks.
Code sketch
from itertools import combinations
K = 6; r = 2
shuffle_msgs = []
for subset in combinations(range(K), r+1):
xor = sum(V[subset - {k}][q_k] for k in subset)
shuffle_msgs.append(xor)
# Uncoded: K choose (K-1) = K(K-1) messages, not XORed
ex-ch16-e11
MediumWhy does Reed-Solomon suffice for coded matrix multiplication (rather than needing more complex codes)?
Think about MDS property.
Straggler = erasure.
MDS suffices
Any -of- MDS code tolerates the worst-case erasures. Reed-Solomon is canonical MDS over large enough field.
Alternative codes
LDPC/Polar codes offer better computation at high redundancy but complicate decoding. For matmul, the encoding overhead and decoder complexity of RS are typically acceptable; fancier codes are used for bandwidth-critical regimes.
ex-ch16-e12
HardThe Joudeh-Caire 2024 result extends gradient coding to heterogeneous FL. What goes wrong when the original Tandon et al. scheme is applied directly to non-IID clients?
Encoding matrix assumes equal batch sizes per worker.
FL: clients have variable data distributions and sizes.
Original assumption
Tandon et al.: workers, each with IID data of equal size. Encoding matrix weights data symmetrically.
FL reality
Non-IID: client 's data is different. Weighted gradient with reflecting data size.
Joudeh-Caire fix
Extend encoding matrix to handle weighted sums. Encoding matrix satisfies: any rows span (weight vector), not . Requires modified RS-like construction.
ex-ch16-e13
MediumExplain why coded MapReduce's combinatorial structure is the MAN structure, and why this is so important intellectually.
Both use -subset placement and -subset XORing.
What's the "cache" in LMYA?
Structural isomorphism
LMYA file batch at -subset = MAN file subfile at -subset. LMYA -subset XOR = MAN -subset XOR. Identical combinatorics.
Cache analog
"Cache" in LMYA = redundant computation capacity. A worker computing at higher redundancy has "more cache". This insight made coded caching portable to new domains.
Why important
Unified theory: one combinatorial toolkit serves many problems. Advances in one translate to others β e.g., PDA subpacketization (Ch 14) informs coded MapReduce design.
ex-ch16-e14
HardDesign experiment: suppose you're deploying coded MapReduce at a real datacenter. What metrics would you measure to validate the theoretical prediction?
Shuffle volume, wallclock, CPU utilization, network utilization.
Key metrics
- Shuffle bytes per job. Direct verification of .
- Wallclock per job. End-to-end validation.
- CPU-hours for map phase. Verify overhead.
- Network utilization during shuffle. Saturation behavior.
- p99 latency tail. Straggler effects.
Baselines
Compare coded MapReduce (research prototype) vs uncoded (production Spark) on identical workloads. Control for data locality and network topology.
Expected outcomes
Shuffle volume matches within 5%. Wallclock reduction depends on shuffle/map ratio. Network utilization peaks reduced (less burstiness).
ex-ch16-e15
HardCoded computing "unifies" caching, shuffling, MapReduce, gradient coding. Identify one open problem that cuts across all four.
Think about subpacketization, finite-blocklength, privacy, or heterogeneity.
Subpacketization
All four: or messages. Scales exponentially in for fixed gain. PDA reduction works for caching; open for LMYA, shuffling, gradient coding.
Or: finite-blocklength
All four assume large file/block size for coding to work. Finite-blocklength effects (small files) undercuts gains. Open research.
Or: privacy
All four have privacy variants (Wan-Caire privacy, private FL, etc). Open: unified privacy-rate tradeoff across all four primitives.