Ferkans — Interactive Telecom Tutor

ex-ch16-e01

Easy

Verify the LMYA communication load formula $L(r) = (1/r)(1 - r/K)$ at the endpoints: $r = 1$ (uncoded) and $r = K$ (full replication).

Show Hint

Plug in $r = 1$ : should get uncoded load $1 - 1/K$ .

Plug in $r = K$ : should get 0 — no shuffle needed (every worker has all data).

Solution

r = 1

$L(1) = 1 \cdot (1 - 1/K) = 1 - 1/K$ . Matches uncoded: each worker broadcasts $(K-1)/K$ to peers per key.

r = K

$L(K) = (1/K)(1 - 1) = 0$ . Full replication eliminates shuffle — each worker has everything.

ex-ch16-e02

Easy

State the analog of MAN's subpacketization constraint for coded MapReduce. Why is this a practical limitation?

Show Hint

LMYA partitions into $\binom{K}{r}$ batches.

For $K = 100$ , $r = 10$ : how many batches?

Solution

Number of batches

$\binom{K}{r}$ file batches, each requiring separate map function invocation. For $K = 100$ , $r = 10$ : $\binom{100}{10} \approx 1.7 \times 10^{13}$ .

Practical impact

Partition overhead dominates. LMYA works well for small $r$ (2-5); large $r$ infeasible. PDA-like reductions (Ch 14) can mitigate for coded MapReduce but require active research.

ex-ch16-e03

Medium

Derive the condition under which coded MapReduce yields wallclock speedup over uncoded.

Show Hint

Total time = map + shuffle + reduce.

Coded: map scales by $r$ , shuffle scales by $1/r$ .

Set total time inequality and solve.

Solution

Wallclock times

Uncoded: $T_\text{unc} = T_m + T_s + T_r$ . Coded: $T_c(r) = rT_m + T_s/r + T_r$ .

Condition

Speedup $\iff T_c < T_\text{unc} \iff (r-1)T_m < (1 - 1/r)T_s$ .

Simplified

Factor out $(r - 1)$ : $r \cdot T_m < T_s$ $\iff r < T_s / T_m$ . Coded MapReduce wins when replication doesn't exceed the shuffle-to-map time ratio.

Optimal r

Differentiate: $r^* = \sqrt{T_s / T_m}$ . Balances map and shuffle exactly.

ex-ch16-e04

Medium

For gradient coding with $K = 20$ workers and $s = 4$ straggler tolerance, derive the storage overhead factor.

Show Hint

Redundancy = $s + 1$ per data partition.

Solution

Storage factor

$s + 1 = 5$ : each data partition stored at 5 of 20 workers.

Aggregate storage

Total storage = $K \cdot (s+1)/K = s + 1 = 5$ times the uncoded total. 5× overhead for 4-straggler tolerance.

ex-ch16-e05

Medium

Show that coded matrix multiplication with threshold $k$ is equivalent to an $(K, k)$ -MDS code over the rows of the product $\mathbf{A}\mathbf{B}$ .

Show Hint

An MDS code tolerates $K - k$ erasures.

Lagrange interpolation reconstructs from any $k$ evaluations.

Solution

MDS property

$(K, k)$ -MDS: any $k$ of $K$ coded symbols determine all $K$ . Equivalently: tolerate any $K - k$ erasures.

Coded matmul correspondence

Polynomial encoding: $\mathbf{P}(x_j) \mathbf{B}$ for $j = 1, \ldots, K$ . Any $k$ evaluations uniquely determine $\mathbf{P}(\cdot) \mathbf{B}$ — same as $(K, k)$ -MDS.

Straggler correspondence

Stragglers = erasures. MDS tolerates $K - k$ erasures, so coded matmul tolerates $K - k$ stragglers.

ex-ch16-e06

Medium

For Netflix-scale analytics ( $K = 500$ workers, $r = 5$ ), compute expected shuffle reduction and map overhead.

Solution

Shuffle reduction

$L(5)/L(1) = (1/5)(1 - 5/500)/(1 - 1/500) \approx (0.198)/0.998 \approx 0.199 \approx 5\times$ reduction.

Map overhead

Each file mapped at 5 workers — 5× CPU time.

Wallclock

Net benefit only if $T_\text{shuffle} > 5 T_\text{map}$ .

ex-ch16-e07

Hard

Prove the converse: the LMYA $L(r)$ is optimal. (Sketch.)

Show Hint

Use a cut-set bound on the shuffle phase.

Consider the set of values needed by any subset of workers.

Solution

Cut-set setup

For subset $\mathcal{T} \subseteq [K]$ of $t$ workers, they need values from files mapped by workers in $\mathcal{T}^c$ . With computation load $r$ , the fraction of "new" values (not already at $\mathcal{T}$ ) is $(K - r)/K \cdot (K-r-1)/(K-1)\cdots$

Bounding expected transmissions

After averaging over all $t$ , total shuffle load $\geq (1/r)(1 - r/K)$ . Matches LMYA achievable.

Matching

LMYA's scheme exactly achieves the lower bound for integer $r$ . For non-integer $r$ , time-sharing closes the gap.

ex-ch16-e08

Hard

Design a gradient coding scheme for $K = 5$ , $s = 2$ (tolerate 2 stragglers). Specify the $5 \times 5$ encoding matrix.

Show Hint

Cyclic assignment: $D_1$ stored at workers 1, 2, 3.

Encoding matrix rows must span $\mathbf{1}^T$ when any 3 rows selected.

Solution

Cyclic assignment

Data $D_k$ stored at workers $k, k+1, k+2 \pmod 5$ .

Encoding matrix (one valid choice)

$B = \begin{pmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 \end{pmatrix}$ (rows = workers, columns = data partitions). Each column has 3 ones (data at 3 workers).

Decodability check

Any 3 rows span the all-ones vector $\Leftrightarrow$ can reconstruct $\sum_i \nabla f_i$ . Verified by checking all $\binom{5}{3} = 10$ selections.

ex-ch16-e09

Hard

Compare coded data shuffling (Ch 15) and coded MapReduce (Ch 16). What are their respective domains? Can they be combined?

Show Hint

Shuffling: ML inter-epoch data movement.

MapReduce: analytics query execution.

Solution

Different systems

Coded shuffling: ML training pipeline, regular epochs. Coded MapReduce: batch analytics jobs, one-shot processing.

Same formula

Both yield $(1 + Kt)$ or $r$ -factor reduction in communication via MAN-style coding.

Composition

In hybrid pipelines (feature engineering + training): apply coded MapReduce for feature computation, coded shuffling for training data epoch shuffling. Distinct layers, both using coded caching principles.

ex-ch16-e10

Medium

Implement a toy $K = 6$ , $r = 2$ coded MapReduce simulation in Python (pseudocode). Compare uncoded and coded shuffle messages.

Show Hint

6 workers, 15 file batches (combinations of pairs).

Each XOR message is a $(r+1) = 3$ -subset of workers.

Solution

Placement

Batches $B_{\{1,2\}}, B_{\{1,3\}}, \ldots, B_{\{5,6\}}$ — 15 total. Worker $k$ holds $B_\mathcal{S}$ iff $k \in \mathcal{S}$ .

Shuffle messages

Number of 3-subsets: $\binom{6}{3} = 20$ . Each message is an XOR of 3 intermediate-value chunks.

Code sketch

from itertools import combinations
K = 6; r = 2
shuffle_msgs = []
for subset in combinations(range(K), r+1):
    xor = sum(V[subset - {k}][q_k] for k in subset)
    shuffle_msgs.append(xor)
# Uncoded: K choose (K-1) = K(K-1) messages, not XORed

ex-ch16-e11

Medium

Why does Reed-Solomon suffice for coded matrix multiplication (rather than needing more complex codes)?

Show Hint

Think about MDS property.

Straggler = erasure.

Solution

MDS suffices

Any $k$ -of- $K$ MDS code tolerates the worst-case $(K-k)$ erasures. Reed-Solomon is canonical MDS over large enough field.

Alternative codes

LDPC/Polar codes offer better computation at high redundancy but complicate decoding. For matmul, the encoding overhead and decoder complexity of RS are typically acceptable; fancier codes are used for bandwidth-critical regimes.

ex-ch16-e12

Hard

The Joudeh-Caire 2024 result extends gradient coding to heterogeneous FL. What goes wrong when the original Tandon et al. scheme is applied directly to non-IID clients?

Show Hint

Encoding matrix assumes equal batch sizes per worker.

FL: clients have variable data distributions and sizes.

Solution

Original assumption

Tandon et al.: $K$ workers, each with IID data of equal size. Encoding matrix weights data symmetrically.

FL reality

Non-IID: client $k$ 's data is different. Weighted gradient $\sum_k w_k \nabla f_k$ with $w_k$ reflecting data size.

Joudeh-Caire fix

Extend encoding matrix to handle weighted sums. Encoding matrix $B$ satisfies: any $K - s$ rows span $\mathbf{w}^T$ (weight vector), not $\mathbf{1}^T$ . Requires modified RS-like construction.

ex-ch16-e13

Medium

Explain why coded MapReduce's combinatorial structure is the MAN structure, and why this is so important intellectually.

Show Hint

Both use $r$ -subset placement and $(r+1)$ -subset XORing.

What's the "cache" in LMYA?

Solution

Structural isomorphism

LMYA file batch at $r$ -subset = MAN file subfile at $t$ -subset. LMYA $(r+1)$ -subset XOR = MAN $(t+1)$ -subset XOR. Identical combinatorics.

Cache analog

"Cache" in LMYA = redundant computation capacity. A worker computing at higher redundancy has "more cache". This insight made coded caching portable to new domains.

Why important

Unified theory: one combinatorial toolkit serves many problems. Advances in one translate to others — e.g., PDA subpacketization (Ch 14) informs coded MapReduce design.

ex-ch16-e14

Hard

Design experiment: suppose you're deploying coded MapReduce at a real datacenter. What metrics would you measure to validate the theoretical prediction?

Show Hint

Shuffle volume, wallclock, CPU utilization, network utilization.

Solution

Key metrics

Shuffle bytes per job. Direct verification of $L(r)$ .
Wallclock per job. End-to-end validation.
CPU-hours for map phase. Verify $r\times$ overhead.
Network utilization during shuffle. Saturation behavior.
p99 latency tail. Straggler effects.

Baselines

Compare coded MapReduce (research prototype) vs uncoded (production Spark) on identical workloads. Control for data locality and network topology.

Expected outcomes

Shuffle volume matches $(1/r)(1 - r/K)$ within 5%. Wallclock reduction depends on shuffle/map ratio. Network utilization peaks reduced (less burstiness).

ex-ch16-e15

Hard

Coded computing "unifies" caching, shuffling, MapReduce, gradient coding. Identify one open problem that cuts across all four.

Show Hint

Think about subpacketization, finite-blocklength, privacy, or heterogeneity.

Solution

Subpacketization

All four: $\binom{K}{t+1}$ or $\binom{K}{r+1}$ messages. Scales exponentially in $K$ for fixed gain. PDA reduction works for caching; open for LMYA, shuffling, gradient coding.

Or: finite-blocklength

All four assume large file/block size for coding to work. Finite-blocklength effects (small files) undercuts gains. Open research.

Or: privacy

All four have privacy variants (Wan-Caire privacy, private FL, etc). Open: unified privacy-rate tradeoff across all four primitives.

Exercises

ex-ch16-e01

r = 1

r = K

ex-ch16-e02

Number of batches

Practical impact

ex-ch16-e03

Wallclock times

Condition

Simplified

Optimal r

ex-ch16-e04

Storage factor

Aggregate storage

ex-ch16-e05

MDS property

Coded matmul correspondence

Straggler correspondence

ex-ch16-e06

Shuffle reduction

Map overhead

Wallclock

ex-ch16-e07

Cut-set setup

Bounding expected transmissions

Matching

ex-ch16-e08

Cyclic assignment

Encoding matrix (one valid choice)

Decodability check

ex-ch16-e09

Different systems

Same formula

Composition

ex-ch16-e10

Placement

Shuffle messages

Code sketch

ex-ch16-e11

MDS suffices

Alternative codes

ex-ch16-e12

Original assumption

FL reality

Joudeh-Caire fix

ex-ch16-e13

Structural isomorphism

Cache analog

Why important

ex-ch16-e14

Key metrics

Baselines

Expected outcomes

ex-ch16-e15

Subpacketization

Or: finite-blocklength

Or: privacy