Exercises

ex-ch06-01

Easy

For a (s,N)=(5,50)(s, N) = (5, 50) gradient-coding scheme, what are the per-worker storage, the recovery threshold, and the straggler tolerance?

ex-ch06-02

Easy

The aggregate gradient G=βˆ‘kgk\mathbf{G} = \sum_k \mathbf{g}_k is a matrix-vector product. Identify the matrix and the vector.

ex-ch06-03

Easy

For exact gradient coding with N=200N = 200 workers and i.i.d. exponential local times rate Ξ»=1\lambda = 1, find the per-round latency speedup of s=40s = 40 stragglers tolerated vs. s=0s = 0.

ex-ch06-04

Easy

State the difference between exact gradient coding and approximate gradient coding in one sentence each.

ex-ch06-05

Medium

Design a (2,6)(2, 6)-gradient-coding scheme. Specify the storage assignment Sk\mathcal{S}_k for each worker and write out an encoding matrix B\mathbf{B} that satisfies the cyclic construction.

ex-ch06-06

Medium

Prove that the uncoded SGD scheme (no redundancy) corresponds to s=0s = 0 in gradient coding, and verify that the recovery threshold K=Nβˆ’0=NK = N - 0 = N is correct.

ex-ch06-07

Medium

Compute the per-worker storage for approximate gradient coding with Ο΅=0.05\epsilon = 0.05, N=100N = 100 workers, and straggler tolerance s=30s = 30.

ex-ch06-08

Medium

Derive (informally) the convergence-rate result for approximate gradient coding on a smooth strongly-convex objective. Identify the asymptotic floor.

ex-ch06-09

Medium

Explain why coded gradient computation does not provide privacy against the master, and how it can be combined with Bonawitz-style secure aggregation to provide both.

ex-ch06-10

Medium

Compare gradient sparsification (top-KK) and approximate gradient coding in terms of (i) what is approximated, (ii) where in the pipeline the approximation happens, and (iii) when each pays off.

ex-ch06-11

Hard

Prove that the cyclic gradient-coding construction's encoding matrix B\mathbf{B} has the property: for every responder set T\mathcal{T} of size Nβˆ’sN - s, the linear system DBT,:=1NT\mathbf{D} \mathbf{B}_{\mathcal{T}, :} = \mathbf{1}_N^T has a solution D\mathbf{D}.

ex-ch06-12

Hard

Derive the random-sparse approximate gradient-coding error bound Eβˆ₯G^βˆ’Gβˆ₯2/βˆ₯Gβˆ₯2≀O(1/[(Nβˆ’s)ρ])\mathbb{E}\|\widehat{\mathbf{G}} - \mathbf{G}\|^2 / \|\mathbf{G}\|^2 \leq \mathcal{O}(1/[(N - s)\rho]) informally. What is the role of the random-matrix concentration?

ex-ch06-13

Hard

Sketch a heterogeneous gradient-coding scheme where workers have different storage budgets bkb_k. Conjecture the optimal recovery threshold and discuss what makes the problem harder than the symmetric case.

ex-ch06-14

Hard

Compute the joint storage / communication / latency trade-off for the composition of (i) (s,N)(s, N)-gradient coding, (ii) Bonawitz secure aggregation, on a federated- learning round with nn users, dd-dimensional gradients, and bb-bit precision. Identify the dominant cost.

ex-ch06-15

Challenge

Open problem. For non-convex deep-learning models, derive (or empirically validate) a bound on the convergence-rate degradation of approximate gradient coding with relative error Ο΅\epsilon. The convex-case bound (Β§6.3 Thm. 6.3.2) gives an asymptotic floor of O(Ο΅)\mathcal{O}(\epsilon); characterize whether the same holds for non-convex losses.