Ferkans — Interactive Telecom Tutor

ex-ch01-01

Easy

A MapReduce job has $N = 20$ workers storing equal disjoint partitions of a $V = 5$ GB intermediate file. How much aggregate network traffic does an uncoded shuffle generate?

Show Hint

Use $\Delta_{\text{uncoded}} = V(1 - 1/N)$ .

Plug in $V = 5$ GB and $N = 20$ .

Solution

Apply the formula

$\Delta_{\text{uncoded}} = 5 \cdot (1 - 1/20) = 5 \cdot 0.95 = 4.75$ GB.

ex-ch01-02

Easy

Ten workers have i.i.d. exponential task-completion times with rate $\lambda = 1$ . Find $\mathbb{E}[T_{(10)}]$ , the expected wait for the slowest worker.

Show Hint

Recall $\mathbb{E}[T_{(N)}] = H_N / \lambda$ .

Compute $H_{10} = \sum_{k=1}^{10} 1/k$ numerically.

Solution

Harmonic number

$H_{10} \approx 1 + 0.5 + 0.333 + 0.25 + 0.2 + 0.167 + 0.143 + 0.125 + 0.111 + 0.1 \approx 2.929$ .

Expected latency

$\mathbb{E}[T_{(10)}] = H_{10} / \lambda \approx 2.93$ time units — almost $3\times$ the per-worker mean of $1/\lambda = 1$ .

ex-ch01-03

Easy

A federated-learning round uses $n = 500$ users, model size $d = 10^7$ parameters, $b = 16$ bits per scalar. Compute the total per-round uplink traffic.

Show Hint

Use $C_{\text{round,uplink}} = n \cdot d \cdot b$ bits.

Solution

Plug in

$500 \cdot 10^7 \cdot 16 = 8 \cdot 10^{10}$ bits, i.e., $10$ GB per round.

ex-ch01-04

Easy

True or false: adding more workers to a synchronous distributed iteration always decreases the per-iteration wall-clock time.

Show Hint

Think about the $H_N / \lambda$ formula.

Solution

Answer

False. The per-iteration latency is $\mathbb{E}[T_{(N)}] = H_N/\lambda$ , which increases with $N$ . Without redundancy, adding workers only slows things down — each extra worker adds one more exponential tail that can delay the iteration.

ex-ch01-05

Medium

Derive the coded communication load $\Delta_{\text{coded}}(\mu) = (1 - \mu)/(N\mu)$ for MapReduce with $\mu \in \{1/N, 2/N, \ldots, 1\}$ , starting from the pair $(\mu, \Delta)$ endpoints $(\mu=1/N, \Delta = 1-1/N)$ and $(\mu=1, \Delta = 0)$ . Verify that the curve is convex and interpolates the two endpoints.

Show Hint

Check the function's value at $\mu = 1/N$ and $\mu = 1$ .

Compute the second derivative with respect to $\mu$ .

Solution

Endpoint checks

At $\mu = 1/N$ : $(1 - 1/N)/(N \cdot 1/N) = 1 - 1/N$ . ✓ At $\mu = 1$ : $(1 - 1)/N = 0$ . ✓

Convexity

Let $f(\mu) = (1-\mu)/(N\mu) = 1/(N\mu) - 1/N$ . Then $f'(\mu) = -1/(N\mu^2)$ and $f''(\mu) = 2/(N\mu^3) > 0$ for $\mu > 0$ . Hence $f$ is convex in $\mu$ , as required.

ex-ch01-06

Medium

For a redundant scheme with recovery threshold $K \leq N$ and i.i.d. exponential task times (rate $\lambda$ ), derive $\mathbb{E}[T_{(K)}] = (H_N - H_{N-K})/\lambda$ and compute its limit as $N \to \infty$ with $K / N \to \alpha \in (0, 1)$ .

Show Hint

Use the memoryless decomposition $T_{(k)} - T_{(k-1)} \sim \mathrm{Exp}((N-k+1)\lambda)$ .

Recognize $H_N - H_{N - K}$ as a partial sum.

Solution

Order-statistic gaps

$\mathbb{E}[T_{(k)} - T_{(k-1)}] = 1/[(N-k+1)\lambda]$ , so $\mathbb{E}[T_{(K)}] = \sum_{k=1}^K 1/[(N-k+1)\lambda] = (H_N - H_{N-K})/\lambda$ .

Asymptotic

$H_N - H_{N-K} = \ln(N/(N-K)) + o(1) = -\ln(1 - K/N) + o(1) \to -\ln(1-\alpha)$ . Hence $\mathbb{E}[T_{(\alpha N)}] \to -\ln(1-\alpha)/\lambda$ .

ex-ch01-07

Medium

A parameter server has $1$ Gbps of ingress bandwidth and a round must complete in under $1$ second. The model has $d = 10^8$ parameters at $b = 32$ bits. What is the maximum number of users $n$ that can be aggregated per round in the plaintext (no-privacy) baseline?

Show Hint

Compute bits per user, then divide available bandwidth by that.

Solution

Per-user uplink

$d b = 10^8 \cdot 32 = 3.2 \cdot 10^9$ bits per user.

Users per second

At $10^9$ bits/s ingress and $3.2 \cdot 10^9$ bits per user, the server can ingest $\sim 0.3$ users per second — or, stated in the opposite direction, just 0.3 users fit in the $1$ -second budget. To serve $n = 1000$ users per round we either need compression (e.g., 1-bit quantization reduces it to $1$ user per $100$ ms) or a parallel-aggregation architecture.

ex-ch01-08

Medium

Suppose a federated-learning service claims "privacy by design" because "raw data never leaves the device". Identify at least three ways the service can still leak information to the cloud provider, and map each to the chapter of this book that addresses it.

Show Hint

Gradient inversion.

Traffic analysis of activity patterns.

Server-side inference on aggregate gradients.

Solution

Gradient inversion (DLG)

A plaintext gradient can be inverted to reconstruct the training samples. Addressed in Chapter 10 (secure aggregation).

Activity side-channels

Timing and existence of uploads reveal that a particular user holds data matching the current model. Addressed by differential privacy and cover-traffic protocols in Chapter 18.

Model memorization

Even the aggregated model weights can memorize individual training samples. Addressed by DP-SGD (Chapter 9) and the PIR-style mechanisms of Part IV when applied to model serving.

ex-ch01-09

Medium

Consider a coded scheme with recovery threshold $K = 8$ for an iteration that uses $N = 10$ workers, each with i.i.d. exponential task times of rate $\lambda = 1$ . Compute $\mathbb{E}[T_{(8)}]$ and compare with $\mathbb{E}[T_{(10)}]$ .

Show Hint

Use $\mathbb{E}[T_{(K)}] = (H_N - H_{N-K})/\lambda$ .

Solution

With recovery

$\mathbb{E}[T_{(8)}] = H_{10} - H_2 \approx 2.929 - 1.5 = 1.429$ .

Without recovery

$\mathbb{E}[T_{(10)}] = H_{10} \approx 2.929$ .

Ratio

Tolerating just 2 stragglers cuts the iteration latency by roughly a factor of $2$ . The cost of course is the storage/computation redundancy that Chapter 5 will quantify.

ex-ch01-10

Medium

Describe a threat model where an honest-but-curious parameter server can be implemented with no privacy-preserving protocol at all, and explain what has to be true of the deployment for this to be defensible.

Show Hint

Think about who the server is and what legal / contractual guarantees exist.

When does an attacker not have access to the server?

Solution

Within a single organization

If all users are employees of one organization, the parameter server is run by that organization's SRE team, and the data is not regulated externally (HIPAA, GDPR, export controls), then the server is not an adversary — it is part of the trusted computing base. In this setting no privacy- preserving protocol is required, because there is no adversary to defend against.

When the defense breaks

The defense fails the moment any of the following becomes true: (i) users include external parties, (ii) the server is subject to a court order or subpoena, (iii) the server can be compromised by an attacker, (iv) the server's personnel are not contractually bound to the data owner. Each of these is the threat model addressed in Chapters 10–12.

ex-ch01-11

Medium

Plot (analytically) the ratio $R(N) = \mathbb{E}[T_{(N)}] / \mathbb{E}[T_i]$ as a function of $N$ for i.i.d. exponential task times. What does this tell you about the "price of synchrony"?

Show Hint

$\mathbb{E}[T_i] = 1/\lambda$ , $\mathbb{E}[T_{(N)}] = H_N/\lambda$ .

Solution

Compute ratio

$R(N) = H_N$ . For $N = 10$ : $R = 2.93$ . For $N = 100$ : $R = 5.19$ . For $N = 10000$ : $R = 9.79$ .

Takeaway

The price of synchrony is sub-linear ( $\Theta(\log N)$ ), but still substantial — a cluster of 10 000 workers pays almost $10\times$ its per-worker mean time on every synchronous barrier. This is one of the central empirical reasons coded computing exists.

ex-ch01-12

Hard

Show that for arbitrary i.i.d. task-time distributions (not necessarily exponential) with finite variance, the expected latency $\mathbb{E}[T_{(N)}]$ is monotone non-decreasing in $N$ . Find a counterexample to the claim " $\mathbb{E}[T_{(N)}] - \mathbb{E}[T_{(N-1)}]$ is decreasing in $N$ ".

Show Hint

Use a monotone coupling.

Consider heavy-tailed distributions.

Solution

Monotone coupling

Couple the $(N-1)$ -sample system to the $N$ -sample system by letting the first $N-1$ samples coincide. Then $T_{(N)}^{(N-\text{worker})} \geq T_{(N-1)}^{(N-1)-\text{worker}}$ pointwise, hence in expectation.

Counter-example for the increment claim

For a Pareto distribution with shape parameter $1 < \alpha \leq 2$ , the increments $\mathbb{E}[T_{(N)}] - \mathbb{E}[T_{(N-1)}]$ can be non-monotone — they may grow in $N$ before shrinking, reflecting the heavy-tailed nature of the distribution. The intuition is that adding a new worker has a non-trivial chance of producing a very large outlier that dominates the order statistic.

ex-ch01-13

Hard

A distributed-training protocol needs to tolerate $B$ Byzantine workers (who send adversarially crafted outputs). Argue informally that any such protocol must have communication complexity at least $\Omega(n d + B d)$ in the large- $d$ regime. (A precise proof requires tools from Chapter 11.)

Show Hint

The plaintext baseline costs $n d$ .

Each Byzantine worker's claim must be checkable, which requires additional $d$ scalars of redundancy.

Solution

Plaintext baseline

Section 1.3's aggregation-cost bound gives $n d$ scalars of uplink even without privacy or robustness.

Byzantine overhead

To detect (or at least bound) the corruption introduced by $B$ Byzantine workers, the protocol must receive enough redundant information to localize the corruption. Any linear-algebraic argument (e.g., Reed–Solomon error correction over gradient-sized vectors) requires at least $2B$ redundant scalars per coordinate — hence $\Omega(B d)$ .

Sum and interpret

Adding baseline and overhead gives $\Omega(n d + B d)$ . Chapter 11 makes this precise and characterizes the achievable rate region.

ex-ch01-14

Hard

Suppose you deploy a federated-learning service with $n = 10^6$ users, model $d = 10^9$ , 32-bit gradients, and a training budget of 100 GB of total per-round uplink (aggregate across users). What is the largest fraction of the model gradient that each user can upload? Propose a sparsification scheme consistent with that budget and discuss the convergence implications.

Show Hint

Per-user upload budget = total / $n$ .

Fraction = budget / (d b).

Solution

Per-user budget

Total $100$ GB = $8 \cdot 10^{11}$ bits. Divide by $n = 10^6$ users: $8 \cdot 10^5$ bits/user/round.

Fraction

Model $d b = 10^9 \cdot 32 = 3.2 \cdot 10^{10}$ bits. Fraction $\approx 2.5 \cdot 10^{-5}$ — about 25 000 out of $10^9$ coordinates.

Sparsification

Top- $K$ with $K = 25\,000$ keeps the largest-magnitude 0.0025% of gradient entries each round. Convergence remains provable under standard error-feedback techniques (Stich et al., 2018) with a slowdown factor of $\Theta(\sqrt{d/K}) \approx 200$ in round count. The combined wall-clock cost (uplink × rounds) is actually smaller than plaintext at these parameters — illustrating why compression is standard in production FL.

ex-ch01-15

Challenge

Consider a hybrid scheme: $N$ workers, recovery threshold $K$ , i.i.d. exponential task times rate $\lambda$ , plus a limited asynchronous buffer that can absorb one worker's response one round early (i.e., the first-to-finish worker of the current round pre-starts its next round's work immediately). Derive the expected iteration latency and show where the scheme sits relative to the synchronous $K$ -of- $N$ latency.

Show Hint

Write the iteration time as two nested order statistics.

Use independence across rounds from memorylessness.

Solution

Decompose

Let $L$ be the latency of a pure synchronous $K$ -of- $N$ round: $\mathbb{E}[L] = (H_N - H_{N-K})/\lambda$ . The hybrid scheme saves the time of the last finisher of the previous round that the buffer was able to absorb. By memorylessness this saving is again exponential with rate $\lambda$ and can be computed independently.

Expected savings

The absorbed round's last-response savings is $\mathbb{E}[T_{(N)} - T_{(K)}] = H_N/\lambda - (H_N - H_{N-K})/\lambda = H_{N-K}/\lambda$ over the pre-started round. Amortized over many rounds this gives a per-round saving of $H_{N-K}/(\lambda \cdot M)$ where $M$ is the absorbed-round stretch.

Position

For moderate $M$ the hybrid scheme sits between fully synchronous $K$ -of- $N$ (upper bound) and fully asynchronous (which has no staleness penalty only in expectation). It is roughly equivalent to using a slightly tighter recovery threshold — buying latency at the cost of coded-storage overhead.

Exercises

ex-ch01-01

Apply the formula

ex-ch01-02

Harmonic number

Expected latency

ex-ch01-03

Plug in

ex-ch01-04

Answer

ex-ch01-05

Endpoint checks

Convexity

ex-ch01-06

Order-statistic gaps

Asymptotic

ex-ch01-07

Per-user uplink

Users per second

ex-ch01-08

Gradient inversion (DLG)

Activity side-channels

Model memorization

ex-ch01-09

With recovery

Without recovery

Ratio

ex-ch01-10

Within a single organization

When the defense breaks

ex-ch01-11

Compute ratio

Takeaway

ex-ch01-12

Monotone coupling

Counter-example for the increment claim

ex-ch01-13

Plaintext baseline

Byzantine overhead

Sum and interpret

ex-ch01-14

Per-user budget

Fraction

Sparsification

ex-ch01-15

Decompose

Expected savings

Position