Ferkans — Interactive Telecom Tutor

ex-ch11-01

Easy

Distinguish the Byzantine adversary from the honest-but-curious adversary of Chapter 10. Give one concrete example of each.

Solution

Honest-but-curious

Follows the protocol; passively analyzes messages to extract information. Example: Google's server in Gboard FL, running the protocol faithfully but inferring user typing patterns from aggregated gradients.

Byzantine

Actively deviates from the protocol — sends corrupted messages, refuses to participate, or colludes adaptively. Example: A compromised device in a cross-silo FL deployment uploading adversarial gradients to bias the trained model.

Defense implications

Different protocols needed. Bonawitz (Ch. 10) handles the first; ByzSecAgg (this chapter) adds protection against the second.

ex-ch11-02

Easy

Why does Bonawitz's secure aggregation provide no defense against a Byzantine user?

Solution

Attack

Byzantine user $k$ uploads $\tilde{\mathbf{g}}_k = \mathbf{g}_k^{\text{adv}} + \text{masks}$ with $\mathbf{g}_k^{\text{adv}}$ arbitrary. Masks cancel in the aggregate; the Byzantine contribution remains.

Result

Server's aggregate = honest sum + $\sum_{k \in \mathcal{B}} \mathbf{g}_k^{\text{adv}}$ . Byzantine users can arbitrarily shift the aggregate. No mechanism in Bonawitz detects or filters this.

Fix

ByzSecAgg adds coded distance computation + outlier filtering + vector commitments to handle Byzantine users.

ex-ch11-03

Easy

Name the three primitives composed in ByzSecAgg and briefly describe each role.

Solution

1. Ramp secret sharing

Chapter 3 §3.4. Gives $g$ -fold smaller shares than Shamir + Byzantine error-correction structure. Sets privacy threshold $T$ and Byzantine tolerance $B$ .

2. Coded distance computation

Chapter 8 §8.3. Computes $\|\mathbf{g}_i - \mathbf{g}_j\|^2$ on ramp shares via Lagrange Coded Computing, enabling outlier detection without learning individual gradients.

3. Vector commitments

Cryptographic primitive. Each user commits to its gradient before sharing, preventing Byzantine users from modifying shares after the protocol has begun.

Together

Privacy (ramp) + detection (coded) + integrity (commitments) = ByzSecAgg.

ex-ch11-04

Easy

For a ByzSecAgg deployment with $n = 100$ users, $T = 20$ colluders, and $B = 15$ Byzantine users, compute the ramp width $g$ and check feasibility.

Show Hint

$g = 2B + 1$ ; feasibility: $B \leq (n - T - 1)/3$ .

Solution

Ramp width

$g = 2B + 1 = 2 \cdot 15 + 1 = 31$ .

Feasibility

$B \leq (n - T - 1)/3 = (100 - 20 - 1)/3 = 79/3 \approx 26.3$ . Since $B = 15 < 26.3$ , the protocol is feasible.

Share size

Each user's upload is $d / g = d / 31$ of the full gradient size — a $31\times$ savings over Shamir sharing.

ex-ch11-05

Medium

Explain the role of the vector commitment in ByzSecAgg. What goes wrong without it?

Solution

Role

Each user commits to its gradient $\mathbf{g}_k$ before sharing. The commitment $\mathcal{C}_k$ is a hash/cryptographic digest that binds the user to a specific $\mathbf{g}_k$ without revealing it.

Without commitment — attack

A Byzantine user could send two inconsistent shares: a "mild" share to the distance- computation phase (looks honest), then a "malicious" share to the reconstruction phase. The server would reconstruct a corrupted aggregate without the filter catching it.

With commitment

Shares must be openings of $\mathcal{C}_k$ . Any deviation between distance and reconstruction shares is detectable via commitment verification. Byzantine users cannot change their gradient between phases.

ex-ch11-06

Medium

Describe how ByzSecAgg computes pairwise distances $d_{ij} = \|\mathbf{g}_i - \mathbf{g}_j\|^2$ on ramp shares, referencing Chapter 8 §8.3's LCC framework.

Solution

Function degree

$d_{ij}$ is quadratic in the gradients ( $d_f = 2$ ). LCC recovery threshold: $K_{\text{rec}} = d_f(n-1) + 1 = 2(n-1) + 1 = 2n - 1$ responses.

Each user's local computation

User $\ell$ holds ramp shares $s_{i, \ell}$ and $s_{j, \ell}$ . Locally computes $\hat d^{(\ell)} = \|s_{i, \ell} - s_{j, \ell}\|^2$ . This is one evaluation of a polynomial of degree $2$ in the inputs, interpolating to $d_{ij}$ at $x = 0$ .

Server reconstruction

Server collects $\hat d^{(\ell)}$ from (at least) $2n - 1$ users. Lagrange interpolation on these values gives $d_{ij}$ at $x = 0$ , which is the desired pairwise distance.

Privacy

The server sees only the scalar $d_{ij}$ for each pair — no individual $\mathbf{g}_k$ is revealed. Distance matrix is enough for Krum filtering but gives no additional information.

ex-ch11-07

Medium

Walk through Krum filtering for $n = 6$ users with $B = 1$ Byzantine. Suppose pairwise distances (in arbitrary units) are: users 1–5 have mutual distances $\sim 2$ , user 6 has distances $\sim 100$ to all others. Compute Krum scores and identify the Byzantine.

Solution

Setup

$n = 6, B = 1, n - B - 1 = 4$ . Each user's Krum score is the sum of 4 smallest distances to other users.

Krum scores

Users 1–5: 4 nearest are all honest → each distance $\sim 2$ . Score: $\sim 4 \cdot 2 = 8$ .

User 6: 4 nearest are users 1–5 (all honest). Distances $\sim 100$ each. Score: $\sim 4 \cdot 100 = 400$ .

Filtering

Sort scores: 5 honest at $\sim 8$ , 1 user (user 6) at $\sim 400$ . Largest $B = 1$ : user 6. Correctly identified as Byzantine.

Aggregation

Surviving set $\mathcal{H} = \{1, ..., 5\}$ . Server reconstructs $\mathbf{G}^* = \sum_{k \in \mathcal{H}} \mathbf{g}_k$ .

ex-ch11-08

Medium

Compute the per-round communication for ByzSecAgg at $n = 200$ , $B = 20$ , $d = 10^8$ and compare with Bonawitz alone and a naive Bonawitz + replicated Krum.

Solution

ByzSecAgg

Ramp width $g = 2B+1 = 41$ . Per-user upload: $d/g = 2.4 \cdot 10^6$ scalars × 32 bits = $7.8 \cdot 10^7$ bits = 10 MB. Aggregate: $n \cdot 10 + Bd/8 \approx 2000 + 2.5 \cdot 10^7$ MB. Total $\sim 2.5$ GB per round.

Bonawitz alone

Per-user: $d \cdot 32 = 3.2 \cdot 10^9$ bits = 400 MB. Aggregate: $n \cdot 400 = 80$ GB. No Byzantine defense.

Bonawitz + replicated Krum (naive)

Each gradient replicated $B + 1 = 21$ times for Byzantine voting, multiplying the 400 MB by 21 to 8.4 GB per user. Aggregate: 1.7 TB. Loses privacy because Krum requires plaintext.

Conclusion

ByzSecAgg is $\sim 680\times$ smaller than naive composition AND preserves privacy. Compared to Bonawitz alone: ByzSecAgg costs $\sim 30\times$ less aggregate traffic because ramp shares are $41\times$ smaller per user. At $n = 200$ , ByzSecAgg is a clear winner.

ex-ch11-09

Medium

Why does ByzSecAgg's feasibility constraint have a factor of 3? ( $B \leq (n - T - 1)/3$ )

Show Hint

Think about the Reed-Solomon decoding requirement.

Solution

Ramp threshold

The ramp scheme has $t_2 = T + 2B + 1$ (allows reconstruction with up to $B$ errors). The surviving honest-user count must exceed $t_2$ : $n - B \geq t_2$ , i.e., $n - B \geq T + 2B + 1$ .

Solve

$n \geq T + 3B + 1$ , or $B \leq (n - T - 1)/3$ .

Interpretation

Each Byzantine user "costs" 3 units of the feasibility budget: one for their direct contribution, two for Reed-Solomon error correction. The factor of 3 is structural, not arbitrary.

Compare with non-private Byzantine

Non-private Byzantine schemes (Krum without privacy) tolerate up to $B < n/2$ — much higher. The combination of privacy and Byzantine tolerance is strictly harder, reflected in the 3-factor.

ex-ch11-10

Medium

Why does ByzSecAgg use ramp secret sharing instead of standard Shamir?

Solution

Share-size savings

Shamir $(t, n)$ -sharing has per-share size = gradient size. Ramp $(t_1, t_2, n)$ -sharing has per-share size = gradient size / $g$ where $g = t_2 - t_1$ . For ByzSecAgg with $g = 2B+1$ , this is a $\sim 2B$ -fold savings.

Byzantine error correction

The ramp gap $g = 2B + 1$ matches the Reed- Solomon minimum-distance requirement for correcting $B$ errors. The width serves both purposes: share-size savings AND error- correction structure.

Privacy preservation

Ramp still guarantees perfect secrecy against $T = t_1$ colluders. The relaxation vs. Shamir (which has $g = 1$ ) is that coalitions of size between $t_1$ and $t_2$ learn a partial aggregate — but this is already the desired output, so no extra privacy loss.

Summary

Ramp is a strictly better primitive for ByzSecAgg than Shamir. Chapter 3 §3.4 develops the ramp construction in detail.

ex-ch11-11

Hard

Sketch the proof that ByzSecAgg preserves information-theoretic privacy against $T$ colluders, even in the presence of $B$ Byzantine users.

Solution

Ramp privacy

Ramp $(T, T + 2B + 1, n-1)$ -sharing (Chapter 3 §3.4) guarantees that any $T$ shares of a gradient are uniform (perfect secrecy). Hence any $T$ colluders see uniform shares — no information about individual gradients.

Coded distance leakage

The server sees pairwise distances $d_{ij}$ . These are symmetric functions — they reveal no information about which specific gradient is which. Formally: $d_{ij}$ depends on the two gradients only through their squared Euclidean distance, not on their individual values.

Byzantine gradients are filtered

After Krum filtering, the honest set $\mathcal{H}$ contributes to the aggregate. Byzantine gradients are filtered out. The server learns $\mathbf{G}^* = \sum_{k \in \mathcal{H}} \mathbf{g}_k$ and (for each honest $k$ ) nothing beyond what's implied by $\mathbf{G}^*$ .

Combining

Colluders learn: their own gradients (trivial), ramp shares of others (uniform over ramp structure), and pairwise distances (symmetric functions). By composition, their total information about non-colluding honest gradients is bounded by what's implied by $\mathbf{G}^*$ — the target privacy guarantee. $\blacksquare$

ex-ch11-12

Hard

Derive the communication complexity of ByzSecAgg from first principles. Specifically, what is the per-user uplink and the aggregate per-round cost?

Solution

Per-user upload

Ramp share of gradient: $d / g = d / (2B+1)$ scalars.
Vector commitment: $O(\log d)$ bits (Merkle tree root hash).
Distance computation contribution per pair: 1 scalar ( $\hat d^{(k)}$ for one pair). For all $\binom{n}{2}$ pairs, this is $O(n^2)$ scalars total across the protocol, or $O(n)$ per user.

Aggregate

$n$ users × $(d/(2B+1) + \log d + n)$ = $O(nd/(2B+1) + n \log d + n^2)$ . For $B = O(n)$ : $O(d + n \log d + n^2)$ = $O(n^2 + n \log d)$ . For $B = o(n)$ : $O(nd/B + n^2)$ .

Simplified bound

Dominant terms: $O(n \log n + Bd)$ for the regime where $B$ is proportional to $n$ . Specifically, per-round bits $\sim n \log n + Bd$ — the theorem statement.

Compared to lower bound

Information-theoretic lower bound: $\Omega(nd + Bd)$ . ByzSecAgg's $O(n\log n + Bd)$ matches up to a $\log n$ factor.

ex-ch11-13

Hard

Describe an adaptive Byzantine attack that defeats Krum filtering in ByzSecAgg, and suggest a countermeasure.

Solution

Adaptive attack

Byzantine users coordinate to send gradients that are close to the honest cluster's centroid but shifted in a coordinated direction. Their pairwise distances to each other are small; distances to honest users are comparable to honest-honest distances. Krum scores are similar to honest users'.

Why it defeats Krum

Krum's decision rests on Byzantine users being outliers. If they cluster near honest gradients, they are not outliers — Krum picks one of them (or mixes their contribution into the aggregate) without filtering.

Countermeasure: Bulyan

Bulyan (El Mhamdi et al. 2018) applies trimmed-mean post-processing. Even if Krum picks a Byzantine user, the trimmed-mean step removes extremes per-coordinate, limiting the Byzantine influence. Within ByzSecAgg, Bulyan can replace Krum as the filtering step without breaking the framework.

Other countermeasures

Segmented filtering: FedSeg (Sun et al. 2021) partitions the gradient into segments and filters per-segment.
Reputation tracking: Byzantine users flagged across rounds via reputation accumulation.
DP noise: Add post-aggregate DP noise to bound any surviving Byzantine contribution.

None is a universal fix — adaptive Byzantine robustness remains an open problem.

ex-ch11-14

Hard

Discuss the composition of ByzSecAgg with differential privacy. Is the composition sound? What is the resulting privacy / utility tradeoff?

Solution

Composition

Each user adds Gaussian noise $\mathcal{N}(0, \sigma^2)$ to its gradient before ramp-sharing. Noise flows through ByzSecAgg's pipeline and appears in the final aggregate.

Guarantees

Information-theoretic (from ByzSecAgg): server learns only $\mathbf{G}^* + \text{noise}$ , nothing about individual gradients.
Differential (from DP noise): noisy aggregate satisfies $\epsilon$ -DP with respect to any single honest user's contribution.
Byzantine resilience: still $B$ Byzantine tolerance, though DP noise may degrade the Krum filter's effectiveness.

Utility tradeoff

DP noise requires larger $\sigma$ for strong guarantees (small $\epsilon$ ). Larger noise slows SGD convergence. The tradeoff: stronger DP → slower convergence → more rounds → more total communication. Balance depends on privacy requirements.

Status

Sound and useful in practice. Chapter 18 lists full characterization of the three-way tradeoff (IT privacy × DP privacy × Byzantine × convergence) as an open problem.

ex-ch11-15

Challenge

Open problem. ByzSecAgg achieves $O(n \log n + Bd)$ communication — matching the information- theoretic lower bound up to a $\log n$ factor. Can the $\log n$ factor be removed? What would be required?

Solution

Source of the $\log n$

The $\log n$ comes from the Lagrange interpolation used to reconstruct pairwise distances from coded contributions. Each pairwise reconstruction involves Lagrange coefficients, each costing $O(\log n)$ bits of precision in the ramp-share coding.

Potential paths

Specialized interpolation: for the quadratic distance function specifically, a reconstruction that doesn't use general Lagrange may avoid the $\log n$ factor.
Batched reconstruction: amortizing across many pairs could reduce per-pair cost.
Different aggregator: replacing Krum with a filter that doesn't need pairwise distances (e.g., per-coordinate trimmed mean) may avoid the $\log n$ .

Status

Open. The $\log n$ factor is not believed to be essential but no known construction removes it. Interested researchers can explore in the direction of specialized LCC-for-quadratic-functions or aggregator-agnostic reductions. Part of the open problems in Chapter 18.

Exercises

ex-ch11-01

Honest-but-curious

Byzantine

Defense implications

ex-ch11-02

Attack

Result

Fix

ex-ch11-03

1. Ramp secret sharing

2. Coded distance computation

3. Vector commitments

Together

ex-ch11-04

Ramp width

Feasibility

Share size

ex-ch11-05

Role

Without commitment — attack

With commitment

ex-ch11-06

Function degree

Each user's local computation

Server reconstruction

Privacy

ex-ch11-07

Setup

Krum scores

Filtering

Aggregation

ex-ch11-08

ByzSecAgg

Bonawitz alone

Bonawitz + replicated Krum (naive)

Conclusion

ex-ch11-09

Ramp threshold

Solve

Interpretation

Compare with non-private Byzantine

ex-ch11-10

Share-size savings

Byzantine error correction

Privacy preservation

Summary

ex-ch11-11

Ramp privacy

Coded distance leakage

Byzantine gradients are filtered

Combining

ex-ch11-12

Per-user upload

Aggregate

Simplified bound

Compared to lower bound

ex-ch11-13

Adaptive attack

Why it defeats Krum

Countermeasure: Bulyan

Other countermeasures

ex-ch11-14

Composition

Guarantees

Utility tradeoff

Status

ex-ch11-15

Source of the $\log n$

Potential paths

Status