Exercises
ex-ch11-01
EasyDistinguish the Byzantine adversary from the honest-but-curious adversary of Chapter 10. Give one concrete example of each.
Honest-but-curious
Follows the protocol; passively analyzes messages to extract information. Example: Google's server in Gboard FL, running the protocol faithfully but inferring user typing patterns from aggregated gradients.
Byzantine
Actively deviates from the protocol β sends corrupted messages, refuses to participate, or colludes adaptively. Example: A compromised device in a cross-silo FL deployment uploading adversarial gradients to bias the trained model.
Defense implications
Different protocols needed. Bonawitz (Ch. 10) handles the first; ByzSecAgg (this chapter) adds protection against the second.
ex-ch11-02
EasyWhy does Bonawitz's secure aggregation provide no defense against a Byzantine user?
Attack
Byzantine user uploads with arbitrary. Masks cancel in the aggregate; the Byzantine contribution remains.
Result
Server's aggregate = honest sum + . Byzantine users can arbitrarily shift the aggregate. No mechanism in Bonawitz detects or filters this.
Fix
ByzSecAgg adds coded distance computation + outlier filtering + vector commitments to handle Byzantine users.
ex-ch11-03
EasyName the three primitives composed in ByzSecAgg and briefly describe each role.
1. Ramp secret sharing
Chapter 3 Β§3.4. Gives -fold smaller shares than Shamir + Byzantine error-correction structure. Sets privacy threshold and Byzantine tolerance .
2. Coded distance computation
Chapter 8 Β§8.3. Computes on ramp shares via Lagrange Coded Computing, enabling outlier detection without learning individual gradients.
3. Vector commitments
Cryptographic primitive. Each user commits to its gradient before sharing, preventing Byzantine users from modifying shares after the protocol has begun.
Together
Privacy (ramp) + detection (coded) + integrity (commitments) = ByzSecAgg.
ex-ch11-04
EasyFor a ByzSecAgg deployment with users, colluders, and Byzantine users, compute the ramp width and check feasibility.
; feasibility: .
Ramp width
.
Feasibility
. Since , the protocol is feasible.
Share size
Each user's upload is of the full gradient size β a savings over Shamir sharing.
ex-ch11-05
MediumExplain the role of the vector commitment in ByzSecAgg. What goes wrong without it?
Role
Each user commits to its gradient before sharing. The commitment is a hash/cryptographic digest that binds the user to a specific without revealing it.
Without commitment β attack
A Byzantine user could send two inconsistent shares: a "mild" share to the distance- computation phase (looks honest), then a "malicious" share to the reconstruction phase. The server would reconstruct a corrupted aggregate without the filter catching it.
With commitment
Shares must be openings of . Any deviation between distance and reconstruction shares is detectable via commitment verification. Byzantine users cannot change their gradient between phases.
ex-ch11-06
MediumDescribe how ByzSecAgg computes pairwise distances on ramp shares, referencing Chapter 8 Β§8.3's LCC framework.
Function degree
is quadratic in the gradients (). LCC recovery threshold: responses.
Each user's local computation
User holds ramp shares and . Locally computes . This is one evaluation of a polynomial of degree in the inputs, interpolating to at .
Server reconstruction
Server collects from (at least) users. Lagrange interpolation on these values gives at , which is the desired pairwise distance.
Privacy
The server sees only the scalar for each pair β no individual is revealed. Distance matrix is enough for Krum filtering but gives no additional information.
ex-ch11-07
MediumWalk through Krum filtering for users with Byzantine. Suppose pairwise distances (in arbitrary units) are: users 1β5 have mutual distances , user 6 has distances to all others. Compute Krum scores and identify the Byzantine.
Setup
. Each user's Krum score is the sum of 4 smallest distances to other users.
Krum scores
Users 1β5: 4 nearest are all honest β each distance . Score: .
User 6: 4 nearest are users 1β5 (all honest). Distances each. Score: .
Filtering
Sort scores: 5 honest at , 1 user (user 6) at . Largest : user 6. Correctly identified as Byzantine.
Aggregation
Surviving set . Server reconstructs .
ex-ch11-08
MediumCompute the per-round communication for ByzSecAgg at , , and compare with Bonawitz alone and a naive Bonawitz + replicated Krum.
ByzSecAgg
Ramp width . Per-user upload: scalars Γ 32 bits = bits = 10 MB. Aggregate: MB. Total GB per round.
Bonawitz alone
Per-user: bits = 400 MB. Aggregate: GB. No Byzantine defense.
Bonawitz + replicated Krum (naive)
Each gradient replicated times for Byzantine voting, multiplying the 400 MB by 21 to 8.4 GB per user. Aggregate: 1.7 TB. Loses privacy because Krum requires plaintext.
Conclusion
ByzSecAgg is smaller than naive composition AND preserves privacy. Compared to Bonawitz alone: ByzSecAgg costs less aggregate traffic because ramp shares are smaller per user. At , ByzSecAgg is a clear winner.
ex-ch11-09
MediumWhy does ByzSecAgg's feasibility constraint have a factor of 3? ()
Think about the Reed-Solomon decoding requirement.
Ramp threshold
The ramp scheme has (allows reconstruction with up to errors). The surviving honest-user count must exceed : , i.e., .
Solve
, or .
Interpretation
Each Byzantine user "costs" 3 units of the feasibility budget: one for their direct contribution, two for Reed-Solomon error correction. The factor of 3 is structural, not arbitrary.
Compare with non-private Byzantine
Non-private Byzantine schemes (Krum without privacy) tolerate up to β much higher. The combination of privacy and Byzantine tolerance is strictly harder, reflected in the 3-factor.
ex-ch11-10
MediumWhy does ByzSecAgg use ramp secret sharing instead of standard Shamir?
Share-size savings
Shamir -sharing has per-share size = gradient size. Ramp -sharing has per-share size = gradient size / where . For ByzSecAgg with , this is a -fold savings.
Byzantine error correction
The ramp gap matches the Reed- Solomon minimum-distance requirement for correcting errors. The width serves both purposes: share-size savings AND error- correction structure.
Privacy preservation
Ramp still guarantees perfect secrecy against colluders. The relaxation vs. Shamir (which has ) is that coalitions of size between and learn a partial aggregate β but this is already the desired output, so no extra privacy loss.
Summary
Ramp is a strictly better primitive for ByzSecAgg than Shamir. Chapter 3 Β§3.4 develops the ramp construction in detail.
ex-ch11-11
HardSketch the proof that ByzSecAgg preserves information-theoretic privacy against colluders, even in the presence of Byzantine users.
Ramp privacy
Ramp -sharing (Chapter 3 Β§3.4) guarantees that any shares of a gradient are uniform (perfect secrecy). Hence any colluders see uniform shares β no information about individual gradients.
Coded distance leakage
The server sees pairwise distances . These are symmetric functions β they reveal no information about which specific gradient is which. Formally: depends on the two gradients only through their squared Euclidean distance, not on their individual values.
Byzantine gradients are filtered
After Krum filtering, the honest set contributes to the aggregate. Byzantine gradients are filtered out. The server learns and (for each honest ) nothing beyond what's implied by .
Combining
Colluders learn: their own gradients (trivial), ramp shares of others (uniform over ramp structure), and pairwise distances (symmetric functions). By composition, their total information about non-colluding honest gradients is bounded by what's implied by β the target privacy guarantee.
ex-ch11-12
HardDerive the communication complexity of ByzSecAgg from first principles. Specifically, what is the per-user uplink and the aggregate per-round cost?
Per-user upload
- Ramp share of gradient: scalars.
- Vector commitment: bits (Merkle tree root hash).
- Distance computation contribution per pair: 1 scalar ( for one pair). For all pairs, this is scalars total across the protocol, or per user.
Aggregate
users Γ = . For : = . For : .
Simplified bound
Dominant terms: for the regime where is proportional to . Specifically, per-round bits β the theorem statement.
Compared to lower bound
Information-theoretic lower bound: . ByzSecAgg's matches up to a factor.
ex-ch11-13
HardDescribe an adaptive Byzantine attack that defeats Krum filtering in ByzSecAgg, and suggest a countermeasure.
Adaptive attack
Byzantine users coordinate to send gradients that are close to the honest cluster's centroid but shifted in a coordinated direction. Their pairwise distances to each other are small; distances to honest users are comparable to honest-honest distances. Krum scores are similar to honest users'.
Why it defeats Krum
Krum's decision rests on Byzantine users being outliers. If they cluster near honest gradients, they are not outliers β Krum picks one of them (or mixes their contribution into the aggregate) without filtering.
Countermeasure: Bulyan
Bulyan (El Mhamdi et al. 2018) applies trimmed-mean post-processing. Even if Krum picks a Byzantine user, the trimmed-mean step removes extremes per-coordinate, limiting the Byzantine influence. Within ByzSecAgg, Bulyan can replace Krum as the filtering step without breaking the framework.
Other countermeasures
- Segmented filtering: FedSeg (Sun et al. 2021) partitions the gradient into segments and filters per-segment.
- Reputation tracking: Byzantine users flagged across rounds via reputation accumulation.
- DP noise: Add post-aggregate DP noise to bound any surviving Byzantine contribution.
None is a universal fix β adaptive Byzantine robustness remains an open problem.
ex-ch11-14
HardDiscuss the composition of ByzSecAgg with differential privacy. Is the composition sound? What is the resulting privacy / utility tradeoff?
Composition
Each user adds Gaussian noise to its gradient before ramp-sharing. Noise flows through ByzSecAgg's pipeline and appears in the final aggregate.
Guarantees
- Information-theoretic (from ByzSecAgg): server learns only , nothing about individual gradients.
- Differential (from DP noise): noisy aggregate satisfies -DP with respect to any single honest user's contribution.
- Byzantine resilience: still Byzantine tolerance, though DP noise may degrade the Krum filter's effectiveness.
Utility tradeoff
DP noise requires larger for strong guarantees (small ). Larger noise slows SGD convergence. The tradeoff: stronger DP β slower convergence β more rounds β more total communication. Balance depends on privacy requirements.
Status
Sound and useful in practice. Chapter 18 lists full characterization of the three-way tradeoff (IT privacy Γ DP privacy Γ Byzantine Γ convergence) as an open problem.
ex-ch11-15
ChallengeOpen problem. ByzSecAgg achieves communication β matching the information- theoretic lower bound up to a factor. Can the factor be removed? What would be required?
Source of the $\log n$
The comes from the Lagrange interpolation used to reconstruct pairwise distances from coded contributions. Each pairwise reconstruction involves Lagrange coefficients, each costing bits of precision in the ramp-share coding.
Potential paths
- Specialized interpolation: for the quadratic distance function specifically, a reconstruction that doesn't use general Lagrange may avoid the factor.
- Batched reconstruction: amortizing across many pairs could reduce per-pair cost.
- Different aggregator: replacing Krum with a filter that doesn't need pairwise distances (e.g., per-coordinate trimmed mean) may avoid the .
Status
Open. The factor is not believed to be essential but no known construction removes it. Interested researchers can explore in the direction of specialized LCC-for-quadratic-functions or aggregator-agnostic reductions. Part of the open problems in Chapter 18.