Handling User Dropouts
Dropouts Break the Naive Protocol
The Bonawitz protocol of Β§10.2 assumes all users successfully complete the round. In production FL, dropouts are frequent: mobile devices go offline, networks time out, users close apps. A β dropout rate per round is typical.
The naive protocol breaks catastrophically under dropouts. If user drops out, the server never receives . Summing only the surviving uploads gives . The mask contributions between surviving and dropped users no longer cancel β the server gets a noisy aggregate at best, and privacy may also break.
The point of Section 10.3 is the fix: Shamir-shared seeds with per-user threshold reconstruction. Each user's seeds are Shamir-shared among the other users before the round; if a user drops, any surviving users can reconstruct its seeds and subtract the leftover masks. The threshold is chosen to tolerate the maximum number of colluders plus the expected dropout rate.
This fix preserves both correctness (aggregation succeeds under dropouts) and privacy (colluders cannot recover secrets of honest users). It is the machinery that makes Bonawitz's protocol production-viable.
Definition: Dropout-Resilient Secure Aggregation
Dropout-Resilient Secure Aggregation
A dropout-resilient secure-aggregation protocol on users, privacy threshold , and tolerable dropout rate satisfies:
-
Correctness under dropouts. For any set of dropouts with , and the surviving set , the server correctly computes .
-
Privacy. The adversary (server + up to colluders) learns only and nothing else about individual for .
The design challenge is to preserve privacy even when the masks of dropped users must be reconstructed. Naive reconstruction (asking surviving users to reveal their seeds with dropped ones) would expose the masks to the server; the Shamir secret sharing of seeds fixes this by requiring a threshold of surviving users to cooperate.
The privacy threshold and dropout rate are related: the Shamir threshold for seed sharing must satisfy (ensuring colluders cannot reconstruct a seed alone) and (ensuring enough honest survivors to reconstruct dropped-user seeds). This gives the feasibility constraint , or equivalently .
Dropout Resilience
A property of a secure-aggregation protocol that allows some users to drop out mid-round without breaking correctness or privacy. Achieved via Shamir-shared seeds with per-user threshold reconstruction.
Bonawitz Protocol with Dropout Handling (Full)
Complexity: Per-round: DH exchanges, share distribution, gradient upload per user. Server side: reconstruction work.The self-mask is a subtle but important addition: if a user only has pairwise masks and never drops, the reconstruction of its seeds (triggered by the server to cancel masks of dropped users) would reveal its mask contribution. The self-mask adds an extra layer so that this reconstruction still preserves the user's individual gradient's privacy.
Theorem: Bonawitz with Dropouts: Correctness + Privacy Under Collusion
Let be the Shamir threshold, the privacy threshold (colluders), and the tolerable dropout fraction. The Bonawitz protocol with dropout handling achieves:
-
Correctness. For dropouts of size (enough surviving users to reconstruct seeds), the server computes exactly.
-
Privacy. Any coalition of size users + server cannot learn individual gradients beyond , provided .
The feasibility constraint is , i.e., , equivalently . Production settings: , , .
The Shamir threshold sits between the privacy threshold (colluders cannot reconstruct) and the dropout tolerance (surviving users can reconstruct). Each seed is a secret in a -Shamir scheme. The server, with access to shares from surviving users, reconstructs dropped users' seeds; a coalition of users cannot.
The point is that the same algebraic primitive (Shamir) from Chapter 3 provides both robustness (dropout handling) and privacy (threshold). This is the standard modular-design pattern of modern MPC: build complex protocols from simple information- theoretic primitives.
Correctness
After dropouts, the server has: (i) masked uploads from surviving users, (ii) shares of each dropped user's pairwise seeds (from surviving users). With shares, the server reconstructs each dropped user's seeds via Lagrange interpolation (Chapter 3 Β§3.2), derives the pairwise masks, and subtracts them from the surviving sum.
Self-masks
For each surviving user , the self-mask is present in the sum. The server reconstructs from surviving-user shares (using threshold ) and subtracts from the sum.
Privacy against $T < t$ colluders
A coalition of users sees: (i) user gradients , (ii) shares of each seed. By Shamir's perfect- secrecy property, shares reveal nothing about any seed (which has -threshold privacy). Hence the pairwise masks cannot be reconstructed by the coalition alone, and the masked uploads of honest users are uniform modulo the aggregate.
Composition via hybrid argument
The full protocol composes these guarantees via a standard simulation-based proof (Bonawitz et al. 2017 Β§4.2). The reader is referred to the paper for the cryptographic details. The key observation is that dropouts and privacy are handled by the same Shamir-share threshold, with .
Example: Feasibility Regions for Dropout + Privacy
For users, compute the feasibility region over for the Bonawitz dropout-resilient protocol. What are the feasible pairs?
Constraint
, or equivalently , or .
Example points
- : (tolerate up to dropouts, no collusion).
- : .
- : .
- : .
- : (no dropouts, max collusion).
Typical production setting
(tolerate up to 20 colluders), (tolerate 20 dropouts): feasibility check: . Feasible. Shamir threshold can be chosen in β typically .
Tight regime
At high and high simultaneously (e.g., ), the constraint becomes β feasible but close to the boundary. Margins should be designed conservatively.
Dropout-Resilience vs. Privacy-Collusion Tradeoff
Plot the feasibility region for the Bonawitz protocol in the plane, for various . The feasibility boundary is ; above this boundary, no combination of Shamir threshold and pairwise masks achieves both privacy and dropout resilience. Higher shifts the boundary outward β larger user populations tolerate more collusion and dropouts simultaneously.
Parameters
Total users
Common Mistake: Shamir Threshold Must Exceed Collusion Threshold
Mistake:
Choose the Shamir threshold (= privacy threshold).
Correction:
Setting allows a coalition of exactly colluders to reconstruct the Shamir-shared seeds, then compute the pairwise masks and extract individual gradients. The correct choice is (strictly more than the collusion threshold), so any colluders have shares and cannot reconstruct.
Production deployments use or similar large fractions, giving large safety margins over typical values. The cost of larger is that more surviving users are needed for reconstruction β but Shamir's per-share cost is small (Β§3.2), so this is affordable.
Dropout Handling in Production
Production Bonawitz deployments:
- Typical dropout rate: β (15β30% of users drop per round on mobile FL).
- Typical privacy threshold: to .
- Shamir threshold: (comfortable margin over , enough for typical dropouts).
- Reconstruction latency: second per reconstruction; reconstructions per round at , so seconds added to each round.
The Shamir-reconstruction step dominates server-side CPU time in production. CCESA (Chapter 12) reduces the number of seeds to reconstruct (sparser graph), but within the Bonawitz framework, the overhead is intrinsic.
- β’
Dropout rate: typical
- β’
Shamir threshold: standard
- β’
Reconstruction: server CPU per round
Historical Note: Shamir Secret Sharing in Modern Cryptography
1979β2017 (Shamir's generalization); 2017 (Bonawitz application)Shamir's secret sharing, introduced in 1979 (Chapter 3), found a natural home in secure aggregation nearly four decades later. The connection is apt: both problems require splitting a secret (in Β§10.3, the mask seed) among parties such that a threshold of them can reconstruct while fewer cannot. Bonawitz et al. (2017) brought Shamir to federated learning via the seed-sharing mechanism described in this section.
The pattern β secret-share a cryptographic primitive, then reconstruct on demand β is now standard in threshold cryptography (Chapter 3 Β§3.5), threshold signatures (Byzantine agreement), and verifiable secret sharing (Chapter 11's ByzSecAgg). The abstraction is clean, and the Shamir implementation is robust. Modern MPC libraries (SPDZ, MP-SPDZ, FRESCO) expose Shamir- shared computation as a primitive, with Bonawitz-style secure aggregation as one of the simplest applications.
Key Takeaway
Shamir secret sharing of the mask seeds bridges dropouts and privacy. The threshold is chosen so that colluders (fewer than ) cannot reconstruct but the server, with shares from surviving users, can reconstruct dropped-user seeds and finalize the aggregate. This is the production-standard protocol; Β§10.4 proves its communication cost is information-theoretically optimal.
Quick Check
For a Bonawitz-protocol deployment with users, privacy threshold colluders, and dropout rate , what is an appropriate Shamir threshold ?
(equal to )
()
()
(all but one user)
Standard production choice. Satisfies (privacy holds against 20 colluders) and (enough survivors to reconstruct). Comfortable margin.