Joint Privacy, Robustness, and Differential Privacy

The Three-Way Pareto Frontier

The book's five CommIT contributions each advance the frontier on a specific axis:

  • Coded shuffling (Ch 7): communication efficiency for data shuffling.
  • Uncoded groupwise keys (Ch 10): SecAgg privacy-fault tolerance.
  • ByzSecAgg (Ch 11): joint privacy + Byzantine robustness.
  • CCESA (Ch 12): sparse-graph communication-privacy trade-off.
  • IT-secure FRL (Ch 17): privacy of learned representations.

These address two axes at a time (privacy + robustness; privacy + efficiency). The three-way joint problem — privacy + robustness + differential privacy simultaneously — remains largely open.

In a three-axis design space:

  • Information-theoretic privacy against honest-but-curious server.
  • Byzantine robustness against adversarial workers.
  • Differential privacy against correlation attacks across rounds.

Each pair has known achievability schemes. The three-way combination has only partial characterization. This section maps what is known and what is open.

,

Known 2-Axis Privacy-Robustness-DP Results

Axis pairKnown schemePrivacy typeRate cost vs. no constraints
Privacy + robustnessByzSecAgg (Jahani-Nezhad-Maddah-Ali-Caire 2022, Ch. 11)IT + Byzantine2×\sim 2\times communication
Privacy + DPBonawitz + Gaussian dither (Ch. 10 + DP)IT + (ε,δ)(\varepsilon, \delta)-DPn\sqrt{n}-amplification via AirComp
Robustness + DPChen-Su-Xu 2017 (robust aggregators + DP)Byzantine + (ε,δ)(\varepsilon, \delta)-DPLoss in statistical efficiency
Privacy + Robustness + DPOpen — no fully characterized schemeAll threeUpper bound known; converse unknown

Why Three-Way Is Harder Than Three Two-Ways

The natural approach — compose the three schemes in series (privacy layer + Byzantine filter + DP noise) — typically gives suboptimal results. The reasons:

  1. Masks interfere with Byzantine detection. Pairwise masking (Ch 10) produces per-user randomized contributions. A Byzantine worker can mask a malicious gradient as a valid random mask, evading detection by statistical filters (Krum, trimmed mean).

  2. DP noise masks malicious gradients. Larger DP noise obscures individual deviations, making Byzantine detection less reliable. DP and robustness are fundamentally at tension.

  3. Masks reduce DP amplification. Pairwise masks are deterministic between pairs; DP amplification for AirComp (Theorem 16.4.2) requires independent per-user randomness. The composition breaks some of the amplification.

The ByzSecAgg framework (Ch 11) handles privacy + robustness via integrated coded computing and vector commitments — it does not straightforwardly extend to DP. Joint protocols need new primitives.

Theorem: A Three-Axis Lower Bound

Consider an FL protocol over nn users that simultaneously guarantees:

  • Information-theoretic privacy against any subset of TT colluding users + server.
  • Tolerance to BB Byzantine workers.
  • (ε,δ)(\varepsilon, \delta)-differential privacy against any user.

Then the per-round communication cost satisfies bits per round    n(T+B+log(1/δ)ε2)d,\text{bits per round} \;\geq\; n \cdot \left(T + B + \frac{\log(1/\delta)}{\varepsilon^2}\right) \cdot d, where dd is the gradient dimension.

Interpretation. Each axis contributes an additive communication cost. The terms do not combine, so joint protocols cannot be communication-efficient: privacy parameter TT, robustness BB, and DP budget all enforce their own "overhead."

The Achievability Gap

The best known achievable joint protocols use composition in series of the three individual schemes — giving communication cost n(T+B+log(1/δ)/ε2)d\sim n \cdot (T + B + \log(1/\delta)/\varepsilon^2) \cdot d, matching the lower bound up to constants. The constant factor is 2\sim 2-33 in practice — a 22-3×3\times bandwidth cost for the joint guarantee.

Joint schemes that share structure across axes (e.g., using the same noise for both DP and privacy amplification) could in principle reduce the constant factor. No such scheme is currently known.

Three-Axis Privacy-Robustness-DP Frontier

Visualize the known lower bound on communication cost across the three axes (privacy TT, robustness BB, DP (ε,δ)(\varepsilon, \delta)). Sweep each axis independently and observe how the communication cost accumulates. The plot reveals the additive structure of the joint cost.

Parameters
50
5
1

Active Research Directions

Current research on the three-axis frontier:

  1. Coded-sharing integration with DP: integrate Shamir sharing (Ch 3) with DP-mechanism outputs to reduce the combined overhead. Preliminary results (e.g., Dutta-Caire 2021) show a n\sqrt{n}-factor improvement in some regimes.

  2. Verifiable coded computing: the ByzSecAgg framework (Ch 11) uses vector commitments for integrity verification. Extending to include DP is ongoing — the commitment scheme must be privacy-preserving under DP randomness.

  3. Decentralized joint protocols: (see §18.3) serverless architectures may enable joint protocols that are harder to compose centrally but easier to compose decentralized via gossip-and-verify.

  4. Quantum primitives: quantum PIR and quantum SecAgg (see §18.4) may offer different composition properties.

Closing the three-axis gap is one of the most active directions in the CommIT research program and in the broader FL security community.

⚠️Engineering Note

Deploying on the Three-Axis Frontier

Production FL with combined guarantees:

  • Identify the binding axis. Usually one axis dominates: medical FL is DP- binding; cross-cloud is privacy-binding; edge/IoT is robustness-binding. Optimize the binding axis and accept suboptimality on the others.

  • Serial composition is almost always acceptable. The 22-3×3\times overhead from naive composition is usually acceptable for modern bandwidth. Shared-mechanism design is advanced research.

  • Monitor each axis separately. Privacy budget, Byzantine detection rate, DP accounting — each should have independent observability.

  • Plan for axis upgrades. New scheme developments (e.g., improved ByzSecAgg, decentralized FL) may shift the binding axis. Modular design allows upgrades.

  • Don't over-design. If the application doesn't need all three axes, omit one. The cost of adding axes is real.

Practical Constraints
  • Identify binding axis first

  • Serial composition: 23×2-3\times overhead, usually acceptable

  • Monitor each axis independently

  • Plan for modular upgrades

  • Don't over-engineer — omit unneeded axes

📋 Ref: Kairouz et al. 2021; Chen-Avestimehr 2023

Common Mistake: Assuming Composition Is Transparent

Mistake:

Assume that composing a privacy scheme, a robustness scheme, and a DP scheme in series preserves all three guarantees.

Correction:

Composition often breaks guarantees. For example, pairwise masking (Ch 10) + Krum (Ch 11) + Gaussian DP: the Krum filter can preserve some of the mask structure, leaking information to the server. The DP noise can mask the masks, weakening privacy. The interactions are subtle.

Rule: when composing privacy primitives, carefully analyze the threat model of the composition. It is usually not the logical AND of the individual threat models — it can be weaker in ways that invalidate the guarantees.

Reference frameworks (TPTPS, Dutta et al. 2022) provide composition-safe primitives with rigorous analysis. Use these, or analyze composition carefully yourself before claiming joint guarantees.

Key Takeaway

The three-axis frontier is open. The joint privacy-robustness-DP problem has known lower bound Θ(n(T+B+log(1/δ)/ε2)d)\Theta(n(T + B + \log(1/\delta)/\varepsilon^2) d) communication per round. Achievable protocols via serial composition match up to constants; jointly optimal schemes that share structure across axes are open. The CommIT research addresses this frontier — extending ByzSecAgg with DP, and developing decentralized variants.

Quick Check

The best known lower bound on communication cost for joint privacy (TT-collusion), robustness (BB-Byzantine), and DP ((ε,δ)(\varepsilon, \delta)) FL is:

O(1)O(1) — independent of all three.

Ω(n(T+B+log(1/δ)/ε2)d)\Omega(n \cdot (T + B + \log(1/\delta)/\varepsilon^2) \cdot d) — additive across axes.

Ω(nTBlog(1/δ)/ε2d)\Omega(n \cdot T \cdot B \cdot \log(1/\delta)/\varepsilon^2 \cdot d) — multiplicative.

Θ(n2)\Theta(n^2) — quadratic in users.