Scheduling, Power, and Resource Allocation
The Joint Optimization
Sections §17.1–§17.2 defined the wireless-FL problem and its convergence behavior as a function of per-round aggregation MSE and user participation. Two levers shape the per-round MSE in practice:
- Device scheduling — which users upload each round.
- Power allocation — how much transmit budget each user spends.
These are coupled. Selecting more users increases statistical representativeness but, for AirComp, worsens the worst-user MSE if weak-channel users are included. Allocating more power to weak users accelerates their gradients' contribution but drains batteries.
The point is that the wireless-FL design problem is a bi-level optimization: the outer layer minimizes convergence loss over rounds; the inner layer chooses each round to satisfy MSE target at minimum cost. This section derives optimal (and practical heuristic) scheduling and power-control rules and quantifies their effect on end-to-end convergence.
Definition: Wireless-FL Joint Optimization
Wireless-FL Joint Optimization
Given a total-round budget , per-user energy budgets , and channel process , the wireless-FL joint optimization is subject to
The expectation is over randomness in gradients and channel noise. By Theorem 17.2.1, this reduces to minimizing the noise floor — an online per-round optimization over .
Theorem: Optimal Per-Round AirComp Scheduling
Given per-round CSI and MSE tolerance , the round- scheduling and power rules (for ) with threshold and common amplitude are Pareto-optimal: they minimize MSE subject to user count, per-user power, and jointly satisfy the feasibility constraints of Theorem 16.2.1.
Feasibility
Users with can satisfy with . Equality when .
MSE
. Exactly meeting the target.
Pareto optimality
From the threshold-scheduling theorem (§16.2 Thm 16.2.2): the threshold set is Pareto-optimal. Increasing reduces MSE but shrinks ; decreasing expands but loses MSE. is the corner of the frontier.
Operational
The designer picks the MSE target (from the convergence analysis); the scheduling and power rules fall out analytically. No iterative optimization per round. For deployment: pre-compute offline based on channel statistics.
Example: Worked Scheduling at Round 5
Round : users with effective channel gains (units ): . Noise variance . MSE target . Apply Theorem 17.3.1 to compute .
Compute $\tau_5$
.
Identify users satisfying $\gamma_k \geq 0.5$
Users with : . Count: of .
Compute $\eta_5$
.
Per-user power
(from ).
Verify feasibility
Users with use full budget. Users with use less than full budget — they have slack.
Operational
users are excluded this round. If they are regularly excluded, their data contributes less — bias introduced. Mitigate via rotation or fairness-aware scheduling (below).
Definition: Fairness-Aware Scheduling
Fairness-Aware Scheduling
A scheduling rule is -proportionally fair if, over rounds, each user has participated in for at least rounds (i.e., each user carries at least a -fraction of its "proportional share" of participations).
Threshold scheduling (Theorem 17.3.1) is not fair: persistently weak-channel users are systematically excluded. To enforce -fairness, add a lower-bound constraint:
Under this constraint, the per-round MSE increases (the scheduler must accept weaker users in some rounds), but all users' gradients contribute.
Theorem: Fairness-MSE Trade-off
For a given , the optimal -fair scheduler has per-round MSE that is, on average, at most a factor larger than the unconstrained optimal MSE, where is a channel-distribution-dependent constant ( is the -th percentile).
Interpretation. Tighter fairness (, every user included every round) approaches the worst-user bottleneck; looser fairness (, threshold scheduling) approaches the unconstrained optimum.
Unconstrained MSE
Minimum MSE = (threshold scheduling selects best).
$\alpha$-fair constraint
Must schedule the weakest users some of the time. Weakest user has average — dragging down the schedule-weighted average.
Bound factor
Simple algebra: MSE scales as where the effective is a weighted average. Combining gives the stated .
Operational
For typical channels (Rayleigh), — -fairness costs MSE. For the FL convergence, this means more rounds to reach the same loss floor — a manageable trade-off in exchange for unbiased participation.
Fairness vs. MSE Trade-off
Vary the fairness parameter and observe how the average per-round aggregation MSE increases as more weak users are forced into the schedule. Compare this to the unconstrained threshold-scheduling baseline (). The simulation draws Rayleigh-distributed channels.
Parameters
Theorem: Energy-Constrained Power Allocation
Given a per-user energy budget and rounds, the optimal power allocation across the rounds user participates in is water-filling on the channel gains: where is the dual variable for the energy constraint.
Interpretation. User spends more power in high-channel-gain rounds, less (or zero) in bad rounds. Water-filling is the standard Lagrangian answer to a sum-concave objective with linear constraint.
Dual Lagrangian
Minimizing per-round MSE over (linear in ) with energy constraint gives water-filling at its most textbook.
Closed form
Standard derivation — skip the Lagrange multiplication; result is the bracketed-positive form above.
Operational
Each user needs local knowledge of their channel history and the dual variable (can be computed by sorting across ). Total power budget is respected across rounds; individual rounds see wildly varying per-user powers.
Joint Optimization — What's Tractable?
The full joint problem — scheduling, power allocation, learning rate, convergence — is non-convex. Practical decomposition:
-
Per-round decomposition. Assume average MSE targets; each round applies Theorem 17.3.1.
-
Per-user decomposition. Each user solves its energy water-filling (Theorem 17.3.2) independently of others, given a target MSE.
-
Meta-level: the target MSE is picked from the convergence analysis (§17.2).
The decomposition is provably suboptimal but provably tractable. Empirically, the loss from decomposition is typically of the optimal. Ongoing research closes the gap.
Deploying Wireless-FL Scheduling
Production wireless-FL scheduling guidelines:
-
Estimate channel statistics offline. Before deployment, collect a few hours of channel measurements from each user. This drives the target MSE selection.
-
Choose from heterogeneity. If user gradients are i.i.d. across devices (uniform datasets), low is fine. If heterogeneous (different demographics per device), enforce higher .
-
Pre-compute the threshold schedule. For a fixed target MSE, the threshold is predictable given channel statistics. Compute once, apply online.
-
Adaptive scheduling. In non-stationary environments (e.g., moving devices), adapt based on tracked channel variability.
-
Monitor convergence in real time. The FL loss (or its estimate) provides feedback on whether the current is in the MSE- dominated regime. If yes, tighten ; if no, loosen to reduce power cost.
-
Integrate with other layers. MAC- layer scheduling interacts with physical-layer power control, which interacts with the FL learning rate. A hierarchical design — cross-layer control of FL over the wireless stack — is where deployments are heading.
- •
Offline channel statistics estimation before deployment
- •
from data heterogeneity
- •
Adaptive for non-stationary environments
- •
Cross-layer design: MAC + PHY + FL
Common Mistake: CSI Acquisition Is Not Free
Mistake:
Assume perfect CSIT is available at zero cost — and design the FL system around ideal scheduling decisions.
Correction:
CSIT requires uplink pilots from each user in each round — adding a non-trivial overhead (typically 10-20% of round bandwidth). Poor CSIT degrades scheduling (wrong thresholds) and power control (misaligned ). Budget for CSIT acquisition: either (i) pilot-based estimation at each round (overhead per round), or (ii) reciprocity-based estimation in TDD systems (lower overhead but requires TDD). Production FL should include the CSIT overhead in the total energy/bandwidth budget. Under-estimated CSIT cost inflates paper performance vs. reality.
Key Takeaway
Wireless-FL scheduling is a Pareto optimization: tight MSE (threshold scheduling) favors best-channel users; fair participation requires -fairness at an MSE cost factor . Energy budgets reduce to per-user water-filling (Theorem 17.3.2). The full joint problem is non-convex; practical decomposition (per-round + per-user) is within of optimal. The golden thread — privacy, robustness, efficiency — reappears here as fairness vs. MSE vs. energy: three axes, no perfect corner.
Quick Check
A wireless-FL system enforces fairness (each user participates in at least half their proportional-share rounds). Relative to unconstrained threshold scheduling, the average per-round MSE is:
larger
larger (Rayleigh channels)
Equal to unconstrained
larger
Per Theorem 17.3.2, for typical Rayleigh fading gives . Manageable convergence-rate cost for unbiased participation.