Scheduling, Power, and Resource Allocation

The Joint Optimization

Sections §17.1–§17.2 defined the wireless-FL problem and its convergence behavior as a function of per-round aggregation MSE and user participation. Two levers shape the per-round MSE in practice:

  • Device scheduling St\mathcal{S}_t — which users upload each round.
  • Power allocation Pk(t)P_k^{(t)} — how much transmit budget each user spends.

These are coupled. Selecting more users increases statistical representativeness but, for AirComp, worsens the worst-user MSE if weak-channel users are included. Allocating more power to weak users accelerates their gradients' contribution but drains batteries.

The point is that the wireless-FL design problem is a bi-level optimization: the outer layer minimizes convergence loss over TT rounds; the inner layer chooses (St,Pk(t))(\mathcal{S}_t, P_k^{(t)}) each round to satisfy MSE target at minimum cost. This section derives optimal (and practical heuristic) scheduling and power-control rules and quantifies their effect on end-to-end convergence.

,

Definition:

Wireless-FL Joint Optimization

Given a total-round budget TT, per-user energy budgets {Ek}\{E_k\}, and channel process {hk(t)}\{h_k^{(t)}\}, the wireless-FL joint optimization is min{(St,Pk(t))}t=0T1E[F(θT)]\min_{\{(\mathcal{S}_t, P_k^{(t)})\}_{t=0}^{T-1}} \mathbb{E}[F(\boldsymbol{\theta}_T)] subject to t:kStPk(t)    Ekk,\sum_{t: k \in \mathcal{S}_t} P_k^{(t)} \;\leq\; E_k \qquad \forall k, Stn (all users),Pk(t)Pmax(k,t).|\mathcal{S}_t| \leq n \text{ (all users)}, \qquad P_k^{(t)} \leq P_{\max} \qquad \forall (k, t).

The expectation is over randomness in gradients and channel noise. By Theorem 17.2.1, this reduces to minimizing the noise floor — an online per-round optimization over \ntnmseagg(t)\ntn{mseagg}(t).

Theorem: Optimal Per-Round AirComp Scheduling

Given per-round CSI {γk(t)}\{\gamma_k^{(t)}\} and MSE tolerance MSEtol\mathsf{MSE}^{\text{tol}}, the round-tt scheduling and power rules St={k:γk(t)τt},Pk(t)=ηt2σs2/hk(t)2\mathcal{S}_t = \{k : \gamma_k^{(t)} \geq \tau_t\}, \qquad P_k^{(t)} = \eta_t^2 \sigma_s^2 / |h_k^{(t)}|^2 (for kStk \in \mathcal{S}_t) with threshold τt=σ2/MSEtol\tau_t = \sigma^2/\mathsf{MSE}^{\text{tol}} and common amplitude ηt=τt\eta_t = \sqrt{\tau_t} are Pareto-optimal: they minimize MSE subject to user count, per-user power, and jointly satisfy the feasibility constraints of Theorem 16.2.1.

,

Example: Worked Scheduling at Round 5

Round t=5t = 5: n=20n = 20 users with effective channel gains (units Pσs2P \sigma_s^{-2}): γk(5)={1.2,0.3,0.8,0.1,2.0,0.5,0.9,1.5,0.7,0.4,1.1,0.6,0.2,0.3,0.8,1.8,0.5,0.9,1.3,0.7}\gamma_k^{(5)} = \{1.2, 0.3, 0.8, 0.1, 2.0, 0.5, 0.9, 1.5, 0.7, 0.4, 1.1, 0.6, 0.2, 0.3, 0.8, 1.8, 0.5, 0.9, 1.3, 0.7\}. Noise variance σ2=0.01\sigma^2 = 0.01. MSE target MSEtol=0.02\mathsf{MSE}^{\text{tol}} = 0.02. Apply Theorem 17.3.1 to compute S5\mathcal{S}_5.

Definition:

Fairness-Aware Scheduling

A scheduling rule is α\alpha-proportionally fair if, over TT rounds, each user kk has participated in St\mathcal{S}_t for at least αT/n\alpha T / n rounds (i.e., each user carries at least a α\alpha-fraction of its "proportional share" of participations).

Threshold scheduling (Theorem 17.3.1) is not fair: persistently weak-channel users are systematically excluded. To enforce α\alpha-fairness, add a lower-bound constraint: t=0T11[kSt]    αT/n,k.\sum_{t=0}^{T-1} \mathbb{1}[k \in \mathcal{S}_t] \;\geq\; \alpha T/n, \quad \forall k.

Under this constraint, the per-round MSE increases (the scheduler must accept weaker users in some rounds), but all users' gradients contribute.

Theorem: Fairness-MSE Trade-off

For a given α[0,1]\alpha \in [0, 1], the optimal α\alpha-fair scheduler has per-round MSE that is, on average, at most a factor ηα  =  11ακ\eta_{\alpha} \;=\; \frac{1}{1 - \alpha \cdot \kappa} larger than the unconstrained optimal MSE, where κ=(E[γmin]/E[γ(α)])1\kappa = (\mathbb{E}[\gamma_{\min}]/\mathbb{E}[\gamma_{(\alpha)}]) - 1 is a channel-distribution-dependent constant (γ(α)\gamma_{(\alpha)} is the α\alpha-th percentile).

Interpretation. Tighter fairness (α1\alpha \to 1, every user included every round) approaches the worst-user bottleneck; looser fairness (α0\alpha \to 0, threshold scheduling) approaches the unconstrained optimum.

Fairness vs. MSE Trade-off

Vary the fairness parameter α\alpha and observe how the average per-round aggregation MSE increases as more weak users are forced into the schedule. Compare this to the unconstrained threshold-scheduling baseline (α=0\alpha = 0). The simulation draws Rayleigh-distributed channels.

Parameters
50
100
1

Theorem: Energy-Constrained Power Allocation

Given a per-user energy budget EkE_k and TT rounds, the optimal power allocation across the rounds user kk participates in is water-filling on the channel gains: Pk(t)=[1λk1γk(t)]+subject to t:kStPk(t)=Ek,P_k^{(t)\star} = \left[\frac{1}{\lambda_k} - \frac{1}{\gamma_k^{(t)}}\right]^+ \text{subject to } \sum_{t: k \in \mathcal{S}_t} P_k^{(t)} = E_k, where λk>0\lambda_k > 0 is the dual variable for the energy constraint.

Interpretation. User kk spends more power in high-channel-gain rounds, less (or zero) in bad rounds. Water-filling is the standard Lagrangian answer to a sum-concave objective with linear constraint.

Joint Optimization — What's Tractable?

The full joint problem — scheduling, power allocation, learning rate, convergence — is non-convex. Practical decomposition:

  1. Per-round decomposition. Assume average MSE targets; each round applies Theorem 17.3.1.

  2. Per-user decomposition. Each user solves its energy water-filling (Theorem 17.3.2) independently of others, given a target MSE.

  3. Meta-level: the target MSE is picked from the convergence analysis (§17.2).

The decomposition is provably suboptimal but provably tractable. Empirically, the loss from decomposition is typically <5%< 5\% of the optimal. Ongoing research closes the gap.

⚠️Engineering Note

Deploying Wireless-FL Scheduling

Production wireless-FL scheduling guidelines:

  • Estimate channel statistics offline. Before deployment, collect a few hours of channel measurements from each user. This drives the target MSE selection.

  • Choose α\alpha from heterogeneity. If user gradients are i.i.d. across devices (uniform datasets), low α\alpha is fine. If heterogeneous (different demographics per device), enforce higher α\alpha.

  • Pre-compute the threshold schedule. For a fixed target MSE, the threshold τt\tau_t is predictable given channel statistics. Compute once, apply online.

  • Adaptive α\alpha scheduling. In non-stationary environments (e.g., moving devices), adapt α\alpha based on tracked channel variability.

  • Monitor convergence in real time. The FL loss (or its estimate) provides feedback on whether the current (τt,α)(\tau_t, \alpha) is in the MSE- dominated regime. If yes, tighten τt\tau_t; if no, loosen to reduce power cost.

  • Integrate with other layers. MAC- layer scheduling interacts with physical-layer power control, which interacts with the FL learning rate. A hierarchical design — cross-layer control of FL over the wireless stack — is where deployments are heading.

Practical Constraints
  • Offline channel statistics estimation before deployment

  • α\alpha from data heterogeneity

  • Adaptive α\alpha for non-stationary environments

  • Cross-layer design: MAC + PHY + FL

📋 Ref: Yang et al. 2020; Chen et al. 2021
,

Common Mistake: CSI Acquisition Is Not Free

Mistake:

Assume perfect CSIT is available at zero cost — and design the FL system around ideal scheduling decisions.

Correction:

CSIT requires uplink pilots from each user in each round — adding a non-trivial overhead (typically 10-20% of round bandwidth). Poor CSIT degrades scheduling (wrong thresholds) and power control (misaligned bkhkb_k h_k). Budget for CSIT acquisition: either (i) pilot-based estimation at each round (overhead per round), or (ii) reciprocity-based estimation in TDD systems (lower overhead but requires TDD). Production FL should include the CSIT overhead in the total energy/bandwidth budget. Under-estimated CSIT cost inflates paper performance vs. reality.

Key Takeaway

Wireless-FL scheduling is a Pareto optimization: tight MSE (threshold scheduling) favors best-channel users; fair participation requires α\alpha-fairness at an MSE cost factor 1/(1ακ)\sim 1/(1 - \alpha\kappa). Energy budgets reduce to per-user water-filling (Theorem 17.3.2). The full joint problem is non-convex; practical decomposition (per-round + per-user) is within 5%5\% of optimal. The golden thread — privacy, robustness, efficiency — reappears here as fairness vs. MSE vs. energy: three axes, no perfect corner.

Quick Check

A wireless-FL system enforces α=0.5\alpha = 0.5 fairness (each user participates in at least half their proportional-share rounds). Relative to unconstrained threshold scheduling, the average per-round MSE is:

2×2\times larger

1.5×\sim 1.5\times larger (Rayleigh channels)

Equal to unconstrained

10×10\times larger