Alternating Optimization Framework
The Simplest Thing That Works
Alternating optimization (AO), also known as block coordinate descent (BCD), is the swiss-army knife of non-convex problems with separable structure. When the joint variable splits cleanly into groups and each conditional sub-problem is tractable β even if the joint problem is not β AO gives a principled, convergent procedure. We iterate: fix , optimize ; fix , optimize ; repeat. Under mild conditions AO converges to a stationary point (local optimum).
Definition: Alternating Optimization (Block Coordinate Descent)
Alternating Optimization (Block Coordinate Descent)
For a joint optimization problem with separable feasibility, alternating optimization starts from an initial and iterates
Each sub-update is a conditional optimization that may be convex even when the joint problem is not. The objective is monotonically non-increasing across iterations; under regularity conditions (compact feasible sets, continuous differentiable ), the iterates converge to a stationary point of on .
Alternating Optimization for Joint Active-Passive Beamforming
Complexity: , where is the number of AO iterations and are per-subproblem costsTypical is β iterations for convergence within . Each active step is a WMMSE iteration ( per step); each passive step uses one of the algorithms in Chapter 6 ( for SDR, per element-wise sweep).
Theorem: Monotone Convergence of AO
Let be continuously differentiable, with compact. The AO iterates satisfy
If each sub-update is exact, then every limit point of the iterates is a stationary point of (i.e., satisfies the KKT conditions of the joint problem).
For the RIS joint problem, the limit point is a local optimum, not necessarily global. Multiple random initializations improve the chance of finding a good local optimum.
Each conditional update produces a value no worse than the previous iterate (otherwise we'd take the previous iterate). So the objective sequence is monotone. Combined with boundedness of the feasible set and continuity of , Bolzano- Weierstrass gives convergence of a subsequence to a limit point. The limit point is a stationary point by the KKT conditions.
Monotone decrease
By the definition, for all , in particular . So . Similarly for the update.
Boundedness and limit point
Compact means every sequence has a convergent subsequence (Bolzano-Weierstrass). The bounded monotone sequence converges to a finite limit .
Stationarity
At the limit point , both is optimal given and vice versa (by continuity of the arg-min operation). This implies (modulo constraint multipliers) and similarly for β the KKT conditions of the joint problem.
Caveats: Local Minima, Saddle Points, Speed
AO convergence to a stationary point is guaranteed; convergence to the global optimum is not. Three practical concerns:
- Local optima: The non-convexity of the unit-modulus constraint means multiple local optima can exist. AO halts at the first one it encounters. Mitigation: multiple random initializations (- in practice) and keep the best.
- Saddle points: AO can stall at saddle points in rare cases. Stochastic perturbations of the initial point typically escape.
- Convergence speed: AO is linear in the best case; each iteration may halve the optimality gap. For , expect - iterations. Compare with Newton-type methods (quadratic convergence) used within each sub-problem.
Alternating Optimization Convergence Trace
Run AO on a single-user MISO-RIS problem and plot the objective (sum rate) as a function of iteration. Compare with the ideal coherent-beamforming upper bound. Change to see how larger RIS takes more iterations (more variables, more local structure); change the initialization to see sensitivity to starting point.
Parameters
AO Staircase on the Rate Landscape
Example: AO on a Single-User MISO-RIS Problem
A single-user MISO system has , , , random Rayleigh channels. Run two iterations of AO and verify the monotone-rate property.
Iteration 0: initialization
. Compute . MRT: . Rate .
Iteration 1: passive update
Solve . For single-user with matched-filter , the solution is element-wise: . Update ; compute new MRT ; compute . by monotonicity.
Iteration 2, 3, ...
Continue. For single-user, the scheme converges in - iterations. Final rate is typically within bits/s/Hz of the coherent-optimum bound.
Common Mistake: Don't Always Initialize with Identity
Mistake:
"Start with (all phases zero) β simple and always works."
Correction:
Identity initialization is cheap but often leads to shallow local optima, especially in multi-user scenarios. Random unit-modulus initialization (i.i.d. uniform phases) tends to find better local optima. Even better is matched-filter initialization: point at the dominant eigenvector of the single-user coherent combining. For serious deployments, run AO from random starts and keep the best β it is a negligible compute cost relative to the AO iterations themselves.
AO in a Deployed Controller
Implementing AO in an RIS controller requires attention to:
- Timing. Each AO iteration takes a few ms; running to convergence takes - ms β comparable to the coherence time in mobile scenarios. Run fewer iterations (e.g., 5-10) with warm-starting from the previous channel realization, rather than restarting from scratch each time.
- Warm-starting. Channel realizations at consecutive time steps are correlated. Keep from the previous coherence block as the initial point for the next; the per-iteration improvement is typically small.
- Numerical stability. WMMSE iterations involve matrix inverses that can be ill-conditioned at low SNR; add a small regularizer to avoid blow-up.
- Fallback. If AO stalls or diverges, fall back to the matched-filter passive beamformer (element-wise rule) which is optimal for single-user and near-optimal for few users.
- β’
Typical AO runtime for , , : on a modern CPU.
- β’
Warm-starting across coherence blocks reduces iteration count by -.
- β’
Regularization suffices to stabilize WMMSE at .