Convergence and Local-Minimum Analysis
Why Worry About Local Minima?
We proved in Section 5.2 that AO converges to a stationary point. But the unit-modulus constraint makes the feasible set non-convex, and non-convex problems generically have multiple stationary points. This section quantifies the local-minimum question: how bad can an AO local optimum be compared with the global, and what does that tell us about the initialization strategy?
Theorem: Worst-Case Gap to Global Optimum
Let denote the global optimum sum rate, the AO converged rate (from any feasible initial point), and the SDR-relaxed upper bound (Chapter 6). Then
Empirical evidence over random channel realizations shows the AO-to-SDR gap is typically in sum rate for single-user and for users. With 5 random AO initializations and taking the best, the worst-case optimality gap falls to in most scenarios.
The SDR relaxation gives an upper bound on the global optimum: any relaxed solution is at least as good as the best feasible solution. Together with the AO lower bound , this sandwiches the global optimum : . Empirically, the gap is small β usually dB.
Initialization Strategies That Work
Empirical best practices for AO initialization:
- Identity: . Cheap; works for single-user, mediocre for multi-user.
- Random unit-modulus: . Best single-trial performance among cheap initializations.
- Strongest-user matched filter: Initialize as matched filter for the single-strongest user. Then the RIS helps that user first; the subsequent iterations fold in the weaker users. Especially effective when user strengths are unbalanced.
- Multi-start with best: Run AO from 5-20 random starts, keep the best. Adds a - compute cost but almost closes the gap to the global optimum.
Strategy (4) is the recommendation for simulation studies and benchmarks; (3) or (2) is the recommendation for real-time operation.
Theorem: Linear Convergence Rate of AO
Suppose is an isolated stationary point with positive-definite block Hessians. Then in a neighborhood of the stationary point,
where depends on the off-block-diagonal terms of the Hessian. For well-conditioned RIS problems, -, i.e., each AO iteration roughly halves the distance to the optimum.
Near a non-degenerate local minimum, each AO iteration reduces the optimality gap by a constant factor less than 1. The rate depends on the "coupling strength" between blocks; weakly coupled problems converge fast.
AO Starting From Multiple Initializations
Run AO from 10 random initializations on the same channel realization and show each convergence trajectory. Most converge to the same local optimum; a few converge to strictly worse local optima. Best-of-10 typically achieves the global optimum within % gap.
Parameters
Example: Characterizing an AO Fixed Point
At an AO fixed point , what are the first-order optimality conditions that both blocks simultaneously satisfy?
Active-block condition
is optimal for subject to power constraint. KKT: there is with and .
Passive-block condition
is optimal for on the unit-modulus torus. The Riemannian gradient , i.e., the Euclidean gradient projected onto the tangent space of the torus is zero. Equivalently, for each : .
Interpretation
The active-block condition is standard (water-filling / KKT). The passive-block condition is a phase matching: each is aligned with the Euclidean gradient component on that element, modulo sign. Jointly these describe a saddle-free coordinate-wise stationary point.
Common Mistake: Monotone Rate Is Not a Substitute for Good Rate
Mistake:
"My AO converged monotonically, so the result is trustworthy."
Correction:
Monotone convergence only tells you AO reached a stationary point, not a good one. Check the gap to an upper bound: SDR-relaxed sum rate, or the single-user coherent-optimum bound . If the AO output is below these bounds, try more initializations or a better passive algorithm.
Key Takeaway
AO is the practical workhorse for RIS joint optimization. Each iteration consists of a WMMSE inner loop (active) and a passive update from Chapter 6. 10β30 outer iterations typically suffice for convergence within . With multiple random initializations, AO delivers rates within - of the SDR upper bound across a wide range of scenarios. The combination of monotone convergence, tractable sub-problems, and empirical near-optimality make AO the default algorithm β even when more sophisticated global methods exist.
Quick Check
Which of the following is not guaranteed by AO on the joint RIS beamforming problem?
Monotone non-decreasing sum rate
Convergence to a stationary point
Convergence to the global optimum
Finite-iteration termination at any tolerance
AO converges only to a local stationary point. The global optimum can be missed due to the non-convex unit-modulus constraint. Multiple random initializations help but do not guarantee the global optimum.
Alternating Optimization (AO)
An iterative algorithm for solving joint optimization problems by alternately minimizing over one block while fixing the other. For RIS, (BS precoder) and (RIS phases). Monotone convergence is guaranteed under mild regularity conditions.
Related: WMMSE Reformulation of Sum-Rate Maximization, The Passive Beamforming Subproblem
WMMSE Algorithm
The Weighted MMSE reformulation of sum-rate maximization, used as the inner loop of RIS AO for the active-beamforming subproblem. Introduces auxiliary combiner and weight variables; each block has a closed-form coordinate update. Converges to a local optimum of the original sum-rate problem.
Related: Monotone Convergence of AO, Precoding
Historical Note: Block Coordinate Descent: From Gauss-Seidel to AO
1950sβ2020sAlternating optimization is a descendant of the GaussβSeidel method (1823, for linear systems). Its generalization to non-convex optimization was studied by Powell in the 1970s, and its convergence theory was matured by Bertsekas and Tsitsiklis in the 1980sβ1990s. Modern applications β matrix completion, tensor decomposition, dictionary learning β all rely on the same monotone-descent principle.
The RIS community adopted BCD (under the name "alternating optimization" or "AltOpt") in 2019 with the Wu and Zhang paper, which used it as the natural framework for the newly-formulated joint beamforming problem. Since then, essentially every RIS-optimization paper uses AO in some form β often with innovations in one or the other sub-problem, but keeping the outer alternating structure.
AO in a Real-Time Controller
A real-time RIS controller running AO between coherence blocks must make time/quality tradeoffs:
- Partial AO: Run 3-5 iterations per coherence block, not full convergence. The sub-optimality is small when warm-starting.
- Warm-start: Initialize current block's and from previous block's solution. Block-to-block correlation makes this very effective.
- Amortized computation: Pre-compute channel-dependent matrices once per coherence block, reuse across inner AO steps.
- Graceful degradation: If AO time budget exceeded, fall back to the current best iterate.
- β’
Typical compute budget in a 5G-like system: ms per AO iteration at .
- β’
Total optimization time per coherence block: ms.
- β’
Warm-start reduces the needed iteration count by vs. cold-start.