Convergence and Local-Minimum Analysis

Why Worry About Local Minima?

We proved in Section 5.2 that AO converges to a stationary point. But the unit-modulus constraint makes the feasible set non-convex, and non-convex problems generically have multiple stationary points. This section quantifies the local-minimum question: how bad can an AO local optimum be compared with the global, and what does that tell us about the initialization strategy?

Theorem: Worst-Case Gap to Global Optimum

Let R⋆R^\star denote the global optimum sum rate, RAOR_{\text{AO}} the AO converged rate (from any feasible initial point), and RSDRR_{\text{SDR}} the SDR-relaxed upper bound (Chapter 6). Then

RAO≀R⋆≀RSDR.R_{\text{AO}} \leq R^\star \leq R_{\text{SDR}}.

Empirical evidence over random channel realizations shows the AO-to-SDR gap is typically <5%< 5\% in sum rate for single-user and <10%< 10\% for K≀4K \leq 4 users. With 5 random AO initializations and taking the best, the worst-case optimality gap falls to <2%< 2\% in most scenarios.

The SDR relaxation gives an upper bound RSDRR_{\text{SDR}} on the global optimum: any relaxed solution is at least as good as the best feasible solution. Together with the AO lower bound RAOR_{\text{AO}}, this sandwiches the global optimum R⋆R^\star: RAO≀R⋆≀RSDRR_{\text{AO}} \leq R^\star \leq R_{\text{SDR}}. Empirically, the gap is small β€” usually <1< 1 dB.

Initialization Strategies That Work

Empirical best practices for AO initialization:

  1. Identity: Ξ¦(0)=I\boldsymbol{\Phi}^{(0)} = \mathbf{I}. Cheap; works for single-user, mediocre for multi-user.
  2. Random unit-modulus: ΞΈn(0)∼Unif[0,2Ο€)\theta_n^{(0)} \sim \text{Unif}[0, 2\pi). Best single-trial performance among cheap initializations.
  3. Strongest-user matched filter: Initialize Ο•(0)\boldsymbol{\phi}^{(0)} as matched filter for the single-strongest user. Then the RIS helps that user first; the subsequent iterations fold in the weaker users. Especially effective when user strengths are unbalanced.
  4. Multi-start with best: Run AO from 5-20 random starts, keep the best. Adds a 55-20Γ—20\times compute cost but almost closes the gap to the global optimum.

Strategy (4) is the recommendation for simulation studies and benchmarks; (3) or (2) is the recommendation for real-time operation.

Theorem: Linear Convergence Rate of AO

Suppose (W⋆,Φ⋆)(\mathbf{W}^\star, \boldsymbol{\Phi}^\star) is an isolated stationary point with positive-definite block Hessians. Then in a neighborhood of the stationary point,

βˆ₯(W(i+1),Ξ¦(i+1))βˆ’(W⋆,Φ⋆)βˆ₯≀ρ βˆ₯(W(i),Ξ¦(i))βˆ’(W⋆,Φ⋆)βˆ₯,\|(\mathbf{W}^{(i+1)}, \boldsymbol{\Phi}^{(i+1)}) - (\mathbf{W}^\star, \boldsymbol{\Phi}^\star)\| \leq \rho\, \|(\mathbf{W}^{(i)}, \boldsymbol{\Phi}^{(i)}) - (\mathbf{W}^\star, \boldsymbol{\Phi}^\star)\|,

where 0<ρ<10 < \rho < 1 depends on the off-block-diagonal terms of the Hessian. For well-conditioned RIS problems, ρ∼0.1\rho \sim 0.1-0.30.3, i.e., each AO iteration roughly halves the distance to the optimum.

Near a non-degenerate local minimum, each AO iteration reduces the optimality gap by a constant factor less than 1. The rate depends on the "coupling strength" between blocks; weakly coupled problems converge fast.

AO Starting From Multiple Initializations

Run AO from 10 random initializations on the same channel realization and show each convergence trajectory. Most converge to the same local optimum; a few converge to strictly worse local optima. Best-of-10 typically achieves the global optimum within <1< 1% gap.

Parameters
32
4
2
15
10

Example: Characterizing an AO Fixed Point

At an AO fixed point (W⋆,Φ⋆)(\mathbf{W}^\star, \boldsymbol{\Phi}^\star), what are the first-order optimality conditions that both blocks simultaneously satisfy?

Common Mistake: Monotone Rate Is Not a Substitute for Good Rate

Mistake:

"My AO converged monotonically, so the result is trustworthy."

Correction:

Monotone convergence only tells you AO reached a stationary point, not a good one. Check the gap to an upper bound: SDR-relaxed sum rate, or the single-user coherent-optimum bound log⁑2(1+PtN2/K/Οƒ2)\log_2(1 + P_t N^2 / K / \sigma^2). If the AO output is >3Β dB> 3\text{ dB} below these bounds, try more initializations or a better passive algorithm.

Key Takeaway

AO is the practical workhorse for RIS joint optimization. Each iteration consists of a WMMSE inner loop (active) and a passive update from Chapter 6. 10–30 outer iterations typically suffice for convergence within Ο΅=10βˆ’3\epsilon = 10^{-3}. With multiple random initializations, AO delivers rates within 11-5%5\% of the SDR upper bound across a wide range of scenarios. The combination of monotone convergence, tractable sub-problems, and empirical near-optimality make AO the default algorithm β€” even when more sophisticated global methods exist.

Quick Check

Which of the following is not guaranteed by AO on the joint RIS beamforming problem?

Monotone non-decreasing sum rate

Convergence to a stationary point

Convergence to the global optimum

Finite-iteration termination at any tolerance

Alternating Optimization (AO)

An iterative algorithm for solving joint optimization problems min⁑(x,y)f(x,y)\min_{(\mathbf{x}, \mathbf{y})} f(\mathbf{x}, \mathbf{y}) by alternately minimizing over one block while fixing the other. For RIS, x=W\mathbf{x} = \mathbf{W} (BS precoder) and y=Φ\mathbf{y} = \boldsymbol{\Phi} (RIS phases). Monotone convergence is guaranteed under mild regularity conditions.

Related: WMMSE Reformulation of Sum-Rate Maximization, The Passive Beamforming Subproblem

WMMSE Algorithm

The Weighted MMSE reformulation of sum-rate maximization, used as the inner loop of RIS AO for the active-beamforming subproblem. Introduces auxiliary combiner and weight variables; each block has a closed-form coordinate update. Converges to a local optimum of the original sum-rate problem.

Related: Monotone Convergence of AO, Precoding

Historical Note: Block Coordinate Descent: From Gauss-Seidel to AO

1950s–2020s

Alternating optimization is a descendant of the Gauss–Seidel method (1823, for linear systems). Its generalization to non-convex optimization was studied by Powell in the 1970s, and its convergence theory was matured by Bertsekas and Tsitsiklis in the 1980s–1990s. Modern applications β€” matrix completion, tensor decomposition, dictionary learning β€” all rely on the same monotone-descent principle.

The RIS community adopted BCD (under the name "alternating optimization" or "AltOpt") in 2019 with the Wu and Zhang paper, which used it as the natural framework for the newly-formulated joint beamforming problem. Since then, essentially every RIS-optimization paper uses AO in some form β€” often with innovations in one or the other sub-problem, but keeping the outer alternating structure.

⚠️Engineering Note

AO in a Real-Time Controller

A real-time RIS controller running AO between coherence blocks must make time/quality tradeoffs:

  • Partial AO: Run 3-5 iterations per coherence block, not full convergence. The sub-optimality is small when warm-starting.
  • Warm-start: Initialize current block's Ξ¦\boldsymbol{\Phi} and W\mathbf{W} from previous block's solution. Block-to-block correlation makes this very effective.
  • Amortized computation: Pre-compute channel-dependent matrices once per coherence block, reuse across inner AO steps.
  • Graceful degradation: If AO time budget exceeded, fall back to the current best iterate.
Practical Constraints
  • β€’

    Typical compute budget in a 5G-like system: ∼1\sim 1 ms per AO iteration at N=256,K=4N = 256, K = 4.

  • β€’

    Total optimization time per coherence block: <20< 20 ms.

  • β€’

    Warm-start reduces the needed iteration count by ∼3Γ—\sim 3\times vs. cold-start.