Alternating Optimization Framework

The Simplest Thing That Works

Alternating optimization (AO), also known as block coordinate descent (BCD), is the swiss-army knife of non-convex problems with separable structure. When the joint variable splits cleanly into groups (W,Ξ¦)(\mathbf{W}, \boldsymbol{\Phi}) and each conditional sub-problem is tractable β€” even if the joint problem is not β€” AO gives a principled, convergent procedure. We iterate: fix Ξ¦\boldsymbol{\Phi}, optimize W\mathbf{W}; fix W\mathbf{W}, optimize Ξ¦\boldsymbol{\Phi}; repeat. Under mild conditions AO converges to a stationary point (local optimum).

Definition:

Alternating Optimization (Block Coordinate Descent)

For a joint optimization problem min⁑(x,y)∈XΓ—Yf(x,y)\min_{(\mathbf{x}, \mathbf{y}) \in \mathcal{X} \times \mathcal{Y}} f(\mathbf{x}, \mathbf{y}) with separable feasibility, alternating optimization starts from an initial (x(0),y(0))(\mathbf{x}^{(0)}, \mathbf{y}^{(0)}) and iterates

x(i+1)=arg⁑min⁑x∈Xf(x,y(i)),y(i+1)=arg⁑min⁑y∈Yf(x(i+1),y).\mathbf{x}^{(i+1)} = \arg\min_{\mathbf{x} \in \mathcal{X}} f(\mathbf{x}, \mathbf{y}^{(i)}), \qquad \mathbf{y}^{(i+1)} = \arg\min_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}^{(i+1)}, \mathbf{y}).

Each sub-update is a conditional optimization that may be convex even when the joint problem is not. The objective is monotonically non-increasing across iterations; under regularity conditions (compact feasible sets, continuous differentiable ff), the iterates converge to a stationary point of ff on XΓ—Y\mathcal{X} \times \mathcal{Y}.

Alternating Optimization for Joint Active-Passive Beamforming

Complexity: O(Iβ‹…[Cactive+Cpassive])O(I \cdot [C_{\text{active}} + C_{\text{passive}}]), where II is the number of AO iterations and Cβ‹…C_\cdot are per-subproblem costs
Input: channels hk,d,hk,2,H1\mathbf{h}_{k,d}, \mathbf{h}_{k,2}, \mathbf{H}_1 for k=1,…,Kk = 1,\ldots, K;
transmit power PtP_t; tolerance ϡ\epsilon; max iterations Imax⁑I_{\max}.
Output: beamformer W⋆\mathbf{W}^\star, phase shifts Φ⋆\boldsymbol{\Phi}^\star.
1. Initialize: Ξ¦(0)\boldsymbol{\Phi}^{(0)} (e.g., Ο•n(0)=1\boldsymbol{\phi}_n^{(0)} = 1 for all nn).
Compute initial hk,eff(0)=hk,d+H1HΦ(0)hk,2\mathbf{h}_{k,\text{eff}}^{(0)} = \mathbf{h}_{k,d} + \mathbf{H}_1^H \boldsymbol{\Phi}^{(0)} \mathbf{h}_{k,2}.
2. Repeat for i=0,1,2,…i = 0, 1, 2, \ldots:
3. \quad Active update. Given Ξ¦(i)\boldsymbol{\Phi}^{(i)}, solve
W(i+1)=arg⁑max⁑W:tr(WHW)≀Ptβˆ‘klog⁑2(1+SINRk)\mathbf{W}^{(i+1)} = \arg\max_{\mathbf{W}: \text{tr}(\mathbf{W}^{H}\mathbf{W}) \leq P_t} \sum_k \log_2(1 + \text{SINR}_k).
(WMMSE iteration, Section 5.3.)
4. \quad Passive update. Given W(i+1)\mathbf{W}^{(i+1)}, solve
Ξ¦(i+1)=arg⁑maxβ‘βˆ£Ο•n∣=1βˆ‘klog⁑2(1+SINRk)\boldsymbol{\Phi}^{(i+1)} = \arg\max_{|\phi_n|=1} \sum_k \log_2(1 + \text{SINR}_k).
(SDR, manifold, or element-wise; Chapter 6.)
5. \quad If ∣f(i+1)βˆ’f(i)∣<Ο΅|f^{(i+1)} - f^{(i)}| < \epsilon or iβ‰₯Imax⁑i \geq I_{\max}: break.
6. return (W(i+1),Ξ¦(i+1))(\mathbf{W}^{(i+1)}, \boldsymbol{\Phi}^{(i+1)}).

Typical II is 1010–3030 iterations for convergence within Ο΅=10βˆ’3\epsilon = 10^{-3}. Each active step is a WMMSE iteration (O(Nt3K)O(N_t^{3} K) per step); each passive step uses one of the algorithms in Chapter 6 (O(N3)O(N^3) for SDR, O(N)O(N) per element-wise sweep).

Theorem: Monotone Convergence of AO

Let f:X×Y→Rf: \mathcal{X} \times \mathcal{Y} \to \mathbb{R} be continuously differentiable, with X,Y\mathcal{X}, \mathcal{Y} compact. The AO iterates (x(i),y(i))(\mathbf{x}^{(i)}, \mathbf{y}^{(i)}) satisfy

f(x(i+1),y(i+1))≀f(x(i+1),y(i))≀f(x(i),y(i)).f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i+1)}) \leq f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i)}) \leq f(\mathbf{x}^{(i)}, \mathbf{y}^{(i)}).

If each sub-update is exact, then every limit point of the iterates is a stationary point of ff (i.e., satisfies the KKT conditions of the joint problem).

For the RIS joint problem, the limit point is a local optimum, not necessarily global. Multiple random initializations improve the chance of finding a good local optimum.

Each conditional update produces a value no worse than the previous iterate (otherwise we'd take the previous iterate). So the objective sequence is monotone. Combined with boundedness of the feasible set and continuity of ff, Bolzano- Weierstrass gives convergence of a subsequence to a limit point. The limit point is a stationary point by the KKT conditions.

Caveats: Local Minima, Saddle Points, Speed

AO convergence to a stationary point is guaranteed; convergence to the global optimum is not. Three practical concerns:

  1. Local optima: The non-convexity of the unit-modulus constraint means multiple local optima can exist. AO halts at the first one it encounters. Mitigation: multiple random initializations (∼5\sim 5-2020 in practice) and keep the best.
  2. Saddle points: AO can stall at saddle points in rare cases. Stochastic perturbations of the initial point typically escape.
  3. Convergence speed: AO is linear in the best case; each iteration may halve the optimality gap. For Ο΅=10βˆ’3\epsilon = 10^{-3}, expect ∼10\sim 10-2020 iterations. Compare with Newton-type methods (quadratic convergence) used within each sub-problem.

Alternating Optimization Convergence Trace

Run AO on a single-user MISO-RIS problem and plot the objective (sum rate) as a function of iteration. Compare with the ideal coherent-beamforming upper bound. Change NN to see how larger RIS takes more iterations (more variables, more local structure); change the initialization to see sensitivity to starting point.

Parameters
64
8
2
10
20

AO Staircase on the Rate Landscape

Animation of AO taking alternating horizontal (active update) and vertical (passive update) steps on the 2D rate landscape. Each step is a conditional optimum; the sequence converges to a local maximum. Different initial points converge to different local optima β€” the hallmark of non-convex optimization.

Example: AO on a Single-User MISO-RIS Problem

A single-user MISO system has Nt=4N_t = 4, N=16N = 16, Pt/Οƒ2=20Β dBP_t/\sigma^2 = 20\text{ dB}, random Rayleigh channels. Run two iterations of AO and verify the monotone-rate property.

Common Mistake: Don't Always Initialize with Identity

Mistake:

"Start with Ξ¦(0)=I\boldsymbol{\Phi}^{(0)} = \mathbf{I} (all phases zero) β€” simple and always works."

Correction:

Identity initialization is cheap but often leads to shallow local optima, especially in multi-user scenarios. Random unit-modulus initialization (i.i.d. uniform phases) tends to find better local optima. Even better is matched-filter initialization: point Ο•(0)\boldsymbol{\phi}^{(0)} at the dominant eigenvector of the single-user coherent combining. For serious deployments, run AO from β‰₯5\geq 5 random starts and keep the best β€” it is a negligible compute cost relative to the AO iterations themselves.

⚠️Engineering Note

AO in a Deployed Controller

Implementing AO in an RIS controller requires attention to:

  1. Timing. Each AO iteration takes a few ms; running to convergence takes ∼20\sim 20-100100 ms β€” comparable to the coherence time in mobile scenarios. Run fewer iterations (e.g., 5-10) with warm-starting from the previous channel realization, rather than restarting from scratch each time.
  2. Warm-starting. Channel realizations at consecutive time steps are correlated. Keep Φ⋆\boldsymbol{\Phi}^\star from the previous coherence block as the initial point for the next; the per-iteration improvement is typically small.
  3. Numerical stability. WMMSE iterations involve matrix inverses that can be ill-conditioned at low SNR; add a small regularizer to avoid blow-up.
  4. Fallback. If AO stalls or diverges, fall back to the matched-filter passive beamformer (element-wise rule) which is optimal for single-user and near-optimal for few users.
Practical Constraints
  • β€’

    Typical AO runtime for N=256N = 256, K=4K = 4, Nt=16N_t = 16: ∼20 ms\sim 20\,\text{ms} on a modern CPU.

  • β€’

    Warm-starting across coherence blocks reduces iteration count by ∼2\sim 2-4Γ—4\times.

  • β€’

    Regularization λ∼10βˆ’6\lambda \sim 10^{-6} suffices to stabilize WMMSE at SNR>βˆ’10Β dB\text{SNR} > -10\text{ dB}.