Ferkans — Interactive Telecom Tutor

The Simplest Thing That Works

Alternating optimization (AO), also known as block coordinate descent (BCD), is the swiss-army knife of non-convex problems with separable structure. When the joint variable splits cleanly into groups $(\mathbf{W}, \boldsymbol{\Phi})$ and each conditional sub-problem is tractable — even if the joint problem is not — AO gives a principled, convergent procedure. We iterate: fix $\boldsymbol{\Phi}$ , optimize $\mathbf{W}$ ; fix $\mathbf{W}$ , optimize $\boldsymbol{\Phi}$ ; repeat. Under mild conditions AO converges to a stationary point (local optimum).

Definition:
Alternating Optimization (Block Coordinate Descent)

For a joint optimization problem $\min_{(\mathbf{x}, \mathbf{y}) \in \mathcal{X} \times \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$ with separable feasibility, alternating optimization starts from an initial $(\mathbf{x}^{(0)}, \mathbf{y}^{(0)})$ and iterates

$\mathbf{x}^{(i+1)} = \arg\min_{\mathbf{x} \in \mathcal{X}} f(\mathbf{x}, \mathbf{y}^{(i)}), \qquad \mathbf{y}^{(i+1)} = \arg\min_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}^{(i+1)}, \mathbf{y}).$

Each sub-update is a conditional optimization that may be convex even when the joint problem is not. The objective is monotonically non-increasing across iterations; under regularity conditions (compact feasible sets, continuous differentiable $f$ ), the iterates converge to a stationary point of $f$ on $\mathcal{X} \times \mathcal{Y}$ .

Alternating Optimization for Joint Active-Passive Beamforming

Complexity:

O(I \cdot [C_{\text{active}} + C_{\text{passive}}])

, where

I

is the number of AO iterations and

C_\cdot

are per-subproblem costs

Input: channels

\mathbf{h}_{k,d}, \mathbf{h}_{k,2}, \mathbf{H}_1

for

k = 1,\ldots, K

;

transmit power

P_t

; tolerance

\epsilon

; max iterations

I_{\max}

.

Output: beamformer

\mathbf{W}^\star

, phase shifts

\boldsymbol{\Phi}^\star

.

1. Initialize:

\boldsymbol{\Phi}^{(0)}

(e.g.,

\boldsymbol{\phi}_n^{(0)} = 1

for all

n

).

Compute initial

\mathbf{h}_{k,\text{eff}}^{(0)} = \mathbf{h}_{k,d} + \mathbf{H}_1^H \boldsymbol{\Phi}^{(0)} \mathbf{h}_{k,2}

.

2. Repeat for

i = 0, 1, 2, \ldots

:

3.

\quad

Active update. Given

\boldsymbol{\Phi}^{(i)}

, solve

\mathbf{W}^{(i+1)} = \arg\max_{\mathbf{W}: \text{tr}(\mathbf{W}^{H}\mathbf{W}) \leq P_t} \sum_k \log_2(1 + \text{SINR}_k)

.

(WMMSE iteration, Section 5.3.)

4.

\quad

Passive update. Given

\mathbf{W}^{(i+1)}

, solve

\boldsymbol{\Phi}^{(i+1)} = \arg\max_{|\phi_n|=1} \sum_k \log_2(1 + \text{SINR}_k)

.

(SDR, manifold, or element-wise; Chapter 6.)

5.

\quad

If

|f^{(i+1)} - f^{(i)}| < \epsilon

or

i \geq I_{\max}

: break.

6. return

(\mathbf{W}^{(i+1)}, \boldsymbol{\Phi}^{(i+1)})

.

Typical $I$ is $10$ – $30$ iterations for convergence within $\epsilon = 10^{-3}$ . Each active step is a WMMSE iteration ( $O(N_t^{3} K)$ per step); each passive step uses one of the algorithms in Chapter 6 ( $O(N^3)$ for SDR, $O(N)$ per element-wise sweep).

Theorem: Monotone Convergence of AO

Let $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$ be continuously differentiable, with $\mathcal{X}, \mathcal{Y}$ compact. The AO iterates $(\mathbf{x}^{(i)}, \mathbf{y}^{(i)})$ satisfy

$f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i+1)}) \leq f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i)}) \leq f(\mathbf{x}^{(i)}, \mathbf{y}^{(i)}).$

If each sub-update is exact, then every limit point of the iterates is a stationary point of $f$ (i.e., satisfies the KKT conditions of the joint problem).

For the RIS joint problem, the limit point is a local optimum, not necessarily global. Multiple random initializations improve the chance of finding a good local optimum.

Each conditional update produces a value no worse than the previous iterate (otherwise we'd take the previous iterate). So the objective sequence is monotone. Combined with boundedness of the feasible set and continuity of $f$ , Bolzano- Weierstrass gives convergence of a subsequence to a limit point. The limit point is a stationary point by the KKT conditions.

Proof

Monotone decrease

By the $\arg\min$ definition, $f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i)}) \leq f(\mathbf{x}, \mathbf{y}^{(i)})$ for all $\mathbf{x}$ , in particular $\mathbf{x} = \mathbf{x}^{(i)}$ . So $f(\mathbf{x}^{(i+1)}, \mathbf{y}^{(i)}) \leq f(\mathbf{x}^{(i)}, \mathbf{y}^{(i)})$ . Similarly for the $\mathbf{y}$ update.

Boundedness and limit point

Compact $\mathcal{X} \times \mathcal{Y}$ means every sequence has a convergent subsequence (Bolzano-Weierstrass). The bounded monotone $f$ sequence converges to a finite limit $f^\star$ .

Stationarity

At the limit point $(\mathbf{x}^\star, \mathbf{y}^\star)$ , both $\mathbf{x}^\star$ is optimal given $\mathbf{y}^\star$ and vice versa (by continuity of the arg-min operation). This implies $\nabla_{\mathbf{x}} f(\mathbf{x}^\star, \mathbf{y}^\star) = 0$ (modulo constraint multipliers) and similarly for $\mathbf{y}$ — the KKT conditions of the joint problem. $\blacksquare$

Caveats: Local Minima, Saddle Points, Speed

AO convergence to a stationary point is guaranteed; convergence to the global optimum is not. Three practical concerns:

Local optima: The non-convexity of the unit-modulus constraint means multiple local optima can exist. AO halts at the first one it encounters. Mitigation: multiple random initializations ( $\sim 5$ - $20$ in practice) and keep the best.
Saddle points: AO can stall at saddle points in rare cases. Stochastic perturbations of the initial point typically escape.
Convergence speed: AO is linear in the best case; each iteration may halve the optimality gap. For $\epsilon = 10^{-3}$ , expect $\sim 10$ - $20$ iterations. Compare with Newton-type methods (quadratic convergence) used within each sub-problem.

Alternating Optimization Convergence Trace

Run AO on a single-user MISO-RIS problem and plot the objective (sum rate) as a function of iteration. Compare with the ideal coherent-beamforming upper bound. Change $N$ to see how larger RIS takes more iterations (more variables, more local structure); change the initialization to see sensitivity to starting point.

Parameters

RIS elements

N

64

BS antennas

N_t

8

Users

K

2

Transmit SNR (dB)10

Initialization

AO iterations20

AO Staircase on the Rate Landscape

Animation of AO taking alternating horizontal (active update) and vertical (passive update) steps on the 2D rate landscape. Each step is a conditional optimum; the sequence converges to a local maximum. Different initial points converge to different local optima — the hallmark of non-convex optimization.

Example: AO on a Single-User MISO-RIS Problem

A single-user MISO system has $N_t = 4$ , $N = 16$ , $P_t/\sigma^2 = 20\text{ dB}$ , random Rayleigh channels. Run two iterations of AO and verify the monotone-rate property.

Solution

Iteration 0: initialization

$\boldsymbol{\Phi}^{(0)} = \mathbf{I}$ . Compute $\mathbf{h}_{\text{eff}}^{(0)} = \mathbf{h}_d + \mathbf{H}_1^H \mathbf{h}_2$ . MRT: $\mathbf{v}^{(0)} = \mathbf{h}_{\text{eff}}^{(0)}/\|\mathbf{h}_{\text{eff}}^{(0)}\|$ . Rate $R^{(0)} = \log_2(1 + P_t\|\mathbf{h}_{\text{eff}}^{(0)}\|^2/\sigma^2)$ .

Iteration 1: passive update

Solve $\max_{\boldsymbol{\phi}: |\phi_n|=1} |\mathbf{h}_{\text{eff}}(\boldsymbol{\phi})^H \mathbf{v}^{(0)}|^2$ . For single-user with matched-filter $\mathbf{v}$ , the solution is element-wise: $\phi_n^{(1)} = e^{-j\arg([\mathbf{G}\mathbf{v}^{(0)*}]_n)}$ . Update $\mathbf{h}_{\text{eff}}^{(1)}$ ; compute new MRT $\mathbf{v}^{(1)}$ ; compute $R^{(1)}$ . $R^{(1)} \geq R^{(0)}$ by monotonicity.

Iteration 2, 3, ...

Continue. For single-user, the scheme converges in $\sim 3$ - $5$ iterations. Final rate is typically within $< 0.1$ bits/s/Hz of the coherent-optimum bound. $\blacksquare$

Common Mistake: Don't Always Initialize with Identity

Mistake:

"Start with $\boldsymbol{\Phi}^{(0)} = \mathbf{I}$ (all phases zero) — simple and always works."

Correction:

Identity initialization is cheap but often leads to shallow local optima, especially in multi-user scenarios. Random unit-modulus initialization (i.i.d. uniform phases) tends to find better local optima. Even better is matched-filter initialization: point $\boldsymbol{\phi}^{(0)}$ at the dominant eigenvector of the single-user coherent combining. For serious deployments, run AO from $\geq 5$ random starts and keep the best — it is a negligible compute cost relative to the AO iterations themselves.

⚠️Engineering Note

AO in a Deployed Controller

Implementing AO in an RIS controller requires attention to:

Timing. Each AO iteration takes a few ms; running to convergence takes $\sim 20$ - $100$ ms — comparable to the coherence time in mobile scenarios. Run fewer iterations (e.g., 5-10) with warm-starting from the previous channel realization, rather than restarting from scratch each time.
Warm-starting. Channel realizations at consecutive time steps are correlated. Keep $\boldsymbol{\Phi}^\star$ from the previous coherence block as the initial point for the next; the per-iteration improvement is typically small.
Numerical stability. WMMSE iterations involve matrix inverses that can be ill-conditioned at low SNR; add a small regularizer to avoid blow-up.
Fallback. If AO stalls or diverges, fall back to the matched-filter passive beamformer (element-wise rule) which is optimal for single-user and near-optimal for few users.

Practical Constraints

•
Typical AO runtime for $N = 256$ , $K = 4$ , $N_t = 16$ : $\sim 20\,\text{ms}$ on a modern CPU.
•
Warm-starting across coherence blocks reduces iteration count by $\sim 2$ - $4\times$ .
•
Regularization $\lambda \sim 10^{-6}$ suffices to stabilize WMMSE at $\text{SNR} > -10\text{ dB}$ .

Alternating Optimization Framework