Ferkans — Interactive Telecom Tutor

The Simplest Algorithm That Works

SDR scales as $O(N^{6.5})$ ; manifold as $O(N^2)$ per iteration. Element-wise BCD — updating one $\phi_n$ at a time in closed form — scales as $O(N)$ per coordinate sweep. For single-user problems, element-wise is optimal (gives the global optimum). For multi-user, it plateaus at a local optimum that is sometimes $1$ - $3$ dB below SDR, but its speed makes it the default for real-time operation. This section derives the update rule, proves monotone convergence, and characterizes its local-optimum structure.

Definition:
The Element-wise Update Rule

Consider the single-user QCQP objective $f(\boldsymbol{\phi}) = |a + \boldsymbol{\phi}^T \mathbf{b}|^2$ for fixed $a \in \mathbb{C}$ and $\mathbf{b} \in \mathbb{C}^N$ . Updating only $\phi_n$ while holding the others fixed, define

$c_n = a + \sum_{m \neq n} \phi_m b_m \in \mathbb{C}\ (\text{known constant}).$

The one-coordinate problem is $\max_{|\phi_n| = 1} |c_n + \phi_n b_n|^2$ , which has the closed-form optimizer

$\phi_n^\star = e^{j(\arg c_n - \arg b_n)}.$

For the multi-user quadratic $f(\boldsymbol{\phi}) = \boldsymbol{\phi}^H \mathbf{A} \boldsymbol{\phi}$ (with $\mathbf{A}$ not rank 1), the single-coordinate problem becomes maximizing a cosine (real sinusoid plus real constant terms), also admitting a closed form:

$\phi_n^\star = e^{j\arg\big(\sum_{m \neq n} A_{nm}^* \phi_m^*\big)}.$

The key algorithmic fact: for each coordinate, the conditional optimum is phase-matching — align $\phi_n$ with the sum of influences from the other elements.

Element-wise BCD for Unit-Modulus QCQP

Complexity:

O(S \cdot N^2)

total:

S \sim 5

-

10

sweeps, each

O(N^2)

for the coordinate updates

Input: PSD

\mathbf{A} \in \mathbb{C}^{N \times N}

; initial

\boldsymbol{\phi}^{(0)}

;

tolerance

\epsilon

; max sweeps

S_{\max}

.

Output: local optimum

\boldsymbol{\phi}^\star

.

1. Initialize

\boldsymbol{\phi}^{(0)}

(e.g., all ones or random).

2. For

s = 0, 1, 2, \ldots, S_{\max} - 1

: (sweep over all coordinates)

3.

\quad

For

n = 1, 2, \ldots, N

:

4.

\quad\quad

Compute

\alpha_n = \sum_{m \neq n} A_{nm}^* \phi_m^*

.

5.

\quad\quad

Update

\phi_n \leftarrow e^{j\arg(\alpha_n)}

.

6.

\quad

If

|f(\boldsymbol{\phi}^{(s+1)}) - f(\boldsymbol{\phi}^{(s)})| < \epsilon

: break.

7. return

\boldsymbol{\phi}

.

Each sweep is $O(N^2)$ because computing $\alpha_n$ for all $n$ involves the matrix $\mathbf{A}$ . For low-rank $\mathbf{A} = \mathbf{U}\mathbf{U}^H$ with $\mathbf{U} \in \mathbb{C}^{N \times r}$ , the sweep cost drops to $O(Nr)$ .

Theorem: Monotone Convergence of Element-wise BCD

The element-wise BCD iterates produce a monotone non-decreasing sequence of objective values: $f(\boldsymbol{\phi}^{(s+1)}) \geq f(\boldsymbol{\phi}^{(s)})$ . Under compactness of $\mathcal{M}$ (automatic for the torus), the iterates have a convergent subsequence. Any limit point $\boldsymbol{\phi}^\star$ satisfies the coordinate-stationarity condition

$\phi_n^\star \in \arg\max_{|\phi_n| = 1} f(\phi_n, \boldsymbol{\phi}_{-n}^\star) \quad \forall n,$

which is a necessary condition for local optimality on $\mathcal{M}$ . For the unit-modulus QCQP, the limit is a local maximum (not saddle) under generic channels.

Each single-coordinate update is a constrained maximization, producing a value no worse than the previous. Monotone sequence on a bounded domain converges to a limit; the limit must be a stationary point (zero coordinate gradient in every direction), i.e., a local optimum.

Proof

Monotonicity

By the $\arg\max$ definition of the coordinate update, $f(\boldsymbol{\phi}^{(s+1)}) \geq f(\boldsymbol{\phi}^{(s)})$ (equality iff the current $\phi_n^{(s)}$ was already optimal conditional on $\boldsymbol{\phi}_{-n}^{(s)}$ ).

Boundedness and limit

$f$ is bounded on compact $\mathcal{M}$ by continuity. The monotone sequence $f(\boldsymbol{\phi}^{(s)})$ converges to some $f^\star$ .

Coordinate stationarity

At any limit point $\boldsymbol{\phi}^\star$ , each coordinate satisfies the local optimality condition (the update doesn't change it). For smooth $f$ , this implies all partial derivatives (restricted to the tangent circle at each coordinate) are zero — i.e., the Riemannian gradient is zero. $\blacksquare$

Single-user: Optimal; Multi-user: Local

For rank-1 $\mathbf{A} = \mathbf{c}\mathbf{c}^H$ , element-wise BCD converges to the global optimum in one sweep: each coordinate update is matched-filter alignment with the others, which is the global solution for the rank-1 case.

For rank- $r$ $\mathbf{A}$ with $r > 1$ , element-wise converges to a local optimum that can be $1$ - $3$ dB below SDR. The bigger the rank (more users / streams), the larger the typical gap. For $K = 2$ - $4$ users, the gap is tolerable; for $K = 16$ , SDR or manifold is noticeably better.

Example: Element-wise Sweep: $N = 3$ Toy Problem

$\mathbf{A} = \mathbf{c}\mathbf{c}^H$ with $\mathbf{c} = [1, j, -1]^T$ . Starting from $\boldsymbol{\phi}^{(0)} = [1, 1, 1]^T$ , perform one sweep of element-wise BCD.

Solution

Update $\phi_1$

$\alpha_1 = A_{12}^* \phi_2^* + A_{13}^* \phi_3^*$ . $A_{12} = c_1 c_2^* = 1 \cdot (-j) = -j$ , so $A_{12}^* = j$ . $A_{13} = c_1 c_3^* = -1$ , so $A_{13}^* = -1$ . $\alpha_1 = j \cdot 1 + (-1) \cdot 1 = -1 + j$ . $\arg \alpha_1 = 3\pi/4$ . Update: $\phi_1 \leftarrow e^{j 3\pi/4}$ .

Update $\phi_2$

With updated $\phi_1 = e^{j3\pi/4}$ , compute $\alpha_2 = A_{21}^* \phi_1^* + A_{23}^* \phi_3^*$ . $A_{21}^* = -j$ (conjugate of $j$ ), $A_{23} = c_2 c_3^* = -j$ , so $A_{23}^* = j$ . $\alpha_2 = -j \cdot e^{-j3\pi/4} + j \cdot 1$ . $-j e^{-j3\pi/4} = e^{-j(\pi/2 + 3\pi/4)} = e^{-j5\pi/4} = e^{j3\pi/4}$ . $\alpha_2 = e^{j3\pi/4} + j \approx -0.71 + 0.71j + j = -0.71 + 1.71j$ . $\arg \alpha_2 \approx 1.97$ rad. Update $\phi_2 \leftarrow e^{j1.97}$ .

Update $\phi_3$

Similarly compute $\alpha_3$ and update $\phi_3$ . After one sweep, $\boldsymbol{\phi}^{(1)}$ is close to the matched filter solution $\phi_n^\star = e^{j\arg c_n}$ , which is $[1, e^{j\pi/2}, e^{j\pi}] = [1, j, -1]$ . Check: $\mathbf{c}^H \boldsymbol{\phi}^{(1)} = |1|^2 + |j|^2 + |-1|^2 = 3$ , so $f = 9$ . The rank-1 case reaches optimality in one sweep. $\blacksquare$

Element-wise BCD Convergence Trace

Plot $f(\boldsymbol{\phi}^{(s)})$ per sweep. Observe monotone convergence; plateau at 3-5 sweeps for rank-1 problems; slower convergence and higher gap for higher-rank problems.

Parameters

RIS elements

N

64

Rank of

\mathbf{A}

2

Number of sweeps10

Initialization

Common Mistake: Element-wise Gets Stuck in Rank- $> 1$ Multi-User Problems

Mistake:

"Element-wise BCD worked great for single-user; I'll use it for my 8-user sum-rate problem too."

Correction:

For higher-rank objectives, element-wise BCD can plateau at strictly suboptimal local maxima that differ significantly from the global optimum. Empirical benchmarks (Yu et al. 2020) show element-wise is typically within $1$ dB of SDR for $K = 2$ , but the gap widens to $3$ dB at $K = 8$ . If the application cares about the last dB, use manifold or SDR. If $\sim 1$ dB gap is tolerable, element-wise's simplicity and speed are hard to beat.

⚠️Engineering Note

Element-wise for Real-Time Operation

Element-wise BCD is typically the only algorithm feasible for real-time RIS operation at large $N$ :

$N = 256$ , $K = 4$ : Element-wise sweep: $\sim 0.5$ ms. Manifold trust-region: $\sim 100$ ms. SDR: $\sim 5$ s.
Warm start across coherence blocks with previous $\boldsymbol{\phi}$ : element-wise converges in 1-2 sweeps.
Approximate low-rank $\mathbf{A}$ : exploit rank structure for $O(Nr)$ sweeps.

For offline analysis or simulation benchmarking, use SDR as a quality ceiling. In deployment, element-wise at $< 1$ dB gap is the pragmatic choice.

Practical Constraints

•
Per-sweep cost for dense $\mathbf{A}$ at $N = 256$ : $\sim 0.3$ ms on a modern CPU in NumPy.
•
Warm-start across coherence blocks reduces sweep count from $\sim 5$ to $\sim 2$ .
•
For 1-bit quantized phase (Ch. 8), element-wise BCD extends directly with projection to the discrete phase grid.

Element-wise Block Coordinate Descent