Element-wise Block Coordinate Descent

The Simplest Algorithm That Works

SDR scales as O(N6.5)O(N^{6.5}); manifold as O(N2)O(N^2) per iteration. Element-wise BCD β€” updating one Ο•n\phi_n at a time in closed form β€” scales as O(N)O(N) per coordinate sweep. For single-user problems, element-wise is optimal (gives the global optimum). For multi-user, it plateaus at a local optimum that is sometimes 11-33 dB below SDR, but its speed makes it the default for real-time operation. This section derives the update rule, proves monotone convergence, and characterizes its local-optimum structure.

Definition:

The Element-wise Update Rule

Consider the single-user QCQP objective f(Ο•)=∣a+Ο•Tb∣2f(\boldsymbol{\phi}) = |a + \boldsymbol{\phi}^T \mathbf{b}|^2 for fixed a∈Ca \in \mathbb{C} and b∈CN\mathbf{b} \in \mathbb{C}^N. Updating only Ο•n\phi_n while holding the others fixed, define

cn=a+βˆ‘mβ‰ nΟ•mbm∈CΒ (knownΒ constant).c_n = a + \sum_{m \neq n} \phi_m b_m \in \mathbb{C}\ (\text{known constant}).

The one-coordinate problem is maxβ‘βˆ£Ο•n∣=1∣cn+Ο•nbn∣2\max_{|\phi_n| = 1} |c_n + \phi_n b_n|^2, which has the closed-form optimizer

Ο•n⋆=ej(arg⁑cnβˆ’arg⁑bn).\phi_n^\star = e^{j(\arg c_n - \arg b_n)}.

For the multi-user quadratic f(Ο•)=Ο•HAΟ•f(\boldsymbol{\phi}) = \boldsymbol{\phi}^H \mathbf{A} \boldsymbol{\phi} (with A\mathbf{A} not rank 1), the single-coordinate problem becomes maximizing a cosine (real sinusoid plus real constant terms), also admitting a closed form:

Ο•n⋆=ejarg⁑(βˆ‘mβ‰ nAnmβˆ—Ο•mβˆ—).\phi_n^\star = e^{j\arg\big(\sum_{m \neq n} A_{nm}^* \phi_m^*\big)}.

The key algorithmic fact: for each coordinate, the conditional optimum is phase-matching β€” align Ο•n\phi_n with the sum of influences from the other elements.

Element-wise BCD for Unit-Modulus QCQP

Complexity: O(Sβ‹…N2)O(S \cdot N^2) total: S∼5S \sim 5-1010 sweeps, each O(N2)O(N^2) for the coordinate updates
Input: PSD A∈CNΓ—N\mathbf{A} \in \mathbb{C}^{N \times N}; initial Ο•(0)\boldsymbol{\phi}^{(0)};
tolerance ϡ\epsilon; max sweeps Smax⁑S_{\max}.
Output: local optimum ϕ⋆\boldsymbol{\phi}^\star.
1. Initialize Ο•(0)\boldsymbol{\phi}^{(0)} (e.g., all ones or random).
2. For s=0,1,2,…,Smaxβ‘βˆ’1s = 0, 1, 2, \ldots, S_{\max} - 1: (sweep over all coordinates)
3. \quad For n=1,2,…,Nn = 1, 2, \ldots, N:
4. \quad\quad Compute Ξ±n=βˆ‘mβ‰ nAnmβˆ—Ο•mβˆ—\alpha_n = \sum_{m \neq n} A_{nm}^* \phi_m^*.
5. \quad\quad Update Ο•n←ejarg⁑(Ξ±n)\phi_n \leftarrow e^{j\arg(\alpha_n)}.
6. \quad If ∣f(Ο•(s+1))βˆ’f(Ο•(s))∣<Ο΅|f(\boldsymbol{\phi}^{(s+1)}) - f(\boldsymbol{\phi}^{(s)})| < \epsilon: break.
7. return Ο•\boldsymbol{\phi}.

Each sweep is O(N2)O(N^2) because computing Ξ±n\alpha_n for all nn involves the matrix A\mathbf{A}. For low-rank A=UUH\mathbf{A} = \mathbf{U}\mathbf{U}^H with U∈CNΓ—r\mathbf{U} \in \mathbb{C}^{N \times r}, the sweep cost drops to O(Nr)O(Nr).

Theorem: Monotone Convergence of Element-wise BCD

The element-wise BCD iterates produce a monotone non-decreasing sequence of objective values: f(Ο•(s+1))β‰₯f(Ο•(s))f(\boldsymbol{\phi}^{(s+1)}) \geq f(\boldsymbol{\phi}^{(s)}). Under compactness of M\mathcal{M} (automatic for the torus), the iterates have a convergent subsequence. Any limit point ϕ⋆\boldsymbol{\phi}^\star satisfies the coordinate-stationarity condition

Ο•nβ‹†βˆˆarg⁑maxβ‘βˆ£Ο•n∣=1f(Ο•n,Ο•βˆ’n⋆)βˆ€n,\phi_n^\star \in \arg\max_{|\phi_n| = 1} f(\phi_n, \boldsymbol{\phi}_{-n}^\star) \quad \forall n,

which is a necessary condition for local optimality on M\mathcal{M}. For the unit-modulus QCQP, the limit is a local maximum (not saddle) under generic channels.

Each single-coordinate update is a constrained maximization, producing a value no worse than the previous. Monotone sequence on a bounded domain converges to a limit; the limit must be a stationary point (zero coordinate gradient in every direction), i.e., a local optimum.

Single-user: Optimal; Multi-user: Local

For rank-1 A=ccH\mathbf{A} = \mathbf{c}\mathbf{c}^H, element-wise BCD converges to the global optimum in one sweep: each coordinate update is matched-filter alignment with the others, which is the global solution for the rank-1 case.

For rank-rr A\mathbf{A} with r>1r > 1, element-wise converges to a local optimum that can be 11-33 dB below SDR. The bigger the rank (more users / streams), the larger the typical gap. For K=2K = 2-44 users, the gap is tolerable; for K=16K = 16, SDR or manifold is noticeably better.

Example: Element-wise Sweep: N=3N = 3 Toy Problem

A=ccH\mathbf{A} = \mathbf{c}\mathbf{c}^H with c=[1,j,βˆ’1]T\mathbf{c} = [1, j, -1]^T. Starting from Ο•(0)=[1,1,1]T\boldsymbol{\phi}^{(0)} = [1, 1, 1]^T, perform one sweep of element-wise BCD.

Element-wise BCD Convergence Trace

Plot f(Ο•(s))f(\boldsymbol{\phi}^{(s)}) per sweep. Observe monotone convergence; plateau at 3-5 sweeps for rank-1 problems; slower convergence and higher gap for higher-rank problems.

Parameters
64
2
10

Common Mistake: Element-wise Gets Stuck in Rank->1> 1 Multi-User Problems

Mistake:

"Element-wise BCD worked great for single-user; I'll use it for my 8-user sum-rate problem too."

Correction:

For higher-rank objectives, element-wise BCD can plateau at strictly suboptimal local maxima that differ significantly from the global optimum. Empirical benchmarks (Yu et al. 2020) show element-wise is typically within 11 dB of SDR for K=2K = 2, but the gap widens to 33 dB at K=8K = 8. If the application cares about the last dB, use manifold or SDR. If ∼1\sim 1 dB gap is tolerable, element-wise's simplicity and speed are hard to beat.

⚠️Engineering Note

Element-wise for Real-Time Operation

Element-wise BCD is typically the only algorithm feasible for real-time RIS operation at large NN:

  • N=256N = 256, K=4K = 4: Element-wise sweep: ∼0.5\sim 0.5 ms. Manifold trust-region: ∼100\sim 100 ms. SDR: ∼5\sim 5 s.
  • Warm start across coherence blocks with previous Ο•\boldsymbol{\phi}: element-wise converges in 1-2 sweeps.
  • Approximate low-rank A\mathbf{A}: exploit rank structure for O(Nr)O(Nr) sweeps.

For offline analysis or simulation benchmarking, use SDR as a quality ceiling. In deployment, element-wise at <1< 1 dB gap is the pragmatic choice.

Practical Constraints
  • β€’

    Per-sweep cost for dense A\mathbf{A} at N=256N = 256: ∼0.3\sim 0.3 ms on a modern CPU in NumPy.

  • β€’

    Warm-start across coherence blocks reduces sweep count from ∼5\sim 5 to ∼2\sim 2.

  • β€’

    For 1-bit quantized phase (Ch. 8), element-wise BCD extends directly with projection to the discrete phase grid.