Multiuser Multibeam Beamforming

Joint Design of W\mathbf{W} and Ο•\boldsymbol{\phi}

With the cascaded channel model of Section 21.3 in hand, the multiuser problem is now clear: we must design a digital precoder W∈CNaΓ—K\mathbf{W} \in \mathbb{C}^{N_a \times K} on the NaN_a-element active feed together with a passive phase profile Ο•βˆˆ[0,2Ο€)NRIS\boldsymbol{\phi} \in [0,2\pi)^{N_{\text{RIS}}} on the RIS. The digital precoder picks the right combination of the NaN_a spatial modes; the RIS phase profile picks which NaN_a modes the feed has access to.

A clean analytical derivation is not available β€” the problem is bilinear in W\mathbf{W} and Ο•\boldsymbol{\phi} and has unit-modulus phase constraints, which are non-convex. The standard recipe is alternating optimization: (i) fix Ο•\boldsymbol{\phi}, compute the optimal linear precoder via ZF or MMSE on the resulting effective channel; (ii) fix W\mathbf{W}, update Ο•\boldsymbol{\phi} by solving the per-element sub-problem or by an SDR/majorization relaxation. The procedure converges quickly in practice even on non-convex landscapes, and we report numerical evidence that 5–15 iterations reach within 0.50.5 dB of a much more expensive branch-and-bound baseline.

,

Theorem: Optimal Linear Precoder for Fixed RIS Phase Profile

Fix Ο•\boldsymbol{\phi} and define the effective multiuser channel matrix

H(Ο•)=[heff,1(Ο•),…,heff,K(Ο•)]H∈CKΓ—Na.\mathbf{H}(\boldsymbol{\phi}) = [h_{\text{eff},1}(\boldsymbol{\phi}), \ldots, h_{\text{eff},K}(\boldsymbol{\phi})]^H \in \mathbb{C}^{K \times N_a}.

If Naβ‰₯KN_a \geq K and H(Ο•)\mathbf{H}(\boldsymbol{\phi}) has full row rank, the zero-forcing precoder

WZF(Ο•)=H(Ο•)H(H(Ο•)H(Ο•)H)βˆ’1Ξ›1/2\mathbf{W}_{\text{ZF}}(\boldsymbol{\phi}) = \mathbf{H}(\boldsymbol{\phi})^H \left(\mathbf{H}(\boldsymbol{\phi})\mathbf{H}(\boldsymbol{\phi})^H\right)^{-1} \mathbf{\Lambda}^{1/2}

nulls all interference, where Ξ›=diag(p1,…,pK)\mathbf{\Lambda} = \text{diag}(p_1, \ldots, p_{K}) sets the per-user powers. The regularized (MMSE) variant replaces the inverse by (H(Ο•)H(Ο•)H+Ξ±I)βˆ’1(\mathbf{H}(\boldsymbol{\phi})\mathbf{H}(\boldsymbol{\phi})^H + \alpha \mathbf{I})^{-1} with Ξ±=KΟƒ2/Pt\alpha = K\sigma^2/P_t, yielding a higher sum rate at finite SNR.

ZF reduces the problem to KK parallel scalar channels at the price of a power penalty set by the condition of HHH\mathbf{H} \mathbf{H}^H. A well-chosen Ο•\boldsymbol{\phi} shapes H(Ο•)\mathbf{H}(\boldsymbol{\phi}) to be well-conditioned β€” essentially picking a set of KK near-orthogonal directions on the NaN_a-dimensional active-feed manifold β€” which is the design objective for the outer Ο•\boldsymbol{\phi} loop.

,

Alternating Optimization for Array-Fed RIS Sum-Rate

Complexity: O(T(KNa2+NRIS KNa))\mathcal{O}(T (K N_a^2 + N_{\text{RIS}}\, K N_a)) with T∼5T \sim 5–2020
Input: Effective channel factors HRIS-Rx,k\mathbf{H}_{\text{RIS-Rx},k} for
k=1,…,Kk = 1,\ldots,K, feed coupling Gf\mathbf{G}_f, sum-power
budget PtP_t, tolerance Ο΅\epsilon.
Output: (W⋆,ϕ⋆)(\mathbf{W}^{\star}, \boldsymbol{\phi}^{\star})
maximizing the sum rate.
1. Initialize Ο•(0)\boldsymbol{\phi}^{(0)} uniformly at random.
2. for t=0,1,2,…t = 0, 1, 2, \ldots do
3. \quad Form H(t)←\mathbf{H}^{(t)} \leftarrow rows
[HRIS-Rx,kHdiag(Ο•(t))Gf][\mathbf{H}_{\text{RIS-Rx},k}^H \text{diag}(\boldsymbol{\phi}^{(t)}) \mathbf{G}_f].
4. W(t)←\quad \mathbf{W}^{(t)} \leftarrow MMSE precoder for
H(t)\mathbf{H}^{(t)} with budget PtP_t.
5. \quad for n=1,…,NRISn = 1, \ldots, N_{\text{RIS}} do
6. \quad\quad Hold all ϕm≠n(t)\phi_{m \neq n}^{(t)} fixed. The sum-rate
as a function of Ο•n\phi_n is of the form
f(Ο•n)=βˆ‘kak∣bk+ckejΟ•n∣2/noisef(\phi_n) = \sum_k a_k |b_k + c_k e^{j\phi_n}|^2 / \text{noise},
which is a sinusoid in Ο•n\phi_n.
7. \quad\quad Solve max⁑ϕn∈[0,2Ο€)f(Ο•n)\max_{\phi_n \in [0,2\pi)} f(\phi_n) in closed
form (derivative is a single sinusoid).
8. \quad\quad Ο•n(t+1)←\phi_n^{(t+1)} \leftarrow optimizer.
9. \quad end for
10. \quad if ∣R(t+1)βˆ’R(t)∣<Ο΅|R^{(t+1)} - R^{(t)}| < \epsilon then break
11. end for
12. return (W(t+1),Ο•(t+1))(\mathbf{W}^{(t+1)}, \boldsymbol{\phi}^{(t+1)})

Each per-element update in lines 6–7 is exact because the dependence of the sum rate on a single Ο•n\phi_n (with all others fixed) is a single sinusoid β€” Caire and collaborators' key structural observation. This makes the outer loop monotonically non-decreasing and allows certification of local optimality at convergence.

Example: Two-User Array-Fed RIS: Numerical Walk-Through

Consider an array-fed RIS with Na=4N_a = 4, NRIS=128N_{\text{RIS}} = 128, K=2K = 2 single-antenna users at angles (Ο•1u,Ο•2u)=(βˆ’10∘,+25∘)(\phi_1^u, \phi_2^u) = (-10^\circ, +25^\circ) in the far field of the RIS. Assume LOS reflected channels HRIS-Rx,k=GRISβ‹…aRIS(Ο•ku)/d2\mathbf{H}_{\text{RIS-Rx},k} = \sqrt{G_{\text{RIS}}} \cdot \mathbf{a}_{\text{RIS}}(\phi_k^u) / d_2 with d2=15d_2 = 15 m. The sum-power budget is Pt=0.1P_t = 0.1 W. Compute (a) the ZF precoder given the ideal RIS profile aligning both user directions, (b) the SINR of each user, and (c) the sum rate at f0=28f_0 = 28 GHz.

Multiuser Sum Rate vs NRISN_{\text{RIS}}

Compute the multiuser sum rate of an array-fed RIS with NaN_a active elements serving KK users as NRISN_{\text{RIS}} grows. The curves use the analytical alt-opt upper bound; compare with a passive RIS baseline at the same NRISN_{\text{RIS}}.

Parameters
8
4
10

Estimating the Cascaded Channel

All the algorithms in this section assume perfect knowledge of H(Ο•)\mathbf{H}(\boldsymbol{\phi}), which is itself a function of the RIS phase profile. In practice, the BS must probe the cascaded channel using a sequence of pilot training patterns {Ο•(t)}\{\boldsymbol{\phi}^{(t)}\}, each producing a measurement of a different linear combination of the NRISN_{\text{RIS}} rank-one terms in the sum decomposition of Heff\mathbf{H}_{\text{eff}}. The number of training phases required scales as O(NRIS/Na)\mathcal{O}(N_{\text{RIS}}/N_a) to resolve all cascaded entries, and the pilot overhead can be reduced further by exploiting angular sparsity of the reflected channels β€” a direct generalization of the compressed channel estimation techniques of Chapter 8 and FSI Chapter 12. The bottom line is that the channel estimation problem is non-trivial but tractable; we leave its full treatment to Chapter 22 and the RIS book.

⚠️Engineering Note

Convergence and Real-Time Operation

The alternating optimization of Algorithm AAlternating Optimization for Array-Fed RIS Sum-Rate converges in T∈[5,20]T \in [5, 20] outer iterations for typical geometries. Each iteration requires one MMSE precoder update (O(Na3+KNa2)\mathcal{O}(N_a^3 + K N_a^2)) and NRISN_{\text{RIS}} per-element phase updates (O(NRISKNa)\mathcal{O}(N_{\text{RIS}} K N_a)). At Na=8N_a = 8, NRIS=1024N_{\text{RIS}} = 1024, K=8K = 8, T=10T = 10, this is roughly 10610^6 flops per coherence block β€” milliseconds on a commodity DSP. The bottleneck in real deployments is channel estimation, not precoding.

Three practical observations from the CommIT prototype:

  1. Warm-starting Ο•\boldsymbol{\phi} from the previous coherence block reduces TT to 2–3.
  2. Per-element updates can be parallelized, because (after a Taylor expansion) the cross-effects are weak when NaN_a is small.
  3. Phase quantization is applied only at the final step; quantizing inside the loop causes instability.
Practical Constraints
  • β€’

    Coherence block of ~ 1 ms at mmWave (large Doppler)

  • β€’

    Phase resolution in hardware: 1–3 bits (Section 21.1 engineering note)

  • β€’

    Channel estimation pilot overhead scales as O(NRIS/Na)\mathcal{O}(N_{\text{RIS}}/N_a)

πŸ“‹ Ref: 3GPP TR 38.843 (smart radio environments), IEEE 802.11bf (WLAN sensing)

Common Mistake: Alt-Opt Is Not Globally Optimal

Mistake:

Because each step of the alternating procedure monotonically increases the sum rate, it is tempting to claim that Algorithm AAlternating Optimization for Array-Fed RIS Sum-Rate converges to the global optimum.

Correction:

Monotone convergence only guarantees a local stationary point. The joint problem is non-convex in Ο•\boldsymbol{\phi} (unit-modulus constraints) and multi-modal. Empirically, alt-opt reaches a good local optimum within 0.5 dB of the branch-and-bound global solution, but rare initializations converge to inferior critical points. In production, it is common to run a small number of random restarts (5–10) and pick the best. We will see similar caveats when we generalize to SDR and majorization-based algorithms in Chapter 22.

Quick Check

An array-fed RIS with Na=8N_a = 8, NRIS=512N_{\text{RIS}} = 512 serves K=12K = 12 users. Using the ZF precoder alone is infeasible. Which of the following is the most reasonable remedy?

Increase NRISN_{\text{RIS}} until NRISβ‰₯KN_{\text{RIS}} \geq K

Apply MMSE + user scheduling so that at most NaN_a users are served per resource block

Use random RIS phases and ZF

Drop 4 users permanently