Ferkans — Interactive Telecom Tutor

The Optimizer Needs to Know the Channel

Every optimization result in Chapters 5–15 — the $N^2$ coherent gain, the SDR solution, alternating-optimization convergence — is stated conditional on known channel state information (CSI). The controller chooses $\boldsymbol{\Phi}$ based on $\mathbf{H}_1$ and $\mathbf{H}_2$ ; if the channels are unknown, the only defensible choice is random phases, which gives $\mathcal{O}(N)$ scaling instead of $\mathcal{O}(N^2)$ — a factor of $N$ loss. The entire value proposition of RIS hinges on CSI.

But the RIS is passive. It cannot transmit pilots, it cannot correlate received waveforms, it cannot even know whether it is currently being illuminated. The controller must estimate $\mathbf{H}_1$ and $\mathbf{h}_2$ using only signals observed at the BS or the UE, with the RIS itself a passive multiplier. This chapter develops the three workhorse protocols (ON/OFF switching, DFT codebooks, compressed sensing) and derives their overhead-accuracy tradeoffs.

Definition:
The Observable Cascaded Channel

Consider an uplink pilot scheme where the UE transmits pilots $x_t$ ( $t = 1, \ldots, \tau_p$ ) and the BS receives

$\mathbf{y}_t = \big(\mathbf{h}_d + \mathbf{H}_1^H \boldsymbol{\Phi}^{(t)} \mathbf{h}_2\big) x_t + \mathbf{w}_t,$

where $\boldsymbol{\Phi}^{(t)}$ is the RIS configuration during pilot slot $t$ . Expanding the RIS term with the diagonal identity,

$\mathbf{H}_1^H \boldsymbol{\Phi}^{(t)} \mathbf{h}_2 = \big(\text{diag}(\mathbf{h}_2^*)\mathbf{H}_1\big)^H \boldsymbol{\phi}^{(t)} = \mathbf{G}^H \boldsymbol{\phi}^{(t)},$

where $\mathbf{G} = \text{diag}(\mathbf{h}_2^*)\mathbf{H}_1 \in \mathbb{C}^{N \times N_t}$ is the cascaded channel matrix. The received signal becomes

$\mathbf{y}_t = \big(\mathbf{h}_d + \mathbf{G}^H \boldsymbol{\phi}^{(t)}\big) x_t + \mathbf{w}_t.$

The cascaded matrix $\mathbf{G}$ is what the BS can estimate, not $\mathbf{H}_1$ and $\mathbf{h}_2$ separately. This is the central observation: the RIS introduces an inherent rank-1 ambiguity — scaling $\mathbf{h}_2$ by $\alpha$ and $\mathbf{H}_1$ by $1/\alpha$ leaves $\mathbf{G}$ unchanged. Fortunately, the BS optimization only needs $\mathbf{G}$ , so the ambiguity does not matter for beamforming.

Key Takeaway

The estimable object is $\mathbf{G} = \text{diag}(\mathbf{h}_2^*)\mathbf{H}_1$ — not $\mathbf{H}_1$ and $\mathbf{h}_2$ separately. The RIS optimization depends only on $\mathbf{G}$ ; the fact that we cannot separate the two hops is harmless. $\mathbf{G}$ has $N N_t$ unknowns — the scale of the estimation problem.

Theorem: Minimum Pilot Length for Cascaded-Channel Estimation

Let $\mathbf{Y} \in \mathbb{C}^{N_t \times \tau_p}$ collect $\tau_p$ pilot-slot observations. Assuming $\mathbf{h}_d$ is either known or separately estimated, the cascaded channel $\mathbf{G} \in \mathbb{C}^{N \times N_t}$ can be uniquely recovered from $\mathbf{Y}$ by least squares if and only if the matrix of stacked RIS configurations $\boldsymbol{\Phi}^{\text{stack}} = [\boldsymbol{\phi}^{(1)}, \ldots, \boldsymbol{\phi}^{(\tau_p)}] \in \mathbb{C}^{N \times \tau_p}$ has row rank $N$ . In particular, $\tau_p \geq N$ is necessary, and equality is achievable with an orthogonal configuration set.

Estimating $\mathbf{G}$ with $N N_t$ complex unknowns requires at least that many linearly independent measurements. If the BS has $N_t$ antennas and each pilot slot generates $N_t$ scalar observations (one per receive antenna), then we need at least $N$ pilot slots — one per RIS configuration.

Proof

Stack observations

Ignoring noise and the direct path, stack the $t$ -th pilot: $\mathbf{y}_t = \mathbf{G}^H \boldsymbol{\phi}^{(t)} x_t$ . With unit-power pilots ( $x_t = 1$ ), write $\mathbf{Y} = \mathbf{G}^H \boldsymbol{\Phi}^{\text{stack}}$ , where $\mathbf{Y} \in \mathbb{C}^{N_t \times \tau_p}$ .

Identifiability

Recovery of $\mathbf{G}^H$ from $\mathbf{Y}$ requires $\boldsymbol{\Phi}^{\text{stack}}$ to have a right-inverse, i.e., row rank $N$ . This forces $\tau_p \geq N$ .

Orthogonal design

If $\tau_p = N$ and $\boldsymbol{\Phi}^{\text{stack}}$ is a unitary $N \times N$ matrix with unit-modulus entries (e.g., the DFT matrix, which has precisely this structure), then the estimate $\hat{\mathbf{G}}^H = \mathbf{Y} \boldsymbol{\Phi}^{\text{stack},-1}$ is unbiased and the per-entry variance achieves the CRB. $\blacksquare$

The Central Tension

The $\tau_p \geq N$ lower bound is the source of the most important practical concern in RIS: channel-estimation overhead scales linearly with the number of RIS elements. But the coherent SNR gain scales as $N^2$ . So, per additional RIS element, we pay one pilot slot and gain $(2N+1)/N^2 \to 0$ of the coherent gain. The return on investment in element count is diminishing once we account for estimation.

Two escape routes are developed below:

Structured sparsity (compressed sensing, Section 4.4): if the channel has only $L \ll N$ dominant paths, we need only $\mathcal{O}(L \log N)$ pilots.
Multi-user pilot reuse (Chapter 7): when multiple UEs share a BS-RIS link, the cost is amortized.

Common Mistake: Don't Try to Separate $\mathbf{H}_1$ and $\mathbf{h}_2$

Mistake:

A newcomer sets up the problem with $N N_t + N$ unknowns ( $\mathbf{H}_1$ 's $N N_t$ entries plus $\mathbf{h}_2$ 's $N$ entries) and concludes $\tau_p \geq N_t + 1$ pilots suffice.

Correction:

The separation is impossible from the observable data alone: the product $\mathbf{h}_2^{(n)*} (\mathbf{H}_1)_{n,:}$ is identifiable, but multiplying $\mathbf{h}_2$ by $\alpha_n$ and dividing the $n$ -th row of $\mathbf{H}_1$ by $\alpha_n^*$ leaves all observations unchanged. Only $\mathbf{G}$ is identifiable from passive-RIS pilots; the BS beamforming optimization needs only $\mathbf{G}$ , so the ambiguity is not an obstacle. Do not waste pilot resources trying to recover a separation that doesn't exist.

Pilot Timeline for a Single Coherence Block — Within one coherence block of length $T$ , $\tau_p$ slots are spent on pilot transmission (with varying RIS configurations $\boldsymbol{\Phi}^{(1)}, \ldots, \boldsymbol{\Phi}^{(\tau_p)}$ ); the remaining $T - \tau_p$ slots are used for data with the optimized $\boldsymbol{\Phi}^\star$ . The effective data rate is scaled by $(1 - \tau_p/T)$ .

Pilot Overhead $\tau_p / T$ vs. $N$

Show how the pilot fraction grows with RIS size for three strategies: naive element-by-element ( $\tau_p = 2N$ ), ON/OFF or DFT codebook ( $\tau_p = N$ ), and compressed sensing ( $\tau_p = \mathcal{O}(L \log N)$ ). Change the sparsity $L$ to see the CS overhead curve shift.

Parameters

Coherence length

T

(symbols)500

Max

N

swept512

Sparsity

L

(for CS)6

The Passive-RIS Estimation Challenge