Beam Prediction in Mobility

Why Beam Prediction, Not Beam Search

The mmWave and sub-THz bands promised in 5G FR2 and 6G live and die by beam alignment. With hundreds of narrow beams in the downlink codebook and a tens-of-microseconds beam switching budget, every handoff between beams is a potential outage. The classical approach — exhaustive beam search through all $|\mathcal{B}|$ codebook entries — scales terribly: at 64 beams the overhead already eats into the data budget, and at 256 beams (plausible for sub-THz) it starves the link.

The data-driven alternative is beam prediction: instead of searching the current best beam, use the history of recent beam indices, received signal strengths (RSRPs), and side information (UE position, velocity, map data) to predict which beam will be best $\Delta t$ seconds in the future. A sequence model — LSTM, Transformer, or state-space — is trained on measurement traces to output a probability distribution over the next-slot beam index. The training objective is either cross-entropy against the ground-truth best beam or top- $k$ accuracy (the true beam appears in the top $k$ predictions).

This is one of the few physical-layer tasks where data-driven learning decisively wins over a rule-based baseline: beam dynamics in mobility have too much structure for any single physical model to capture, and too little structure for a blind search to keep up.

Definition:
Beam Prediction Task

Let $\mathcal{B} = \{\mathbf{f}_1, \ldots, \mathbf{f}_{|\mathcal{B}|}\}$ be a pre-defined beam codebook on the BS side. At time slot $t$ , the best beam for a given UE is $i^{\star}_t = \arg\max_{i} \, |\mathbf{f}_i^H \mathbf{H}_{t}|^2.$ The beam prediction problem is: given a history window $\mathcal{H}_t = \{(i^{\star}_{t-H}, r_{t-H}, \ldots, i^{\star}_{t-1}, r_{t-1})\}$ of length $H$ (past best-beam indices and observed RSRPs, and optionally UE position and velocity), produce a distribution $p_\phi(i \mid \mathcal{H}_t)$ over beam indices such that $i^{\star}_t$ lies in the top- $k$ of $p_\phi$ with high probability.

The prediction horizon $\Delta t$ is usually one slot (for low-latency tracking) or up to several slots (to eliminate the beam search overhead entirely). The relevant metric is top- $k$ accuracy: the fraction of slots on which the true $i^{\star}_t$ is among the $k$ highest-scoring predictions.

LSTM Beam Predictor

Complexity:

\mathcal{O}(H \cdot d^2)

per prediction where

d

is the LSTM hidden dimension. For

H = 8

and

d = 128

this is roughly 130 K MAC per prediction — easily real-time on a base station DSP.

Offline training:

Input: Dataset of beam-index + RSRP traces from a measurement campaign

Output: Trained LSTM parameters

\phi

1. For each trace, slide a window of length

H

over time and form

(x_{1:H}, y_{H+1})

pairs where

x_t = (i^{\star}_t, \text{one-hot}(i^{\star}_t), r_t)

2. Feed

x_{1:H}

through a two-layer LSTM producing hidden state

h_H

3. Project

h_H

through a dense layer + softmax to get the predicted distribution over

|\mathcal{B}|

beam indices

4. Compute cross-entropy loss against

y_{H+1}

and backpropagate

5. Repeat until convergence

Online prediction:

Input: Current history

\mathcal{H}_t

Output: Predicted top-

k

beam indices

6. Compute

p_\phi(i \mid \mathcal{H}_t)

7. Return the

k

indices with highest probability as the measurement set

8. On the next slot, measure RSRP only on those

k

beams (instead of all

|\mathcal{B}|

)

The switch from "exhaustive beam search" to "top- $k$ measurement after LSTM prediction" cuts the beam search overhead from $|\mathcal{B}|$ to $k$ (typically $k = 4$ or $k = 8$ ). For $|\mathcal{B}| = 64$ this is an 8-16x overhead reduction at essentially no cost in accuracy — provided the LSTM has been trained on data from the same deployment. Same caveat as Section 25.2: distribution shift ruins it.

Theorem: Markovian Approximation for Beam Dynamics

Assume the UE follows a constant-velocity trajectory with uniform angular speed $\omega$ relative to the BS and traverses a beam grid of angular width $\Delta\phi$ per beam. Then the optimal-beam sequence $\{i^{\star}_t\}$ is approximately a first-order Markov chain with transition probabilities concentrated on $\{i^{\star}_{t-1} - 1, i^{\star}_{t-1}, i^{\star}_{t-1} + 1\}$ , and the higher-order memory disappears as the slot duration $T_s \to 0$ .

For slow enough mobility (slot duration much smaller than the time to traverse one beam) the best-beam index changes by at most one per slot. This is a local-transition prior that a simple rule-based tracker can exploit. Neural methods beat it only when the UE trajectory is non-constant-velocity (turning, acceleration, multi-path bouncing off clutter), i.e. when the Markov assumption breaks.

Show Hint

Relate the time between beam boundary crossings to $\Delta\phi / \omega$ .

Show that for slot duration smaller than one beam crossing time, the probability of moving more than one index per slot is $O((\omega T_s / \Delta\phi)^2)$ .

Higher-order correlations vanish because consecutive increments are independent by construction.

Proof

Beam boundary crossing rate

The UE traverses one beam in time $T_b = \Delta\phi / \omega$ . In a slot of duration $T_s$ the number of boundary crossings is $N_s = T_s / T_b = \omega T_s / \Delta\phi$ . For slow mobility $N_s \ll 1$ so crossings are rare.

Transition probabilities

Over one slot the probability of changing index by more than one is bounded by $N_s^2 \to 0$ . Hence transitions are concentrated on $\{-1, 0, +1\}$ with probabilities depending on the fractional position inside the current beam.

Markovian reduction

Given the current index, the next index depends only on the current position inside the beam, which is encoded in the current and previous best-beam indices (for constant-velocity trajectories). Higher-order history is redundant. $\blacksquare$

Example: Beam Search Overhead for a 64-Beam Codebook

A mmWave BS uses a 64-beam codebook and measures RSRP on each beam once every 5 ms. Each measurement takes one OFDM symbol of 8.33 microseconds. Compare the beam search overhead for (i) exhaustive search every slot, (ii) an LSTM predictor with top-4 refinement.

Solution

Exhaustive search

64 measurements $\times$ 8.33 microseconds $= 533$ microseconds of overhead per 5 ms slot, or roughly 10.7 % of the slot. At mmWave where every microsecond of data time is expensive, this is a serious tax.

LSTM top-4 refinement

The LSTM predicts the top-4 candidates. The BS measures only those 4, taking $4 \times 8.33 = 33.3$ microseconds or 0.67 % of the slot. Overhead drops by a factor of 16.

Accuracy cost

Top-4 accuracy on the Alkhateeb deep-beam dataset is around 95 % for reasonable LSTM architectures at 25 km/h vehicular speed. The 5 % of slots where the true beam is not in the top-4 can be caught by a fallback to exhaustive search, adding back roughly 5 % of the original overhead. Net savings: approximately 10.7 % $\to$ 1.2 % overhead, a 9x reduction. $\blacksquare$

Beam Prediction Accuracy vs UE Velocity

Top-1 and top-5 beam prediction accuracy as a function of UE velocity, comparing a rule-based Markov tracker and an LSTM predictor. At low velocity both methods saturate; as velocity grows, the LSTM retains more of its accuracy because it handles non-Markovian trajectory patterns.

Parameters

Codebook size64

LSTM history length

H

Top-

k

Key Takeaway

Beam prediction is where pure data-driven DL pays off. A sequence model over the history of (best beam, RSRP, optional side info) cuts mmWave beam-search overhead by 8-16x at the cost of a small fallback fraction. This works because beam dynamics in mobility have a rich but non-stationary structure that no single analytical model captures. It is also the physical-layer task where 3GPP has moved fastest on AI/ML — the Rel-18 beam management use case is the leading candidate for actual standardization.

Why This Matters: Beam Management in 5G NR

5G NR already contains a rich beam management framework — CSI-RS for measurement, SSBs for broad beams, beam failure detection and recovery — but all of it is reactive and fully standardized. What Rel-18 AI/ML adds is the option for predictive beam selection using sequence models fed by the same measurement streams. The key design question is whether the trained model lives on the BS (one model per cell, retrained as the environment drifts) or on the UE (one model per handset, generalizing across all cells it visits). The 3GPP study item is currently favoring the BS-side approach because it avoids the model-distribution nightmare of deploying per-cell weights to every handset chipset.

Common Mistake: Training on Stationary UEs

Mistake:

A beam predictor trained on traces from a stationary UE (or even a slow-walking UE) will quietly fail when deployed on a vehicle. The temporal correlation structure is completely different: stationary UEs have near-constant best beams dominated by measurement noise, while vehicular UEs have deterministic drift dominated by geometry.

Correction:

Include a velocity-stratified training set: equal numbers of samples from stationary, pedestrian (1-5 km/h), and vehicular (30-60 km/h) regimes. Alternatively, condition the network explicitly on UE velocity — it is a standard measurement reportable by the UE. The ablation ( $\pm$ velocity conditioning) should always appear in the paper.

Top- $k$ Accuracy

The fraction of predictions in which the ground-truth label is among the $k$ highest-scoring outputs of the model. In beam prediction, top- $k$ accuracy at small $k$ (typically 1, 3, or 5) matters because the BS can only afford to measure RSRP on a handful of candidate beams.

Quick Check

A 64-beam mmWave BS must decide between an LSTM beam predictor and an exhaustive beam sweep every slot. Under which deployment is the LSTM most likely to lose?

Dense urban mobility with vehicular UEs.

Indoor stationary office with highly variable multipath and no training data from the office.

Highway vehicular UEs with clean LoS.

Pedestrian mobility at walking speed.

Correction:

Indoor stationary office with highly variable multipath and no training data from the office.

Stationary UEs have no mobility signal to exploit, and the unseen multipath means the LSTM has no relevant temporal structure to exploit either. An exhaustive sweep at low measurement rate is robust and nearly as cheap in this regime.

⚠️Engineering Note

Latency Budget for Beam Prediction

The mmWave latency budget is tight: once a beam decision is taken, the actual data transmission must begin within a few tens of microseconds to match the 5G URLLC and 6G latency targets. A beam-prediction network must produce its top- $k$ candidates within this window. An LSTM with $d = 128$ hidden dim and $H = 8$ history is typically around 100 microseconds on a generic DSP, which leaves very little headroom. Transformer-based predictors are 2-4 times slower per prediction, which rules them out for tight budgets unless they run on a dedicated accelerator. The Rel-18 use case document allocates a 200 microsecond end-to-end budget for the full prediction + refinement loop — tight but feasible.

Practical Constraints

•
Maximum inference latency: 100-200 microseconds
•
Accuracy target (top-5): $\geq 95 \%$
•
Fallback to exhaustive sweep: once per $\approx 100$ slots
•
Model update cadence: weekly for vehicular, monthly for pedestrian

📋 Ref: 3GPP TR 38.843, Section 6.3

CSI Feedback Compression RL for Scheduling and Power Control

Beam Prediction in Mobility

Why Beam Prediction, Not Beam Search

Definition: Beam Prediction Task

LSTM Beam Predictor

Theorem: Markovian Approximation for Beam Dynamics

Beam boundary crossing rate

Transition probabilities

Markovian reduction

Example: Beam Search Overhead for a 64-Beam Codebook

Exhaustive search

LSTM top-4 refinement

Accuracy cost

Beam Prediction Accuracy vs UE Velocity

Parameters

Key Takeaway

Why This Matters: Beam Management in 5G NR

Common Mistake: Training on Stationary UEs

Top-kkk Accuracy

Quick Check

Latency Budget for Beam Prediction

Definition:
Beam Prediction Task

Top- $k$ Accuracy