Beam Prediction in Mobility

Why Beam Prediction, Not Beam Search

The mmWave and sub-THz bands promised in 5G FR2 and 6G live and die by beam alignment. With hundreds of narrow beams in the downlink codebook and a tens-of-microseconds beam switching budget, every handoff between beams is a potential outage. The classical approach β€” exhaustive beam search through all ∣B∣|\mathcal{B}| codebook entries β€” scales terribly: at 64 beams the overhead already eats into the data budget, and at 256 beams (plausible for sub-THz) it starves the link.

The data-driven alternative is beam prediction: instead of searching the current best beam, use the history of recent beam indices, received signal strengths (RSRPs), and side information (UE position, velocity, map data) to predict which beam will be best Ξ”t\Delta t seconds in the future. A sequence model β€” LSTM, Transformer, or state-space β€” is trained on measurement traces to output a probability distribution over the next-slot beam index. The training objective is either cross-entropy against the ground-truth best beam or top-kk accuracy (the true beam appears in the top kk predictions).

This is one of the few physical-layer tasks where data-driven learning decisively wins over a rule-based baseline: beam dynamics in mobility have too much structure for any single physical model to capture, and too little structure for a blind search to keep up.

Definition:

Beam Prediction Task

Let B={f1,…,f∣B∣}\mathcal{B} = \{\mathbf{f}_1, \ldots, \mathbf{f}_{|\mathcal{B}|}\} be a pre-defined beam codebook on the BS side. At time slot tt, the best beam for a given UE is it⋆=arg⁑max⁑iβ€‰βˆ£fiHHt∣2.i^{\star}_t = \arg\max_{i} \, |\mathbf{f}_i^H \mathbf{H}_{t}|^2. The beam prediction problem is: given a history window Ht={(itβˆ’H⋆,rtβˆ’H,…,itβˆ’1⋆,rtβˆ’1)}\mathcal{H}_t = \{(i^{\star}_{t-H}, r_{t-H}, \ldots, i^{\star}_{t-1}, r_{t-1})\} of length HH (past best-beam indices and observed RSRPs, and optionally UE position and velocity), produce a distribution pΟ•(i∣Ht)p_\phi(i \mid \mathcal{H}_t) over beam indices such that it⋆i^{\star}_t lies in the top-kk of pΟ•p_\phi with high probability.

The prediction horizon Ξ”t\Delta t is usually one slot (for low-latency tracking) or up to several slots (to eliminate the beam search overhead entirely). The relevant metric is top-kk accuracy: the fraction of slots on which the true it⋆i^{\star}_t is among the kk highest-scoring predictions.

LSTM Beam Predictor

Complexity: O(Hβ‹…d2)\mathcal{O}(H \cdot d^2) per prediction where dd is the LSTM hidden dimension. For H=8H = 8 and d=128d = 128 this is roughly 130 K MAC per prediction β€” easily real-time on a base station DSP.
Offline training:
Input: Dataset of beam-index + RSRP traces from a measurement campaign
Output: Trained LSTM parameters Ο•\phi
1. For each trace, slide a window of length HH over time and form
(x1:H,yH+1)(x_{1:H}, y_{H+1}) pairs where xt=(it⋆,one-hot(it⋆),rt)x_t = (i^{\star}_t, \text{one-hot}(i^{\star}_t), r_t)
2. Feed x1:Hx_{1:H} through a two-layer LSTM producing hidden state hHh_H
3. Project hHh_H through a dense layer + softmax to get the predicted distribution over ∣B∣|\mathcal{B}| beam indices
4. Compute cross-entropy loss against yH+1y_{H+1} and backpropagate
5. Repeat until convergence
Online prediction:
Input: Current history Ht\mathcal{H}_t
Output: Predicted top-kk beam indices
6. Compute pΟ•(i∣Ht)p_\phi(i \mid \mathcal{H}_t)
7. Return the kk indices with highest probability as the measurement set
8. On the next slot, measure RSRP only on those kk beams (instead of all ∣B∣|\mathcal{B}|)

The switch from "exhaustive beam search" to "top-kk measurement after LSTM prediction" cuts the beam search overhead from ∣B∣|\mathcal{B}| to kk (typically k=4k = 4 or k=8k = 8). For ∣B∣=64|\mathcal{B}| = 64 this is an 8-16x overhead reduction at essentially no cost in accuracy β€” provided the LSTM has been trained on data from the same deployment. Same caveat as Section 25.2: distribution shift ruins it.

Theorem: Markovian Approximation for Beam Dynamics

Assume the UE follows a constant-velocity trajectory with uniform angular speed Ο‰\omega relative to the BS and traverses a beam grid of angular width Δϕ\Delta\phi per beam. Then the optimal-beam sequence {it⋆}\{i^{\star}_t\} is approximately a first-order Markov chain with transition probabilities concentrated on {itβˆ’1β‹†βˆ’1,itβˆ’1⋆,itβˆ’1⋆+1}\{i^{\star}_{t-1} - 1, i^{\star}_{t-1}, i^{\star}_{t-1} + 1\}, and the higher-order memory disappears as the slot duration Tsβ†’0T_s \to 0.

For slow enough mobility (slot duration much smaller than the time to traverse one beam) the best-beam index changes by at most one per slot. This is a local-transition prior that a simple rule-based tracker can exploit. Neural methods beat it only when the UE trajectory is non-constant-velocity (turning, acceleration, multi-path bouncing off clutter), i.e. when the Markov assumption breaks.

Example: Beam Search Overhead for a 64-Beam Codebook

A mmWave BS uses a 64-beam codebook and measures RSRP on each beam once every 5 ms. Each measurement takes one OFDM symbol of 8.33 microseconds. Compare the beam search overhead for (i) exhaustive search every slot, (ii) an LSTM predictor with top-4 refinement.

Beam Prediction Accuracy vs UE Velocity

Top-1 and top-5 beam prediction accuracy as a function of UE velocity, comparing a rule-based Markov tracker and an LSTM predictor. At low velocity both methods saturate; as velocity grows, the LSTM retains more of its accuracy because it handles non-Markovian trajectory patterns.

Parameters
64
8
5

Key Takeaway

Beam prediction is where pure data-driven DL pays off. A sequence model over the history of (best beam, RSRP, optional side info) cuts mmWave beam-search overhead by 8-16x at the cost of a small fallback fraction. This works because beam dynamics in mobility have a rich but non-stationary structure that no single analytical model captures. It is also the physical-layer task where 3GPP has moved fastest on AI/ML β€” the Rel-18 beam management use case is the leading candidate for actual standardization.

Why This Matters: Beam Management in 5G NR

5G NR already contains a rich beam management framework β€” CSI-RS for measurement, SSBs for broad beams, beam failure detection and recovery β€” but all of it is reactive and fully standardized. What Rel-18 AI/ML adds is the option for predictive beam selection using sequence models fed by the same measurement streams. The key design question is whether the trained model lives on the BS (one model per cell, retrained as the environment drifts) or on the UE (one model per handset, generalizing across all cells it visits). The 3GPP study item is currently favoring the BS-side approach because it avoids the model-distribution nightmare of deploying per-cell weights to every handset chipset.

Common Mistake: Training on Stationary UEs

Mistake:

A beam predictor trained on traces from a stationary UE (or even a slow-walking UE) will quietly fail when deployed on a vehicle. The temporal correlation structure is completely different: stationary UEs have near-constant best beams dominated by measurement noise, while vehicular UEs have deterministic drift dominated by geometry.

Correction:

Include a velocity-stratified training set: equal numbers of samples from stationary, pedestrian (1-5 km/h), and vehicular (30-60 km/h) regimes. Alternatively, condition the network explicitly on UE velocity β€” it is a standard measurement reportable by the UE. The ablation (Β±\pm velocity conditioning) should always appear in the paper.

Top-kk Accuracy

The fraction of predictions in which the ground-truth label is among the kk highest-scoring outputs of the model. In beam prediction, top-kk accuracy at small kk (typically 1, 3, or 5) matters because the BS can only afford to measure RSRP on a handful of candidate beams.

Quick Check

A 64-beam mmWave BS must decide between an LSTM beam predictor and an exhaustive beam sweep every slot. Under which deployment is the LSTM most likely to lose?

Dense urban mobility with vehicular UEs.

Indoor stationary office with highly variable multipath and no training data from the office.

Highway vehicular UEs with clean LoS.

Pedestrian mobility at walking speed.

⚠️Engineering Note

Latency Budget for Beam Prediction

The mmWave latency budget is tight: once a beam decision is taken, the actual data transmission must begin within a few tens of microseconds to match the 5G URLLC and 6G latency targets. A beam-prediction network must produce its top-kk candidates within this window. An LSTM with d=128d = 128 hidden dim and H=8H = 8 history is typically around 100 microseconds on a generic DSP, which leaves very little headroom. Transformer-based predictors are 2-4 times slower per prediction, which rules them out for tight budgets unless they run on a dedicated accelerator. The Rel-18 use case document allocates a 200 microsecond end-to-end budget for the full prediction + refinement loop β€” tight but feasible.

Practical Constraints
  • β€’

    Maximum inference latency: 100-200 microseconds

  • β€’

    Accuracy target (top-5): β‰₯95%\geq 95 \%

  • β€’

    Fallback to exhaustive sweep: once per β‰ˆ100\approx 100 slots

  • β€’

    Model update cadence: weekly for vehicular, monthly for pedestrian

πŸ“‹ Ref: 3GPP TR 38.843, Section 6.3