Beam Prediction in Mobility
Why Beam Prediction, Not Beam Search
The mmWave and sub-THz bands promised in 5G FR2 and 6G live and die by beam alignment. With hundreds of narrow beams in the downlink codebook and a tens-of-microseconds beam switching budget, every handoff between beams is a potential outage. The classical approach β exhaustive beam search through all codebook entries β scales terribly: at 64 beams the overhead already eats into the data budget, and at 256 beams (plausible for sub-THz) it starves the link.
The data-driven alternative is beam prediction: instead of searching the current best beam, use the history of recent beam indices, received signal strengths (RSRPs), and side information (UE position, velocity, map data) to predict which beam will be best seconds in the future. A sequence model β LSTM, Transformer, or state-space β is trained on measurement traces to output a probability distribution over the next-slot beam index. The training objective is either cross-entropy against the ground-truth best beam or top- accuracy (the true beam appears in the top predictions).
This is one of the few physical-layer tasks where data-driven learning decisively wins over a rule-based baseline: beam dynamics in mobility have too much structure for any single physical model to capture, and too little structure for a blind search to keep up.
Definition: Beam Prediction Task
Beam Prediction Task
Let be a pre-defined beam codebook on the BS side. At time slot , the best beam for a given UE is The beam prediction problem is: given a history window of length (past best-beam indices and observed RSRPs, and optionally UE position and velocity), produce a distribution over beam indices such that lies in the top- of with high probability.
The prediction horizon is usually one slot (for low-latency tracking) or up to several slots (to eliminate the beam search overhead entirely). The relevant metric is top- accuracy: the fraction of slots on which the true is among the highest-scoring predictions.
LSTM Beam Predictor
Complexity: per prediction where is the LSTM hidden dimension. For and this is roughly 130 K MAC per prediction β easily real-time on a base station DSP.The switch from "exhaustive beam search" to "top- measurement after LSTM prediction" cuts the beam search overhead from to (typically or ). For this is an 8-16x overhead reduction at essentially no cost in accuracy β provided the LSTM has been trained on data from the same deployment. Same caveat as Section 25.2: distribution shift ruins it.
Theorem: Markovian Approximation for Beam Dynamics
Assume the UE follows a constant-velocity trajectory with uniform angular speed relative to the BS and traverses a beam grid of angular width per beam. Then the optimal-beam sequence is approximately a first-order Markov chain with transition probabilities concentrated on , and the higher-order memory disappears as the slot duration .
For slow enough mobility (slot duration much smaller than the time to traverse one beam) the best-beam index changes by at most one per slot. This is a local-transition prior that a simple rule-based tracker can exploit. Neural methods beat it only when the UE trajectory is non-constant-velocity (turning, acceleration, multi-path bouncing off clutter), i.e. when the Markov assumption breaks.
Relate the time between beam boundary crossings to .
Show that for slot duration smaller than one beam crossing time, the probability of moving more than one index per slot is .
Higher-order correlations vanish because consecutive increments are independent by construction.
Beam boundary crossing rate
The UE traverses one beam in time . In a slot of duration the number of boundary crossings is . For slow mobility so crossings are rare.
Transition probabilities
Over one slot the probability of changing index by more than one is bounded by . Hence transitions are concentrated on with probabilities depending on the fractional position inside the current beam.
Markovian reduction
Given the current index, the next index depends only on the current position inside the beam, which is encoded in the current and previous best-beam indices (for constant-velocity trajectories). Higher-order history is redundant.
Example: Beam Search Overhead for a 64-Beam Codebook
A mmWave BS uses a 64-beam codebook and measures RSRP on each beam once every 5 ms. Each measurement takes one OFDM symbol of 8.33 microseconds. Compare the beam search overhead for (i) exhaustive search every slot, (ii) an LSTM predictor with top-4 refinement.
Exhaustive search
64 measurements 8.33 microseconds microseconds of overhead per 5 ms slot, or roughly 10.7 % of the slot. At mmWave where every microsecond of data time is expensive, this is a serious tax.
LSTM top-4 refinement
The LSTM predicts the top-4 candidates. The BS measures only those 4, taking microseconds or 0.67 % of the slot. Overhead drops by a factor of 16.
Accuracy cost
Top-4 accuracy on the Alkhateeb deep-beam dataset is around 95 % for reasonable LSTM architectures at 25 km/h vehicular speed. The 5 % of slots where the true beam is not in the top-4 can be caught by a fallback to exhaustive search, adding back roughly 5 % of the original overhead. Net savings: approximately 10.7 % 1.2 % overhead, a 9x reduction.
Beam Prediction Accuracy vs UE Velocity
Top-1 and top-5 beam prediction accuracy as a function of UE velocity, comparing a rule-based Markov tracker and an LSTM predictor. At low velocity both methods saturate; as velocity grows, the LSTM retains more of its accuracy because it handles non-Markovian trajectory patterns.
Parameters
Key Takeaway
Beam prediction is where pure data-driven DL pays off. A sequence model over the history of (best beam, RSRP, optional side info) cuts mmWave beam-search overhead by 8-16x at the cost of a small fallback fraction. This works because beam dynamics in mobility have a rich but non-stationary structure that no single analytical model captures. It is also the physical-layer task where 3GPP has moved fastest on AI/ML β the Rel-18 beam management use case is the leading candidate for actual standardization.
Why This Matters: Beam Management in 5G NR
5G NR already contains a rich beam management framework β CSI-RS for measurement, SSBs for broad beams, beam failure detection and recovery β but all of it is reactive and fully standardized. What Rel-18 AI/ML adds is the option for predictive beam selection using sequence models fed by the same measurement streams. The key design question is whether the trained model lives on the BS (one model per cell, retrained as the environment drifts) or on the UE (one model per handset, generalizing across all cells it visits). The 3GPP study item is currently favoring the BS-side approach because it avoids the model-distribution nightmare of deploying per-cell weights to every handset chipset.
Common Mistake: Training on Stationary UEs
Mistake:
A beam predictor trained on traces from a stationary UE (or even a slow-walking UE) will quietly fail when deployed on a vehicle. The temporal correlation structure is completely different: stationary UEs have near-constant best beams dominated by measurement noise, while vehicular UEs have deterministic drift dominated by geometry.
Correction:
Include a velocity-stratified training set: equal numbers of samples from stationary, pedestrian (1-5 km/h), and vehicular (30-60 km/h) regimes. Alternatively, condition the network explicitly on UE velocity β it is a standard measurement reportable by the UE. The ablation ( velocity conditioning) should always appear in the paper.
Top- Accuracy
The fraction of predictions in which the ground-truth label is among the highest-scoring outputs of the model. In beam prediction, top- accuracy at small (typically 1, 3, or 5) matters because the BS can only afford to measure RSRP on a handful of candidate beams.
Quick Check
A 64-beam mmWave BS must decide between an LSTM beam predictor and an exhaustive beam sweep every slot. Under which deployment is the LSTM most likely to lose?
Dense urban mobility with vehicular UEs.
Indoor stationary office with highly variable multipath and no training data from the office.
Highway vehicular UEs with clean LoS.
Pedestrian mobility at walking speed.
Stationary UEs have no mobility signal to exploit, and the unseen multipath means the LSTM has no relevant temporal structure to exploit either. An exhaustive sweep at low measurement rate is robust and nearly as cheap in this regime.
Latency Budget for Beam Prediction
The mmWave latency budget is tight: once a beam decision is taken, the actual data transmission must begin within a few tens of microseconds to match the 5G URLLC and 6G latency targets. A beam-prediction network must produce its top- candidates within this window. An LSTM with hidden dim and history is typically around 100 microseconds on a generic DSP, which leaves very little headroom. Transformer-based predictors are 2-4 times slower per prediction, which rules them out for tight budgets unless they run on a dedicated accelerator. The Rel-18 use case document allocates a 200 microsecond end-to-end budget for the full prediction + refinement loop β tight but feasible.
- β’
Maximum inference latency: 100-200 microseconds
- β’
Accuracy target (top-5):
- β’
Fallback to exhaustive sweep: once per slots
- β’
Model update cadence: weekly for vehicular, monthly for pedestrian