Hierarchical Soft-Thresholding
Beyond Scalar Thresholds: Structured Sparsity in OTFS
LISTA uses a single scalar threshold per layer. But in OTFS channel estimation and RF sensing, the sparsity structure is hierarchical: the channel is sparse in the angular-delay-Doppler domain, and different dimensions require different regularisation strengths.
Hierarchical soft-thresholding (HST) replaces the scalar with a spatially-varying threshold map predicted by an auxiliary network. When combined with the Kronecker structure of OFDM/OTFS sensing matrices, HST provides a principled approach to exploiting structured sparsity while preserving the interpretability of the proximal framework.
Definition: Hierarchical Soft-Thresholding
Hierarchical Soft-Thresholding
In hierarchical soft-thresholding (HST), the proximal step at layer uses spatially-varying thresholds:
where the threshold map is generated by an auxiliary network :
The auxiliary network is typically a small CNN (3--5 layers) with a final softplus activation to ensure non-negative thresholds.
HST occupies a middle ground between LISTA (scalar threshold, full interpretability) and ProxNet (arbitrary learned denoiser, less interpretability). The soft-thresholding structure is preserved, but the threshold adapts to local content.
Definition: Structured Sparsity in the Angular-Delay-Doppler Domain
Structured Sparsity in the Angular-Delay-Doppler Domain
In OTFS channel estimation, the channel response admits a three-dimensional sparse representation in the angular-delay-Doppler domain:
where is the number of propagation paths and are the angle, delay, and Doppler of path .
The sparsity is hierarchical: a few dominant angles contain most of the energy, each angle has a few delay taps, and each delay tap has a few Doppler components. This tree structure motivates group-sparse and hierarchical regularisation:
where is a group (e.g., all delay-Doppler components at angle ) and is a per-group regularisation weight.
Theorem: HST Preserves Proximal Properties
If the threshold map is non-negative, then the spatially-varying soft-thresholding operator is:
- Component-wise nonexpansive:
- A valid proximal operator of (weighted )
Hence HST inherits the convergence guarantees of ISTA with weighted regularisation.
Soft-thresholding with different thresholds per component is equivalent to the proximal operator of a weighted norm. Since weighted is still convex, all convergence results for proximal gradient methods carry over.
Nonexpansiveness
Soft-thresholding is nonexpansive for any (it is 1-Lipschitz). This holds component-wise, so the spatially-varying version inherits the property.
Proximal interpretation
.
This identifies HST as the proximal operator of the weighted norm, which is convex and proper.
Hierarchical Soft-Thresholding for OTFS Channel Estimation
Complexity: Per layer: for OTFS-structured (via FFT) + for threshold prediction + for soft-thresholding. Total: .The group structure is defined by the angular-delay-Doppler grid. The auxiliary CNN learns to predict large thresholds for noise-dominated groups and small thresholds for signal-bearing groups, achieving automatic support detection.
Example: HST for OTFS Channel Estimation in Vehicular Scenarios
A vehicular OTFS system operates at GHz with MHz bandwidth and subcarriers across OFDM symbols. The channel has dominant paths with angles , delays , and Dopplers .
Compare scalar ISTA, LISTA, and HST for channel estimation at SNR = 15 dB.
Problem size
The angular-delay-Doppler grid has voxels. The channel is -sparse in this domain, giving a sparsity ratio of .
Results
| Method | NMSE (dB) | Layers | Params |
|---|---|---|---|
| ISTA (100 iter) | 100 | 0 | |
| LISTA (10 layers) | 10 | N/A (dense) | |
| OAMP + soft-thresh (10 iter) | 10 | 0 | |
| HST (10 layers) | 10 | 12K | |
| Unrolled OAMP + HST | 10 | 67K |
HST gains 3.4 dB over scalar OAMP by adapting thresholds to the hierarchical sparsity structure. Combining HST with the OAMP LMMSE step gives a further 1.8 dB improvement.
Threshold map analysis
The learned threshold maps show that:
- Angular groups with no path receive (aggressive suppression).
- Angular groups with a path receive in the delay-Doppler cells near .
- The thresholds decrease across layers (coarse fine support refinement), consistent with the LISTA threshold schedule.
Hierarchical Threshold Map in Angular-Delay-Doppler Domain
Visualise the spatially-varying threshold map learned by the HST auxiliary network in the angular-delay-Doppler domain. The plot shows the threshold magnitude at each voxel, with dark regions (low threshold) indicating detected signal support and bright regions (high threshold) indicating noise suppression.
Adjust the layer index to see how the threshold map evolves from coarse angular-group selection (early layers) to fine delay-Doppler refinement (late layers).
Parameters
Hierarchical Soft-Thresholding for OTFS Channel Estimation
Dehkordi, Jung, and Caire developed hierarchical soft-thresholding specifically for the structured sparsity patterns arising in OTFS channel estimation. The key contributions are:
- Angular-delay-Doppler group structure: Exploiting the tree-like sparsity where a few angular clusters contain most energy, each with a few delay-Doppler taps.
- Learned threshold prediction: An auxiliary CNN predicts per-voxel thresholds conditioned on the current estimate, replacing the scalar threshold of ISTA/LISTA.
- Joint OFDM/OTFS sensing: The method applies to both OFDM (range-angle estimation) and OTFS (range-angle-Doppler estimation) by adjusting the group structure.
- Convergence guarantees: HST preserves the proximal operator interpretation, ensuring that convergence results for weighted minimisation apply.
The approach achieves 3--5 dB NMSE improvement over scalar thresholding on vehicular OTFS channels at moderate SNR.
Why This Matters: OFDM/OTFS Sensing via Hierarchical Sparse Recovery
OFDM and OTFS modulations are the workhorses of modern wireless communication, and they also serve as excellent sensing waveforms. The received signal from a radar-like scene can be modelled as , where:
- OFDM sensing: has structure , providing range-angle estimation.
- OTFS sensing: has structure , adding velocity estimation.
Hierarchical soft-thresholding with learned threshold maps exploits the angular-delay(-Doppler) sparsity for state-of-the-art ISAC (Integrated Sensing and Communication) performance.
Example: Edge-Aware Thresholding for RF Image Reconstruction
Explain how hierarchical soft-thresholding produces edge-aware reconstruction in an RF image, and compare with scalar thresholding.
Scalar threshold limitation
With a single , all pixels receive the same regularisation. A threshold that preserves weak scatterers ( small) also allows noise in smooth regions. A threshold that suppresses noise ( large) also attenuates weak targets.
This is the fundamental sparsity-noise tradeoff of scalar soft-thresholding.
Spatially-varying thresholds
The auxiliary network learns to predict:
- Large in smooth/background regions (aggressive noise suppression)
- Small near edges and point targets (preserve fine details)
- Moderate in textured regions
This adapts the regularisation strength to the local structure, breaking the sparsity-noise tradeoff.
Implementation
The auxiliary network takes as input and outputs a threshold map of the same spatial dimensions. A 3-layer CNN with channels and softplus output suffices:
.
Common Mistake: Threshold Prediction Networks Can Overfit
Mistake:
Using a large auxiliary CNN for threshold prediction () without regularisation, leading to overfitting on the training distribution and poor generalisation to unseen scenes.
Correction:
Keep the auxiliary CNN small (3--5 layers, 16--32 channels). Add weight decay and enforce a minimum threshold to prevent the network from turning off regularisation entirely. Validate on held-out scenes with different sparsity patterns.
Quick Check
What advantage does hierarchical soft-thresholding have over replacing the proximal step with a generic CNN denoiser (ProxNet)?
It is faster to compute
It maintains the proximal operator interpretation and convergence guarantees
It uses fewer parameters
It produces sharper images in all cases
HST is the proximal operator of a (data-dependent) weighted norm, so convergence guarantees from convex optimisation apply. A generic CNN denoiser may not be a proximal operator of any function, so convergence is not guaranteed.
Quick Check
In the angular-delay-Doppler domain for OTFS sensing, why is hierarchical (group-sparse) regularisation better than element-wise ?
It is computationally cheaper
It exploits the tree structure where a few angles contain most energy
It avoids the need for a sensing matrix
It guarantees exact recovery with fewer measurements
The channel has a few dominant angles, each with a few delay taps, each with a few Doppler components. Group-sparse regularisation can zero out entire angular groups that contain only noise, while treats each voxel independently and may fail to suppress weak noise patterns that are coherent within a group.
Practical OFDM/OTFS Sensing with Hierarchical Recovery
For practical ISAC systems using OFDM/OTFS:
- Pilot design: Allocate pilot subcarriers to maximise the mutual coherence of the sensing matrix in the angular-delay-Doppler domain.
- Group definition: Define groups from the known array geometry (angular groups) and waveform parameters (delay resolution, Doppler resolution).
- Online adaptation: The threshold prediction network can be fine-tuned online as the channel statistics change (e.g., urban rural transition).
- Latency constraint: HST inference takes ms for on a mobile GPU, compatible with 5G NR slot timing.
- β’
5G NR slot duration: 0.5--1 ms for numerology 1--2
- β’
Pilot overhead: 5--15% of OFDM resources
Hierarchical Soft-Thresholding (HST)
A variant of soft-thresholding where the threshold is spatially varying, predicted by an auxiliary network. Equivalent to the proximal operator of a weighted norm, preserving convergence guarantees.
Related: LISTA (Learned ISTA)
Structured Sparsity
A signal model where nonzero entries are organised in groups, trees, or other patterns, rather than being uniformly distributed. Exploited by group LASSO and hierarchical thresholding.
Key Takeaway
Hierarchical soft-thresholding replaces scalar thresholds with spatially-varying maps predicted by an auxiliary CNN, enabling edge-aware and group-sparse regularisation. Applied to OTFS channel estimation, HST exploits the angular-delay-Doppler tree structure for 3--5 dB gains over scalar methods. HST preserves the proximal operator interpretation, so convergence guarantees carry over from weighted theory.