Hierarchical Soft-Thresholding

Beyond Scalar Thresholds: Structured Sparsity in OTFS

LISTA uses a single scalar threshold Ο„k\tau_k per layer. But in OTFS channel estimation and RF sensing, the sparsity structure is hierarchical: the channel is sparse in the angular-delay-Doppler domain, and different dimensions require different regularisation strengths.

Hierarchical soft-thresholding (HST) replaces the scalar Ο„k\tau_k with a spatially-varying threshold map Ο„k(r)\boldsymbol{\tau}_k(\mathbf{r}) predicted by an auxiliary network. When combined with the Kronecker structure of OFDM/OTFS sensing matrices, HST provides a principled approach to exploiting structured sparsity while preserving the interpretability of the proximal framework.

Definition:

Hierarchical Soft-Thresholding

In hierarchical soft-thresholding (HST), the proximal step at layer kk uses spatially-varying thresholds:

[c^(k+1)]i=SΟ„k,i ⁣([r(k)]i)[\hat{\mathbf{c}}^{(k+1)}]_i = \mathcal{S}_{\tau_{k,i}}\!\bigl([\mathbf{r}^{(k)}]_i\bigr)

where the threshold map Ο„k=(Ο„k,1,…,Ο„k,N)\boldsymbol{\tau}_k = (\tau_{k,1}, \ldots, \tau_{k,N}) is generated by an auxiliary network hΟ•kh_{\phi_k}:

Ο„k=hΟ•k ⁣(r(k)),Ο„k,iβ‰₯0β€…β€Šβˆ€i.\boldsymbol{\tau}_k = h_{\phi_k}\!\bigl(\mathbf{r}^{(k)}\bigr), \qquad \tau_{k,i} \geq 0 \;\forall i.

The auxiliary network hΟ•kh_{\phi_k} is typically a small CNN (3--5 layers) with a final softplus activation to ensure non-negative thresholds.

HST occupies a middle ground between LISTA (scalar threshold, full interpretability) and ProxNet (arbitrary learned denoiser, less interpretability). The soft-thresholding structure is preserved, but the threshold adapts to local content.

Definition:

Structured Sparsity in the Angular-Delay-Doppler Domain

In OTFS channel estimation, the channel response admits a three-dimensional sparse representation in the angular-delay-Doppler domain:

h(ΞΈ,Ο„,Ξ½)=βˆ‘p=1PΞ±p δ(ΞΈβˆ’ΞΈp) δ(Ο„βˆ’Ο„p) δ(Ξ½βˆ’Ξ½p)h(\theta, \tau, \nu) = \sum_{p=1}^{P} \alpha_p \,\delta(\theta - \theta_p)\,\delta(\tau - \tau_p)\,\delta(\nu - \nu_p)

where PP is the number of propagation paths and (ΞΈp,Ο„p,Ξ½p)(\theta_p, \tau_p, \nu_p) are the angle, delay, and Doppler of path pp.

The sparsity is hierarchical: a few dominant angles contain most of the energy, each angle has a few delay taps, and each delay tap has a few Doppler components. This tree structure motivates group-sparse and hierarchical regularisation:

R(c)=βˆ‘β„“=1LΞ»β„“βˆ₯cGβ„“βˆ₯2\mathcal{R}(\mathbf{c}) = \sum_{\ell=1}^{L} \lambda_\ell \|\mathbf{c}_{\mathcal{G}_\ell}\|_2

where Gβ„“\mathcal{G}_\ell is a group (e.g., all delay-Doppler components at angle β„“\ell) and Ξ»β„“\lambda_\ell is a per-group regularisation weight.

Theorem: HST Preserves Proximal Properties

If the threshold map Ο„k\boldsymbol{\tau}_k is non-negative, then the spatially-varying soft-thresholding operator SΟ„k\mathcal{S}_{\boldsymbol{\tau}_k} is:

  1. Component-wise nonexpansive: ∣[SΟ„(a)]iβˆ’[SΟ„(b)]iβˆ£β‰€βˆ£aiβˆ’bi∣|[\mathcal{S}_{\boldsymbol{\tau}}(\mathbf{a})]_i - [\mathcal{S}_{\boldsymbol{\tau}}(\mathbf{b})]_i| \leq |a_i - b_i|
  2. A valid proximal operator of R(c)=βˆ‘iΟ„i∣ci∣R(\mathbf{c}) = \sum_i \tau_i |c_i| (weighted β„“1\ell^1)

Hence HST inherits the convergence guarantees of ISTA with weighted β„“1\ell^1 regularisation.

Soft-thresholding with different thresholds per component is equivalent to the proximal operator of a weighted β„“1\ell^1 norm. Since weighted β„“1\ell^1 is still convex, all convergence results for proximal gradient methods carry over.

Hierarchical Soft-Thresholding for OTFS Channel Estimation

Complexity: Per layer: O(Nlog⁑N)O(N\log N) for OTFS-structured AH\mathbf{A}^{H} (via FFT) + O(N)O(N) for threshold prediction + O(N)O(N) for soft-thresholding. Total: O(KNlog⁑N)O(KN\log N).
Input: OTFS measurements y\mathbf{y}, sensing matrix A\mathbf{A}
(delay-Doppler structure), group structure {Gβ„“}\{\mathcal{G}_\ell\},
trained parameters {Ο•k,ΞΈk}k=1K\{\phi_k, \theta_k\}_{k=1}^K
Initialise: c^(0)=AHy\hat{\mathbf{c}}^{(0)} = \mathbf{A}^{H} \mathbf{y}
For k=1,…,Kk = 1, \ldots, K:
1. Gradient step:
r(k)=c^(kβˆ’1)+Ξ±kAH(yβˆ’Ac^(kβˆ’1))\mathbf{r}^{(k)} = \hat{\mathbf{c}}^{(k-1)} + \alpha_k \mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(k-1)})
2. Predict threshold map:
Ο„k=hΟ•k(r(k))\boldsymbol{\tau}_k = h_{\phi_k}(\mathbf{r}^{(k)})
(auxiliary CNN with softplus output)
3. Hierarchical soft-thresholding:
For each group Gβ„“\mathcal{G}_\ell:
[c^(k)]Gβ„“=SΟ„k,Gβ„“(rGβ„“(k))[\hat{\mathbf{c}}^{(k)}]_{\mathcal{G}_\ell} = \mathcal{S}_{\boldsymbol{\tau}_{k,\mathcal{G}_\ell}}(\mathbf{r}^{(k)}_{\mathcal{G}_\ell})
Output: c^(K)\hat{\mathbf{c}}^{(K)}

The group structure {Gβ„“}\{\mathcal{G}_\ell\} is defined by the angular-delay-Doppler grid. The auxiliary CNN learns to predict large thresholds for noise-dominated groups and small thresholds for signal-bearing groups, achieving automatic support detection.

Example: HST for OTFS Channel Estimation in Vehicular Scenarios

A vehicular OTFS system operates at fc=28f_c = 28 GHz with B=100B = 100 MHz bandwidth and N=128N = 128 subcarriers across M=64M = 64 OFDM symbols. The channel has P=5P = 5 dominant paths with angles ΞΈp∈{βˆ’30∘,βˆ’10∘,5∘,20∘,45∘}\theta_p \in \{-30^\circ, -10^\circ, 5^\circ, 20^\circ, 45^\circ\}, delays Ο„p∈[0,1 μs]\tau_p \in [0, 1\,\mu\text{s}], and Dopplers Ξ½p∈[βˆ’2 kHz,2 kHz]\nu_p \in [-2\,\text{kHz}, 2\,\text{kHz}].

Compare scalar ISTA, LISTA, and HST for channel estimation at SNR = 15 dB.

Hierarchical Threshold Map in Angular-Delay-Doppler Domain

Visualise the spatially-varying threshold map learned by the HST auxiliary network in the angular-delay-Doppler domain. The plot shows the threshold magnitude at each voxel, with dark regions (low threshold) indicating detected signal support and bright regions (high threshold) indicating noise suppression.

Adjust the layer index to see how the threshold map evolves from coarse angular-group selection (early layers) to fine delay-Doppler refinement (late layers).

Parameters
5
5
15
πŸŽ“CommIT Contribution(2023)

Hierarchical Soft-Thresholding for OTFS Channel Estimation

S. Dehkordi, P. Jung, G. Caire β€” IEEE Transactions on Signal Processing

Dehkordi, Jung, and Caire developed hierarchical soft-thresholding specifically for the structured sparsity patterns arising in OTFS channel estimation. The key contributions are:

  1. Angular-delay-Doppler group structure: Exploiting the tree-like sparsity where a few angular clusters contain most energy, each with a few delay-Doppler taps.
  2. Learned threshold prediction: An auxiliary CNN predicts per-voxel thresholds conditioned on the current estimate, replacing the scalar threshold of ISTA/LISTA.
  3. Joint OFDM/OTFS sensing: The method applies to both OFDM (range-angle estimation) and OTFS (range-angle-Doppler estimation) by adjusting the group structure.
  4. Convergence guarantees: HST preserves the proximal operator interpretation, ensuring that convergence results for weighted β„“1\ell^1 minimisation apply.

The approach achieves 3--5 dB NMSE improvement over scalar thresholding on vehicular OTFS channels at moderate SNR.

hierarchical soft-thresholdingOTFSstructured sparsitychannel estimationOFDM sensing

Why This Matters: OFDM/OTFS Sensing via Hierarchical Sparse Recovery

OFDM and OTFS modulations are the workhorses of modern wireless communication, and they also serve as excellent sensing waveforms. The received signal from a radar-like scene can be modelled as y=Ac+w\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}, where:

  • OFDM sensing: A\mathbf{A} has structure AfreqβŠ—Aarray\mathbf{A}_{\text{freq}} \otimes \mathbf{A}_{\text{array}}, providing range-angle estimation.
  • OTFS sensing: A\mathbf{A} has structure AdelayβŠ—ADopplerβŠ—Aangle\mathbf{A}_{\text{delay}} \otimes \mathbf{A}_{\text{Doppler}} \otimes \mathbf{A}_{\text{angle}}, adding velocity estimation.

Hierarchical soft-thresholding with learned threshold maps exploits the angular-delay(-Doppler) sparsity for state-of-the-art ISAC (Integrated Sensing and Communication) performance.

Example: Edge-Aware Thresholding for RF Image Reconstruction

Explain how hierarchical soft-thresholding produces edge-aware reconstruction in an RF image, and compare with scalar thresholding.

Common Mistake: Threshold Prediction Networks Can Overfit

Mistake:

Using a large auxiliary CNN for threshold prediction (hΟ•kh_{\phi_k}) without regularisation, leading to overfitting on the training distribution and poor generalisation to unseen scenes.

Correction:

Keep the auxiliary CNN small (3--5 layers, 16--32 channels). Add β„“2\ell^2 weight decay and enforce a minimum threshold Ο„min⁑>0\tau_{\min} > 0 to prevent the network from turning off regularisation entirely. Validate on held-out scenes with different sparsity patterns.

Quick Check

What advantage does hierarchical soft-thresholding have over replacing the proximal step with a generic CNN denoiser (ProxNet)?

It is faster to compute

It maintains the proximal operator interpretation and convergence guarantees

It uses fewer parameters

It produces sharper images in all cases

Quick Check

In the angular-delay-Doppler domain for OTFS sensing, why is hierarchical (group-sparse) regularisation better than element-wise β„“1\ell^1?

It is computationally cheaper

It exploits the tree structure where a few angles contain most energy

It avoids the need for a sensing matrix

It guarantees exact recovery with fewer measurements

πŸ”§Engineering Note

Practical OFDM/OTFS Sensing with Hierarchical Recovery

For practical ISAC systems using OFDM/OTFS:

  • Pilot design: Allocate pilot subcarriers to maximise the mutual coherence of the sensing matrix in the angular-delay-Doppler domain.
  • Group definition: Define groups from the known array geometry (angular groups) and waveform parameters (delay resolution, Doppler resolution).
  • Online adaptation: The threshold prediction network can be fine-tuned online as the channel statistics change (e.g., urban β†’\to rural transition).
  • Latency constraint: HST inference takes ∼1\sim 1 ms for N=105N = 10^5 on a mobile GPU, compatible with 5G NR slot timing.
Practical Constraints
  • β€’

    5G NR slot duration: 0.5--1 ms for numerology 1--2

  • β€’

    Pilot overhead: 5--15% of OFDM resources

Hierarchical Soft-Thresholding (HST)

A variant of soft-thresholding where the threshold is spatially varying, predicted by an auxiliary network. Equivalent to the proximal operator of a weighted β„“1\ell^1 norm, preserving convergence guarantees.

Related: LISTA (Learned ISTA)

Structured Sparsity

A signal model where nonzero entries are organised in groups, trees, or other patterns, rather than being uniformly distributed. Exploited by group LASSO and hierarchical thresholding.

Key Takeaway

Hierarchical soft-thresholding replaces scalar thresholds with spatially-varying maps predicted by an auxiliary CNN, enabling edge-aware and group-sparse regularisation. Applied to OTFS channel estimation, HST exploits the angular-delay-Doppler tree structure for 3--5 dB gains over scalar methods. HST preserves the proximal operator interpretation, so convergence guarantees carry over from weighted β„“1\ell^1 theory.