Ferkans — Interactive Telecom Tutor

Beyond Scalar Thresholds: Structured Sparsity in OTFS

LISTA uses a single scalar threshold $\tau_k$ per layer. But in OTFS channel estimation and RF sensing, the sparsity structure is hierarchical: the channel is sparse in the angular-delay-Doppler domain, and different dimensions require different regularisation strengths.

Hierarchical soft-thresholding (HST) replaces the scalar $\tau_k$ with a spatially-varying threshold map $\boldsymbol{\tau}_k(\mathbf{r})$ predicted by an auxiliary network. When combined with the Kronecker structure of OFDM/OTFS sensing matrices, HST provides a principled approach to exploiting structured sparsity while preserving the interpretability of the proximal framework.

Definition:
Hierarchical Soft-Thresholding

In hierarchical soft-thresholding (HST), the proximal step at layer $k$ uses spatially-varying thresholds:

$[\hat{\mathbf{c}}^{(k+1)}]_i = \mathcal{S}_{\tau_{k,i}}\!\bigl([\mathbf{r}^{(k)}]_i\bigr)$

where the threshold map $\boldsymbol{\tau}_k = (\tau_{k,1}, \ldots, \tau_{k,N})$ is generated by an auxiliary network $h_{\phi_k}$ :

$\boldsymbol{\tau}_k = h_{\phi_k}\!\bigl(\mathbf{r}^{(k)}\bigr), \qquad \tau_{k,i} \geq 0 \;\forall i.$

The auxiliary network $h_{\phi_k}$ is typically a small CNN (3--5 layers) with a final softplus activation to ensure non-negative thresholds.

HST occupies a middle ground between LISTA (scalar threshold, full interpretability) and ProxNet (arbitrary learned denoiser, less interpretability). The soft-thresholding structure is preserved, but the threshold adapts to local content.

Definition:
Structured Sparsity in the Angular-Delay-Doppler Domain

In OTFS channel estimation, the channel response admits a three-dimensional sparse representation in the angular-delay-Doppler domain:

$h(\theta, \tau, \nu) = \sum_{p=1}^{P} \alpha_p \,\delta(\theta - \theta_p)\,\delta(\tau - \tau_p)\,\delta(\nu - \nu_p)$

where $P$ is the number of propagation paths and $(\theta_p, \tau_p, \nu_p)$ are the angle, delay, and Doppler of path $p$ .

The sparsity is hierarchical: a few dominant angles contain most of the energy, each angle has a few delay taps, and each delay tap has a few Doppler components. This tree structure motivates group-sparse and hierarchical regularisation:

$\mathcal{R}(\mathbf{c}) = \sum_{\ell=1}^{L} \lambda_\ell \|\mathbf{c}_{\mathcal{G}_\ell}\|_2$

where $\mathcal{G}_\ell$ is a group (e.g., all delay-Doppler components at angle $\ell$ ) and $\lambda_\ell$ is a per-group regularisation weight.

Theorem: HST Preserves Proximal Properties

If the threshold map $\boldsymbol{\tau}_k$ is non-negative, then the spatially-varying soft-thresholding operator $\mathcal{S}_{\boldsymbol{\tau}_k}$ is:

Component-wise nonexpansive: $|[\mathcal{S}_{\boldsymbol{\tau}}(\mathbf{a})]_i - [\mathcal{S}_{\boldsymbol{\tau}}(\mathbf{b})]_i| \leq |a_i - b_i|$
A valid proximal operator of $R(\mathbf{c}) = \sum_i \tau_i |c_i|$ (weighted $\ell^1$ )

Hence HST inherits the convergence guarantees of ISTA with weighted $\ell^1$ regularisation.

Soft-thresholding with different thresholds per component is equivalent to the proximal operator of a weighted $\ell^1$ norm. Since weighted $\ell^1$ is still convex, all convergence results for proximal gradient methods carry over.

Proof

Nonexpansiveness

Soft-thresholding $\mathcal{S}_\tau$ is nonexpansive for any $\tau \geq 0$ (it is 1-Lipschitz). This holds component-wise, so the spatially-varying version inherits the property.

Proximal interpretation

$\mathcal{S}_{\boldsymbol{\tau}}(\mathbf{r}) = \arg\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{c} - \mathbf{r}\|^2 + \sum_i \tau_i |c_i| = \operatorname{prox}_{\sum_i \tau_i|\cdot|}(\mathbf{r})$ .

This identifies HST as the proximal operator of the weighted $\ell^1$ norm, which is convex and proper. $\blacksquare$

Hierarchical Soft-Thresholding for OTFS Channel Estimation

Complexity: Per layer:

O(N\log N)

for OTFS-structured

\mathbf{A}^{H}

(via FFT) +

O(N)

for threshold prediction +

O(N)

for soft-thresholding. Total:

O(KN\log N)

.

Input: OTFS measurements

\mathbf{y}

, sensing matrix

\mathbf{A}

(delay-Doppler structure), group structure

\{\mathcal{G}_\ell\}

,

trained parameters

\{\phi_k, \theta_k\}_{k=1}^K

Initialise:

\hat{\mathbf{c}}^{(0)} = \mathbf{A}^{H} \mathbf{y}

For

k = 1, \ldots, K

:

1. Gradient step:

\mathbf{r}^{(k)} = \hat{\mathbf{c}}^{(k-1)} + \alpha_k \mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(k-1)})

2. Predict threshold map:

\boldsymbol{\tau}_k = h_{\phi_k}(\mathbf{r}^{(k)})

(auxiliary CNN with softplus output)

3. Hierarchical soft-thresholding:

For each group

\mathcal{G}_\ell

:

[\hat{\mathbf{c}}^{(k)}]_{\mathcal{G}_\ell} = \mathcal{S}_{\boldsymbol{\tau}_{k,\mathcal{G}_\ell}}(\mathbf{r}^{(k)}_{\mathcal{G}_\ell})

Output:

\hat{\mathbf{c}}^{(K)}

The group structure $\{\mathcal{G}_\ell\}$ is defined by the angular-delay-Doppler grid. The auxiliary CNN learns to predict large thresholds for noise-dominated groups and small thresholds for signal-bearing groups, achieving automatic support detection.

Example: HST for OTFS Channel Estimation in Vehicular Scenarios

A vehicular OTFS system operates at $f_c = 28$ GHz with $B = 100$ MHz bandwidth and $N = 128$ subcarriers across $M = 64$ OFDM symbols. The channel has $P = 5$ dominant paths with angles $\theta_p \in \{-30^\circ, -10^\circ, 5^\circ, 20^\circ, 45^\circ\}$ , delays $\tau_p \in [0, 1\,\mu\text{s}]$ , and Dopplers $\nu_p \in [-2\,\text{kHz}, 2\,\text{kHz}]$ .

Compare scalar ISTA, LISTA, and HST for channel estimation at SNR = 15 dB.

Solution

Problem size

The angular-delay-Doppler grid has $N_\theta \times N_\tau \times N_\nu = 32 \times 128 \times 64 = 262{,}144$ voxels. The channel is $P = 5$ -sparse in this domain, giving a sparsity ratio of $5/262{,}144 \approx 0.002\%$ .

Results

Method	NMSE (dB)	Layers	Params
ISTA (100 iter)	$-14.2$	100	0
LISTA (10 layers)	$-17.8$	10	N/A (dense)
OAMP + soft-thresh (10 iter)	$-19.1$	10	0
HST (10 layers)	$-22.5$	10	12K
Unrolled OAMP + HST	$-24.3$	10	67K

HST gains 3.4 dB over scalar OAMP by adapting thresholds to the hierarchical sparsity structure. Combining HST with the OAMP LMMSE step gives a further 1.8 dB improvement.

Threshold map analysis

The learned threshold maps show that:

Angular groups with no path receive $\tau_{k,\ell} \gg 0$ (aggressive suppression).
Angular groups with a path receive $\tau_{k,\ell} \approx 0$ in the delay-Doppler cells near $(\tau_p, \nu_p)$ .
The thresholds decrease across layers (coarse $\to$ fine support refinement), consistent with the LISTA threshold schedule.

Hierarchical Threshold Map in Angular-Delay-Doppler Domain

Visualise the spatially-varying threshold map learned by the HST auxiliary network in the angular-delay-Doppler domain. The plot shows the threshold magnitude at each voxel, with dark regions (low threshold) indicating detected signal support and bright regions (high threshold) indicating noise suppression.

Adjust the layer index to see how the threshold map evolves from coarse angular-group selection (early layers) to fine delay-Doppler refinement (late layers).

Parameters

Layer Index5

Number of Paths5

SNR (dB)15

🎓CommIT Contribution(2023)

Hierarchical Soft-Thresholding for OTFS Channel Estimation

S. Dehkordi, P. Jung, G. Caire — IEEE Transactions on Signal Processing

Dehkordi, Jung, and Caire developed hierarchical soft-thresholding specifically for the structured sparsity patterns arising in OTFS channel estimation. The key contributions are:

Angular-delay-Doppler group structure: Exploiting the tree-like sparsity where a few angular clusters contain most energy, each with a few delay-Doppler taps.
Learned threshold prediction: An auxiliary CNN predicts per-voxel thresholds conditioned on the current estimate, replacing the scalar threshold of ISTA/LISTA.
Joint OFDM/OTFS sensing: The method applies to both OFDM (range-angle estimation) and OTFS (range-angle-Doppler estimation) by adjusting the group structure.
Convergence guarantees: HST preserves the proximal operator interpretation, ensuring that convergence results for weighted $\ell^1$ minimisation apply.

The approach achieves 3--5 dB NMSE improvement over scalar thresholding on vehicular OTFS channels at moderate SNR.

hierarchical soft-thresholdingOTFSstructured sparsitychannel estimationOFDM sensing

Why This Matters: OFDM/OTFS Sensing via Hierarchical Sparse Recovery

OFDM and OTFS modulations are the workhorses of modern wireless communication, and they also serve as excellent sensing waveforms. The received signal from a radar-like scene can be modelled as $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ , where:

OFDM sensing: $\mathbf{A}$ has structure $\mathbf{A}_{\text{freq}} \otimes \mathbf{A}_{\text{array}}$ , providing range-angle estimation.
OTFS sensing: $\mathbf{A}$ has structure $\mathbf{A}_{\text{delay}} \otimes \mathbf{A}_{\text{Doppler}} \otimes \mathbf{A}_{\text{angle}}$ , adding velocity estimation.

Hierarchical soft-thresholding with learned threshold maps exploits the angular-delay(-Doppler) sparsity for state-of-the-art ISAC (Integrated Sensing and Communication) performance.

Example: Edge-Aware Thresholding for RF Image Reconstruction

Explain how hierarchical soft-thresholding produces edge-aware reconstruction in an RF image, and compare with scalar thresholding.

Solution

Scalar threshold limitation

With a single $\tau_k$ , all pixels receive the same regularisation. A threshold that preserves weak scatterers ( $\tau_k$ small) also allows noise in smooth regions. A threshold that suppresses noise ( $\tau_k$ large) also attenuates weak targets.

This is the fundamental sparsity-noise tradeoff of scalar soft-thresholding.

Spatially-varying thresholds

The auxiliary network learns to predict:

Large $\tau_{k,i}$ in smooth/background regions (aggressive noise suppression)
Small $\tau_{k,i}$ near edges and point targets (preserve fine details)
Moderate $\tau_{k,i}$ in textured regions

This adapts the regularisation strength to the local structure, breaking the sparsity-noise tradeoff.

Implementation

The auxiliary network $h_{\phi_k}$ takes $\mathbf{r}^{(k)}$ as input and outputs a threshold map of the same spatial dimensions. A 3-layer CNN with $[32, 32, 1]$ channels and softplus output suffices:

$h_{\phi_k}(\mathbf{r}) = \text{softplus}(\text{Conv}_3(\text{ReLU}(\text{Conv}_2(\text{ReLU}(\text{Conv}_1(\mathbf{r}))))))$ .

Common Mistake: Threshold Prediction Networks Can Overfit

Mistake:

Using a large auxiliary CNN for threshold prediction ( $h_{\phi_k}$ ) without regularisation, leading to overfitting on the training distribution and poor generalisation to unseen scenes.

Correction:

Keep the auxiliary CNN small (3--5 layers, 16--32 channels). Add $\ell^2$ weight decay and enforce a minimum threshold $\tau_{\min} > 0$ to prevent the network from turning off regularisation entirely. Validate on held-out scenes with different sparsity patterns.

Quick Check

What advantage does hierarchical soft-thresholding have over replacing the proximal step with a generic CNN denoiser (ProxNet)?

It is faster to compute

It maintains the proximal operator interpretation and convergence guarantees

It uses fewer parameters

It produces sharper images in all cases

Correction:

It maintains the proximal operator interpretation and convergence guarantees

HST is the proximal operator of a (data-dependent) weighted $\ell^1$ norm, so convergence guarantees from convex optimisation apply. A generic CNN denoiser may not be a proximal operator of any function, so convergence is not guaranteed.

Quick Check

In the angular-delay-Doppler domain for OTFS sensing, why is hierarchical (group-sparse) regularisation better than element-wise $\ell^1$ ?

It is computationally cheaper

It exploits the tree structure where a few angles contain most energy

It avoids the need for a sensing matrix

It guarantees exact recovery with fewer measurements

Correction:

It exploits the tree structure where a few angles contain most energy

The channel has a few dominant angles, each with a few delay taps, each with a few Doppler components. Group-sparse regularisation can zero out entire angular groups that contain only noise, while $\ell^1$ treats each voxel independently and may fail to suppress weak noise patterns that are coherent within a group.

🔧Engineering Note

Practical OFDM/OTFS Sensing with Hierarchical Recovery

For practical ISAC systems using OFDM/OTFS:

Pilot design: Allocate pilot subcarriers to maximise the mutual coherence of the sensing matrix in the angular-delay-Doppler domain.
Group definition: Define groups from the known array geometry (angular groups) and waveform parameters (delay resolution, Doppler resolution).
Online adaptation: The threshold prediction network can be fine-tuned online as the channel statistics change (e.g., urban $\to$ rural transition).
Latency constraint: HST inference takes $\sim 1$ ms for $N = 10^5$ on a mobile GPU, compatible with 5G NR slot timing.

Practical Constraints

•
5G NR slot duration: 0.5--1 ms for numerology 1--2
•
Pilot overhead: 5--15% of OFDM resources

Hierarchical Soft-Thresholding (HST)

A variant of soft-thresholding where the threshold is spatially varying, predicted by an auxiliary network. Equivalent to the proximal operator of a weighted $\ell^1$ norm, preserving convergence guarantees.

Related: LISTA (Learned ISTA)

Structured Sparsity

A signal model where nonzero entries are organised in groups, trees, or other patterns, rather than being uniformly distributed. Exploited by group LASSO and hierarchical thresholding.

Key Takeaway

Hierarchical soft-thresholding replaces scalar thresholds with spatially-varying maps predicted by an auxiliary CNN, enabling edge-aware and group-sparse regularisation. Applied to OTFS channel estimation, HST exploits the angular-delay-Doppler tree structure for 3--5 dB gains over scalar methods. HST preserves the proximal operator interpretation, so convergence guarantees carry over from weighted $\ell^1$ theory.

Hierarchical Soft-Thresholding

Beyond Scalar Thresholds: Structured Sparsity in OTFS

Definition: Hierarchical Soft-Thresholding

Definition: Structured Sparsity in the Angular-Delay-Doppler Domain

Theorem: HST Preserves Proximal Properties

Nonexpansiveness

Proximal interpretation

Hierarchical Soft-Thresholding for OTFS Channel Estimation

Example: HST for OTFS Channel Estimation in Vehicular Scenarios

Problem size

Results

Threshold map analysis

Hierarchical Threshold Map in Angular-Delay-Doppler Domain

Parameters

Hierarchical Soft-Thresholding for OTFS Channel Estimation

Why This Matters: OFDM/OTFS Sensing via Hierarchical Sparse Recovery

Example: Edge-Aware Thresholding for RF Image Reconstruction

Scalar threshold limitation

Spatially-varying thresholds

Implementation

Common Mistake: Threshold Prediction Networks Can Overfit

Quick Check

Quick Check

Practical OFDM/OTFS Sensing with Hierarchical Recovery

Hierarchical Soft-Thresholding (HST)

Structured Sparsity

Key Takeaway

Definition:
Hierarchical Soft-Thresholding

Definition:
Structured Sparsity in the Angular-Delay-Doppler Domain