Ferkans — Interactive Telecom Tutor

ex-ch22-01

Easy

Compute the support $[(1-\sqrt\gamma)^2,(1+\sqrt\gamma)^2]$ of the Marchenko–Pastur distribution for $\gamma\in\{0.1,0.25,0.5,0.9\}$ .

Show Hint

Just substitute into the formula.

Solution

Compute

$\gamma=0.1$ : $\sqrt\gamma\approx 0.316$ , support $\approx[0.468,1.732]$ .
$\gamma=0.25$ : $\sqrt\gamma=0.5$ , support $[0.25,2.25]$ .
$\gamma=0.5$ : $\sqrt\gamma\approx 0.707$ , support $\approx[0.086,2.914]$ .
$\gamma=0.9$ : $\sqrt\gamma\approx 0.949$ , support $\approx[0.003,3.797]$ .

Observation

As $\gamma\to 1^-$ the left edge approaches zero, signalling the onset of rank deficiency.

ex-ch22-02

Easy

Derive the ridge estimator $\hat{\mathbf{x}}_{\text{ridge}}=(\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I})^{-1}\mathbf{A}^T\mathbf{y}$ by differentiating the ridge objective and setting the gradient to zero.

Show Hint

The objective is $\tfrac12\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2+\tfrac\lambda 2\|\mathbf{x}\|^2$ .

Use $\nabla_{\mathbf{x}}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2=-2\mathbf{A}^T(\mathbf{y}-\mathbf{A}\mathbf{x})$ .

Solution

Differentiate

Let $J(\mathbf{x})=\tfrac12\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2+\tfrac{\lambda}{2}\|\mathbf{x}\|^2$ . Then $\nabla J(\mathbf{x})=-\mathbf{A}^T(\mathbf{y}-\mathbf{A}\mathbf{x})+\lambda\mathbf{x}$ .

Set to zero

$\mathbf{A}^T\mathbf{A}\mathbf{x}+\lambda\mathbf{x}=\mathbf{A}^T\mathbf{y}$ $\Longrightarrow \hat{\mathbf{x}}=(\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I})^{-1}\mathbf{A}^T\mathbf{y}$ . The Hessian $\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I}\succ 0$ confirms strict convexity.

ex-ch22-03

Easy

Compute the soft-thresholding operator $\eta_\theta(z)$ for $\theta=0.5$ and $z\in\{-1.2,-0.3,0,0.8,2.0\}$ .

Show Hint

$\eta_\theta(z)=\text{sign}(z)\max(|z|-\theta,0)$ .

Solution

Compute

$z=-1.2$ : $\eta=-(1.2-0.5)=-0.7$ .
$z=-0.3$ : $|z|<0.5$ , so $\eta=0$ .
$z=0$ : $\eta=0$ .
$z=0.8$ : $\eta=0.8-0.5=0.3$ .
$z=2.0$ : $\eta=2.0-0.5=1.5$ .

ex-ch22-04

Easy

For $N=5$ , $\sigma^2=1$ , compute the shrinkage factor $1-(N-2)\sigma^2/\|\mathbf{y}\|^2$ of the James–Stein estimator when $\|\mathbf{y}\|^2=10$ . What happens when $\|\mathbf{y}\|^2=2$ ?

Show Hint

Apply the formula directly; note when the factor goes negative.

Solution

Case $\|\mathbf{y}\|^2=10$

Factor $=1-(5-2)\cdot 1/10=1-0.3=0.7$ . Shrink each coordinate by $0.7$ .

Case $\|\mathbf{y}\|^2=2$

Factor $=1-3/2=-0.5$ . The classical JS would flip the sign. The positive-part variant clamps at zero instead, yielding $\hat{\boldsymbol{\theta}}=\mathbf{0}$ .

ex-ch22-05

Easy

State what "inadmissible" means in two sentences, without using formulas.

Show Hint

Focus on the comparison to another estimator.

Solution

Answer

An estimator is inadmissible if another estimator has risk no larger for every parameter value and strictly smaller for at least one. Inadmissibility says there is a better alternative, but does not construct it.

ex-ch22-06

Medium

Show that the LMMSE estimator for the model $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{w}$ with $\mathbf{x}\sim\mathcal{N}(\mathbf{0},\sigma_x^2\mathbf{I})$ , $\mathbf{w}\sim\mathcal{N}(\mathbf{0},\sigma^2\mathbf{I})$ coincides with the ridge estimator at $\lambda=\sigma^2/\sigma_x^2$ .

Show Hint

Write the LMMSE estimator as $\boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}_y^{-1}\mathbf{y}$ .

Use the push-through identity $\mathbf{B}^T(\mathbf{B}\mathbf{B}^T+\alpha\mathbf{I})^{-1}=(\mathbf{B}^T\mathbf{B}+\alpha\mathbf{I})^{-1}\mathbf{B}^T$ .

Solution

Compute covariances

$\boldsymbol{\Sigma}_{xy}=\mathbb{E}[\mathbf{x}\mathbf{y}^T]=\sigma_x^2\mathbf{A}^T$ , $\boldsymbol{\Sigma}_y=\sigma_x^2\mathbf{A}\mathbf{A}^T+\sigma^2\mathbf{I}_M$ .

LMMSE

$\hat{\mathbf{x}}_{\text{LMMSE}}=\sigma_x^2\mathbf{A}^T(\sigma_x^2\mathbf{A}\mathbf{A}^T+\sigma^2\mathbf{I})^{-1}\mathbf{y}=\mathbf{A}^T(\mathbf{A}\mathbf{A}^T+\tfrac{\sigma^2}{\sigma_x^2}\mathbf{I})^{-1}\mathbf{y}$ .

Push-through

Apply the push-through identity: $\mathbf{A}^T(\mathbf{A}\mathbf{A}^T+\lambda\mathbf{I}_M)^{-1}=(\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I}_N)^{-1}\mathbf{A}^T$ with $\lambda=\sigma^2/\sigma_x^2$ . Hence $\hat{\mathbf{x}}_{\text{LMMSE}}=(\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I})^{-1}\mathbf{A}^T\mathbf{y}=\hat{\mathbf{x}}_{\text{ridge}}(\lambda)$ .

ex-ch22-07

Medium

Derive the KKT conditions for the LASSO and interpret them coordinate-by-coordinate.

Show Hint

The LASSO objective $\tfrac12\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2+\lambda\|\mathbf{x}\|_1$ is convex but non-smooth.

The sub-differential of $|x_i|$ is $\{\text{sign}(x_i)\}$ if $x_i\neq 0$ and $[-1,1]$ if $x_i=0$ .

Solution

Stationarity

$\mathbf{0}\in -\mathbf{A}^T(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})+\lambda\partial\|\hat{\mathbf{x}}\|_1$ . Componentwise: $\mathbf{a}_i^T(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})=\lambda\,\text{sign}(\hat x_i)$ when $\hat x_i\neq 0$ , and $|\mathbf{a}_i^T(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})|\leq\lambda$ when $\hat x_i=0$ .

Interpretation

A coordinate is active ( $\hat x_i\neq 0$ ) iff the residual correlates with the $i$ -th column at level exactly $\lambda$ ; inactive coordinates have residual correlation below $\lambda$ . This is the "equicorrelation" condition that underlies the LARS algorithm.

ex-ch22-08

Medium

Compute the divergence $\nabla\cdot(\mathbf{y}/\|\mathbf{y}\|^2)$ and verify that it equals $(N-2)/\|\mathbf{y}\|^2$ for $\mathbf{y}\neq\mathbf{0}$ .

Show Hint

Compute $\partial_i(y_i/\|\mathbf{y}\|^2)$ using the quotient rule.

Solution

Single partial

$\partial_i\bigl(y_i/\|\mathbf{y}\|^2\bigr)=\tfrac{\|\mathbf{y}\|^2-y_i\cdot 2y_i}{\|\mathbf{y}\|^4}=\tfrac{\|\mathbf{y}\|^2-2y_i^2}{\|\mathbf{y}\|^4}$ .

Sum

Summing over $i$ : $\sum_i\tfrac{\|\mathbf{y}\|^2-2y_i^2}{\|\mathbf{y}\|^4}=\tfrac{N\|\mathbf{y}\|^2-2\|\mathbf{y}\|^2}{\|\mathbf{y}\|^4}=\tfrac{N-2}{\|\mathbf{y}\|^2}$ .

ex-ch22-09

Medium

Given $\mathbf{y}\sim\mathcal{N}(\boldsymbol{\theta},\sigma^2\mathbf{I}_N)$ and the positive-part JS estimator $\hat{\boldsymbol{\theta}}_{\text{JS}+}=\max(0,1-(N-2)\sigma^2/\|\mathbf{y}\|^2)\mathbf{y}$ , argue (informally) why it should dominate the ordinary JS estimator.

Show Hint

When the JS shrinkage factor is negative, it flips the sign of $\mathbf{y}$ .

Solution

Sign flip is wasteful

When $\|\mathbf{y}\|^2<(N-2)\sigma^2$ , the JS shrinkage factor is negative, producing an estimator that points opposite to the observation. Setting this factor to zero (positive-part) always improves risk, because $\hat{\boldsymbol{\theta}}=\mathbf{0}$ has risk $\|\boldsymbol{\theta}\|^2$ , which is typically smaller than the risk of $\hat{\boldsymbol{\theta}}=-c\mathbf{y}$ for positive $c$ .

Dominance

A careful Stein-identity calculation (see Baranchik 1964) formalises this intuition. JS $+$ dominates JS, confirming that JS itself is inadmissible (though minimax).

ex-ch22-10

Medium

Prove that the Bayes estimator under squared-error loss is the posterior mean $\hat\theta_{\text{Bayes}}(y)=\mathbb{E}[\theta\mid Y=y]$ .

Show Hint

Minimise $\mathbb{E}[(\theta-c)^2\mid Y=y]$ over $c$ .

Solution

Pointwise minimisation

The Bayes risk is $\mathbb{E}[(\theta-\hat\theta(Y))^2]=\mathbb{E}_Y[\mathbb{E}[(\theta-\hat\theta(Y))^2\mid Y]]$ . It suffices to minimise $c\mapsto\mathbb{E}[(\theta-c)^2\mid Y=y]$ pointwise.

Quadratic in $c$

Differentiating in $c$ : $\partial_c\mathbb{E}[(\theta-c)^2\mid Y=y]=-2\mathbb{E}[\theta-c\mid Y=y]$ . Setting to zero yields $c^*=\mathbb{E}[\theta\mid Y=y]$ .

ex-ch22-11

Medium

Verify the minimax = maximin duality for the scalar Gaussian mean problem $Y\sim\mathcal{N}(\theta,1)$ , $\theta\in\mathbb{R}$ . Specifically, compute both sides and check equality.

Show Hint

The MLE $\hat\theta=Y$ has constant risk $1$ .

A Gaussian prior $\theta\sim\mathcal{N}(0,\tau^2)$ gives Bayes risk $\tau^2/(\tau^2+1)$ ; let $\tau\to\infty$ .

Solution

Minimax side

$\inf_{\hat\theta}\sup_\theta R(\hat\theta,\theta)\leq\sup_\theta R(Y,\theta)=1$ . A lower bound of $1$ follows from taking the prior to be a Gaussian of growing variance (next step).

Maximin side

Under prior $\mathcal{N}(0,\tau^2)$ , the Bayes estimator is $\tau^2 Y/(\tau^2+1)$ with Bayes risk $\tau^2/(\tau^2+1)\to 1$ as $\tau\to\infty$ . Hence $\sup_\pi\inf_{\hat\theta}r(\hat\theta,\pi)=1$ .

Conclude

Both sides equal $1$ , so duality holds and the MLE is minimax.

ex-ch22-12

Medium

Derive the optimal linear shrinkage for the sample covariance matrix: $\hat{\boldsymbol{\Sigma}}_{\alpha}=(1-\alpha)\hat{\boldsymbol{\Sigma}}+\alpha\mathbf{I}$ . Find the $\alpha^*$ that minimises $\mathbb{E}\|\hat{\boldsymbol{\Sigma}}_\alpha-\boldsymbol{\Sigma}\|_F^2$ .

Show Hint

Expand the squared-Frobenius error and minimise over $\alpha$ .

Solution

Expand

$\mathbb{E}\|\hat{\boldsymbol{\Sigma}}_\alpha-\boldsymbol{\Sigma}\|_F^2=(1-\alpha)^2\mathbb{E}\|\hat{\boldsymbol{\Sigma}}-\boldsymbol{\Sigma}\|_F^2+\alpha^2\|\boldsymbol{\Sigma}-\mathbf{I}\|_F^2-2\alpha(1-\alpha)\mathbb{E}\langle\hat{\boldsymbol{\Sigma}}-\boldsymbol{\Sigma},\mathbf{I}-\boldsymbol{\Sigma}\rangle_F$ . Since $\hat{\boldsymbol{\Sigma}}$ is unbiased, the cross term vanishes in expectation.

Optimise

$\alpha^*=\tfrac{\mathbb{E}\|\hat{\boldsymbol{\Sigma}}-\boldsymbol{\Sigma}\|_F^2}{\mathbb{E}\|\hat{\boldsymbol{\Sigma}}-\boldsymbol{\Sigma}\|_F^2+\|\boldsymbol{\Sigma}-\mathbf{I}\|_F^2}$ . Interpret: shrink more toward $\mathbf{I}$ when the sample covariance is noisy and the true covariance is close to identity.

ex-ch22-13

Hard

Using the Marchenko–Pastur law, show that $\frac{1}{N}\mathbb{E}\mathrm{tr}(\tfrac1M\mathbf{A}^T\mathbf{A}+\lambda\mathbf{I})^{-1}$ converges to the Stieltjes transform $m(-\lambda)=\int\tfrac{1}{\mu+\lambda}f_\gamma(\mu)\,d\mu$ , and derive a closed-form expression for $m(-\lambda)$ as a function of $\gamma$ and $\lambda$ .

Show Hint

The Stieltjes transform of the MP law satisfies a quadratic equation.

Specifically, $m(z)$ solves $\gamma z m(z)^2-(1-\gamma-z)m(z)+1=0$ .

Solution

Quadratic equation

From the MP density one derives the Stieltjes-transform identity $\gamma z m(z)^2-(1-\gamma-z)m(z)+1=0$ . Substituting $z=-\lambda$ : $-\gamma\lambda m(-\lambda)^2-(1-\gamma+\lambda)m(-\lambda)+1=0$ .

Solve

Solving the quadratic and taking the branch that is positive on the negative real axis: $m(-\lambda)=\tfrac{-(1-\gamma+\lambda)+\sqrt{(1-\gamma+\lambda)^2+4\gamma\lambda}}{2\gamma\lambda}$ .

Consistency check

At $\lambda\to 0^+$ and $\gamma<1$ , $m(-\lambda)\to 1/(1-\gamma)$ , matching the OLS risk. For $\gamma>1$ the limit is finite and positive — ridge remains well-defined.

ex-ch22-14

Hard

Prove the upper bound of TMinimax Rate for Sparse Estimation: best-subset selection achieves MSE $\lesssim\sigma^2 s\log(N/s)/M$ .

Show Hint

Use a union bound over $\binom{N}{s}$ supports.

For each support, OLS risk is $s\sigma^2/M$ ; multiply by the union-bound factor.

Solution

Oracle risk per support

For the true support $S^*$ , restricted OLS has risk $s\sigma^2/M$ . The estimator selects the support $\hat S$ minimising the residual sum of squares.

Union bound

For each candidate $S\neq S^*$ , the Gaussian concentration gives $\mathbb{P}(\hat S=S)\lesssim e^{-c M\|\mathbf{x}_{S^*}-\mathbf{x}_S\|^2/\sigma^2}$ . Summing over $\binom{N}{s}\leq(eN/s)^s$ supports inflates the risk by a multiplicative $s\log(N/s)$ factor.

Combine

$\mathbb{E}\|\hat{\mathbf{x}}-\mathbf{x}\|^2\lesssim\sigma^2 s\log(N/s)/M$ . The constant can be sharpened via Slepian-type Gaussian comparison arguments.

ex-ch22-15

Hard

Derive the risk of the James–Stein estimator when shrinking toward an arbitrary fixed vector $\boldsymbol{\mu}_0$ rather than zero. Conclude that the JS phenomenon is independent of the anchor.

Show Hint

Apply the Stein-lemma argument to the translated estimator $\hat{\boldsymbol{\theta}}=\boldsymbol{\mu}_0+\bigl(1-(N-2)\sigma^2/\|\mathbf{y}-\boldsymbol{\mu}_0\|^2\bigr)(\mathbf{y}-\boldsymbol{\mu}_0)$ .

Solution

Translation

Let $\mathbf{z}=\mathbf{y}-\boldsymbol{\mu}_0$ and $\boldsymbol{\nu}=\boldsymbol{\theta}-\boldsymbol{\mu}_0$ . Then $\mathbf{z}\sim\mathcal{N}(\boldsymbol{\nu},\sigma^2\mathbf{I}_N)$ and applying classical JS to $\mathbf{z}$ gives an estimator of $\boldsymbol{\nu}$ that dominates $\mathbf{z}$ for $N\geq 3$ .

Translate back

Adding $\boldsymbol{\mu}_0$ back, the anchored-JS estimator has risk $R(\hat{\boldsymbol{\theta}},\boldsymbol{\theta})=N\sigma^2-\sigma^4(N-2)^2\mathbb{E}\|\mathbf{y}-\boldsymbol{\mu}_0\|^{-2}$ . Dominance over the MLE holds for every $\boldsymbol{\mu}_0$ .

Consequence

The anchor is a free parameter. Shrinking toward the grand mean (as Efron–Morris did) or toward a physics-informed prior (e.g., zero steering in beamforming) both improve upon the MLE.

ex-ch22-16

Hard

Show that LASSO with penalty $\lambda=2\sigma\sqrt{2\log N/M}$ satisfies the minimax rate for sparse recovery. (Upper bound only.)

Show Hint

Use the 'basic inequality' $\tfrac12\|\mathbf{A}\hat{\mathbf{x}}-\mathbf{A}\mathbf{x}\|^2\leq\langle\mathbf{w},\mathbf{A}(\hat{\mathbf{x}}-\mathbf{x})\rangle+\lambda(\|\mathbf{x}\|_1-\|\hat{\mathbf{x}}\|_1)$ .

Then bound the linear term $\langle\mathbf{w},\mathbf{A}\mathbf{v}\rangle\leq\|\mathbf{A}^T\mathbf{w}\|_\infty\|\mathbf{v}\|_1$ and use $\|\mathbf{A}^T\mathbf{w}\|_\infty\leq\sigma\sqrt{2\log N/M}$ w.h.p.

Solution

Basic inequality

From optimality of $\hat{\mathbf{x}}$ and a convex-analysis expansion, $\tfrac12\|\mathbf{A}(\hat{\mathbf{x}}-\mathbf{x})\|^2\leq\langle\mathbf{w},\mathbf{A}(\hat{\mathbf{x}}-\mathbf{x})\rangle+\lambda(\|\mathbf{x}\|_1-\|\hat{\mathbf{x}}\|_1)$ .

Dual-norm bound

Bound the noise term: $|\langle\mathbf{w},\mathbf{A}\mathbf{v}\rangle|\leq\|\mathbf{A}^T\mathbf{w}\|_\infty\|\mathbf{v}\|_1$ . For i.i.d. Gaussian $\mathbf{A}$ and $\mathbf{w}$ , with high probability $\|\mathbf{A}^T\mathbf{w}\|_\infty\leq\sigma\sqrt{2\log N/M}$ .

Restricted eigenvalue + rate

Under a restricted-eigenvalue condition on $\mathbf{A}$ (satisfied w.h.p. for i.i.d. Gaussian matrices with $M\gtrsim s\log N$ ), one derives $\|\hat{\mathbf{x}}-\mathbf{x}\|^2\lesssim\sigma^2 s\log N/M$ , matching the minimax rate up to constants.

ex-ch22-17

Challenge

Derive the fixed-point equation for the asymptotic MSE of LASSO in the proportional regime (Bayati–Montanari state evolution). Reproduce the statement: there exist $\tau^*$ and $\alpha^*$ such that the LASSO MSE equals $\tau^{*2}-\sigma^2$ at the fixed point.

Show Hint

State evolution iterates $\tau_{t+1}^2=\sigma^2+\tfrac{1}{\gamma}\mathbb{E}[(\eta_{\alpha\tau_t}(X+\tau_t Z)-X)^2]$ where $X$ is drawn from the empirical distribution of $\mathbf{x}$ .

At the fixed point, $\alpha$ and $\tau$ are linked by the LASSO threshold.

Solution

State evolution recursion

AMP with soft-threshold denoiser $\eta_\theta$ has scalar state evolution $\tau_{t+1}^2=\sigma^2+\frac{1}{\gamma}\mathbb{E}\bigl[(\eta_{\alpha\tau_t}(X+\tau_t Z)-X)^2\bigr],$ $X$ drawn from the empirical distribution of the true signal, $Z\sim\mathcal{N}(0,1)$ .

Calibration to LASSO

The connection to LASSO is via the "Onsager" correction: the LASSO solution coincides with the AMP fixed point when $\alpha$ and $\tau$ satisfy the scalar calibration $\lambda=\alpha\tau(1-\tfrac{1}{\gamma}\mathbb{P}(|X+\tau Z|\geq\alpha\tau))$ .

Fixed point MSE

At the joint fixed point $(\alpha^*,\tau^*)$ , the LASSO per-coord MSE is $\tau^{*2}-\sigma^2$ (for $\gamma\geq 1$ ) or $(\tau^{*2}-\sigma^2)/\gamma$ (for $\gamma<1$ ). The formal proof is Bayati–Montanari (2012).

ex-ch22-18

Challenge

For a massive-MIMO uplink with $N_t=128$ antennas, $K=8$ single-antenna users, and $M$ pilot symbols, design a shrinkage-based channel estimator that minimises the worst-case MSE over users with bounded channel norm $\|\mathbf{h}_k\|\leq B$ . Compare with LMMSE assuming a mismatched prior.

Show Hint

Minimax over a ball: the problem has known structure (Pinsker-type).

Derive the optimal shrinkage factor as a function of $B$ , $N_t$ , $M$ , and $\sigma^2$ .

Solution

Problem setup

For each user, the pilot estimate is $\hat{\mathbf{h}}_k=\mathbf{h}_k+\mathbf{w}_k$ after matched filtering, with $\mathbf{w}_k\sim\mathcal{N}(\mathbf{0},(\sigma^2/M)\mathbf{I}_{N_t})$ . Estimate $\mathbf{h}_k$ under the constraint $\|\mathbf{h}_k\|\leq B$ .

Minimax shrinkage

The minimax estimator on a ball is a linear shrinker $\alpha^*\hat{\mathbf{h}}_k$ with $\alpha^*=\tfrac{B^2}{B^2+(\sigma^2 N_t/M)}$ . The worst-case MSE is $\tfrac{B^2\cdot\sigma^2 N_t/M}{B^2+\sigma^2 N_t/M}$ .

Comparison

LMMSE with true $\|\mathbf{h}_k\|^2=\sigma_h^2$ uses $\alpha_{\text{LMMSE}}=\sigma_h^2/(\sigma_h^2+\sigma^2/M)$ , which is optimal only for the specific $\sigma_h^2$ . If the true norm is mismatched (e.g., the prior was trained on a different environment), LMMSE risk can exceed minimax by a multiplicative factor of up to $(B/\sigma_h)^2$ . The minimax estimator is safer; LMMSE is better when the prior is accurate. In production, one often uses empirical-Bayes: estimate $\sigma_h^2$ online from recent pilots.

Exercises

ex-ch22-01

Compute

Observation

ex-ch22-02

Differentiate

Set to zero

ex-ch22-03

Compute

ex-ch22-04

Case $\|\mathbf{y}\|^2=10$

Case $\|\mathbf{y}\|^2=2$

ex-ch22-05

Answer

ex-ch22-06

Compute covariances

LMMSE

Push-through

ex-ch22-07

Stationarity

Interpretation

ex-ch22-08

Single partial

Sum

ex-ch22-09

Sign flip is wasteful

Dominance

ex-ch22-10

Pointwise minimisation

Quadratic in $c$

ex-ch22-11

Minimax side

Maximin side

Conclude

ex-ch22-12

Expand

Optimise

ex-ch22-13

Quadratic equation

Solve

Consistency check

ex-ch22-14

Oracle risk per support

Union bound

Combine

ex-ch22-15

Translation

Translate back

Consequence

ex-ch22-16

Basic inequality

Dual-norm bound

Restricted eigenvalue + rate

ex-ch22-17

State evolution recursion

Calibration to LASSO

Fixed point MSE

ex-ch22-18

Problem setup

Minimax shrinkage

Comparison