Ferkans — Interactive Telecom Tutor

ex-ch20-01

Easy

Compute the cumulant generating function $\Lambda(\theta)$ and rate function $I(x)$ for $X \sim \text{Exponential}(\lambda)$ , i.e., $f_X(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ .

Show Hint

$M_X(\theta) = \lambda/(\lambda - \theta)$ for $\theta < \lambda$ .

Take the log and compute the Legendre transform.

Solution

CGF

$\Lambda(\theta) = \log\frac{\lambda}{\lambda - \theta} = -\log(1 - \theta/\lambda)$ for $\theta < \lambda$ .

Rate function

$I(x) = \sup_{\theta < \lambda}\{\theta x + \log(1 - \theta/\lambda)\}$ . Setting the derivative to zero: $x - 1/(\lambda - \theta) = 0 \Rightarrow \theta^* = \lambda - 1/x$ . Substituting: $I(x) = \lambda x - 1 - \log(\lambda x)$ for $x > 0$ . Note $I(1/\lambda) = 0$ , confirming the rate function vanishes at the mean.

ex-ch20-02

Easy

Compute the rate function $I(x)$ for $X \sim \text{Poisson}(\mu)$ .

Show Hint

$M_X(\theta) = \exp(\mu(e^\theta - 1))$ .

Solution

CGF

$\Lambda(\theta) = \mu(e^\theta - 1)$ , defined for all $\theta$ .

Legendre transform

$I(x) = \sup_\theta\{\theta x - \mu(e^\theta - 1)\}$ . Setting derivative to zero: $x = \mu e^{\theta^*}$ , so $\theta^* = \log(x/\mu)$ . $I(x) = x\log\frac{x}{\mu} - x + \mu$ for $x \geq 0$ , with $I(\mu) = 0$ .

ex-ch20-03

Medium

Let $X_1, \ldots, X_n$ be i.i.d. $\text{Bernoulli}(1/2)$ . Using Cramér's theorem, compute the exact rate function and determine the exponential rate of $\mathbb{P}(\bar{X}_n \geq 3/4)$ .

Show Hint

The rate function for Bernoulli( $p$ ) is $I(x) = x\log(x/p) + (1-x)\log((1-x)/(1-p))$ .

Solution

Apply the Bernoulli rate function

With $p = 1/2$ : $I(3/4) = \frac{3}{4}\log\frac{3/4}{1/2} + \frac{1}{4}\log\frac{1/4}{1/2} = \frac{3}{4}\log\frac{3}{2} + \frac{1}{4}\log\frac{1}{2}.$

Compute

Using natural log: $I(3/4) = \frac{3}{4}\ln\frac{3}{2} + \frac{1}{4}\ln\frac{1}{2} = \frac{3}{4}(0.405) + \frac{1}{4}(-0.693) \approx 0.131$ nats. So $\mathbb{P}(\bar{X}_n \geq 3/4) \doteq e^{-0.131n}$ .

ex-ch20-04

Medium

Show that $I(x) = \Lambda^*(x)$ is convex and satisfies $I(x) \geq 0$ with $I(\mu) = 0$ where $\mu = \Lambda'(0)$ .

Show Hint

The Legendre transform of a convex function is convex.

Use $\Lambda(0) = 0$ to show non-negativity.

Solution

Convexity

$I(x) = \sup_\theta\{\theta x - \Lambda(\theta)\}$ is a supremum of affine (hence convex) functions of $x$ , so $I$ is convex.

Non-negativity

$I(x) = \sup_\theta\{\theta x - \Lambda(\theta)\} \geq [\theta x - \Lambda(\theta)]_{\theta=0} = 0 - \Lambda(0) = 0$ .

$I(\mu) = 0$

At $x = \mu$ : $I(\mu) = \sup_\theta\{\theta\mu - \Lambda(\theta)\}$ . Since $\Lambda$ is convex and $\Lambda'(0) = \mu$ , the supremum is attained at $\theta = 0$ : $I(\mu) = 0 \cdot \mu - \Lambda(0) = 0$ . $\blacksquare$

ex-ch20-05

Medium

Prove that if $X$ is $\sigma$ -sub-Gaussian, then $\mathbb{P}(X \geq t) \leq e^{-t^2/(2\sigma^2)}$ for all $t > 0$ .

Show Hint

Apply the Chernoff technique to the sub-Gaussian MGF bound.

Solution

Chernoff bound

For any $s > 0$ : $\mathbb{P}(X \geq t) \leq e^{-st}\mathbb{E}[e^{sX}] \leq e^{-st + s^2\sigma^2/2}$ .

Optimize

Minimize over $s$ : $d/ds[-st + s^2\sigma^2/2] = 0 \Rightarrow s^* = t/\sigma^2$ . Substituting: $\mathbb{P}(X \geq t) \leq e^{-t^2/(2\sigma^2)}$ . $\blacksquare$

ex-ch20-06

Medium

Show that if $X$ is $\sigma_1$ -sub-Gaussian and $Y$ is $\sigma_2$ -sub-Gaussian, and $X, Y$ are independent, then $X + Y$ is $\sqrt{\sigma_1^2 + \sigma_2^2}$ -sub-Gaussian.

Show Hint

Use independence to factor the MGF.

Solution

Factor MGFs

$\mathbb{E}[e^{t(X+Y)}] = \mathbb{E}[e^{tX}]\mathbb{E}[e^{tY}] \leq e^{\sigma_1^2 t^2/2} \cdot e^{\sigma_2^2 t^2/2} = e^{(\sigma_1^2 + \sigma_2^2)t^2/2}$ .

Conclude

$X + Y$ is $\sqrt{\sigma_1^2 + \sigma_2^2}$ -sub-Gaussian. Sub-Gaussian parameters add (like variances), and the effective parameter is the square root of the sum. $\blacksquare$

ex-ch20-07

Medium

Let $X \sim \mathcal{N}(0, 1)$ . Show that $X^2$ is sub-exponential but not sub-Gaussian.

Show Hint

$\mathbb{E}[e^{tX^2}] = (1-2t)^{-1/2}$ for $t < 1/2$ . This blows up at $t = 1/2$ .

Solution

MGF of $X^2$

$\mathbb{E}[e^{tX^2}] = \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}}e^{tx^2 - x^2/2}dx = (1 - 2t)^{-1/2}$ for $t < 1/2$ .

Not sub-Gaussian

A sub-Gaussian MGF bound $e^{\sigma^2 t^2/2}$ is finite for all $t$ , but $\mathbb{E}[e^{tX^2}] = \infty$ for $t \geq 1/2$ . So $X^2$ is not sub-Gaussian.

Sub-exponential

For $|t| < 1/4$ : $(1-2t)^{-1/2} \leq e^{2t^2}$ (by $-\frac{1}{2}\log(1-2t) \leq 2t^2$ for $|t| < 1/4$ ). So $X^2 - 1$ is sub-exponential with parameters $(\nu^2, \alpha) = (4, 4)$ . $\blacksquare$

ex-ch20-08

Hard

Prove Hoeffding's lemma: if $\mathbb{E}[X] = 0$ and $a \leq X \leq b$ , then $\mathbb{E}[e^{tX}] \leq \exp(t^2(b-a)^2/8)$ .

Show Hint

Use convexity of $e^{tx}$ : bound $e^{tX}$ by the chord between $(a, e^{ta})$ and $(b, e^{tb})$ .

After taking expectation and using $\mathbb{E}[X] = 0$ , show the bound reduces to $e^{g(u)}$ where $g(u) \leq u^2/8$ .

Solution

Convexity bound

For $x \in [a, b]$ : $e^{tx} \leq \frac{b-x}{b-a}e^{ta} + \frac{x-a}{b-a}e^{tb}$ . Taking expectation with $\mathbb{E}[X] = 0$ : $\mathbb{E}[e^{tX}] \leq \frac{b}{b-a}e^{ta} - \frac{a}{b-a}e^{tb}$ .

Reparametrize

Let $p = -a/(b-a) \in [0,1]$ , $u = t(b-a)$ . Then $\mathbb{E}[e^{tX}] \leq (1-p)e^{-pu} + pe^{(1-p)u} = e^{g(u)}$ where $g(u) = -pu + \log(1-p+pe^u)$ .

Bound $g(u)$

$g(0) = 0$ , $g'(0) = 0$ (using the reparametrization), and $g''(u) = \frac{pe^u(1-p+pe^u) - (pe^u)^2}{(1-p+pe^u)^2} = \frac{pe^u(1-p)}{(1-p+pe^u)^2} \leq \frac{1}{4}$ . The last step uses $ab/(a+b)^2 \leq 1/4$ . By Taylor: $g(u) \leq u^2/8$ . $\blacksquare$

ex-ch20-09

Hard

Let $X_1, \ldots, X_n$ be i.i.d. $\text{Ber}(p)$ with $p = 0.5$ . Compare the upper bounds on $\mathbb{P}(\bar{X}_n \geq 0.8)$ from: (a) Markov's inequality, (b) Chebyshev's inequality, (c) Hoeffding's inequality, (d) the exact Cramér rate function. Evaluate numerically for $n = 100$ .

Show Hint

Markov: $\mathbb{P}(\bar{X}_n \geq a) \leq \mu/a$ .

Chebyshev: $\mathbb{P}(|\bar{X}_n - \mu| \geq a - \mu) \leq \sigma^2/(n(a-\mu)^2)$ .

Solution

Markov

$\mathbb{P}(\bar{X}_n \geq 0.8) \leq 0.5/0.8 = 0.625$ . (Useless.)

Chebyshev

$\mathbb{P}(|\bar{X}_n - 0.5| \geq 0.3) \leq \frac{0.25}{100 \cdot 0.09} = 0.0278$ .

Hoeffding

$\mathbb{P}(\bar{X}_n \geq 0.8) \leq e^{-2 \cdot 100 \cdot 0.3^2} = e^{-18} \approx 1.5 \times 10^{-8}$ .

Exact Cramér

$I(0.8) = 0.8\ln(0.8/0.5) + 0.2\ln(0.2/0.5) = 0.8(0.470) + 0.2(-0.916) = 0.193$ nats. $e^{-100 \cdot 0.193} = e^{-19.3} \approx 4.1 \times 10^{-9}$ . The Cramér bound is about 3.7x tighter than Hoeffding in the exponent.

ex-ch20-10

Hard

Prove Sanov's theorem for the binary alphabet $\mathcal{X} = \{0, 1\}$ with $P = \text{Ber}(p)$ and $\mathcal{E} = \{Q : Q(1) \geq a\}$ where $a > p$ .

Show Hint

Use the type probability bound: $\mathbb{P}(\hat{P}_n = Q) \doteq e^{-nD(Q \| P)}$ .

Sum over all types in $\mathcal{E}$ and use the polynomial-in- $n$ bound on the number of types.

Solution

Upper bound

$\mathbb{P}(\hat{P}_n \in \mathcal{E}) = \sum_{Q \in \mathcal{P}_n : Q(1) \geq a} \mathbb{P}(\hat{P}_n = Q) \leq (n+1) \max_{Q(1) \geq a} e^{-nD(Q \| P)} = (n+1)e^{-nD(\text{Ber}(a) \| \text{Ber}(p))}$ .

Lower bound

$\mathbb{P}(\hat{P}_n \in \mathcal{E}) \geq \mathbb{P}(\hat{P}_n = \text{Ber}(a_n))$ where $a_n = \lceil na \rceil / n \to a$ . Using the lower type probability bound: $\geq \frac{1}{(n+1)^2}e^{-nD(\text{Ber}(a_n) \| \text{Ber}(p))} \to e^{-nD(\text{Ber}(a) \| \text{Ber}(p))}$ in the exponent.

Conclude

Both bounds give rate $D(\text{Ber}(a) \| \text{Ber}(p))$ . $\blacksquare$

ex-ch20-11

Medium

Verify Stein's lemma for the specific case $P_0 = \mathcal{N}(0, 1)$ and $P_1 = \mathcal{N}(\mu, 1)$ with $\mu > 0$ . Show the optimal Type II error exponent (with Type I error constrained) is $D(P_0 \| P_1) = \mu^2/2$ .

Show Hint

The LLR is $\ell_n = \mu\sum_i X_i - n\mu^2/2$ .

Solution

Compute KL divergence

$D(P_0 \| P_1) = \mathbb{E}_0[\log(f_0(X)/f_1(X))] = \mathbb{E}_0[\mu X - \mu^2/2] = -\mu^2/2 + \mu^2/2 = \mu^2/2$ . Wait — more carefully: $\log(f_0/f_1) = -\mu X + \mu^2/2$ , so $D(P_0 \| P_1) = \mathbb{E}_0[-\mu X + \mu^2/2] = 0 + \mu^2/2 = \mu^2/2$ .

Verify with NP test

The NP test rejects $H_0$ when $\sum_i X_i > \gamma$ . Under $H_1$ , $\sum_i X_i \sim \mathcal{N}(n\mu, n)$ . $\beta_n = \mathbb{P}_1(\sum X_i \leq \gamma) = \Phi((\gamma - n\mu)/\sqrt{n})$ . Choosing $\gamma = c\sqrt{n}$ (to fix $\alpha$ ): $\beta_n \approx \Phi(-\sqrt{n}\mu + c) \approx e^{-n\mu^2/2}$ for large $n$ , confirming the exponent $\mu^2/2$ . $\blacksquare$

ex-ch20-12

Hard

Compute the Chernoff information $C(P_0, P_1)$ for $P_0 = \text{Ber}(p)$ and $P_1 = \text{Ber}(q)$ with $p \neq q$ .

Show Hint

$\sum_x P_0(x)^\lambda P_1(x)^{1-\lambda} = p^\lambda q^{1-\lambda} + (1-p)^\lambda(1-q)^{1-\lambda}$ .

Solution

Chernoff exponent

$C_\lambda = -\log[p^\lambda q^{1-\lambda} + (1-p)^\lambda(1-q)^{1-\lambda}]$ .

Optimize

Take the derivative with respect to $\lambda$ and set to zero. This yields a transcendental equation that generally requires numerical solution. For $p = 0.3$ , $q = 0.7$ : numerical optimization gives $\lambda^* \approx 0.5$ and $C \approx -\log[0.3^{0.5} \cdot 0.7^{0.5} + 0.7^{0.5} \cdot 0.3^{0.5}] = -\log[2\sqrt{0.21}] \approx 0.082$ nats.

ex-ch20-13

Challenge

(Gärtner-Ellis) Let $\{X_n\}$ be a stationary ergodic Markov chain on $\{0, 1\}$ with transition matrix $\mathbf{P} = \begin{pmatrix}1-\alpha & \alpha \\ \beta & 1-\beta\end{pmatrix}$ . Show that the limiting CGF $\Lambda(\theta) = \lim_{n \to \infty}\frac{1}{n}\log\mathbb{E}[e^{\theta S_n}]$ exists (where $S_n = \sum_{i=1}^n X_i$ ) and compute it.

Show Hint

The MGF of $S_n$ can be expressed using matrix products involving the tilted transition matrix.

The tilted matrix is $\mathbf{P}_\theta = \mathbf{P} \circ \begin{pmatrix}1 & e^\theta \\ 1 & e^\theta\end{pmatrix}$ .

Solution

Tilted transition matrix

Define $\mathbf{P}_\theta$ with entries $(P_\theta)_{ij} = P_{ij}e^{\theta j}$ : $\mathbf{P}_\theta = \begin{pmatrix}1-\alpha & \alpha e^\theta \\ \beta & (1-\beta)e^\theta\end{pmatrix}$ .

Spectral radius

$\mathbb{E}[e^{\theta S_n}] \sim \rho(\mathbf{P}_\theta)^n$ where $\rho$ is the spectral radius (largest eigenvalue) of $\mathbf{P}_\theta$ . Therefore: $\Lambda(\theta) = \log\rho(\mathbf{P}_\theta)$ .

Compute eigenvalue

The characteristic polynomial of $\mathbf{P}_\theta$ is $\lambda^2 - [(1-\alpha) + (1-\beta)e^\theta]\lambda + [(1-\alpha)(1-\beta)e^\theta - \alpha\beta e^\theta] = 0$ . The largest root $\rho(\mathbf{P}_\theta)$ gives $\Lambda(\theta)$ in closed form.

ex-ch20-14

Medium

Using the matrix Bernstein inequality, determine how many pilot transmissions $n$ are needed so that $\|\hat{\mathbf{R}} - \mathbf{R}\| \leq \epsilon \|\mathbf{R}\|$ with probability $\geq 1 - \delta$ , where $\hat{\mathbf{R}} = \frac{1}{n}\sum_{k=1}^n \mathbf{h}_k\mathbf{h}_k^H$ , $\mathbf{h}_k \sim \mathcal{CN}(\mathbf{0}, \mathbf{R})$ , and the antenna dimension is $d$ .

Show Hint

The summands $\mathbf{X}_k = \mathbf{h}_k\mathbf{h}_k^H - \mathbf{R}$ are zero-mean with $\|\mathbf{X}_k\| \leq K$ (requires truncation or sub-exponential argument).

The matrix variance is $\sigma^2 = \|\mathbb{E}[\mathbf{X}_k^2]\| \leq \|\mathbf{R}\|^2$ .

Solution

Setup

Let $\mathbf{X}_k = \mathbf{h}_k\mathbf{h}_k^H - \mathbf{R}$ . For Gaussian vectors, $\|\mathbf{h}_k\|^2$ is sub-exponential, so with truncation at level $K = C\|\mathbf{R}\|\log(nd/\delta)$ , the bounded matrix Bernstein inequality applies.

Apply bound

$\mathbb{P}(\|\hat{\mathbf{R}} - \mathbf{R}\| \geq t) \leq 2d\exp\left(-\frac{nt^2/2}{\|\mathbf{R}\|^2 + Kt/3}\right)$ . Setting $t = \epsilon\|\mathbf{R}\|$ and requiring the RHS $\leq \delta$ : $n \geq C\frac{d\log(d/\delta)}{\epsilon^2}$ suffices (with $C$ depending on the truncation level).

ex-ch20-15

Easy

Show that a Rademacher random variable ( $\mathbb{P}(X = 1) = \mathbb{P}(X = -1) = 1/2$ ) is 1-sub-Gaussian.

Show Hint

Compute $\mathbb{E}[e^{tX}]$ directly.

Solution

Compute MGF

$\mathbb{E}[e^{tX}] = \frac{1}{2}(e^t + e^{-t}) = \cosh(t)$ .

Bound by Gaussian MGF

Using $\cosh(t) = \sum_{k=0}^\infty \frac{t^{2k}}{(2k)!} \leq \sum_{k=0}^\infty \frac{t^{2k}}{2^k k!} = e^{t^2/2}$ (since $(2k)! \geq 2^k k!$ ). Therefore $\mathbb{E}[e^{tX}] \leq e^{t^2/2}$ , confirming $X$ is 1-sub-Gaussian. $\blacksquare$

ex-ch20-16

Hard

(Cramér's theorem - lower bound via tilting) Let $X_1, \ldots, X_n$ be i.i.d. with finite CGF $\Lambda(\theta)$ on $\mathbb{R}$ , and let $a > \mu$ . Using the exponential tilting measure $dP_{\theta^*} = e^{\theta^* x - \Lambda(\theta^*)}dP$ where $\Lambda'(\theta^*) = a$ , prove the lower bound $\liminf_{n \to \infty}\frac{1}{n}\log\mathbb{P}(\bar{X}_n \geq a) \geq -I(a)$ .

Show Hint

Under $P_{\theta^*}$ , $\bar{X}_n$ has mean $a$ and finite variance $\Lambda''(\theta^*)/n$ .

Use the CLT under the tilted measure to show $P_{\theta^*}(\bar{X}_n \in [a - \delta, a + \delta]) \to 1$ .

Solution

Change of measure

$\mathbb{P}(\bar{X}_n \geq a) = \mathbb{E}_{\theta^*}[e^{-\theta^* S_n + n\Lambda(\theta^*)} \mathbf{1}\{\bar{X}_n \geq a\}]$ .

Restrict to interval

$\geq e^{-n[\theta^*(a + \delta) - \Lambda(\theta^*)]} P_{\theta^*}(a \leq \bar{X}_n \leq a + \delta)$ .

CLT under tilted measure

Under $P_{\theta^*}$ , $\bar{X}_n$ has mean $a$ and variance $\Lambda''(\theta^*)/n \to 0$ . So $P_{\theta^*}(a \leq \bar{X}_n \leq a + \delta) \to \Phi(\delta\sqrt{n/\Lambda''(\theta^*)}) - 1/2 > 0$ . Taking $\frac{1}{n}\log$ , the CLT term is $o(1)$ , and letting $\delta \to 0$ : $\liminf \frac{1}{n}\log\mathbb{P}(\bar{X}_n \geq a) \geq -[\theta^* a - \Lambda(\theta^*)] = -I(a)$ . $\blacksquare$

ex-ch20-17

Challenge

(Connection to hypothesis testing) Using Cramér's theorem, derive Stein's lemma for the special case of testing $P_0 = \text{Ber}(p)$ vs $P_1 = \text{Ber}(q)$ with $p < q$ . Show that the optimal Type II error exponent is $D(P_0 \| P_1)$ .

Show Hint

The NP test thresholds $\bar{X}_n$ at some level $\gamma_n$ .

Under $P_1$ , the event $\{\bar{X}_n \leq \gamma_n\}$ is a large deviation.

Solution

NP test structure

The LLR is $\ell_n = \sum_i \log(P_0(X_i)/P_1(X_i)) = n\bar{X}_n\log(p/q) + n(1-\bar{X}_n)\log((1-p)/(1-q))$ . This is a monotone function of $\bar{X}_n$ (decreasing since $p < q$ ), so the NP test reduces to $\bar{X}_n \leq \gamma$ for some threshold $\gamma$ .

Type I constraint determines $\gamma$

$\alpha_n = P_0(\bar{X}_n \leq \gamma)$ . For $\gamma \in (p, q)$ , this goes to zero. The exact value of $\gamma$ affects only the sub-exponential term.

Type II error via Cramér

$\beta_n = P_1(\bar{X}_n \leq \gamma)$ . Under $P_1$ , $\bar{X}_n$ has mean $q$ and we need $\bar{X}_n \leq \gamma < q$ . By Cramér's theorem with the Bernoulli rate function: $\beta_n \doteq e^{-nI_1(\gamma)}$ where $I_1(x) = x\log(x/q) + (1-x)\log((1-x)/(1-q))$ . As $\gamma \to p^+$ (optimal threshold): $I_1(p) = D(\text{Ber}(p) \| \text{Ber}(q)) = D(P_0 \| P_1)$ . $\blacksquare$

ex-ch20-18

Easy

Show that $D(Q \| P) \geq 0$ with equality iff $Q = P$ , using Jensen's inequality.

Show Hint

Write $D(Q \| P) = -\mathbb{E}_Q[\log(P(X)/Q(X))]$ and apply Jensen's to the concave $\log$ .

Solution

Apply Jensen's

$D(Q \| P) = -\sum_x Q(x)\log\frac{P(x)}{Q(x)} \geq -\log\sum_x Q(x)\frac{P(x)}{Q(x)} = -\log 1 = 0$ .

Equality condition

Equality in Jensen's holds iff $P(x)/Q(x)$ is constant a.s. under $Q$ , i.e., $P = Q$ . $\blacksquare$

Exercises

ex-ch20-01

CGF

Rate function

ex-ch20-02

CGF

Legendre transform

ex-ch20-03

Apply the Bernoulli rate function

Compute

ex-ch20-04

Convexity

Non-negativity

$I(\mu) = 0$

ex-ch20-05

Chernoff bound

Optimize

ex-ch20-06

Factor MGFs

Conclude

ex-ch20-07

MGF of $X^2$

Not sub-Gaussian

Sub-exponential

ex-ch20-08

Convexity bound

Reparametrize

Bound $g(u)$

ex-ch20-09

Markov

Chebyshev

Hoeffding

Exact Cramér

ex-ch20-10

Upper bound

Lower bound

Conclude

ex-ch20-11

Compute KL divergence

Verify with NP test

ex-ch20-12

Chernoff exponent

Optimize

ex-ch20-13

Tilted transition matrix

Spectral radius

Compute eigenvalue

ex-ch20-14

Setup

Apply bound

ex-ch20-15

Compute MGF

Bound by Gaussian MGF

ex-ch20-16

Change of measure

Restrict to interval

CLT under tilted measure

ex-ch20-17

NP test structure

Type I constraint determines $\gamma$

Type II error via Cramér

ex-ch20-18

Apply Jensen's

Equality condition