Ferkans — Interactive Telecom Tutor

ex-ch02-01

Easy

Compute the differential entropy of $X \sim \text{Uniform}(-a, a)$ for $a > 0$ .

Show Hint

The PDF is $f(x) = 1/(2a)$ on $[-a, a]$ .

Solution

Direct computation

$h(X) = -\int_{-a}^{a} \frac{1}{2a} \log \frac{1}{2a}\,dx = \log(2a)$ .

For $a = 1/2$ : $h = 0$ . For $a < 1/2$ : $h < 0$ .

ex-ch02-02

Easy

Show that $h(X + c) = h(X)$ for any constant $c$ (translation invariance).

Show Hint

The PDF of $X + c$ is $f_{X+c}(x) = f_X(x - c)$ .

Solution

Proof

$h(X+c) = -\int f_X(x-c) \log f_X(x-c)\,dx$ . Substituting $u = x - c$ : $= -\int f_X(u) \log f_X(u)\,du = h(X)$ .

ex-ch02-03

Easy

Compute $h(X)$ for $X \sim \mathcal{N}(5, 9)$ .

Show Hint

Differential entropy depends on the variance, not the mean.

Solution

Apply the formula

$h(X) = \frac{1}{2}\log(2\pi e \cdot 9) = \frac{1}{2}\log(18\pi e) \approx 3.28$ bits.

Note: $h(\mathcal{N}(\mu, \sigma^2)) = \frac{1}{2}\log(2\pi e \sigma^2)$ for any $\mu$ .

ex-ch02-04

Easy

Let $X \sim \mathcal{N}(0, 1)$ and $Y = 3X + 2$ . Compute $h(Y)$ using the scaling property.

Show Hint

$h(aX + b) = h(X) + \log|a|$ .

Solution

Apply scaling and translation

$h(Y) = h(3X + 2) = h(3X) = h(X) + \log 3 = \frac{1}{2}\log(2\pi e) + \log 3$ .

Alternatively: $Y \sim \mathcal{N}(2, 9)$ , so $h(Y) = \frac{1}{2}\log(2\pi e \cdot 9) = \frac{1}{2}\log(2\pi e) + \frac{1}{2}\log 9 = \frac{1}{2}\log(2\pi e) + \log 3$ . Consistent.

ex-ch02-05

Medium

Prove that $I(X;Y)$ for continuous random variables is invariant under invertible transformations: if $U = g(X)$ and $V = h(Y)$ where $g, h$ are invertible, then $I(U;V) = I(X;Y)$ .

Show Hint

Write $I$ as a KL divergence.

KL divergence between continuous distributions is invariant under invertible maps.

Solution

Via KL divergence

$I(X;Y) = D(f_{XY} \| f_X f_Y)$ .

Under the invertible map $(X,Y) \mapsto (U,V) = (g(X), h(Y))$ , both the joint density and the product of marginals transform with the same Jacobian factor $|g'(X)|^{-1} |h'(Y)|^{-1}$ , which cancels in the ratio:

$\frac{f_{UV}(u,v)}{f_U(u)f_V(v)} = \frac{f_{XY}(g^{-1}(u), h^{-1}(v))}{f_X(g^{-1}(u))f_Y(h^{-1}(v))}$ .

Therefore $D(f_{UV} \| f_U f_V) = D(f_{XY} \| f_X f_Y)$ .

ex-ch02-06

Medium

Show that for the exponential distribution with parameter $\lambda$ : (a) $h(X) = \log(e/\lambda)$ , and (b) among all non-negative distributions with mean $1/\lambda$ , the exponential uniquely maximizes $h$ .

Show Hint

For (b), use the same KL divergence technique as the Gaussian case.

The cross-entropy term depends on $f$ only through $\mathbb{E}[X]$ .

Solution

(a) Compute

$h(X) = -\int_0^\infty \lambda e^{-\lambda x}(\ln\lambda - \lambda x)\frac{dx}{\ln 2} = \frac{1 - \ln\lambda}{\ln 2} = \log(e/\lambda)$ .

(b) Maximum entropy proof

Let $f$ be any PDF on $[0,\infty)$ with $\mathbb{E}[X] = 1/\lambda$ , and let $\phi(x) = \lambda e^{-\lambda x}$ .

$D(f \| \phi) = -h(X) - \int f(x)\log\phi(x)\,dx = -h(X) + \log\lambda + \frac{\lambda}{\ln 2}\mathbb{E}[X]$ .

$= -h(X) + \log\lambda + \frac{1}{\ln 2} = -h(X) + \log(e\lambda) \geq 0$ .

Therefore $h(X) \leq \log(e/\lambda) = h(\text{Exp}(\lambda))$ .

ex-ch02-07

Medium

Compute the mutual information $I(X;Y)$ for the AWGN channel $Y = X + Z$ where $X \sim \mathcal{N}(0, P)$ and $Z \sim \mathcal{N}(0, N)$ are independent.

Show Hint

$Y \sim \mathcal{N}(0, P + N)$ .

$h(Y|X) = h(Z)$ .

Solution

Compute

$Y \sim \mathcal{N}(0, P + N)$ , so $h(Y) = \frac{1}{2}\log(2\pi e(P+N))$ .

$h(Y|X) = h(Z) = \frac{1}{2}\log(2\pi eN)$ .

$I(X;Y) = \frac{1}{2}\log\frac{P+N}{N} = \frac{1}{2}\log(1 + P/N) = \frac{1}{2}\log(1 + \text{SNR})$ .

ex-ch02-08

Medium

Prove that for a random vector $\mathbf{X} \in \mathbb{R}^n$ : $h(\mathbf{A}\mathbf{X}) = h(\mathbf{X}) + \log|\det(\mathbf{A})|$ for any invertible matrix $\mathbf{A}$ .

Show Hint

Use the change-of-variables formula for densities.

Solution

Change of variables

Let $\mathbf{Y} = \mathbf{A}\mathbf{X}$ . The PDF transforms as $f_{\mathbf{Y}}(\mathbf{y}) = \frac{1}{|\det(\mathbf{A})|} f_{\mathbf{X}}(\mathbf{A}^{-1}\mathbf{y})$ .

$h(\mathbf{Y}) = -\int f_{\mathbf{Y}} \log f_{\mathbf{Y}}\,d\mathbf{y}$

$= -\int \frac{f_{\mathbf{X}}(\mathbf{A}^{-1}\mathbf{y})}{|\det\mathbf{A}|} \log\frac{f_{\mathbf{X}}(\mathbf{A}^{-1}\mathbf{y})}{|\det\mathbf{A}|}\,d\mathbf{y}$

$= -\int f_{\mathbf{X}}(\mathbf{x})[\log f_{\mathbf{X}}(\mathbf{x}) - \log|\det\mathbf{A}|]\,d\mathbf{x} = h(\mathbf{X}) + \log|\det\mathbf{A}|$ .

ex-ch02-09

Medium

Show that for the Gaussian vector $\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \mathbf{K})$ , the differential entropy can be written as $h(\mathbf{X}) = \frac{1}{2}\sum_{i=1}^n \log(2\pi e \lambda_i)$ , where $\lambda_1, \ldots, \lambda_n$ are the eigenvalues of $\mathbf{K}$ .

Show Hint

$\det(\mathbf{K}) = \prod \lambda_i$ .

Apply the formula $h = \frac{1}{2}\log((2\pi e)^n \det(\mathbf{K}))$ .

Solution

Eigenvalue decomposition

$h(\mathbf{X}) = \frac{1}{2}\log((2\pi e)^n \det(\mathbf{K})) = \frac{1}{2}\log\prod_{i=1}^n (2\pi e \lambda_i)$

$= \frac{1}{2}\sum_{i=1}^n \log(2\pi e \lambda_i) = \sum_{i=1}^n h(\tilde{X}_i)$

where $\tilde{X}_i \sim \mathcal{N}(0, \lambda_i)$ are the components in the eigenbasis. The total entropy is the sum of entropies along independent principal components.

ex-ch02-10

Hard

Derive the capacity of the parallel Gaussian channel: $Y_k = X_k + Z_k$ for $k = 1, \ldots, K$ , where $Z_k \sim \mathcal{N}(0, N_k)$ are independent, and the total power constraint is $\sum_k \mathbb{E}[X_k^2] \leq P$ .

Show Hint

The capacity is the sum of individual capacities, optimized over power allocation.

Use Lagrange multipliers — the constraint is linear in the powers $P_k$ .

The resulting power allocation is waterfilling.

Solution

Sum of capacities

Since the channels are independent: $I(\mathbf{X};\mathbf{Y}) = \sum_{k=1}^K I(X_k;Y_k) = \sum_k \frac{1}{2}\log(1 + P_k/N_k)$ .

Waterfilling

Maximize $\sum_k \frac{1}{2}\log(1 + P_k/N_k)$ subject to $\sum_k P_k \leq P$ and $P_k \geq 0$ .

The Lagrangian is $\sum_k \frac{1}{2}\log(1 + P_k/N_k) - \nu(\sum_k P_k - P)$ .

KKT conditions give: $P_k^* = [\nu - N_k]_+$ where $\nu$ is chosen so that $\sum_k P_k^* = P$ .

Capacity

$C = \sum_{k=1}^K \frac{1}{2}\log\frac{\max(\nu, N_k)}{N_k}$ .

The waterfilling interpretation: fill water to level $\nu$ over a "landscape" with heights $N_k$ . Channels with noise above the water level get zero power — they are too noisy to be useful.

ex-ch02-11

Hard

Prove that for independent $X, Y$ with $Y \sim \mathcal{N}(0, \sigma^2)$ :

$h(X + Y) \geq h(Y) = \frac{1}{2}\log(2\pi e \sigma^2).$

Show Hint

This follows from $I(X; X+Y) \geq 0$ .

Alternatively, use the EPI.

Solution

Via mutual information

$I(X; X+Y) = h(X+Y) - h(X+Y|X) = h(X+Y) - h(Y) \geq 0$ .

Therefore $h(X+Y) \geq h(Y)$ .

Interpretation

Adding an independent signal to Gaussian noise can only increase the total entropy — the signal "spreads" the distribution further.

ex-ch02-12

Hard

The de Bruijn identity states that if $Z \sim \mathcal{N}(0, 1)$ is independent of $X$ :

$\frac{d}{dt}h(X + \sqrt{t}Z) = \frac{1}{2}J(X + \sqrt{t}Z),$

where $J(\cdot)$ is the Fisher information. Verify this for $X = 0$ (deterministic) and for $X \sim \mathcal{N}(0, \sigma^2)$ .

Show Hint

For $X = 0$ : $X + \sqrt{t}Z \sim \mathcal{N}(0, t)$ .

Fisher information of $\mathcal{N}(0, \sigma^2)$ is $1/\sigma^2$ .

Solution

Case $X = 0$

$X + \sqrt{t}Z \sim \mathcal{N}(0, t)$ , so $h = \frac{1}{2}\log(2\pi e t)$ and $\frac{d}{dt}h = \frac{1}{2t\ln 2}$ .

$J(\mathcal{N}(0,t)) = 1/t$ , so $\frac{1}{2}J = \frac{1}{2t}$ . In nats: $\frac{d}{dt}h_{\text{nats}} = \frac{1}{2t} = \frac{1}{2}J$ . Verified.

Case $X \sim \ntn{gauss}(0, \sigma^2)$

$X + \sqrt{t}Z \sim \mathcal{N}(0, \sigma^2 + t)$ .

$\frac{d}{dt}h = \frac{1}{2(\sigma^2 + t)\ln 2}$ .

$J(\mathcal{N}(0, \sigma^2 + t)) = \frac{1}{\sigma^2 + t}$ .

$\frac{1}{2}J = \frac{1}{2(\sigma^2 + t)}$ . Converting to bits: matches. Verified.

ex-ch02-13

Hard

Prove the maximum entropy under covariance constraint for complex random vectors: among all distributions on $\mathbb{C}^n$ with covariance matrix $\mathbf{K}$ , the circularly symmetric complex Gaussian $\mathcal{CN}(\mathbf{0}, \mathbf{K})$ uniquely maximizes differential entropy, achieving $h(\mathbf{X}) = \log(\pi e)^n \det(\mathbf{K})$ .

Show Hint

Follow the same KL divergence proof as the real case.

Complex Gaussian PDF: $f(\mathbf{x}) = \frac{1}{\pi^n \det(\mathbf{K})} \exp(-\mathbf{x}^H \mathbf{K}^{-1} \mathbf{x})$ .

Solution

KL divergence approach

Let $\phi$ be the PDF of $\mathcal{CN}(\mathbf{0}, \mathbf{K})$ . For any $f$ with the same covariance:

$D(f\|\phi) = -h(\mathbf{X}) + n\log(\pi e) + \log\det(\mathbf{K}) \geq 0$ .

The cross-entropy uses $\mathbb{E}[\mathbf{X}^H\mathbf{K}^{-1}\mathbf{X}] = \text{tr}(\mathbf{K}^{-1}\mathbf{K}) = n$ .

Therefore $h(\mathbf{X}) \leq \log((\pi e)^n \det(\mathbf{K}))$ .

ex-ch02-14

Medium

Let $X \sim \mathcal{N}(0, P)$ and $Z \sim \mathcal{N}(0, N)$ be independent. Compute $h(X|X+Z)$ (the conditional differential entropy of the input given the noisy output).

Show Hint

$X|Y=y$ is Gaussian with known mean and variance (MMSE estimation).

The MMSE estimate is $\hat{X} = \frac{P}{P+N}Y$ with error variance $\frac{PN}{P+N}$ .

Solution

Conditional distribution

$X|Y=y \sim \mathcal{N}\!\left(\frac{P}{P+N}y,\; \frac{PN}{P+N}\right)$ .

The conditional variance does not depend on $y$ .

Conditional entropy

$h(X|Y) = \frac{1}{2}\log\!\left(2\pi e \cdot \frac{PN}{P+N}\right)$ .

Check: $I(X;Y) = h(X) - h(X|Y) = \frac{1}{2}\log(2\pi eP) - \frac{1}{2}\log\!\left(\frac{2\pi ePN}{P+N}\right) = \frac{1}{2}\log(1 + P/N)$ . Consistent.

ex-ch02-15

Challenge

(Costa's EPI strengthening) Show that for independent $X$ and $Z \sim \mathcal{N}(0, N)$ , the function $g(t) = e^{2h(X + \sqrt{t}Z)}$ is concave in $t \geq 0$ . Deduce the EPI as a special case.

Show Hint

Use the de Bruijn identity: $g'(t) = g(t) \cdot J(X + \sqrt{t}Z)$ .

Show $g''(t) \leq 0$ using the Fisher information inequality.

Solution

Sketch

Define $Y_t = X + \sqrt{t}Z$ . Using de Bruijn's identity:

$\frac{d}{dt}h(Y_t) = \frac{1}{2}J(Y_t)$ , where $h$ is in nats.

$g(t) = e^{2h(Y_t)}$ , so $g'(t) = 2g(t) \cdot \frac{1}{2}J(Y_t) = g(t)J(Y_t)$ .

$g''(t) = g'(t)J(Y_t) + g(t)J'(Y_t) = g(t)[J(Y_t)^2 + J'(Y_t)]$ .

One shows $J'(Y_t) \leq -J(Y_t)^2$ using the Cramér-Rao-type bound $J(Y_t) \leq 1/(t + N_X)$ where $N_X$ is the entropy power of $X$ . This gives $g''(t) \leq 0$ , proving concavity.

EPI as corollary

Concavity of $g$ gives: $g(1) \geq g(0) + g'(0)(1-0)$ is not the right bound. Instead, $g(1) \geq \frac{t}{1}g(0) + \frac{1-t}{1}\ldots$ The EPI follows from concavity along with a scaling argument.

Exercises

ex-ch02-01

Direct computation

ex-ch02-02

Proof

ex-ch02-03

Apply the formula

ex-ch02-04

Apply scaling and translation

ex-ch02-05

Via KL divergence

ex-ch02-06

(a) Compute

(b) Maximum entropy proof

ex-ch02-07

Compute

ex-ch02-08

Change of variables

ex-ch02-09

Eigenvalue decomposition

ex-ch02-10

Sum of capacities

Waterfilling

Capacity

ex-ch02-11

Via mutual information

Interpretation

ex-ch02-12

Case $X = 0$

Case $X \sim \ntn{gauss}(0, \sigma^2)$

ex-ch02-13

KL divergence approach

ex-ch02-14

Conditional distribution

Conditional entropy

ex-ch02-15

Sketch

EPI as corollary