Ferkans — Interactive Telecom Tutor

ex-ch01-01

Easy

Compute the entropy of a random variable $X$ uniformly distributed over $\mathcal{X} = \{1, 2, \ldots, 32\}$ .

Show Hint

For a uniform distribution, $p(x) = 1/M$ .

Recall that $H(X) = \log M$ for the uniform distribution.

Solution

Apply the formula

$H(X) = \log_2 32 = 5$ bits.

ex-ch01-02

Easy

Let $X$ take values in $\{a, b, c, d\}$ with probabilities $(1/2, 1/4, 1/8, 1/8)$ . Compute $H(X)$ and compare with $\log_2 4$ .

Show Hint

Compute $-p(x) \log p(x)$ for each outcome.

The difference $\log 4 - H(X)$ equals $D(P \| U)$ .

Solution

Compute

$H(X) = \frac{1}{2}(1) + \frac{1}{4}(2) + \frac{1}{8}(3) + \frac{1}{8}(3) = \frac{7}{4} = 1.75$ bits.

$\log_2 4 = 2$ bits. The gap $2 - 1.75 = 0.25$ bits equals $D(P \| U)$ , the divergence from $P$ to the uniform.

ex-ch01-03

Easy

Prove that $H(X, Y) \leq H(X) + H(Y)$ with equality iff $X$ and $Y$ are independent.

Show Hint

Write the difference as a mutual information.

Use the non-negativity of $I(X;Y)$ .

Solution

Proof

$H(X) + H(Y) - H(X,Y) = I(X;Y) \geq 0$ .

Equality holds iff $I(X;Y) = 0$ , i.e., iff $X \perp Y$ .

ex-ch01-04

Easy

Let $X \sim \text{Bernoulli}(1/3)$ and $Y = 1 - X$ . Compute $H(X)$ , $H(Y)$ , and $I(X;Y)$ .

Show Hint

$Y$ is a deterministic function of $X$ .

What is $H(X|Y)$ when $Y$ determines $X$ ?

Solution

Compute entropies

$H(X) = h_b(1/3) = -\frac{1}{3}\log\frac{1}{3} - \frac{2}{3}\log\frac{2}{3} \approx 0.918$ bits.

$Y$ takes values $\{0, 1\}$ with $\Pr(Y=0) = 1/3$ , $\Pr(Y=1) = 2/3$ , so $H(Y) = h_b(1/3) \approx 0.918$ bits.

Since $Y = 1 - X$ is invertible, $H(X|Y) = 0$ , so $I(X;Y) = H(X) = h_b(1/3) \approx 0.918$ bits.

ex-ch01-05

Medium

Prove that for any function $g$ : $H(g(X)) \leq H(X)$ , with equality iff $g$ is one-to-one on the support of $X$ .

Show Hint

Consider the Markov chain $X \to g(X)$ .

Apply the data processing inequality with $Y = g(X)$ and $Z = X$ .

Alternatively, use the grouping property of entropy.

Solution

Via data processing

Since $g(X)$ is a function of $X$ , we have the Markov chain $g(X) X \multimap Y \multimap Z X$ . The self-information gives $I(X; g(X)) \leq I(X; X) = H(X)$ .

But $I(X; g(X)) = H(g(X)) - H(g(X)|X) = H(g(X))$ since $g(X)$ is determined by $X$ .

Therefore $H(g(X)) \leq H(X)$ .

Equality condition

Equality holds iff $X$ is a function of $g(X)$ , i.e., iff $g$ is invertible on the support of $X$ . If $g$ maps two distinct values $x_1, x_2$ to the same output, the entropy of $g(X)$ is strictly less than $H(X)$ .

ex-ch01-06

Medium

Let $(X, Y)$ have joint distribution:

$X \backslash Y$	$y=0$	$y=1$
$x=0$	$1/4$	$1/4$
$x=1$	$1/4$	$1/4$

Compute $H(X)$ , $H(Y)$ , $H(X,Y)$ , $H(X|Y)$ , $H(Y|X)$ , and $I(X;Y)$ .

Show Hint

First find the marginal distributions.

Check if $X$ and $Y$ are independent.

Solution

Marginals

$P_X(0) = P_X(1) = 1/2$ and $P_Y(0) = P_Y(1) = 1/2$ . Both are uniform.

Independence check

$p(x,y) = 1/4 = P_X(x) \cdot P_Y(y)$ for all $(x,y)$ , so $X \perp Y$ .

Compute all quantities

$H(X) = 1$ bit, $H(Y) = 1$ bit.

$H(X,Y) = \log 4 = 2$ bits.

$H(X|Y) = H(X) = 1$ bit (by independence).

$H(Y|X) = H(Y) = 1$ bit.

$I(X;Y) = 0$ (independent).

ex-ch01-07

Medium

Prove that $I(X;Y,Z) \geq I(X;Y)$ , i.e., more observations cannot decrease mutual information.

Show Hint

Use the chain rule for mutual information.

Non-negativity of conditional mutual information.

Solution

Chain rule argument

By the chain rule:

$I(X; Y, Z) = I(X;Y) + I(X;Z|Y)$ .

Since $I(X;Z|Y) \geq 0$ :

$I(X;Y,Z) \geq I(X;Y)$ .

ex-ch01-08

Medium

Show that $D(P \| Q) \geq \frac{1}{2\ln 2}\|P - Q\|_1^2$ (Pinsker's inequality), where $\|P - Q\|_1 = \sum_x |P(x) - Q(x)|$ .

Show Hint

Start with the inequality $\log t \geq (t-1)/\ln 2 - (t-1)^2/(2\ln 2)$ for $t > 0$ .

Alternatively, use the stronger form: $D(P \| Q) \geq \frac{1}{2\ln 2}\chi^2(P\|Q) \geq \frac{1}{2\ln 2}\|P-Q\|_1^2$ .

Solution

Via the convexity approach

Define $f(t) = t \log t$ , which is convex. By a Taylor expansion argument:

$f(t) \geq f(1) + f'(1)(t-1) + \frac{1}{2\ln 2}(t-1)^2 = \frac{t-1}{\ln 2} + \frac{(t-1)^2}{2\ln 2}.$

Setting $t = P(x)/Q(x)$ and summing $\sum_x Q(x) f(P(x)/Q(x))$ :

$D(P\|Q) \geq \frac{1}{2\ln 2}\sum_x \frac{(P(x)-Q(x))^2}{Q(x)} = \frac{\chi^2(P\|Q)}{2\ln 2}.$

Apply Cauchy-Schwarz

By Cauchy-Schwarz: $\|P-Q\|_1^2 = \left(\sum_x |P(x)-Q(x)|\right)^2 \leq \sum_x \frac{(P(x)-Q(x))^2}{Q(x)} \cdot \sum_x Q(x) = \chi^2(P\|Q)$ .

Therefore $D(P\|Q) \geq \frac{1}{2\ln 2}\|P-Q\|_1^2$ .

ex-ch01-09

Medium

Let $X_1, X_2, \ldots, X_n$ be i.i.d. $\sim P_X$ . Show that $H(X_1, \ldots, X_n) = n \, H(X)$ .

Show Hint

Use the chain rule for entropy.

What does independence imply about conditional entropies?

Solution

Chain rule + independence

$H(X_1, \ldots, X_n) = \sum_{i=1}^n H(X_i | X_1, \ldots, X_{i-1})$ .

By independence: $H(X_i | X_1, \ldots, X_{i-1}) = H(X_i) = H(X)$ .

Therefore $H(X_1, \ldots, X_n) = n \, H(X)$ .

ex-ch01-10

Medium

A binary erasure channel (BEC) has input $X \in \{0,1\}$ and output $Y \in \{0, e, 1\}$ . With probability $\delta$ , the output is an erasure $e$ ; otherwise $Y = X$ . For uniform input, compute $I(X;Y)$ .

Show Hint

Compute $H(Y)$ and $H(Y|X)$ separately.

$H(Y|X=x) = h_b(\delta)$ for both values of $x$ .

Solution

Conditional entropy

Given $X = 0$ : $Y = 0$ w.p. $1-\delta$ , $Y = e$ w.p. $\delta$ . So $H(Y|X=0) = h_b(\delta)$ . By symmetry, $H(Y|X=1) = h_b(\delta)$ .

Therefore $H(Y|X) = h_b(\delta)$ .

Output entropy

$P_Y(0) = (1-\delta)/2$ , $P_Y(e) = \delta$ , $P_Y(1) = (1-\delta)/2$ .

$H(Y) = -2 \cdot \frac{1-\delta}{2}\log\frac{1-\delta}{2} - \delta \log \delta = (1-\delta) + h_b(\delta)$ .

Mutual information

$I(X;Y) = H(Y) - H(Y|X) = (1-\delta) + h_b(\delta) - h_b(\delta) = 1 - \delta$ .

This is the capacity of the BEC — each non-erased bit carries exactly one bit of information, and a fraction $\delta$ of bits are erased.

ex-ch01-11

Hard

Prove that $I(X;Y|Z) \leq I(X;Y)$ when $X X \multimap Y \multimap Z Y X \multimap Y \multimap Z Z$ (i.e., the data processing inequality for conditional MI).

Show Hint

Start from $I(X;Y,Z) = I(X;Y) + I(X;Z|Y)$ and also $I(X;Y,Z) = I(X;Z) + I(X;Y|Z)$ .

Use the Markov property to simplify $I(X;Z|Y)$ .

Solution

Two chain rule expansions

$I(X;Y,Z) = I(X;Y) + I(X;Z|Y) = I(X;Z) + I(X;Y|Z)$ .

Since $X X \multimap Y \multimap Z Y X \multimap Y \multimap Z Z$ : $I(X;Z|Y) = 0$ .

Conclude

$I(X;Y) = I(X;Z) + I(X;Y|Z)$ .

Since $I(X;Z) \geq 0$ : $I(X;Y|Z) \leq I(X;Y)$ .

ex-ch01-12

Hard

Let $X$ and $Y$ be random variables with $|\mathcal{X}| = |\mathcal{Y}| = M$ . Prove that if $P_e = \Pr(\hat{X} \neq X) \to 0$ where $\hat{X} = g(Y)$ , then $\frac{1}{n}H(X^n | Y^n) \to 0$ (Fano's inequality applied to sequences).

Show Hint

Apply Fano to $X^n$ with alphabet size $M^n$ .

The bound becomes $H(X^n|Y^n) \leq 1 + P_e \cdot n \log M$ .

Solution

Apply Fano

Fano's inequality with $|\mathcal{X}^n| = M^n$ gives:

$H(X^n | Y^n) \leq h_b(P_e) + P_e \log(M^n - 1) \leq 1 + P_e \cdot n \log M$ .

Normalize by $n$

$\frac{1}{n}H(X^n|Y^n) \leq \frac{1}{n} + P_e \log M \to 0$ as $n \to \infty$ and $P_e \to 0$ .

ex-ch01-13

Hard

Prove the Mrs. Gerber's Lemma: If $X \sim \text{Bernoulli}(p)$ and $Z \sim \text{Bernoulli}(\epsilon)$ independent of $X$ , then $h_b^{-1}(H(X \oplus Z)) \geq h_b^{-1}(H(X)) * \epsilon$ , where $a * b = a(1-b) + (1-a)b$ and $h_b^{-1}$ is the inverse of $h_b$ on $[0, 1/2]$ .

Show Hint

$X \oplus Z \sim \text{Bernoulli}(p * \epsilon)$ .

Show that $h_b(p * \epsilon) \geq h_b(p)$ for $\epsilon \in [0, 1/2]$ when $h_b(p)$ is given.

Use the concavity of $h_b$ and the convexity of $h_b^{-1}$ .

Solution

Setup

Since $X \oplus Z \sim \text{Bernoulli}(p * \epsilon)$ : $H(X \oplus Z) = h_b(p * \epsilon)$ .

We need: $h_b^{-1}(h_b(p * \epsilon)) \geq h_b^{-1}(h_b(p)) * \epsilon$ , i.e., $p * \epsilon \geq p * \epsilon$ , which is trivially true.

The non-trivial version applies when $H(X) \leq h_b(p)$ (i.e., $X$ is not necessarily Bernoulli, just has bounded entropy).

The general statement

For general $X$ with $H(X) = h_b(\alpha)$ (so $\alpha = h_b^{-1}(H(X))$ ):

$H(X \oplus Z) \geq h_b(\alpha * \epsilon)$ .

This follows from the concavity of $h_b$ and the fact that the BSC with parameter $\epsilon$ "smooths" the distribution of $X$ . The proof uses the data processing inequality and properties of the binary convolution.

ex-ch01-14

Hard

Prove that for any discrete random variables $X, Y, Z$ :

$I(X;Y) \leq I(X;Y|Z) + I(Y;Z).$

Interpret this bound in terms of the information that $Z$ reveals about the relationship between $X$ and $Y$ .

Show Hint

Start from $I(X;Y,Z) = I(X;Z) + I(X;Y|Z) = I(X;Y) + I(X;Z|Y)$ .

Use $I(X;Z) \leq H(Z) \leq I(Y;Z) + H(Z|Y)$ .

Actually, try a simpler approach: expand definitions directly.

Solution

Direct expansion

$I(X;Y) = H(X) - H(X|Y)$

$I(X;Y|Z) = H(X|Z) - H(X|Y,Z)$

$I(X;Y) - I(X;Y|Z) = [H(X) - H(X|Z)] - [H(X|Y) - H(X|Y,Z)]$

$= I(X;Z) - I(X;Z|Y)$ .

Bound the difference

We need $I(X;Z) - I(X;Z|Y) \leq I(Y;Z)$ .

$I(X;Z) \leq I(X,Y;Z) = I(Y;Z) + I(X;Z|Y)$ .

Therefore $I(X;Z) - I(X;Z|Y) \leq I(Y;Z)$ .

ex-ch01-15

Hard

(Entropy of a function) Let $Y = g(X)$ for some deterministic function $g$ . Show that $H(X|Y) = H(X) - H(Y)$ and interpret this as the "entropy lost" by applying $g$ .

Show Hint

$H(Y|X) = 0$ since $Y$ is a function of $X$ .

Use the chain rule.

Solution

Chain rule

$H(X,Y) = H(X) + H(Y|X) = H(X) + 0 = H(X)$ .

Also: $H(X,Y) = H(Y) + H(X|Y)$ .

Therefore: $H(X|Y) = H(X) - H(Y)$ .

Interpretation

$H(X|Y)$ is the residual uncertainty about $X$ when we know $g(X)$ . If $g$ is one-to-one, $H(Y) = H(X)$ and no entropy is lost. If $g$ is many-to-one (e.g., $g(x) = |x|$ ), then $H(Y) < H(X)$ and the residual $H(X) - H(Y)$ measures the information destroyed by $g$ .

ex-ch01-16

Challenge

(Non-Shannon inequality) Show that for four random variables $X_1, X_2, X_3, X_4$ , every "Shannon-type" inequality is a non-negative linear combination of the basic identities $I(A;B|C) \geq 0$ . Then state (without proof) the Zhang-Yeung inequality — the first known non-Shannon inequality:

$2I(X_3;X_4) \leq I(X_1;X_2) + I(X_1;X_3,X_4) + 3I(X_3;X_4|X_1) + I(X_3;X_4|X_2).$

Verify that this cannot be derived as a non-negative combination of conditional MI non-negativity constraints.

Show Hint

Shannon-type inequalities are those valid for ALL joint distributions.

The entropy function region $\Gamma_n^*$ is a convex cone.

Zhang and Yeung (1998) found the first inequality constraining $\Gamma_4^*$ beyond Shannon's basic inequalities.

Solution

Shannon inequalities

For any collection of random variables, the basic Shannon inequalities are: $I(A;B|C) \geq 0$ for all subsets $A, B, C$ . All inequalities derivable from these (by taking non-negative linear combinations) are called "Shannon-type."

Zhang-Yeung (1998)

The Zhang-Yeung inequality cannot be derived from $I(A;B|C) \geq 0$ alone. This was verified by showing that the inequality defines a half-space that is strictly tighter than the Shannon cone $\Gamma_4$ . Its discovery showed that the entropy function region is strictly smaller than the Shannon cone for $n \geq 4$ variables.

Significance

This exercise illustrates a deep open problem: characterizing the entropy function region $\Gamma_n^*$ for $n \geq 4$ variables. Non-Shannon inequalities have implications for network coding, secret sharing, and the capacity regions of certain multi-source multi-sink networks.

ex-ch01-17

Challenge

(Strong Fano) Prove the following strengthening of Fano's inequality: if $\hat{X} = g(Y)$ and $P_e = \Pr(\hat{X} \neq X)$ , then

$H(X|Y) \leq h_b(P_e) + P_e H(X|\hat{X}, E=1),$

where $E = \mathbf{1}\{\hat{X} \neq X\}$ . Show that this is always at least as tight as the standard Fano bound.

Show Hint

Follow the standard Fano proof but do not upper-bound $H(X|E=1,Y)$ by $\log(M-1)$ .

Note that $H(X|E=1,Y) \leq H(X|\hat{X}, E=1)$ since $\hat{X} = g(Y)$ .

Solution

Tighter bound

From the standard proof:

$H(X|Y) = h_b(P_e|Y) + P_e H(X|E=1, Y) \leq h_b(P_e) + P_e H(X|E=1, Y)$ .

Now $H(X|E=1, Y) \leq H(X|\hat{X}, E=1)$ because $Y$ provides at least as much information as $\hat{X} = g(Y)$ .

Comparison with standard Fano

The standard bound replaces $H(X|\hat{X}, E=1)$ with $\log(M-1)$ , which is the maximum entropy over $M-1$ outcomes. The strong form is tighter whenever the conditional distribution of $X$ given an error is not uniform over the remaining $M-1$ possibilities.

ex-ch01-18

Medium

Compute the capacity of the Z-channel: input $\mathcal{X} = \{0,1\}$ , output $\mathcal{Y} = \{0,1\}$ , with $P_{Y|X}(0|0) = 1$ (no error for input 0) and $P_{Y|X}(0|1) = p$ (input 1 received as 0 with probability $p$ ). Find the capacity $C = \max_{P_X} I(X;Y)$ .

Show Hint

$H(Y|X) = \Pr(X=1) \cdot h_b(p)$ since $H(Y|X=0) = 0$ .

Optimize over $\Pr(X=1)$ .

The optimal input is NOT necessarily uniform.

Solution

Compute $\ntn{mi}(X;Y)$

Let $q = \Pr(X=1)$ . Then $\Pr(Y=0) = (1-q) + qp = 1 - q(1-p)$ and $\Pr(Y=1) = q(1-p)$ .

$H(Y) = h_b(q(1-p))$ .

$H(Y|X) = q \cdot h_b(p)$ .

$I(X;Y) = h_b(q(1-p)) - q \cdot h_b(p)$ .

Optimize over $q$

Taking the derivative and setting to zero:

$\frac{d}{dq}I = (1-p)\log\frac{1-q(1-p)}{q(1-p)} - h_b(p) = 0$ .

This gives $q^* = \frac{1}{1 + 2^{h_b(p)/(1-p)}}$ , and the capacity is

$C = \log(1 + 2^{-h_b(p)/(1-p)} \cdot (1-p))$ .

ex-ch01-19

Medium

Prove that for stationary processes, the entropy rate satisfies $H'(X) = \lim_{n \to \infty} \frac{1}{n}H(X_1, \ldots, X_n) = \lim_{n \to \infty} H(X_n | X_{n-1}, \ldots, X_1)$ and that both limits exist.

Show Hint

Show $H(X_n|X^{n-1})$ is non-increasing in $n$ for stationary processes.

A bounded non-increasing sequence converges.

Use Cesàro's lemma to connect the two limits.

Solution

Monotonicity

By stationarity: $H(X_{n+1}|X^n) \leq H(X_{n+1}|X_2, \ldots, X_n) = H(X_n|X_1, \ldots, X_{n-1})$ .

The first inequality uses conditioning reduces entropy. The equality uses stationarity (shifting indices). So $H(X_n|X^{n-1})$ is non-increasing and bounded below by 0.

Convergence and Cesàro

A bounded monotone sequence converges, so $\lim_n H(X_n|X^{n-1})$ exists.

$\frac{1}{n}H(X^n) = \frac{1}{n}\sum_{i=1}^n H(X_i|X^{i-1})$ is a Cesàro average of a convergent sequence, so it converges to the same limit.

ex-ch01-20

Easy

Verify that the binary entropy function $h_b(p)$ satisfies: (a) $h_b(0) = h_b(1) = 0$ , (b) $h_b(1/2) = 1$ , (c) $h_b(p) = h_b(1-p)$ for all $p$ .

Show Hint

Direct substitution into the definition.

Solution

Verify all three

(a) $h_b(0) = -0 \cdot \log 0 - 1 \cdot \log 1 = 0$ . Similarly $h_b(1) = -1 \cdot \log 1 - 0 \cdot \log 0 = 0$ .

(b) $h_b(1/2) = -\frac{1}{2}\log\frac{1}{2} - \frac{1}{2}\log\frac{1}{2} = \frac{1}{2} + \frac{1}{2} = 1$ bit.

(c) $h_b(1-p) = -(1-p)\log(1-p) - p\log p = h_b(p)$ . The symmetry reflects the fact that labeling the outcomes 0 and 1 is arbitrary.

Exercises

ex-ch01-01

Apply the formula

ex-ch01-02

Compute

ex-ch01-03

Proof

ex-ch01-04

Compute entropies

ex-ch01-05

Via data processing

Equality condition

ex-ch01-06

Marginals

Independence check

Compute all quantities

ex-ch01-07

Chain rule argument

ex-ch01-08

Via the convexity approach

Apply Cauchy-Schwarz

ex-ch01-09

Chain rule + independence

ex-ch01-10

Conditional entropy

Output entropy

Mutual information

ex-ch01-11

Two chain rule expansions

Conclude

ex-ch01-12

Apply Fano

Normalize by $n$

ex-ch01-13

Setup

The general statement

ex-ch01-14

Direct expansion

Bound the difference

ex-ch01-15

Chain rule

Interpretation

ex-ch01-16

Shannon inequalities

Zhang-Yeung (1998)

Significance

ex-ch01-17

Tighter bound

Comparison with standard Fano

ex-ch01-18

Compute $\ntn{mi}(X;Y)$

Optimize over $q$

ex-ch01-19

Monotonicity

Convergence and Cesàro

ex-ch01-20

Verify all three