Ferkans — Interactive Telecom Tutor

ex-ch04-01

Easy

Let $\mathcal{X} = \{a, b, c\}$ and $n = 3$ . List all types in $\mathcal{P}_3(\mathcal{X})$ , compute $|T_Q|$ for each type $Q$ , and verify that $\sum_Q |T_Q| = 3^3 = 27$ .

Show Hint

A type on a ternary alphabet with $n = 3$ is a triple $(k_a, k_b, k_c)/3$ with $k_a + k_b + k_c = 3$ .

The number of types is $\binom{3 + 2}{2} = 10$ .

Use the multinomial coefficient $|T_Q| = 3!/(k_a! k_b! k_c!)$ .

Solution

Enumerate types

The 10 types are all $(k_a, k_b, k_c)/3$ with $k_a + k_b + k_c = 3$ : $(3,0,0), (0,3,0), (0,0,3)$ : 3 types with all mass on one symbol. $(2,1,0), (2,0,1), (1,2,0), (0,2,1), (1,0,2), (0,1,2)$ : 6 types with two symbols. $(1,1,1)$ : 1 type with uniform distribution.

Compute type class sizes

$|T_{(3,0,0)/3}| = 3!/(3!0!0!) = 1$ (and similarly for the other degenerate types), total: $3 \times 1 = 3$ .
$|T_{(2,1,0)/3}| = 3!/(2!1!0!) = 3$ (and similarly for the other two-symbol types), total: $6 \times 3 = 18$ .
$|T_{(1,1,1)/3}| = 3!/(1!1!1!) = 6$ , total: $1 \times 6 = 6$ .
Sum: $3 + 18 + 6 = 27 = 3^3$ . ✓

ex-ch04-02

Easy

Show that for a binary alphabet $\mathcal{X} = \{0, 1\}$ and the uniform distribution $P = (1/2, 1/2)$ , the type class $T_Q$ with $Q = (k/n, 1 - k/n)$ has $P^n(T_Q) = \binom{n}{k} 2^{-n}.$ Verify that $\sum_{k=0}^n P^n(T_{Q_k}) = 1$ .

Show Hint

Under the uniform distribution, every binary sequence of length $n$ has probability $2^{-n}$ .

The type class $T_{Q_k}$ with $Q_k = (k/n, 1-k/n)$ contains exactly $\binom{n}{k}$ sequences.

Solution

Direct computation

Under $P = (1/2, 1/2)$ , $P^n(\mathbf{x}) = (1/2)^n = 2^{-n}$ for every $\mathbf{x}$ . The type class $T_{Q_k}$ has $|T_{Q_k}| = \binom{n}{k}$ sequences (choosing which $k$ positions have symbol 0). Therefore: $P^n(T_{Q_k}) = \binom{n}{k} \cdot 2^{-n}.$

Verification

$\sum_{k=0}^n P^n(T_{Q_k}) = \sum_{k=0}^n \binom{n}{k} 2^{-n} = 2^{-n} \sum_{k=0}^n \binom{n}{k} = 2^{-n} \cdot 2^n = 1. \; \checkmark$ $

ex-ch04-03

Easy

Let $P = (0.7, 0.3)$ on $\mathcal{X} = \{0, 1\}$ . Compute $D(Q \| P)$ for $Q = (0.5, 0.5)$ and interpret the result using the type class probability formula.

Show Hint

$D(Q \| P) = \sum_a Q(a) \log \frac{Q(a)}{P(a)}$ .

The result tells you the exponential rate at which $P^n(T_Q)$ decays.

Solution

Compute KL divergence

$D(Q \| P) = 0.5 \log_2 \frac{0.5}{0.7} + 0.5 \log_2 \frac{0.5}{0.3} = 0.5(-0.485) + 0.5(0.737) = 0.126 \text{ bits}.$ $

Interpret

This means $P^n(T_{(0.5, 0.5)}) \doteq 2^{-0.126n}$ . For $n = 100$ , the probability of seeing exactly 50 zeros and 50 ones when the true probability of 0 is 0.7 is approximately $2^{-12.6} \approx 1.6 \times 10^{-4}$ . The KL divergence of 0.126 bits is small because $Q = (0.5, 0.5)$ is not too far from $P = (0.7, 0.3)$ .

ex-ch04-04

Easy

For the set $\mathcal{E} = \{Q \in \mathcal{P}(\{0,1\}) : Q(1) \geq 0.8\}$ and true distribution $P = (0.5, 0.5)$ , find $D^*(\mathcal{E} \| P)$ and state the asymptotic probability $P^n(\hat{P}_{\mathbf{X}} \in \mathcal{E})$ .

Show Hint

The I-projection onto $\mathcal{E}$ is the boundary point $Q^* = (0.2, 0.8)$ .

Use the formula for binary KL divergence.

Solution

Find the I-projection

Since $P = (0.5, 0.5)$ is uniform, $D(Q \| P) = \log 2 - H(Q)$ for any binary $Q$ . The minimum over $\mathcal{E}$ is achieved at the boundary $Q^* = (0.2, 0.8)$ , which has the highest entropy in $\mathcal{E}$ .

Compute the exponent

$D^*(\mathcal{E} \| P) = D((0.2, 0.8) \| (0.5, 0.5)) = 0.2 \log_2 \frac{0.2}{0.5} + 0.8 \log_2 \frac{0.8}{0.5} \approx 0.278 \text{ bits}.$ $By Sanov's theorem:$ P^n(\hat{P}_\mathbf{X} \in \mathcal{E}) \doteq 2^{-0.278n}$.

ex-ch04-05

Medium

Prove that $D(Q \| P) = 0$ if and only if $Q = P$ , using only the type class probability formula $P^n(T_Q) \doteq 2^{-nD(Q \| P)}$ and the fact that $P^n(T_P)$ is the maximum over all type class probabilities.

Show Hint

If $D(Q \| P) > 0$ , then $P^n(T_Q)$ decays exponentially while $P^n(T_P)$ stays bounded.

Use the fact that $P^n(T_P) \geq (n+1)^{-|\mathcal{X}|}$ (since the type closest to $P$ has KL divergence 0).

Solution

Forward direction

If $D(Q \| P) > 0$ , then $P^n(T_Q) \leq 2^{-nD(Q \| P)} \to 0$ exponentially. But $P^n(T_P) \geq (n+1)^{-|\mathcal{X}|} > 0$ , so $P^n(T_Q) < P^n(T_P)$ for large $n$ , confirming $Q \neq P$ .

Backward direction

If $Q = P$ , then every sequence in $T_P$ has the same probability $P^n(\mathbf{x}) = 2^{-nH(P)}$ , and $P^n(T_P) = |T_P| \cdot 2^{-nH(P)} \doteq 2^{nH(P)} \cdot 2^{-nH(P)} = 2^0 = 1$ . The exponent is $-D(P \| P) = 0$ , so $D(P \| P) = 0$ .

ex-ch04-06

Medium

Let $X_1, \ldots, X_n$ be i.i.d. Bernoulli( $p$ ) with $p = 0.2$ . Use Sanov's theorem to find the exponential rate at which $\Pr[\hat{P}_{\mathbf{X}}(1) \geq 0.5]$ decays. Compare with the Chernoff bound.

Show Hint

The set is $\mathcal{E} = \{Q : Q(1) \geq 0.5\}$ , and the I-projection is the boundary point $Q^* = (0.5, 0.5)$ .

The Chernoff bound for $\Pr[S_n \geq 0.5n]$ with $S_n = \sum X_i$ uses the moment generating function.

Solution

Sanov exponent

$D^*(\mathcal{E} \| P) = D((0.5, 0.5) \| (0.8, 0.2)) = 0.5 \log_2 \frac{0.5}{0.8} + 0.5 \log_2 \frac{0.5}{0.2} = 0.5(-0.678) + 0.5(1.322) = 0.322$ bits.

So $\Pr[\hat{P}(1) \geq 0.5] \doteq 2^{-0.322n}$ .

Chernoff bound

The Chernoff exponent is $\sup_{t > 0} [0.5t - \log M_X(t)]$ where $M_X(t) = 0.8 + 0.2e^t$ . Optimizing: $0.5 = 0.2e^{t^*}/(0.8 + 0.2e^{t^*})$ , giving $e^{t^*} = 4$ and $\sup [0.5t - \log(0.8 + 0.2e^t)] = 0.5 \ln 4 - \ln(0.8 + 0.8) = \ln 2 - \ln 1.6 = 0.2231.$ Converting to bits: $0.2231 / \ln 2 = 0.322$ bits. The exponents match!

Connection

This is not a coincidence: for i.i.d. binary random variables, Sanov's theorem and the Chernoff bound give the same exponent. Sanov is more general (it works for any set of distributions on any finite alphabet), while Chernoff is specific to tail probabilities. Both are manifestations of the Cramér large deviation principle.

ex-ch04-07

Medium

Show that the source coding error exponent $E_s(R)$ is a convex function of $R$ for $R > H(P)$ , and that $E_s(R) \to D(U \| P)$ as $R \to \log |\mathcal{X}|$ , where $U$ is the uniform distribution on $\mathcal{X}$ .

Show Hint

Convexity of $E_s$ follows from the fact that it is a pointwise minimum of affine functions of $R$ .

At $R = \log |\mathcal{X}|$ , the constraint $H(Q) \geq R$ forces $Q = U$ .

Solution

Convexity

Fix any $Q$ . The constraint $H(Q) \geq R$ defines a set that shrinks as $R$ increases. Therefore $E_s(R) = \min_{Q : H(Q) \geq R} D(Q \| P)$ is the minimum of $D(Q \| P)$ over a shrinking feasible set, which makes $E_s$ non-decreasing in $R$ . Convexity follows from the general principle that the value function of a convex program with a parameter entering linearly in the constraint is convex.

Limiting behavior

At $R = \log |\mathcal{X}|$ , the constraint $H(Q) \geq \log |\mathcal{X}|$ forces $Q = U$ (the uniform distribution), since $H(Q) \leq \log |\mathcal{X}|$ with equality iff $Q = U$ . Therefore $E_s(\log |\mathcal{X}|) = D(U \| P)$ .

ex-ch04-08

Medium

For a BSC with crossover probability $p$ , the random coding exponent with the optimal (uniform) input distribution is $E_r(R) = \max_{0 \leq \rho \leq 1} \left[E_0(\rho) - \rho R\right]$ where $E_0(\rho) = 1 - (1+\rho)\log\!\left[p^{1/(1+\rho)} + (1-p)^{1/(1+\rho)}\right]$ (in bits). Show that $E_r(C) = 0$ and $E_r(0) = E_0(1)$ .

Show Hint

At $R = C$ , the maximum over $\rho$ is achieved at $\rho = 0$ .

At $R = 0$ , the maximum is achieved at $\rho = 1$ .

Solution

$E_r(\ntn{cap}) = 0$

$E_r(C) = \max_\rho [E_0(\rho) - \rho C]$ . Since $E_0(0) = 0$ and $E_0'(0) = I(X;Y) = C$ (for uniform input on a symmetric channel), the function $E_0(\rho) - \rho C$ has value 0 at $\rho = 0$ and derivative $E_0'(0) - C = 0$ at $\rho = 0$ . Since $E_0$ is concave in $\rho$ , the function $E_0(\rho) - \rho C$ is concave with maximum at $\rho = 0$ , giving $E_r(C) = 0$ .

$E_r(0) = E_0(1)$

$E_r(0) = \max_\rho E_0(\rho)$ . Since $E_0$ is concave and increasing (its derivative at any $\rho$ is the Rényi mutual information, which is positive), the maximum on $[0,1]$ is at $\rho = 1$ : $E_r(0) = E_0(1)$ . This is called the zero-rate exponent and represents the best reliability achievable when we allow only a polynomial number of codewords.

ex-ch04-09

Medium

Consider a ternary symmetric channel with $\mathcal{X} = \mathcal{Y} = \{0, 1, 2\}$ and transition probabilities $W(y|x) = 1-2\epsilon$ if $y = x$ and $W(y|x) = \epsilon$ otherwise. Find the channel capacity and the zero-rate exponent $E_r(0) = E_0(1)$ for $\epsilon = 0.1$ .

Show Hint

By symmetry, the optimal input distribution is uniform.

$C = \log 3 - H(1-2\epsilon, \epsilon, \epsilon)$ .

Solution

Channel capacity

By symmetry, $Q^* = (1/3, 1/3, 1/3)$ . $C = \log_2 3 - H(0.8, 0.1, 0.1) = 1.585 - (0.8 \log_2 1.25 + 0.2 \log_2 10)$ $= 1.585 - (0.8 \times 0.322 + 0.2 \times 3.322) = 1.585 - (0.258 + 0.664) = 0.663$ bits.

Zero-rate exponent

$E_0(1) = -\log_2 \sum_y [\sum_x Q(x) \sqrt{W(y|x)}]^2 = -\log_2 3 \cdot [\frac{1}{3}(\sqrt{0.8} + 2\sqrt{0.1})]^2$ . Computing: $\sqrt{0.8} + 2\sqrt{0.1} = 0.894 + 0.632 = 1.526$ , so $[\frac{1.526}{3}]^2 = 0.259$ . Then $E_0(1) = -\log_2(3 \times 0.259) = -\log_2 0.776 = 0.366$ bits.

ex-ch04-10

Medium

Prove that for any DMC, $E_r(R) \geq E_{sp}(R)$ cannot hold — i.e., the sphere-packing exponent is always an upper bound: $E_r(R) \leq E_{sp}(R)$ . (You do not need to prove the sphere-packing bound; just explain why the achievability exponent cannot exceed the converse exponent.)

Show Hint

This is a fundamental consistency argument, not a calculation.

If there existed a rate where $E_r > E_{sp}$ , what would it mean for the error probability?

Solution

Consistency argument

The random coding exponent $E_r(R)$ provides a lower bound on the reliability function $E(R) = \lim_{n \to \infty} -\frac{1}{n} \log P_e^{(n)*}$ (using the best code of each blocklength). The sphere-packing exponent provides an upper bound: $E(R) \leq E_{sp}(R)$ .

Since $E_r(R) \leq E(R) \leq E_{sp}(R)$ , we always have $E_r \leq E_{sp}$ . If $E_r > E_{sp}$ at some rate, it would mean the best code has error probability decaying faster than $2^{-nE_{sp}}$ — contradicting the sphere-packing converse. This would be a logical impossibility.

ex-ch04-11

Hard

Derive the source coding error exponent for a binary source with $P(0) = p$ , $P(1) = 1-p$ in closed form. Show that: $E_s(R) = D(Q^* \| P)$ where $Q^* = (q^*, 1-q^*)$ with $H(q^*) = R$ and $|q^* - p| < |1-q^* - p|$ (i.e., $q^*$ is the root of $H(q) = R$ on the same side as $p$ ).

Show Hint

For a binary source, the optimization $\min_{Q : H(Q) \geq R} D(Q \| P)$ reduces to a one-dimensional problem.

The entropy function $H(q) = -q \log q - (1-q) \log(1-q)$ is symmetric about $q = 0.5$ and concave.

There are two solutions to $H(q) = R$ when $R < 1$ ; the one closer to $p$ gives smaller KL divergence.

Solution

Reduce to one dimension

For binary $Q = (q, 1-q)$ and $P = (p, 1-p)$ : $E_s(R) = \min_{q : H(q) \geq R} D(q \| p)$ where $D(q \| p) = q \log(q/p) + (1-q) \log((1-q)/(1-p))$ .

Since $D(q \| p)$ is strictly convex in $q$ with minimum at $q = p$ , and the constraint set $\{q : H(q) \geq R\}$ is an interval $[q_L, q_R]$ (with $q_L < 0.5 < q_R$ when $R < 1$ ), the minimum is at the boundary point closest to $p$ .

Identify the minimizer

If $p < 0.5$ (low-entropy source), then $H(p) < 1$ and for $R > H(p)$ , the constraint boundary has two points $q_L, q_R$ with $H(q_L) = H(q_R) = R$ and $q_L < p < 0.5 < q_R$ (when $R$ is not too large). The minimizer is $q^* = q_L$ since $|q_L - p| < |q_R - p|$ . The case $p > 0.5$ is symmetric.

Closed form

The exponent is $E_s(R) = D(q^* \| p)$ where $q^*$ solves $H(q^*) = R$ on the same side of 0.5 as $p$ . No simpler closed form exists, but for small redundancy $\delta = R - H(p)$ , we can expand: $E_s(R) \approx \frac{\delta^2}{2 \text{Var}[\log(1/P(X))]}$ which shows quadratic growth of the exponent in the redundancy.

ex-ch04-12

Hard

(Conditional type class size.) Let $\mathcal{X} = \mathcal{Y} = \{0, 1\}$ , $n = 6$ , and $\mathbf{x} = (0, 0, 0, 1, 1, 1)$ (type $Q = (1/2, 1/2)$ ). A conditional type $V$ is defined by $V(0|0) = 2/3$ , $V(1|0) = 1/3$ , $V(0|1) = 1/3$ , $V(1|1) = 2/3$ . Enumerate all $\mathbf{y} \in T_{V|\mathbf{x}}$ and verify $|T_{V|\mathbf{x}}| \leq 2^{nH(V|Q)}$ .

Show Hint

The first 3 positions (where $x_i = 0$ ) must have exactly 2 zeros and 1 one in $\mathbf{y}$ .

The last 3 positions (where $x_i = 1$ ) must have exactly 1 zero and 2 ones in $\mathbf{y}$ .

Solution

Enumerate

For positions $\{1,2,3\}$ (where $x_i = 0$ ): $\mathbf{y}$ must have 2 zeros, 1 one. Choices: $(0,0,1), (0,1,0), (1,0,0)$ — 3 options.

For positions $\{4,5,6\}$ (where $x_i = 1$ ): $\mathbf{y}$ must have 1 zero, 2 ones. Choices: $(0,1,1), (1,0,1), (1,1,0)$ — 3 options.

Total: $|T_{V|\mathbf{x}}| = 3 \times 3 = 9$ .

Verify the bound

$H(V|Q) = Q(0)H(V(\cdot|0)) + Q(1)H(V(\cdot|1)) = 0.5 \cdot H(2/3) + 0.5 \cdot H(2/3) = H(2/3) \approx 0.918$ bits.

$2^{nH(V|Q)} = 2^{6 \times 0.918} = 2^{5.51} \approx 45.5$ . We have $|T_{V|\mathbf{x}}| = 9 \leq 45.5$ . ✓

The bound is loose because $n = 6$ is very small — the polynomial correction factor matters.

ex-ch04-13

Hard

(Sanov for hypothesis testing.) Consider testing $H_0: P = (0.5, 0.5)$ vs $H_1: P = (0.3, 0.7)$ on binary sequences of length $n$ . A type-based test rejects $H_0$ if $D(\hat{P}_\mathbf{X} \| P_0) > \gamma$ .

(a) Find the type-I error exponent as a function of $\gamma$ . (b) Find the type-II error exponent as a function of $\gamma$ . (c) Find the optimal $\gamma$ that maximizes the type-II error exponent subject to the type-I error exponent being at least $\alpha_0 = 0.1$ .

Show Hint

The type-I error is $P_0^n(D(\hat{P} \| P_0) > \gamma)$ , which by Sanov decays as $2^{-n \min_{Q: D(Q \| P_0) > \gamma} D(Q \| P_0)} = 2^{-n\gamma}$ .

The type-II error is $P_1^n(D(\hat{P} \| P_0) \leq \gamma)$ .

Solution

Type-I error exponent

The rejection region is $\mathcal{E}_1 = \{Q : D(Q \| P_0) > \gamma\}$ . The closest point in $\mathcal{E}_1$ to $P_0$ has $D(Q \| P_0) = \gamma$ (the boundary). So the type-I error exponent is $\gamma$ .

Type-II error exponent

The acceptance region is $\mathcal{E}_0 = \{Q : D(Q \| P_0) \leq \gamma\}$ . The closest point in $\mathcal{E}_0$ to $P_1 = (0.3, 0.7)$ determines the exponent. If $D(P_1 \| P_0) > \gamma$ (i.e., $P_1$ is outside the acceptance ball), the type-II exponent is $\min_{Q: D(Q \| P_0) \leq \gamma} D(Q \| P_1) > 0$ .

Optimize

$D(P_1 \| P_0) = 0.3 \log(0.3/0.5) + 0.7 \log(0.7/0.5) = 0.3(-0.737) + 0.7(0.485) = 0.119$ bits. For the constraint that type-I exponent $\geq \alpha_0 = 0.1$ , we need $\gamma \geq 0.1$ . Since $D(P_1 \| P_0) = 0.119 > 0.1 = \gamma$ , at $\gamma = 0.1$ the point $P_1$ is just outside the acceptance ball, giving a positive type-II exponent. Setting $\gamma = 0.1$ maximizes the acceptance region (and thus the type-II exponent) while meeting the constraint.

ex-ch04-14

Hard

(Method of types proof of the channel coding theorem.) Use the method of types to prove that for a DMC with capacity $C > 0$ , any rate $R < C$ is achievable. Specifically, show that a random codebook with $2^{nR}$ codewords drawn i.i.d. from the capacity-achieving distribution $Q^*$ has average error probability $\overline{P}_e \to 0$ as $n \to \infty$ .

Show Hint

Fix codeword $\mathbf{x}_1$ . The error event is that there exists $m \neq 1$ such that $(\mathbf{x}_m, \mathbf{y})$ is jointly typical.

Use the packing lemma from Chapter 3, or derive the bound directly using conditional types.

The probability that a random $\mathbf{x}_m$ forms a joint type with mutual information $\leq R$ with $\mathbf{y}$ can be bounded using type class sizes.

Solution

Setup

Generate $M = 2^{nR}$ codewords $\mathbf{x}_1, \ldots, \mathbf{x}_M$ i.i.d. $\sim (Q^*)^n$ . Transmit $\mathbf{x}_1$ , receive $\mathbf{y}$ . Decode to $\mathbf{x}_m$ if it is the unique codeword jointly typical with $\mathbf{y}$ .

Error analysis using types

Error occurs if: (a) $(\mathbf{x}_1, \mathbf{y})$ is not jointly typical (probability $\to 0$ ), or (b) there exists $m \neq 1$ with $(\mathbf{x}_m, \mathbf{y})$ jointly typical.

For a fixed $\mathbf{y}$ with type $Q_Y$ , the probability that a random $\mathbf{x}_m \sim (Q^*)^n$ is jointly typical with $\mathbf{y}$ is at most $(n+1)^{|\mathcal{X}||\mathcal{Y}|} \cdot 2^{-nI(Q^*, W)}$ (summing over compatible joint types). By the union bound over $M - 1$ codewords: $\Pr[\text{error}] \leq \epsilon_n + (M-1)(n+1)^{|\mathcal{X}||\mathcal{Y}|} 2^{-nI(Q^*, W)}.$

Conclusion

Since $M = 2^{nR}$ and $R < I(Q^*, W) = C$ : $\overline{P}_e \leq \epsilon_n + \text{poly}(n) \cdot 2^{-n(C - R)} \to 0.$ This proves achievability. The method of types gives a clean proof with an explicit (though not tight) error exponent of $C - R$ .

ex-ch04-15

Hard

(I-projection onto an exponential family.) Let $P$ be a distribution on $\mathcal{X} = \{1, \ldots, m\}$ and $\mathcal{E} = \{Q : \mathbb{E}_Q[X] \geq \mu\}$ for some $\mu > \mathbb{E}_P[X]$ . Show that the I-projection $Q^* = \arg\min_{Q \in \mathcal{E}} D(Q \| P)$ is an exponential tilting of $P$ : $Q^*(x) = \frac{P(x) e^{\lambda^* x}}{\sum_{x'} P(x') e^{\lambda^* x'}}$ for some $\lambda^* \geq 0$ .

Show Hint

Write the Lagrangian for the constrained optimization $\min_Q D(Q \| P)$ subject to $\sum_x Q(x) x \geq \mu$ and $\sum_x Q(x) = 1$ .

Use the KKT conditions.

Solution

Lagrangian

$\mathcal{L}(Q, \lambda, \nu) = \sum_x Q(x) \log \frac{Q(x)}{P(x)} - \lambda\left(\sum_x Q(x)x - \mu\right) - \nu\left(\sum_x Q(x) - 1\right)$ $

KKT conditions

Setting $\frac{\partial \mathcal{L}}{\partial Q(x)} = 0$ : $\log \frac{Q^*(x)}{P(x)} + 1 - \lambda x - \nu = 0$ $Q^*(x) = P(x) \exp(\lambda x + \nu - 1) = P(x) \frac{e^{\lambda x}}{Z(\lambda)}$ where $Z(\lambda) = \sum_{x'} P(x') e^{\lambda x'}$ ensures normalization.

Complementary slackness

By complementary slackness, $\lambda^* \geq 0$ with $\lambda^*(\mathbb{E}_{Q^*}[X] - \mu) = 0$ . If $\mu > \mathbb{E}_P[X]$ , the constraint is active and $\lambda^* > 0$ . The tilting parameter $\lambda^*$ is determined by $\mathbb{E}_{Q^*}[X] = \frac{d}{d\lambda}\log Z(\lambda)\big|_{\lambda=\lambda^*} = \mu$ .

ex-ch04-16

Challenge

(Gap between $E_r$ and $E_{sp}$ below the critical rate.) For the BEC with erasure probability $\epsilon$ , compute $E_r(R)$ and $E_{sp}(R)$ in closed form. Show that: (a) The critical rate is $R_{cr} = 0$ (i.e., the random coding exponent is tight for all positive rates). (b) Therefore $E_r(R) = E_{sp}(R)$ for all $0 < R < C = 1 - \epsilon$ . This makes the BEC one of the few channels where the reliability function is known exactly.

Show Hint

For the BEC, the channel output is either the input or an erasure symbol $e$ . The transition probabilities are $W(x|x) = 1-\epsilon$ , $W(e|x) = \epsilon$ .

Gallager's function simplifies dramatically for the BEC because the non-erased outputs perfectly identify the input.

Compute $E_0(\rho)$ explicitly and find $R_{cr} = E_0'(1)$ .

Solution

Gallager's function for the BEC

With uniform input $Q = (1/2, 1/2)$ and $\mathcal{Y} = \{0, 1, e\}$ : $E_0(\rho) = -\log\!\left[\sum_y \left(\sum_x Q(x) W(y|x)^{1/(1+\rho)}\right)^{1+\rho}\right].$ For $y = 0$ : only $x = 0$ contributes, giving $[\frac{1}{2}(1-\epsilon)^{1/(1+\rho)}]^{1+\rho} = \frac{1}{2^{1+\rho}}(1-\epsilon)$ . Similarly for $y = 1$ . For $y = e$ : both inputs contribute $\epsilon^{1/(1+\rho)}$ , giving $[2 \cdot \frac{1}{2}\epsilon^{1/(1+\rho)}]^{1+\rho} = \epsilon$ . Total: $E_0(\rho) = -\log[2 \cdot \frac{1-\epsilon}{2^{1+\rho}} + \epsilon] = -\log[\frac{1-\epsilon}{2^\rho} + \epsilon]$ .

Critical rate

$E_0'(\rho) = \frac{(1-\epsilon) \ln 2 \cdot 2^{-\rho}}{(1-\epsilon)2^{-\rho} + \epsilon}$ (in nats, convert to bits). At $\rho = 1$ : $R_{cr} = E_0'(1) = \frac{(1-\epsilon)/2}{(1-\epsilon)/2 + \epsilon} = \frac{1-\epsilon}{1+\epsilon}$ .

Wait — let me recompute carefully. $E_0'(\rho)$ at $\rho = 1$ : $E_0'(1) = \frac{(1-\epsilon) \ln 2 / 2}{[(1-\epsilon)/2 + \epsilon] \ln 2} = \frac{(1-\epsilon)/2}{(1+\epsilon)/2} = \frac{1-\epsilon}{1+\epsilon}$ .

For the BEC, $R_{cr} = \frac{1-\epsilon}{1+\epsilon}$ , which is positive for $\epsilon < 1$ . So the random coding exponent is not tight at all rates — the statement needs correction. The BEC does have a gap below $R_{cr}$ , but this gap is smaller than for most channels.

Reliability function

The random coding exponent for the BEC is $E_r(R) = \max_{0 \leq \rho \leq 1}[E_0(\rho) - \rhoR]$ . For $R \geq R_{cr}$ , the optimizer $\rho^* < 1$ and $E_r = E_{sp}$ . For $R < R_{cr}$ , $\rho^* = 1$ and $E_r(R) = E_0(1) - R$ , which is a straight line. The sphere-packing exponent may be strictly larger in this regime. The BEC is actually a useful example because the calculations are tractable even though the full reliability function has the same three-regime structure as general DMCs.

ex-ch04-17

Challenge

(Csiszár–Körner proof of the converse for channel coding.) Using the method of types, prove that for a DMC, any sequence of codes with rate $R > C$ has $P_e^{(n)} \to 1$ . Specifically, show that for any codebook $\{\mathbf{x}_1, \ldots, \mathbf{x}_M\}$ with $M > 2^{nC}$ : $P_e^{(n)} \geq 1 - \frac{2^{nC}}{M} - \epsilon_n$ where $\epsilon_n \to 0$ .

Hint: This does not require Fano's inequality — only the type counting arguments from this chapter.

Show Hint

Partition the codewords by their types. For each type $Q$ , bound the number of distinguishable codewords using the mutual information $I(Q, W)$ .

The total number of distinguishable codewords across all types is at most $\sum_Q 2^{nI(Q, W)} \leq (n+1)^{|\mathcal{X}|} \cdot 2^{nC}$ .

Solution

Distinguishable codewords per type

Fix a type $Q$ . Codewords of type $Q$ must be "distinguishable" — their output conditional type classes must not overlap too much. The number of conditionally typical output sequences for a codeword $\mathbf{x} \in T_Q$ is $\doteq 2^{nH(Y|X)}$ (under the true channel $W$ ). The total output space for type- $Q$ inputs has $\doteq 2^{nH(Y)}$ typical outputs. Therefore at most $\doteq 2^{n(H(Y) - H(Y|X))} = 2^{nI(Q,W)}$ codewords of type $Q$ can be distinguished.

Sum over types

Total distinguishable codewords: $M_{\text{dist}} \leq \sum_{Q \in \mathcal{P}_n} 2^{nI(Q,W)} \leq (n+1)^{|\mathcal{X}|} \cdot 2^{n \max_Q I(Q,W)} = \text{poly}(n) \cdot 2^{nC}.$

Error probability bound

If $M > M_{\text{dist}}$ , the excess codewords cannot all be decoded correctly. The fraction of decodable codewords is at most $M_{\text{dist}}/M$ , so: $P_e^{(n)} \geq 1 - \frac{M_{\text{dist}}}{M} \geq 1 - \frac{\text{poly}(n) \cdot 2^{nC}}{2^{nR}}.$ For $R > C$ , this tends to 1 as $n \to \infty$ .

ex-ch04-18

Challenge

(Duality between source and channel exponents.) The source coding exponent at rate $R$ is $E_s(R) = \min_{Q : H(Q) \geq R} D(Q \| P)$ , and the simple channel coding exponent (at zero rate, for maximum likelihood decoding with two codewords) is $E_c = \min_V D(V \| W | Q)$ subject to a confusion constraint.

Show that both exponents can be expressed as I-projections on the probability simplex, and explain the geometric duality: source coding projects onto a superlevel set of entropy, while channel coding projects onto a sublevel set of mutual information.

Show Hint

Draw both problems on the probability simplex.

For source coding, $\mathcal{E} = \{Q : H(Q) \geq R\}$ is the region above an entropy contour.

For channel coding, the analogous set involves distributions where two inputs are 'confusable.'

Solution

Source coding as I-projection

$E_s(R) = D^*(\mathcal{E}_s \| P)$ where $\mathcal{E}_s = \{Q : H(Q) \geq R\}$ . Geometrically, $\mathcal{E}_s$ is the region of the simplex with entropy at least $R$ — a convex set (since entropy is concave, superlevel sets are convex). The I-projection is the point on the boundary $H(Q) = R$ closest to $P$ .

Channel coding as I-projection

For the channel coding converse, the "bad" event is that the output type is consistent with a different codeword. The set of "confusing" conditional types is $\mathcal{E}_c = \{V : I(Q, V) \leq R\}$ — the sublevel set of mutual information. The converse exponent is $D^*(\mathcal{E}_c \| W | Q) = \min_{V \in \mathcal{E}_c} D(V \| W | Q)$ .

Duality

Both exponents are I-projections, but on dual objects:

Source coding: project the true source $P$ onto $\{H \geq R\}$ , asking "how far is $P$ from the bad set?"
Channel coding: project the true channel $W$ onto $\{I \leq R\}$ , asking "how far is $W$ from the confusing set?"

The source asks "how unlikely is it that the type has too much entropy?" while the channel asks "how unlikely is it that the output looks consistent with the wrong input?" Both are measured by the KL divergence from truth to the nearest "bad" distribution.

Exercises

ex-ch04-01

Enumerate types

Compute type class sizes

ex-ch04-02

Direct computation

Verification

ex-ch04-03

Compute KL divergence

Interpret

ex-ch04-04

Find the I-projection

Compute the exponent

ex-ch04-05

Forward direction

Backward direction

ex-ch04-06

Sanov exponent

Chernoff bound

Connection

ex-ch04-07

Convexity

Limiting behavior

ex-ch04-08

$E_r(\ntn{cap}) = 0$

$E_r(0) = E_0(1)$

ex-ch04-09

Channel capacity

Zero-rate exponent

ex-ch04-10

Consistency argument

ex-ch04-11

Reduce to one dimension

Identify the minimizer

Closed form

ex-ch04-12

Enumerate

Verify the bound

ex-ch04-13

Type-I error exponent

Type-II error exponent

Optimize

ex-ch04-14

Setup

Error analysis using types

Conclusion

ex-ch04-15

Lagrangian

KKT conditions

Complementary slackness

ex-ch04-16

Gallager's function for the BEC

Critical rate

Reliability function

ex-ch04-17

Distinguishable codewords per type

Sum over types

Error probability bound

ex-ch04-18

Source coding as I-projection

Channel coding as I-projection

Duality