Ferkans — Interactive Telecom Tutor

ex-ch05-01

Easy

Verify Kraft's inequality for the code $\{0, 10, 110, 111\}$ and determine if there is room to add another codeword.

Show Hint

Compute $\sum 2^{-\ell_i}$ and check if it is $\leq 1$ .

Solution

Compute Kraft sum

Lengths: $(1, 2, 3, 3)$ . Kraft sum: $2^{-1} + 2^{-2} + 2^{-3} + 2^{-3} = 0.5 + 0.25 + 0.125 + 0.125 = 1.0$ . Kraft with equality — no room for another codeword. The code tree is complete (every leaf is used).

ex-ch05-02

Easy

A source has four symbols with probabilities $(0.5, 0.25, 0.125, 0.125)$ . Construct the Huffman code and verify that it achieves $L^* = H(X)$ .

Show Hint

When probabilities are dyadic (powers of 1/2), Huffman achieves entropy exactly.

Solution

Huffman construction

Merge 0.125 + 0.125 = 0.25. Remaining: (0.5, 0.25, 0.25). Merge 0.25 + 0.25 = 0.5. Remaining: (0.5, 0.5). Merge 0.5 + 0.5 = 1.0.

Codewords: $x_1 \to 0$ (length 1), $x_2 \to 10$ (length 2), $x_3 \to 110$ (length 3), $x_4 \to 111$ (length 3).

Verify optimality

$L = 0.5(1) + 0.25(2) + 0.125(3) + 0.125(3) = 0.5 + 0.5 + 0.375 + 0.375 = 1.75$ bits. $H(X) = 0.5 \log 2 + 0.25 \log 4 + 0.125 \log 8 + 0.125 \log 8 = 0.5 + 0.5 + 0.375 + 0.375 = 1.75$ bits. $L = H(X)$ exactly. ✓

ex-ch05-03

Easy

Compute the Shannon code (codeword lengths and expected length) for $P = (0.4, 0.3, 0.2, 0.1)$ . Compare with entropy.

Show Hint

Shannon code lengths: $\ell_i = \lceil -\log_2 P(x_i) \rceil$ .

Solution

Compute lengths

$\ell_1 = \lceil -\log_2 0.4 \rceil = \lceil 1.32 \rceil = 2$ . $\ell_2 = \lceil -\log_2 0.3 \rceil = \lceil 1.74 \rceil = 2$ . $\ell_3 = \lceil -\log_2 0.2 \rceil = \lceil 2.32 \rceil = 3$ . $\ell_4 = \lceil -\log_2 0.1 \rceil = \lceil 3.32 \rceil = 4$ .

Expected length and comparison

$L_{Shannon} = 0.4(2) + 0.3(2) + 0.2(3) + 0.1(4) = 0.8 + 0.6 + 0.6 + 0.4 = 2.4$ bits. $H(X) \approx 1.846$ bits. Gap: $2.4 - 1.846 = 0.554$ bits, within the guaranteed 1-bit bound.

ex-ch05-04

Easy

A binary Markov chain has $P(0|0) = 0.95$ , $P(1|0) = 0.05$ , $P(0|1) = 0.4$ , $P(1|1) = 0.6$ . Find the stationary distribution and the entropy rate.

Show Hint

Stationary: $\pi_0 \cdot 0.05 = \pi_1 \cdot 0.4$ .

Solution

Stationary distribution

$\pi_0 \cdot 0.05 = \pi_1 \cdot 0.4$ and $\pi_0 + \pi_1 = 1$ . $\pi_0 = 0.4/0.45 = 8/9 \approx 0.889$ , $\pi_1 = 0.05/0.45 = 1/9 \approx 0.111$ .

Entropy rate

$H_\infty = \pi_0 H(0.05) + \pi_1 H(0.4)$ $= 0.889 \times 0.286 + 0.111 \times 0.971 = 0.254 + 0.108 = 0.362$ bits/symbol.

Marginal entropy: $H(X) = H(0.889) = 0.503$ bits. Memory saves $0.503 - 0.362 = 0.141$ bits/symbol (28% improvement).

ex-ch05-05

Medium

Prove McMillan's inequality: if a code (not necessarily prefix) is uniquely decodable, then $\sum_{i=1}^m D^{-\ell_i} \leq 1$ . Use the "power trick": expand $(\sum D^{-\ell_i})^k$ and bound the result.

Show Hint

Consider $S^k = (\sum_{i=1}^m D^{-\ell_i})^k = \sum_{i_1, \ldots, i_k} D^{-(\ell_{i_1} + \cdots + \ell_{i_k})}$ .

Group terms by total length $L = \ell_{i_1} + \cdots + \ell_{i_k}$ .

The number of terms with total length $L$ is the number of source sequences mapping to code strings of length $L$ .

Solution

Expand the power

$S^k = \left(\sum_{i=1}^m D^{-\ell_i}\right)^k = \sum_{L = k}^{k\ell_{\max}} A_L \cdot D^{-L}$ $where$ A_L $is the number of$ k $-tuples$ (i_1, \ldots, i_k) $with$ \ell_{i_1} + \cdots + \ell_{i_k} = L$.

Bound $A_L$

$A_L$ counts the number of source sequences whose concatenated codeword has total length $L$ . Since the code is uniquely decodable, each binary string of length $L$ can decode to at most one source sequence. There are $D^L$ binary strings of length $L$ , so $A_L \leq D^L$ .

Conclude

$S^k \leq \sum_{L=k}^{k\ell_{\max}} D^L \cdot D^{-L} = k\ell_{\max} - k + 1 \leq k\ell_{\max}.$ $So$ S \leq (k\ell_{\max})^{1/k} \to 1 $as$ k \to \infty $. Since$ S $is independent of$ k $,$ S \leq 1$.

ex-ch05-06

Medium

A source has 8 equiprobable symbols. Compare the expected codeword length of: (a) A fixed-length code (b) The Huffman code (c) The Shannon code. What do you observe?

Show Hint

For equiprobable symbols, all three codes should give the same result.

Solution

All codes are equivalent

With $P(x_i) = 1/8$ for all $i$ :

Fixed-length: 3 bits each. $L = 3$ .
Huffman: all lengths equal (no asymmetry to exploit). $L = 3$ .
Shannon: $\ell_i = \lceil \log_2 8 \rceil = 3$ for all $i$ . $L = 3$ .
Entropy: $H(X) = \log_2 8 = 3$ bits.

For uniform distributions, there is no compression possible — the entropy equals the alphabet size logarithm, and all codes achieve this exactly.

ex-ch05-07

Medium

A code is designed for distribution $Q = (0.5, 0.25, 0.25)$ , but the true source distribution is $P = (0.7, 0.2, 0.1)$ . Compute: (a) the expected codeword length under $P$ , (b) the entropy $H(P)$ , and (c) the mismatch penalty $D(P \| Q)$ . Verify that $L = H(P) + D(P \| Q) + \delta$ where $0 \leq \delta < 1$ .

Show Hint

The Shannon code for $Q$ has lengths $\ell_i = \lceil -\log Q(x_i) \rceil$ .

Solution

Compute Shannon code for $Q$

$\ell_1 = \lceil -\log_2 0.5 \rceil = 1$ , $\ell_2 = \lceil -\log_2 0.25 \rceil = 2$ , $\ell_3 = \lceil -\log_2 0.25 \rceil = 2$ .

Expected length under $P$

$L = 0.7(1) + 0.2(2) + 0.1(2) = 0.7 + 0.4 + 0.2 = 1.3$ bits.

Entropy and KL divergence

$H(P) = -0.7\log 0.7 - 0.2\log 0.2 - 0.1\log 0.1 \approx 1.157$ bits. $D(P \| Q) = 0.7\log(0.7/0.5) + 0.2\log(0.2/0.25) + 0.1\log(0.1/0.25) \approx 0.087$ bits. $H(P) + D(P \| Q) \approx 1.244$ bits. $L - (H(P) + D(P \| Q)) = 1.3 - 1.244 = 0.056 \in [0, 1)$ . ✓

ex-ch05-08

Medium

Show that the entropy rate of a second-order binary Markov chain (where $X_n$ depends on $X_{n-1}$ and $X_{n-2}$ ) can be written as $H_\infty = \sum_{a,b \in \{0,1\}} \pi(a,b) H(X_n | X_{n-1} = b, X_{n-2} = a)$ where $\pi(a,b)$ is the stationary distribution of the pair $(X_{n-2}, X_{n-1})$ .

Show Hint

A second-order Markov chain on $\{0,1\}$ can be viewed as a first-order Markov chain on $\{0,1\}^2$ .

Solution

Reduce to first-order

Define $Y_n = (X_{n-1}, X_n) \in \{0,1\}^2$ . Then $\{Y_n\}$ is a first-order Markov chain on a 4-symbol alphabet, with transition probabilities determined by the second-order conditional probabilities $P(X_n | X_{n-1}, X_{n-2})$ .

Apply first-order formula

The entropy rate of $\{Y_n\}$ is $H(Y_n | Y_{n-1})$ , but since $Y_n$ shares one component with $Y_{n-1}$ : $H(Y_n | Y_{n-1}) = H(X_n | X_{n-1}, X_{n-2}) = \sum_{a,b} \pi(a,b) H(X_n | X_{n-1}=b, X_{n-2}=a).$ The entropy rate of $\{X_n\}$ equals the entropy rate of $\{Y_n\}$ (they encode the same information).

ex-ch05-09

Medium

(Block Huffman.) For a binary memoryless source with $P(0) = 0.9$ , $P(1) = 0.1$ : (a) Compute $H(X)$ . (b) What is the expected length of the symbol-by-symbol Huffman code? (c) Compute the expected length per symbol of the block-2 Huffman code (Huffman on pairs). (d) How does blocking improve the compression rate?

Show Hint

Block-2: the source has 4 "super-symbols" (00, 01, 10, 11) with probabilities $(0.81, 0.09, 0.09, 0.01)$ .

Solution

Entropy

$H(X) = H(0.1) = 0.469$ bits.

Symbol-by-symbol Huffman

Only 2 symbols: Huffman assigns 0 and 1. $L = 1$ bit/symbol. Gap: $1 - 0.469 = 0.531$ bits!

Block-2 Huffman

Probabilities: $(0.81, 0.09, 0.09, 0.01)$ . Huffman: merge 0.09 + 0.01 = 0.10, then 0.10 + 0.09 = 0.19, then 0.81 + 0.19 = 1.0. Lengths: (1, 2, 3, 3). $L_2 = 0.81(1) + 0.09(2) + 0.09(3) + 0.01(3) = 0.81 + 0.18 + 0.27 + 0.03 = 1.29$ . Per symbol: $L_2/2 = 0.645$ bits/symbol. Gap: $0.645 - 0.469 = 0.176$ bits.

Improvement

Blocking from $n=1$ to $n=2$ reduced the gap from 0.531 to 0.176 bits/symbol — a 67% reduction. The theoretical bound for block- $n$ Huffman is $L_n/n < H(X) + 1/n$ . For low-entropy sources like this one, blocking (or arithmetic coding) is essential.

ex-ch05-10

Medium

Prove that arithmetic coding uses at most $\lceil -\log_2 P(\mathbf{x}) \rceil + 1$ bits to encode any sequence $\mathbf{x}$ . (Hint: show that a binary fraction with $k = \lceil -\log_2 P(\mathbf{x}) \rceil + 1$ bits can uniquely identify a point in the interval $[F(\mathbf{x}), F(\mathbf{x}) + P(\mathbf{x}))$ .)

Show Hint

A $k$ -bit binary fraction has precision $2^{-k}$ .

Show that among the $2^k$ fractions $j/2^k$ for $j = 0, \ldots, 2^k - 1$ , at least one falls in the interval.

Solution

Interval width and precision

The interval for $\mathbf{x}$ has width $P(\mathbf{x})$ . A $k$ -bit fraction $j \cdot 2^{-k}$ has spacing $2^{-k}$ . If $2^{-k} \leq P(\mathbf{x})$ , then at least one fraction falls in the interval $[F(\mathbf{x}), F(\mathbf{x}) + P(\mathbf{x}))$ .

Verify the bound

With $k = \lceil -\log P(\mathbf{x}) \rceil + 1$ : $2^{-k} = 2^{-\lceil -\log P(\mathbf{x}) \rceil - 1} \leq 2^{\log P(\mathbf{x}) - 1} = P(\mathbf{x})/2 < P(\mathbf{x})$ .

So the spacing is strictly less than the interval width, guaranteeing at least one fraction in the interval. We use at most $k = \lceil -\log P(\mathbf{x}) \rceil + 1$ bits.

ex-ch05-11

Hard

(Competitive optimality.) Prove that for the Shannon code with lengths $\ell_i = \lceil -\log P(x_i) \rceil$ : $P(\ell(X) \geq -\log P(X) + c) \leq 2^{-c+1}$ for any $c > 0$ . Interpret: the probability of using more than $c$ extra bits decays exponentially.

Show Hint

Note that $\ell(X) < -\log P(X) + 1$ always, so $\ell(X) \geq -\log P(X) + c$ requires $c \leq 1$ ... unless we consider block codes.

For block coding of $n$ symbols, $\ell(\mathbf{X}) = \lceil -\log P(\mathbf{X}) \rceil$ , and the event is that the per-symbol cost exceeds entropy by $c/n$ bits.

Solution

Per-symbol case

For a single symbol: $\ell(X) = \lceil -\log P(X) \rceil \leq -\log P(X) + 1$ , so the event $\ell(X) \geq -\log P(X) + c$ is impossible for $c > 1$ . For $0 < c \leq 1$ , the bound $P(\ell \geq -\log P + c) \leq 2^{-c+1}$ is trivially true since $2^0 = 1$ .

Block coding (more interesting)

For block- $n$ : $\ell(\mathbf{X}) = \lceil -\log P(\mathbf{X}) \rceil$ . The event $\ell \geq -\log P(\mathbf{X}) + c$ means $\lceil z \rceil \geq z + c$ where $z = -\log P(\mathbf{X})$ . This requires the fractional part of $z$ to be at most $1 - c$ . More precisely, $P(\ell(\mathbf{X}) - (-\log P(\mathbf{X})) \geq c) = P(\{-\log P(\mathbf{X})\} \leq 1 - c)$ where $\{y\}$ is the fractional part. For the normalized version $P(\frac{1}{n}\ell(\mathbf{X}) \geq H(X) + t)$ , apply Markov's inequality to the random variable $P(\mathbf{X})$ : $P(P(\mathbf{X}) \leq 2^{-n(H + t)}) = P(-\log P(\mathbf{X}) \geq n(H + t)) \leq \frac{\mathbb{E}[P(\mathbf{X})^{-1}]}{2^{n(H+t)}} \cdot P(\mathbf{X}).$ A cleaner approach uses the AEP directly.

ex-ch05-12

Hard

(Huffman code for geometric distribution.) Let $X \in \{0, 1, 2, \ldots\}$ with $P(k) = (1-p)p^k$ for $0 < p < 1$ . Show that the Huffman code assigns lengths $\ell_k = k + 1$ (unary coding) when $p \leq 1/2$ , and compute the expected length.

Show Hint

For $p \leq 1/2$ : $P(0) = 1-p \geq 1/2 \geq p = P(1)/(1-p) \cdot (1-p)$ .

Show that merging the two least probable symbols at each step yields a new geometric-like distribution.

Solution

Structure of Huffman for geometric

Since $P(k) = (1-p)p^k$ is decreasing and $P(0) = 1-p \geq 1/2$ for $p \leq 1/2$ , symbol 0 is always more probable than all other symbols combined. Huffman always pairs symbol 0 with the merged tail.

Inductive argument

After merging all symbols $\geq k$ , the merged probability is $p^k$ . At each step, we compare $P(k-1) = (1-p)p^{k-1}$ with the merged tail $p^k$ . Since $(1-p)p^{k-1} \geq p^k$ iff $1-p \geq p$ iff $p \leq 1/2$ , symbol $k-1$ is always merged with the tail, giving it one more bit than symbol $k-2$ . This produces lengths $\ell_k = k + 1$ .

Expected length

$L = \sum_{k=0}^\infty (k+1)(1-p)p^k = (1-p)\sum_{k=0}^\infty (k+1)p^k = (1-p) \cdot \frac{1}{(1-p)^2} = \frac{1}{1-p}$ . Entropy: $H(X) = \frac{H(p)}{1-p} + \frac{p \log(1/p)}{(1-p)} = \frac{-p\log p - (1-p)\log(1-p)}{1-p}$ . The gap $L - H(X)$ depends on $p$ and approaches 0 as $p \to 0$ .

ex-ch05-13

Hard

(Entropy rate of a hidden Markov model.) Consider a binary Markov chain $(S_n)$ with states $\{0, 1\}$ and transition probabilities $P(0|0) = 0.8, P(1|1) = 0.9$ . The observation $X_n$ is a noisy version of $S_n$ : $P(X_n = S_n) = 0.95$ , $P(X_n \neq S_n) = 0.05$ . Explain why $H_\infty(X)$ does not have a simple closed form, and describe how to compute it numerically using the forward algorithm.

Show Hint

The observations $\{X_n\}$ form a hidden Markov model (HMM).

For HMMs, the entropy rate does not decompose as a simple weighted sum.

Solution

Why no closed form

For a first-order Markov chain, $H_\infty = H(X_n | X_{n-1})$ because $X_n \perp X_{n-2}, \ldots | X_{n-1}$ . For an HMM, $\{X_n\}$ is not Markov: $X_n$ depends on all past observations through the hidden state $S_n$ . The conditional entropy $H(X_n | X_{n-1}, \ldots, X_1)$ does not simplify, because the posterior distribution of $S_n$ given $X_1, \ldots, X_{n-1}$ depends on the entire observation history.

Numerical computation via forward algorithm

The forward algorithm computes $\alpha_n(s) = P(X_1, \ldots, X_n, S_n = s)$ recursively: $\alpha_{n+1}(s') = P(X_{n+1}|S_{n+1}=s') \sum_s \alpha_n(s) P(s'|s).$ The entropy rate is $H_\infty = \lim_{n \to \infty} H(X_n | X_1, \ldots, X_{n-1})$ , which can be estimated by averaging $-\log P(X_n | X_1, \ldots, X_{n-1})$ over a long simulated sequence, where $P(X_n | X_1, \ldots, X_{n-1}) = \sum_s P(X_n|S_n=s) \bar{\alpha}_{n-1}(s) P(s|S_{n-1})$ using the normalized forward variables.

ex-ch05-14

Hard

(Redundancy of Huffman vs. arithmetic coding.) For a source with $P(0) = 1 - 2^{-k}$ , $P(1) = 2^{-k}$ (very low entropy), compute the Huffman redundancy $L_{Huffman} - H(X)$ and the arithmetic coding redundancy $L_{arith}/n - H(X)$ for block length $n$ . Show that Huffman redundancy can approach 1 bit, while arithmetic coding redundancy vanishes.

Show Hint

For very low entropy, $H(X) \approx k \cdot 2^{-k} \to 0$ as $k \to \infty$ .

Huffman must use at least 1 bit per symbol.

Solution

Huffman redundancy

Huffman assigns lengths (1, 1) for a binary source. $L_{Huffman} = 1$ bit/symbol. $H(X) = H(2^{-k}) \approx k \cdot 2^{-k}$ for large $k$ . Redundancy: $1 - k \cdot 2^{-k} \to 1$ as $k \to \infty$ . Almost the entire bit is wasted!

Arithmetic coding redundancy

Arithmetic coding on blocks of $n$ : $L_{arith} \leq nH(X) + 2$ . Per symbol: $L_{arith}/n \leq H(X) + 2/n$ . Redundancy: $2/n \to 0$ as $n \to \infty$ , regardless of how small $H(X)$ is.

For example, with $k = 10$ : $H \approx 0.01$ bits/symbol. Huffman: 1 bit/symbol (100x the entropy). Arithmetic with $n=200$ : $\approx 0.02$ bits/symbol.

ex-ch05-15

Challenge

(Minimax redundancy.) Consider the class of all memoryless sources on a binary alphabet, parameterized by $p \in (0, 1)$ . A universal code $c$ is designed without knowing $p$ . The redundancy at parameter $p$ is $r(p) = \frac{1}{n}L_c(p) - H(p)$ . The minimax redundancy is $R^*_n = \min_c \max_p r(p)$ .

Show that $R^*_n = \frac{1}{2n}\log(n\pi e/2) + o(1/n)$ , and that the optimal code uses a mixture (Jeffreys prior) over the source parameter.

Show Hint

This connects to the MDL (minimum description length) principle.

The optimal mixture assigns probability $w(p) = \frac{1}{\pi\sqrt{p(1-p)}}$ (Jeffreys prior) to each source.

The redundancy equals $\frac{1}{n}D(P_p^n \| P_{mix}^n)$ where $P_{mix}^n = \int P_p^n w(p) dp$ .

Solution

Mixture coding approach

A mixture code assigns length $-\log P_{mix}(\mathbf{x})$ to sequence $\mathbf{x}$ , where $P_{mix}(\mathbf{x}) = \int_0^1 P_p(\mathbf{x}) w(p) dp$ is the Bayesian predictive distribution. The redundancy at $p$ is: $r_n(p) = \frac{1}{n}\sum_\mathbf{x} P_p(\mathbf{x}) \log \frac{P_p(\mathbf{x})}{P_{mix}(\mathbf{x})} = \frac{1}{n}D(P_p^n \| P_{mix}).$

Jeffreys prior is minimax optimal

By Laplace's method, the KL divergence $D(P_p^n \| P_{mix})$ is approximately $\frac{1}{2}\log n + \frac{1}{2}\log\frac{\pi e}{2} - \log w(p) - \frac{1}{2}\log J(p) + o(1)$ where $J(p) = 1/(p(1-p))$ is the Fisher information. The maximum over $p$ is minimized when $\log w(p) = \frac{1}{2}\log J(p) + \text{const}$ , i.e., $w(p) \propto \sqrt{J(p)} = 1/\sqrt{p(1-p)}$ . This is the Jeffreys prior, and the resulting minimax redundancy is $\frac{1}{2n}\log(\pi en/2) + o(1/n)$ .

Interpretation

The $\frac{1}{2}\log n$ term is the cost of "learning" one real parameter from $n$ samples. More generally, for a $k$ -parameter source family, the minimax redundancy is $\frac{k}{2n}\log n + O(1/n)$ , and the Jeffreys prior is asymptotically minimax optimal. This connects information theory to the MDL (minimum description length) principle in statistics.

ex-ch05-16

Challenge

(LZ78 lower bound on phrase count.) Prove that for a binary i.i.d. source with entropy $H(p)$ , the LZ78 phrase count after parsing $n$ symbols satisfies $c(n) \geq \frac{n(1-\epsilon)}{\log n}$ almost surely for large $n$ and any $\epsilon > 0$ . Deduce that the average phrase length is at most $\log n + o(\log n)$ .

Show Hint

Each of the $c(n)$ phrases is distinct, and the total length is $n$ .

If the average phrase length is $L_{avg} = n/c(n)$ , then $c(n) \leq 2^{L_{avg}+1}$ (number of distinct binary strings of length $\leq L_{avg}$ ).

Solution

Phrase length constraint

The $c(n)$ phrases are all distinct and have total length $n$ . The number of distinct binary strings of length at most $L$ is $\sum_{k=1}^{L} 2^k < 2^{L+1}$ . If the longest phrase has length at most $L_{max}$ , then $c(n) < 2^{L_{max}+1}$ .

Average phrase length

The average phrase length is $n/c(n)$ . Since all phrases are distinct and the $k$ -th phrase extends a previous phrase by one symbol, the phrase lengths are non-decreasing on average. By a counting argument, if $c(n)$ phrases fill $n$ positions: $n = \sum_{i=1}^{c(n)} \ell_i \leq c(n) \cdot L_{max}$ and $c(n) \leq 2^{L_{max}+1}$ , giving $L_{max} \geq \log c(n) - 1$ .

Lower bound on $c(n)$

The phrases of length $\leq k$ number at most $\sum_{j=1}^{k} 2^j < 2^{k+1}$ . So the number of phrases of length $> k$ is $c(n) - 2^{k+1}$ , and they contribute at least $(c(n) - 2^{k+1}) \cdot k$ to the total length $n$ . Choosing $k = \log_2 c(n) - 2$ : $n \geq (c(n) - c(n)/4) \cdot (\log_2 c(n) - 2) = \frac{3c(n)}{4}(\log c(n) - 2).$ Rearranging: $c(n) \leq \frac{4n}{3(\log c(n) - 2)}$ , which for large $n$ gives $c(n) = \Theta(n/\log n)$ , and more precisely $c(n) \geq n(1-\epsilon)/\log n$ a.s.

Exercises

ex-ch05-01

Compute Kraft sum

ex-ch05-02

Huffman construction

Verify optimality

ex-ch05-03

Compute lengths

Expected length and comparison

ex-ch05-04

Stationary distribution

Entropy rate

ex-ch05-05

Expand the power

Bound $A_L$

Conclude

ex-ch05-06

All codes are equivalent

ex-ch05-07

Compute Shannon code for $Q$

Expected length under $P$

Entropy and KL divergence

ex-ch05-08

Reduce to first-order

Apply first-order formula

ex-ch05-09

Entropy

Symbol-by-symbol Huffman

Block-2 Huffman

Improvement

ex-ch05-10

Interval width and precision

Verify the bound

ex-ch05-11

Per-symbol case

Block coding (more interesting)

ex-ch05-12

Structure of Huffman for geometric

Inductive argument

Expected length

ex-ch05-13

Why no closed form

Numerical computation via forward algorithm

ex-ch05-14

Huffman redundancy

Arithmetic coding redundancy

ex-ch05-15

Mixture coding approach

Jeffreys prior is minimax optimal

Interpretation

ex-ch05-16

Phrase length constraint

Average phrase length

Lower bound on $c(n)$