Exercises

ex-ch05-01

Easy

Verify Kraft's inequality for the code {0,10,110,111}\{0, 10, 110, 111\} and determine if there is room to add another codeword.

ex-ch05-02

Easy

A source has four symbols with probabilities (0.5,0.25,0.125,0.125)(0.5, 0.25, 0.125, 0.125). Construct the Huffman code and verify that it achieves Lβˆ—=H(X)L^* = H(X).

ex-ch05-03

Easy

Compute the Shannon code (codeword lengths and expected length) for P=(0.4,0.3,0.2,0.1)P = (0.4, 0.3, 0.2, 0.1). Compare with entropy.

ex-ch05-04

Easy

A binary Markov chain has P(0∣0)=0.95P(0|0) = 0.95, P(1∣0)=0.05P(1|0) = 0.05, P(0∣1)=0.4P(0|1) = 0.4, P(1∣1)=0.6P(1|1) = 0.6. Find the stationary distribution and the entropy rate.

ex-ch05-05

Medium

Prove McMillan's inequality: if a code (not necessarily prefix) is uniquely decodable, then βˆ‘i=1mDβˆ’β„“i≀1\sum_{i=1}^m D^{-\ell_i} \leq 1. Use the "power trick": expand (βˆ‘Dβˆ’β„“i)k(\sum D^{-\ell_i})^k and bound the result.

ex-ch05-06

Medium

A source has 8 equiprobable symbols. Compare the expected codeword length of: (a) A fixed-length code (b) The Huffman code (c) The Shannon code. What do you observe?

ex-ch05-07

Medium

A code is designed for distribution Q=(0.5,0.25,0.25)Q = (0.5, 0.25, 0.25), but the true source distribution is P=(0.7,0.2,0.1)P = (0.7, 0.2, 0.1). Compute: (a) the expected codeword length under PP, (b) the entropy H(P)H(P), and (c) the mismatch penalty D(Pβˆ₯Q)D(P \| Q). Verify that L=H(P)+D(Pβˆ₯Q)+Ξ΄L = H(P) + D(P \| Q) + \delta where 0≀δ<10 \leq \delta < 1.

ex-ch05-08

Medium

Show that the entropy rate of a second-order binary Markov chain (where XnX_n depends on Xnβˆ’1X_{n-1} and Xnβˆ’2X_{n-2}) can be written as H∞=βˆ‘a,b∈{0,1}Ο€(a,b)H(Xn∣Xnβˆ’1=b,Xnβˆ’2=a)H_\infty = \sum_{a,b \in \{0,1\}} \pi(a,b) H(X_n | X_{n-1} = b, X_{n-2} = a) where Ο€(a,b)\pi(a,b) is the stationary distribution of the pair (Xnβˆ’2,Xnβˆ’1)(X_{n-2}, X_{n-1}).

ex-ch05-09

Medium

(Block Huffman.) For a binary memoryless source with P(0)=0.9P(0) = 0.9, P(1)=0.1P(1) = 0.1: (a) Compute H(X)H(X). (b) What is the expected length of the symbol-by-symbol Huffman code? (c) Compute the expected length per symbol of the block-2 Huffman code (Huffman on pairs). (d) How does blocking improve the compression rate?

ex-ch05-10

Medium

Prove that arithmetic coding uses at most βŒˆβˆ’log⁑2P(x)βŒ‰+1\lceil -\log_2 P(\mathbf{x}) \rceil + 1 bits to encode any sequence x\mathbf{x}. (Hint: show that a binary fraction with k=βŒˆβˆ’log⁑2P(x)βŒ‰+1k = \lceil -\log_2 P(\mathbf{x}) \rceil + 1 bits can uniquely identify a point in the interval [F(x),F(x)+P(x))[F(\mathbf{x}), F(\mathbf{x}) + P(\mathbf{x})).)

ex-ch05-11

Hard

(Competitive optimality.) Prove that for the Shannon code with lengths β„“i=βŒˆβˆ’log⁑P(xi)βŒ‰\ell_i = \lceil -\log P(x_i) \rceil: P(β„“(X)β‰₯βˆ’log⁑P(X)+c)≀2βˆ’c+1P(\ell(X) \geq -\log P(X) + c) \leq 2^{-c+1} for any c>0c > 0. Interpret: the probability of using more than cc extra bits decays exponentially.

ex-ch05-12

Hard

(Huffman code for geometric distribution.) Let X∈{0,1,2,…}X \in \{0, 1, 2, \ldots\} with P(k)=(1βˆ’p)pkP(k) = (1-p)p^k for 0<p<10 < p < 1. Show that the Huffman code assigns lengths β„“k=k+1\ell_k = k + 1 (unary coding) when p≀1/2p \leq 1/2, and compute the expected length.

ex-ch05-13

Hard

(Entropy rate of a hidden Markov model.) Consider a binary Markov chain (Sn)(S_n) with states {0,1}\{0, 1\} and transition probabilities P(0∣0)=0.8,P(1∣1)=0.9P(0|0) = 0.8, P(1|1) = 0.9. The observation XnX_n is a noisy version of SnS_n: P(Xn=Sn)=0.95P(X_n = S_n) = 0.95, P(Xnβ‰ Sn)=0.05P(X_n \neq S_n) = 0.05. Explain why H∞(X)H_\infty(X) does not have a simple closed form, and describe how to compute it numerically using the forward algorithm.

ex-ch05-14

Hard

(Redundancy of Huffman vs. arithmetic coding.) For a source with P(0)=1βˆ’2βˆ’kP(0) = 1 - 2^{-k}, P(1)=2βˆ’kP(1) = 2^{-k} (very low entropy), compute the Huffman redundancy LHuffmanβˆ’H(X)L_{Huffman} - H(X) and the arithmetic coding redundancy Larith/nβˆ’H(X)L_{arith}/n - H(X) for block length nn. Show that Huffman redundancy can approach 1 bit, while arithmetic coding redundancy vanishes.

ex-ch05-15

Challenge

(Minimax redundancy.) Consider the class of all memoryless sources on a binary alphabet, parameterized by p∈(0,1)p \in (0, 1). A universal code cc is designed without knowing pp. The redundancy at parameter pp is r(p)=1nLc(p)βˆ’H(p)r(p) = \frac{1}{n}L_c(p) - H(p). The minimax redundancy is Rnβˆ—=min⁑cmax⁑pr(p)R^*_n = \min_c \max_p r(p).

Show that Rnβˆ—=12nlog⁑(nΟ€e/2)+o(1/n)R^*_n = \frac{1}{2n}\log(n\pi e/2) + o(1/n), and that the optimal code uses a mixture (Jeffreys prior) over the source parameter.

ex-ch05-16

Challenge

(LZ78 lower bound on phrase count.) Prove that for a binary i.i.d. source with entropy H(p)H(p), the LZ78 phrase count after parsing nn symbols satisfies c(n)β‰₯n(1βˆ’Ο΅)log⁑nc(n) \geq \frac{n(1-\epsilon)}{\log n} almost surely for large nn and any Ο΅>0\epsilon > 0. Deduce that the average phrase length is at most log⁑n+o(log⁑n)\log n + o(\log n).