Exercises
ex-ch04-01
EasyLet and . List all types in , compute for each type , and verify that .
A type on a ternary alphabet with is a triple with .
The number of types is .
Use the multinomial coefficient .
Enumerate types
The 10 types are all with : : 3 types with all mass on one symbol. : 6 types with two symbols. : 1 type with uniform distribution.
Compute type class sizes
- (and similarly for the other degenerate types), total: .
- (and similarly for the other two-symbol types), total: .
- , total: .
- Sum: . ✓
ex-ch04-02
EasyShow that for a binary alphabet and the uniform distribution , the type class with has Verify that .
Under the uniform distribution, every binary sequence of length has probability .
The type class with contains exactly sequences.
Direct computation
Under , for every . The type class has sequences (choosing which positions have symbol 0). Therefore:
Verification
$
ex-ch04-03
EasyLet on . Compute for and interpret the result using the type class probability formula.
.
The result tells you the exponential rate at which decays.
Compute KL divergence
$
Interpret
This means . For , the probability of seeing exactly 50 zeros and 50 ones when the true probability of 0 is 0.7 is approximately . The KL divergence of 0.126 bits is small because is not too far from .
ex-ch04-04
EasyFor the set and true distribution , find and state the asymptotic probability .
The I-projection onto is the boundary point .
Use the formula for binary KL divergence.
Find the I-projection
Since is uniform, for any binary . The minimum over is achieved at the boundary , which has the highest entropy in .
Compute the exponent
P^n(\hat{P}_\mathbf{X} \in \mathcal{E}) \doteq 2^{-0.278n}$.
ex-ch04-05
MediumProve that if and only if , using only the type class probability formula and the fact that is the maximum over all type class probabilities.
If , then decays exponentially while stays bounded.
Use the fact that (since the type closest to has KL divergence 0).
Forward direction
If , then exponentially. But , so for large , confirming .
Backward direction
If , then every sequence in has the same probability , and . The exponent is , so .
ex-ch04-06
MediumLet be i.i.d. Bernoulli() with . Use Sanov's theorem to find the exponential rate at which decays. Compare with the Chernoff bound.
The set is , and the I-projection is the boundary point .
The Chernoff bound for with uses the moment generating function.
Sanov exponent
bits.
So .
Chernoff bound
The Chernoff exponent is where . Optimizing: , giving and Converting to bits: bits. The exponents match!
Connection
This is not a coincidence: for i.i.d. binary random variables, Sanov's theorem and the Chernoff bound give the same exponent. Sanov is more general (it works for any set of distributions on any finite alphabet), while Chernoff is specific to tail probabilities. Both are manifestations of the Cramér large deviation principle.
ex-ch04-07
MediumShow that the source coding error exponent is a convex function of for , and that as , where is the uniform distribution on .
Convexity of follows from the fact that it is a pointwise minimum of affine functions of .
At , the constraint forces .
Convexity
Fix any . The constraint defines a set that shrinks as increases. Therefore is the minimum of over a shrinking feasible set, which makes non-decreasing in . Convexity follows from the general principle that the value function of a convex program with a parameter entering linearly in the constraint is convex.
Limiting behavior
At , the constraint forces (the uniform distribution), since with equality iff . Therefore .
ex-ch04-08
MediumFor a BSC with crossover probability , the random coding exponent with the optimal (uniform) input distribution is where (in bits). Show that and .
At , the maximum over is achieved at .
At , the maximum is achieved at .
$E_r(\ntn{cap}) = 0$
. Since and (for uniform input on a symmetric channel), the function has value 0 at and derivative at . Since is concave in , the function is concave with maximum at , giving .
$E_r(0) = E_0(1)$
. Since is concave and increasing (its derivative at any is the Rényi mutual information, which is positive), the maximum on is at : . This is called the zero-rate exponent and represents the best reliability achievable when we allow only a polynomial number of codewords.
ex-ch04-09
MediumConsider a ternary symmetric channel with and transition probabilities if and otherwise. Find the channel capacity and the zero-rate exponent for .
By symmetry, the optimal input distribution is uniform.
.
Channel capacity
By symmetry, . bits.
Zero-rate exponent
. Computing: , so . Then bits.
ex-ch04-10
MediumProve that for any DMC, cannot hold — i.e., the sphere-packing exponent is always an upper bound: . (You do not need to prove the sphere-packing bound; just explain why the achievability exponent cannot exceed the converse exponent.)
This is a fundamental consistency argument, not a calculation.
If there existed a rate where , what would it mean for the error probability?
Consistency argument
The random coding exponent provides a lower bound on the reliability function (using the best code of each blocklength). The sphere-packing exponent provides an upper bound: .
Since , we always have . If at some rate, it would mean the best code has error probability decaying faster than — contradicting the sphere-packing converse. This would be a logical impossibility.
ex-ch04-11
HardDerive the source coding error exponent for a binary source with , in closed form. Show that: where with and (i.e., is the root of on the same side as ).
For a binary source, the optimization reduces to a one-dimensional problem.
The entropy function is symmetric about and concave.
There are two solutions to when ; the one closer to gives smaller KL divergence.
Reduce to one dimension
For binary and : where .
Since is strictly convex in with minimum at , and the constraint set is an interval (with when ), the minimum is at the boundary point closest to .
Identify the minimizer
If (low-entropy source), then and for , the constraint boundary has two points with and (when is not too large). The minimizer is since . The case is symmetric.
Closed form
The exponent is where solves on the same side of 0.5 as . No simpler closed form exists, but for small redundancy , we can expand: which shows quadratic growth of the exponent in the redundancy.
ex-ch04-12
Hard(Conditional type class size.) Let , , and (type ). A conditional type is defined by , , , . Enumerate all and verify .
The first 3 positions (where ) must have exactly 2 zeros and 1 one in .
The last 3 positions (where ) must have exactly 1 zero and 2 ones in .
Enumerate
For positions (where ): must have 2 zeros, 1 one. Choices: — 3 options.
For positions (where ): must have 1 zero, 2 ones. Choices: — 3 options.
Total: .
Verify the bound
bits.
. We have . ✓
The bound is loose because is very small — the polynomial correction factor matters.
ex-ch04-13
Hard(Sanov for hypothesis testing.) Consider testing vs on binary sequences of length . A type-based test rejects if .
(a) Find the type-I error exponent as a function of . (b) Find the type-II error exponent as a function of . (c) Find the optimal that maximizes the type-II error exponent subject to the type-I error exponent being at least .
The type-I error is , which by Sanov decays as .
The type-II error is .
Type-I error exponent
The rejection region is . The closest point in to has (the boundary). So the type-I error exponent is .
Type-II error exponent
The acceptance region is . The closest point in to determines the exponent. If (i.e., is outside the acceptance ball), the type-II exponent is .
Optimize
bits. For the constraint that type-I exponent , we need . Since , at the point is just outside the acceptance ball, giving a positive type-II exponent. Setting maximizes the acceptance region (and thus the type-II exponent) while meeting the constraint.
ex-ch04-14
Hard(Method of types proof of the channel coding theorem.) Use the method of types to prove that for a DMC with capacity , any rate is achievable. Specifically, show that a random codebook with codewords drawn i.i.d. from the capacity-achieving distribution has average error probability as .
Fix codeword . The error event is that there exists such that is jointly typical.
Use the packing lemma from Chapter 3, or derive the bound directly using conditional types.
The probability that a random forms a joint type with mutual information with can be bounded using type class sizes.
Setup
Generate codewords i.i.d. . Transmit , receive . Decode to if it is the unique codeword jointly typical with .
Error analysis using types
Error occurs if: (a) is not jointly typical (probability ), or (b) there exists with jointly typical.
For a fixed with type , the probability that a random is jointly typical with is at most (summing over compatible joint types). By the union bound over codewords:
Conclusion
Since and : This proves achievability. The method of types gives a clean proof with an explicit (though not tight) error exponent of .
ex-ch04-15
Hard(I-projection onto an exponential family.) Let be a distribution on and for some . Show that the I-projection is an exponential tilting of : for some .
Write the Lagrangian for the constrained optimization subject to and .
Use the KKT conditions.
Lagrangian
$
KKT conditions
Setting : where ensures normalization.
Complementary slackness
By complementary slackness, with . If , the constraint is active and . The tilting parameter is determined by .
ex-ch04-16
Challenge(Gap between and below the critical rate.) For the BEC with erasure probability , compute and in closed form. Show that: (a) The critical rate is (i.e., the random coding exponent is tight for all positive rates). (b) Therefore for all . This makes the BEC one of the few channels where the reliability function is known exactly.
For the BEC, the channel output is either the input or an erasure symbol . The transition probabilities are , .
Gallager's function simplifies dramatically for the BEC because the non-erased outputs perfectly identify the input.
Compute explicitly and find .
Gallager's function for the BEC
With uniform input and : For : only contributes, giving . Similarly for . For : both inputs contribute , giving . Total: .
Critical rate
(in nats, convert to bits). At : .
Wait — let me recompute carefully. at : .
For the BEC, , which is positive for . So the random coding exponent is not tight at all rates — the statement needs correction. The BEC does have a gap below , but this gap is smaller than for most channels.
Reliability function
The random coding exponent for the BEC is . For , the optimizer and . For , and , which is a straight line. The sphere-packing exponent may be strictly larger in this regime. The BEC is actually a useful example because the calculations are tractable even though the full reliability function has the same three-regime structure as general DMCs.
ex-ch04-17
Challenge(Csiszár–Körner proof of the converse for channel coding.) Using the method of types, prove that for a DMC, any sequence of codes with rate has . Specifically, show that for any codebook with : where .
Hint: This does not require Fano's inequality — only the type counting arguments from this chapter.
Partition the codewords by their types. For each type , bound the number of distinguishable codewords using the mutual information .
The total number of distinguishable codewords across all types is at most .
Distinguishable codewords per type
Fix a type . Codewords of type must be "distinguishable" — their output conditional type classes must not overlap too much. The number of conditionally typical output sequences for a codeword is (under the true channel ). The total output space for type- inputs has typical outputs. Therefore at most codewords of type can be distinguished.
Sum over types
Total distinguishable codewords:
Error probability bound
If , the excess codewords cannot all be decoded correctly. The fraction of decodable codewords is at most , so: For , this tends to 1 as .
ex-ch04-18
Challenge(Duality between source and channel exponents.) The source coding exponent at rate is , and the simple channel coding exponent (at zero rate, for maximum likelihood decoding with two codewords) is subject to a confusion constraint.
Show that both exponents can be expressed as I-projections on the probability simplex, and explain the geometric duality: source coding projects onto a superlevel set of entropy, while channel coding projects onto a sublevel set of mutual information.
Draw both problems on the probability simplex.
For source coding, is the region above an entropy contour.
For channel coding, the analogous set involves distributions where two inputs are 'confusable.'
Source coding as I-projection
where . Geometrically, is the region of the simplex with entropy at least — a convex set (since entropy is concave, superlevel sets are convex). The I-projection is the point on the boundary closest to .
Channel coding as I-projection
For the channel coding converse, the "bad" event is that the output type is consistent with a different codeword. The set of "confusing" conditional types is — the sublevel set of mutual information. The converse exponent is .
Duality
Both exponents are I-projections, but on dual objects:
- Source coding: project the true source onto , asking "how far is from the bad set?"
- Channel coding: project the true channel onto , asking "how far is from the confusing set?"
The source asks "how unlikely is it that the type has too much entropy?" while the channel asks "how unlikely is it that the output looks consistent with the wrong input?" Both are measured by the KL divergence from truth to the nearest "bad" distribution.