Exercises
ex-ch01-01
EasyCompute the entropy of a random variable uniformly distributed over .
For a uniform distribution, .
Recall that for the uniform distribution.
Apply the formula
bits.
ex-ch01-02
EasyLet take values in with probabilities . Compute and compare with .
Compute for each outcome.
The difference equals .
Compute
bits.
bits. The gap bits equals , the divergence from to the uniform.
ex-ch01-03
EasyProve that with equality iff and are independent.
Write the difference as a mutual information.
Use the non-negativity of .
Proof
.
Equality holds iff , i.e., iff .
ex-ch01-04
EasyLet and . Compute , , and .
is a deterministic function of .
What is when determines ?
Compute entropies
bits.
takes values with , , so bits.
Since is invertible, , so bits.
ex-ch01-05
MediumProve that for any function : , with equality iff is one-to-one on the support of .
Consider the Markov chain .
Apply the data processing inequality with and .
Alternatively, use the grouping property of entropy.
Via data processing
Since is a function of , we have the Markov chain . The self-information gives .
But since is determined by .
Therefore .
Equality condition
Equality holds iff is a function of , i.e., iff is invertible on the support of . If maps two distinct values to the same output, the entropy of is strictly less than .
ex-ch01-06
MediumLet have joint distribution:
Compute , , , , , and .
First find the marginal distributions.
Check if and are independent.
Marginals
and . Both are uniform.
Independence check
for all , so .
Compute all quantities
bit, bit.
bits.
bit (by independence).
bit.
(independent).
ex-ch01-07
MediumProve that , i.e., more observations cannot decrease mutual information.
Use the chain rule for mutual information.
Non-negativity of conditional mutual information.
Chain rule argument
By the chain rule:
.
Since :
.
ex-ch01-08
MediumShow that (Pinsker's inequality), where .
Start with the inequality for .
Alternatively, use the stronger form: .
Via the convexity approach
Define , which is convex. By a Taylor expansion argument:
Setting and summing :
Apply Cauchy-Schwarz
By Cauchy-Schwarz: .
Therefore .
ex-ch01-09
MediumLet be i.i.d. . Show that .
Use the chain rule for entropy.
What does independence imply about conditional entropies?
Chain rule + independence
.
By independence: .
Therefore .
ex-ch01-10
MediumA binary erasure channel (BEC) has input and output . With probability , the output is an erasure ; otherwise . For uniform input, compute .
Compute and separately.
for both values of .
Conditional entropy
Given : w.p. , w.p. . So . By symmetry, .
Therefore .
Output entropy
, , .
.
Mutual information
.
This is the capacity of the BEC — each non-erased bit carries exactly one bit of information, and a fraction of bits are erased.
ex-ch01-11
HardProve that when (i.e., the data processing inequality for conditional MI).
Start from and also .
Use the Markov property to simplify .
Two chain rule expansions
.
Since : .
Conclude
.
Since : .
ex-ch01-12
HardLet and be random variables with . Prove that if where , then (Fano's inequality applied to sequences).
Apply Fano to with alphabet size .
The bound becomes .
Apply Fano
Fano's inequality with gives:
.
Normalize by $n$
as and .
ex-ch01-13
HardProve the Mrs. Gerber's Lemma: If and independent of , then , where and is the inverse of on .
.
Show that for when is given.
Use the concavity of and the convexity of .
Setup
Since : .
We need: , i.e., , which is trivially true.
The non-trivial version applies when (i.e., is not necessarily Bernoulli, just has bounded entropy).
The general statement
For general with (so ):
.
This follows from the concavity of and the fact that the BSC with parameter "smooths" the distribution of . The proof uses the data processing inequality and properties of the binary convolution.
ex-ch01-14
HardProve that for any discrete random variables :
Interpret this bound in terms of the information that reveals about the relationship between and .
Start from .
Use .
Actually, try a simpler approach: expand definitions directly.
Direct expansion
.
Bound the difference
We need .
.
Therefore .
ex-ch01-15
Hard(Entropy of a function) Let for some deterministic function . Show that and interpret this as the "entropy lost" by applying .
since is a function of .
Use the chain rule.
Chain rule
.
Also: .
Therefore: .
Interpretation
is the residual uncertainty about when we know . If is one-to-one, and no entropy is lost. If is many-to-one (e.g., ), then and the residual measures the information destroyed by .
ex-ch01-16
Challenge(Non-Shannon inequality) Show that for four random variables , every "Shannon-type" inequality is a non-negative linear combination of the basic identities . Then state (without proof) the Zhang-Yeung inequality — the first known non-Shannon inequality:
Verify that this cannot be derived as a non-negative combination of conditional MI non-negativity constraints.
Shannon-type inequalities are those valid for ALL joint distributions.
The entropy function region is a convex cone.
Zhang and Yeung (1998) found the first inequality constraining beyond Shannon's basic inequalities.
Shannon inequalities
For any collection of random variables, the basic Shannon inequalities are: for all subsets . All inequalities derivable from these (by taking non-negative linear combinations) are called "Shannon-type."
Zhang-Yeung (1998)
The Zhang-Yeung inequality cannot be derived from alone. This was verified by showing that the inequality defines a half-space that is strictly tighter than the Shannon cone . Its discovery showed that the entropy function region is strictly smaller than the Shannon cone for variables.
Significance
This exercise illustrates a deep open problem: characterizing the entropy function region for variables. Non-Shannon inequalities have implications for network coding, secret sharing, and the capacity regions of certain multi-source multi-sink networks.
ex-ch01-17
Challenge(Strong Fano) Prove the following strengthening of Fano's inequality: if and , then
where . Show that this is always at least as tight as the standard Fano bound.
Follow the standard Fano proof but do not upper-bound by .
Note that since .
Tighter bound
From the standard proof:
.
Now because provides at least as much information as .
Comparison with standard Fano
The standard bound replaces with , which is the maximum entropy over outcomes. The strong form is tighter whenever the conditional distribution of given an error is not uniform over the remaining possibilities.
ex-ch01-18
MediumCompute the capacity of the Z-channel: input , output , with (no error for input 0) and (input 1 received as 0 with probability ). Find the capacity .
since .
Optimize over .
The optimal input is NOT necessarily uniform.
Compute $\ntn{mi}(X;Y)$
Let . Then and .
.
.
.
Optimize over $q$
Taking the derivative and setting to zero:
.
This gives , and the capacity is
.
ex-ch01-19
MediumProve that for stationary processes, the entropy rate satisfies and that both limits exist.
Show is non-increasing in for stationary processes.
A bounded non-increasing sequence converges.
Use Cesàro's lemma to connect the two limits.
Monotonicity
By stationarity: .
The first inequality uses conditioning reduces entropy. The equality uses stationarity (shifting indices). So is non-increasing and bounded below by 0.
Convergence and Cesàro
A bounded monotone sequence converges, so exists.
is a Cesàro average of a convergent sequence, so it converges to the same limit.
ex-ch01-20
EasyVerify that the binary entropy function satisfies: (a) , (b) , (c) for all .
Direct substitution into the definition.
Verify all three
(a) . Similarly .
(b) bit.
(c) . The symmetry reflects the fact that labeling the outcomes 0 and 1 is arbitrary.