Exercises
ex-ch09-01
EasyCompute the capacity of the BSC with crossover probability . How many bits per channel use can be reliably transmitted?
.
.
Compute binary entropy
bits
Compute capacity
bits per channel use.
So with a well-designed code, about 92% of the channel's potential is usable despite the 1% error rate. This illustrates why modern codes (LDPC, polar) that approach capacity are so valuable.
ex-ch09-02
EasyCompute the capacity of the BEC with erasure probability . Compare with the BSC capacity at .
BEC: . BSC: .
BEC capacity
bits per channel use.
BSC capacity
. bits per channel use.
Comparison
. The BEC has nearly 6 times the capacity of the BSC at the same parameter value! This is because erasures are much less harmful than errors: the decoder knows which bits were lost and can focus its effort there.
ex-ch09-03
MediumShow that for any DMC, if and only if and are independent for all input distributions (i.e., the rows of the transition matrix are identical).
iff for all .
iff .
Forward direction
If all rows of are identical (say, equal to ), then for any . So is independent of and for all . Therefore .
Reverse direction
If , then for all . In particular, for that puts positive mass on every , we need , which means for all with . Since this holds for all , all rows of equal .
ex-ch09-04
MediumA channel has and with transition matrix: Show this is a strongly symmetric channel and compute its capacity.
Check that each row is a permutation of the first row AND each column is a permutation of the first column.
For strongly symmetric channels: .
Verify strong symmetry
Each row is a permutation of β verified by inspection. Each column is also a permutation of β verified. So the channel is strongly symmetric.
Compute capacity
$
The uniform input distribution is optimal.
ex-ch09-05
MediumProve that is concave in for a fixed DMC . (Hint: write and analyze each term separately.)
is linear in .
is concave in , and is a linear function of .
Concave composed with affine is concave.
Analyze $\ntn{entropy}(Y|X)$
is a linear function of (it is a weighted sum with fixed coefficients ).
Analyze $\ntn{entropy}(Y)$
The output distribution is , which is linear in . The entropy is a concave function of (a standard property). The composition of a concave function with a linear (affine) map is concave. So is concave in .
Conclude
is concave minus linear, which is concave. Therefore is a concave maximization problem with a unique global optimum.
ex-ch09-06
HardProve the achievability part of the channel coding theorem in detail for the BEC(). Show that random linear codes achieve the BEC capacity using a simpler argument than joint typicality decoding.
For the BEC, the decoder knows which bits were erased. It needs to recover information bits from unerased bits.
A random linear code over maps information bits to coded bits via a generator matrix .
The decoder sees unerased equations in unknowns.
Setup
Use a random linear code with generator matrix drawn uniformly at random over . The encoder computes for message .
After transmission through BEC(), approximately positions are received correctly. This gives a system of linear equations in unknowns (the message bits).
Decoding condition
The decoder can recover if and only if the unerased columns of span (the system has full rank ).
For a random , the probability that random columns span approaches 1 as , provided , i.e., .
Error probability bound
The probability of failure (rank deficiency) is bounded by:
which vanishes exponentially for . (A more careful analysis using concentration of the number of erasures around is needed for the precise statement, but the intuition is clear: with enough unerased observations, the linear system is solvable.)
ex-ch09-07
HardProve the strong converse for the BSC: if , then as (not just ).
Use the method of types. The probability of decoding correctly is bounded by the probability of the output being in a ball of the right type around the correct codeword.
The number of type classes is polynomial in , while the number of codewords is exponential.
Type class approach
For message , the decoder declares only if is "closer" to than to any other codeword. The probability of correct decoding for message is:
Bound using types
The output distribution given input concentrates on sequences with joint type close to . The number of "distinguishable" output sequences (those in the typical set around ) is approximately .
For correct decoding, the decoder's decision region for message must include most of these sequences. But the total number of output sequences is , and there are codewords.
If , then , so the decision regions must overlap significantly, forcing for most messages. More precisely, .
ex-ch09-08
MediumImplement one iteration of the Blahut-Arimoto algorithm for the BSC() starting from the uniform input distribution . Compute , , and the mutual information after one iteration.
The BSC transition matrix is .
E-step: .
E-step
With :
, ,
M-step (no cost constraint)
By symmetry: .
Mutual information
bits.
The BSC is symmetric, so the algorithm converges in one iteration to the uniform distribution. For asymmetric channels, multiple iterations are needed.
ex-ch09-09
EasyA source with bits per symbol must be transmitted over a BEC() using channel uses per source symbol. Is the source transmissible?
Use the source-channel coding theorem: transmissible iff .
Check the condition
bits per channel use. bits per source symbol.
Since , the source is transmissible. Separate source coding at rate bits/symbol followed by channel coding at rate bits/use achieves reliable transmission.
ex-ch09-10
MediumShow that for the BSC, the capacity-achieving input distribution must be by verifying the KKT conditions for the optimization .
Write where depends on .
The KKT conditions for the concave maximization require .
Express MI in terms of $\alpha$
Let . Then .
.
Take derivative
\mathcal{H}_2'(q) = -\log q + \log(1-q) = \log\frac{1-q}{q}\frac{dI}{d\alpha} = 0p = 1/2\mathcal{H}_2'(P_Y(1)) = 0P_Y(1) = 1/2P_Y(1) = 1/2 \implies p + \alpha(1-2p) = 1/2 \implies \alpha = 1/2$.
ex-ch09-11
HardConsider the channel with , , and: (so .) Find the capacity and the capacity-achieving input distribution.
This is not a symmetric channel.
Let and maximize over the simplex.
By symmetry of the channel structure (0 β deterministic 0, 2 β deterministic 1), try .
Parameterize
Let and .
.
Interesting: is always for this symmetric parameterization! So bit.
Compute conditional entropy
.
.
Maximized at ... but requires .
Capacity
At : and bit.
The optimal strategy never uses input 1 (the noisy input). By using only inputs 0 and 2, the channel becomes noiseless (0 maps deterministically to output 0, 2 maps deterministically to output 1).
ex-ch09-12
HardProve that the capacity-cost function is concave in .
Show that is the maximum of a linear function of over a convex set, or use a time-sharing argument.
If achieves and achieves , what does the mixture achieve?
Time-sharing argument
Let achieve and achieve . Consider .
Cost: .
Use concavity of MI
By concavity of in :
Since is feasible for cost :
This is exactly the definition of concavity.
ex-ch09-13
MediumA binary input channel has , , with transition probabilities: Is this channel symmetric? Compute or bound its capacity.
Check if the rows are permutations of each other.
The rows are and β are these permutations?
Check symmetry
Row 0: . Row 1: . Row 1 is a permutation of Row 0 (swap first and third elements). So the channel is weakly symmetric.
Check columns: Column 0 is , Column 1 is , Column 2 is . Column 0 and Column 2 are permutations; Column 1 is constant. Not all columns are permutations of each other, so not strongly symmetric.
Compute capacity
For weakly symmetric channels: is the same for all : .
.
With : . bits. bits.
Since the channel is weakly symmetric with a "balanced" structure, is likely optimal, giving bits.
ex-ch09-14
ChallengeProve that the Blahut-Arimoto algorithm converges to from any initial distribution . Specifically, show that is non-decreasing in and converges to .
Show that (E-step maximizes over Q).
Show that (M-step maximizes over p).
Use the fact that J is bounded above by .
E-step increases J
For fixed , the E-step sets to the Bayes-optimal backward channel, which maximizes over . So .
Moreover, .
M-step increases J
For fixed , the M-step maximizes over (the cost-constrained simplex). Since is concave in for fixed , the M-step finds the global max: .
Convergence
Combining: (non-decreasing). Since (bounded above), the sequence converges. At convergence, both the E-step and M-step are fixed points, which means satisfies the KKT conditions for . By concavity, the KKT point is the global maximum, so .
ex-ch09-15
MediumThe binary channel with , , , is the general binary channel. When , it reduces to the BSC. Show that the capacity of the general binary channel with is achieved by a non-uniform input distribution, and give the optimality condition.
Write as a function of and take the derivative.
Set and solve for .
Express MI
, .
.
.
Optimality condition
p_0 = p_1\mathcal{H}_2(p_0) = \mathcal{H}_2(p_1)P_Y(1) = 1/2\alpha = 1/2p_0 \neq p_1\alpha^*1/2$. The channel asymmetry pushes the optimal input towards the "better" direction.