Ferkans — Interactive Telecom Tutor

ex-ch09-01

Easy

Compute the capacity of the BSC with crossover probability $p = 0.01$ . How many bits per channel use can be reliably transmitted?

Show Hint

$C = 1 - \mathcal{H}_2(p)$ .

$\mathcal{H}_2(0.01) = -0.01\log_2 0.01 - 0.99\log_2 0.99$ .

Solution

Compute binary entropy

$\mathcal{H}_2(0.01) = -0.01 \cdot (-6.644) - 0.99 \cdot (-0.0145) = 0.06644 + 0.01436 = 0.0808$ bits

Compute capacity

$C = 1 - 0.0808 = 0.919$ bits per channel use.

So with a well-designed code, about 92% of the channel's potential is usable despite the 1% error rate. This illustrates why modern codes (LDPC, polar) that approach capacity are so valuable.

ex-ch09-02

Easy

Compute the capacity of the BEC with erasure probability $\epsilon = 0.3$ . Compare with the BSC capacity at $p = 0.3$ .

Show Hint

BEC: $C = 1 - \epsilon$ . BSC: $C = 1 - \mathcal{H}_2(p)$ .

Solution

BEC capacity

$C_{\text{BEC}} = 1 - 0.3 = 0.7$ bits per channel use.

BSC capacity

$\mathcal{H}_2(0.3) = -0.3\log_2 0.3 - 0.7\log_2 0.7 \approx 0.521 + 0.360 = 0.881$ . $C_{\text{BSC}} = 1 - 0.881 = 0.119$ bits per channel use.

Comparison

$C_{\text{BEC}} = 0.7 \gg C_{\text{BSC}} = 0.119$ . The BEC has nearly 6 times the capacity of the BSC at the same parameter value! This is because erasures are much less harmful than errors: the decoder knows which bits were lost and can focus its effort there.

ex-ch09-03

Medium

Show that for any DMC, $C = 0$ if and only if $X$ and $Y$ are independent for all input distributions (i.e., the rows of the transition matrix are identical).

Show Hint

$C = \max_{P_X} I(X;Y) = 0$ iff $I(X;Y) = 0$ for all $P_X$ .

$I(X;Y) = 0$ iff $P_{XY} = P_X P_Y$ .

Solution

Forward direction

If all rows of $\mathbf{P}$ are identical (say, equal to $\mathbf{q}$ ), then $P_Y(y) = \sum_x P_X(x) P_{Y|X}(y|x) = q(y)$ for any $P_X$ . So $Y$ is independent of $X$ and $I(X;Y) = 0$ for all $P_X$ . Therefore $C = 0$ .

Reverse direction

If $C = 0$ , then $I(X;Y) = 0$ for all $P_X$ . In particular, for $P_X$ that puts positive mass on every $x \in \mathcal{X}$ , we need $D(P_{XY} \| P_X P_Y) = 0$ , which means $P_{Y|X}(y|x) = P_Y(y)$ for all $(x, y)$ with $P_X(x) > 0$ . Since this holds for all $x$ , all rows of $\mathbf{P}$ equal $P_Y = \mathbf{q}$ .

ex-ch09-04

Medium

A channel has $|\mathcal{X}| = 3$ and $|\mathcal{Y}| = 3$ with transition matrix: $\mathbf{P} = \begin{pmatrix} 1/2 & 1/4 & 1/4 \\ 1/4 & 1/2 & 1/4 \\ 1/4 & 1/4 & 1/2 \end{pmatrix}$ Show this is a strongly symmetric channel and compute its capacity.

Show Hint

Check that each row is a permutation of the first row AND each column is a permutation of the first column.

For strongly symmetric channels: $C = \log|\mathcal{Y}| - \mathcal{H}(\text{row})$ .

Solution

Verify strong symmetry

Each row is a permutation of $(1/2, 1/4, 1/4)$ — verified by inspection. Each column is also a permutation of $(1/2, 1/4, 1/4)$ — verified. So the channel is strongly symmetric.

Compute capacity

$C = \log 3 - \mathcal{H}(1/2, 1/4, 1/4)KATEXPLACEHOLDER0END= \log_2 3 - (-\frac{1}{2}\log_2\frac{1}{2} - \frac{1}{4}\log_2\frac{1}{4} - \frac{1}{4}\log_2\frac{1}{4})KATEXPLACEHOLDER1END= 1.585 - (0.5 + 0.5 + 0.5) = 1.585 - 1.5 = 0.085 \text{ bits}$ $

The uniform input distribution is optimal.

ex-ch09-05

Medium

Prove that $I(X; Y)$ is concave in $P_X$ for a fixed DMC $P_{Y|X}$ . (Hint: write $I(X;Y) = H(Y) - H(Y|X)$ and analyze each term separately.)

Show Hint

$H(Y|X) = \sum_x P_X(x) H(Y|X=x)$ is linear in $P_X$ .

$H(Y)$ is concave in $P_Y$ , and $P_Y$ is a linear function of $P_X$ .

Concave composed with affine is concave.

Solution

Analyze $\ntn{entropy}(Y|X)$

$H(Y|X) = \sum_x P_X(x) H(Y|X=x)$ is a linear function of $P_X$ (it is a weighted sum with fixed coefficients $H(Y|X=x)$ ).

Analyze $\ntn{entropy}(Y)$

The output distribution is $P_Y(y) = \sum_x P_X(x) P_{Y|X}(y|x)$ , which is linear in $P_X$ . The entropy $H(Y)$ is a concave function of $P_Y$ (a standard property). The composition of a concave function with a linear (affine) map is concave. So $H(Y)$ is concave in $P_X$ .

Conclude

$I(X;Y) = H(Y) - H(Y|X)$ is concave minus linear, which is concave. Therefore $\max_{P_X} I(X;Y)$ is a concave maximization problem with a unique global optimum.

ex-ch09-06

Hard

Prove the achievability part of the channel coding theorem in detail for the BEC( $\epsilon$ ). Show that random linear codes achieve the BEC capacity $C = 1 - \epsilon$ using a simpler argument than joint typicality decoding.

Show Hint

For the BEC, the decoder knows which bits were erased. It needs to recover $k = nR$ information bits from $n(1-\epsilon)$ unerased bits.

A random linear code over $\mathbb{F}_2$ maps $k$ information bits to $n$ coded bits via a $k \times n$ generator matrix $\mathbf{G}$ .

The decoder sees $n(1-\epsilon)$ unerased equations in $k$ unknowns.

Solution

Setup

Use a random linear code with $k \times n$ generator matrix $\mathbf{G}$ drawn uniformly at random over $\mathbb{F}_2$ . The encoder computes $x^n = m^k \mathbf{G}$ for message $m^k \in \mathbb{F}_2^k$ .

After transmission through BEC( $\epsilon$ ), approximately $n(1-\epsilon)$ positions are received correctly. This gives a system of $n(1-\epsilon)$ linear equations in $k$ unknowns (the message bits).

Decoding condition

The decoder can recover $m^k$ if and only if the $n(1-\epsilon)$ unerased columns of $\mathbf{G}$ span $\mathbb{F}_2^k$ (the system has full rank $k$ ).

For a random $\mathbf{G}$ , the probability that $n(1-\epsilon)$ random columns span $\mathbb{F}_2^k$ approaches 1 as $n \to \infty$ , provided $k < n(1-\epsilon)$ , i.e., $R = k/n < 1 - \epsilon = C$ .

Error probability bound

The probability of failure (rank deficiency) is bounded by: $P_e \leq 2^{k - n(1-\epsilon)} = 2^{n(R - 1 + \epsilon)}$

which vanishes exponentially for $R < 1 - \epsilon = C$ . (A more careful analysis using concentration of the number of erasures around $n\epsilon$ is needed for the precise statement, but the intuition is clear: with enough unerased observations, the linear system is solvable.)

ex-ch09-07

Hard

Prove the strong converse for the BSC: if $R > C = 1 - \mathcal{H}_2(p)$ , then $P_e^{(n)} \to 1$ as $n \to \infty$ (not just $P_e^{(n)} \not\to 0$ ).

Show Hint

Use the method of types. The probability of decoding correctly is bounded by the probability of the output $y^n$ being in a ball of the right type around the correct codeword.

The number of type classes is polynomial in $n$ , while the number of codewords is exponential.

Solution

Type class approach

For message $m$ , the decoder declares $\hat{m} = m$ only if $y^n$ is "closer" to $x^n(m)$ than to any other codeword. The probability of correct decoding for message $m$ is:

$P_{c,m} = \sum_{y^n} P_{Y^n|X^n}(y^n | x^n(m)) \cdot \mathbf{1}\{g(y^n) = m\}$

Bound using types

The output distribution given input $x^n(m)$ concentrates on sequences $y^n$ with joint type close to $P_{XY}$ . The number of "distinguishable" output sequences (those in the typical set around $x^n(m)$ ) is approximately $2^{nH(Y|X)}$ .

For correct decoding, the decoder's decision region for message $m$ must include most of these $2^{nH(Y|X)}$ sequences. But the total number of output sequences is $2^{nH(Y)}$ , and there are $2^{nR}$ codewords.

If $R > C = H(Y) - H(Y|X)$ , then $2^{nR} \cdot 2^{nH(Y|X)} > 2^{nH(Y)}$ , so the decision regions must overlap significantly, forcing $P_{c,m} \to 0$ for most messages. More precisely, $P_e^{(n)} \to 1$ .

ex-ch09-08

Medium

Implement one iteration of the Blahut-Arimoto algorithm for the BSC( $p = 0.1$ ) starting from the uniform input distribution $\mathbf{p}^{(0)} = (0.5, 0.5)$ . Compute $\mathbf{Q}^{(1)}$ , $\mathbf{p}^{(1)}$ , and the mutual information after one iteration.

Show Hint

The BSC transition matrix is $\mathbf{P} = \begin{pmatrix} 0.9 & 0.1 \\ 0.1 & 0.9 \end{pmatrix}$ .

E-step: $Q_{s,r} = p_r P_{r,s} / \sum_{r'} p_{r'} P_{r',s}$ .

Solution

E-step

With $\mathbf{p}^{(0)} = (0.5, 0.5)$ :

$P_Y(0) = 0.5 \cdot 0.9 + 0.5 \cdot 0.1 = 0.5$ $P_Y(1) = 0.5 \cdot 0.1 + 0.5 \cdot 0.9 = 0.5$

$Q_{0,0} = \frac{0.5 \cdot 0.9}{0.5} = 0.9$ , $Q_{0,1} = \frac{0.5 \cdot 0.1}{0.5} = 0.1$ $Q_{1,0} = \frac{0.5 \cdot 0.1}{0.5} = 0.1$ , $Q_{1,1} = \frac{0.5 \cdot 0.9}{0.5} = 0.9$

M-step (no cost constraint)

$p_0^{(1)} \propto \exp(0.9\log 0.9 + 0.1\log 0.1) = \exp(-0.469\ln 2) = 2^{-0.469}$ $p_1^{(1)} \propto \exp(0.1\log 0.9 + 0.9\log 0.1) = \exp(-0.469\ln 2) = 2^{-0.469}$

By symmetry: $\mathbf{p}^{(1)} = (0.5, 0.5)$ .

Mutual information

$I(X;Y) = 1 - \mathcal{H}_2(0.1) = 1 - 0.469 = 0.531$ bits.

The BSC is symmetric, so the algorithm converges in one iteration to the uniform distribution. For asymmetric channels, multiple iterations are needed.

ex-ch09-09

Easy

A source with $H(V) = 1.5$ bits per symbol must be transmitted over a BEC( $\epsilon = 0.4$ ) using $\tau = 3$ channel uses per source symbol. Is the source transmissible?

Show Hint

Use the source-channel coding theorem: transmissible iff $H(V) < \tauC$ .

Solution

Check the condition

$C = 1 - 0.4 = 0.6$ bits per channel use. $\tauC = 3 \times 0.6 = 1.8$ bits per source symbol.

Since $H(V) = 1.5 < 1.8 = \tauC$ , the source is transmissible. Separate source coding at rate $> 1.5$ bits/symbol followed by channel coding at rate $< 0.6$ bits/use achieves reliable transmission.

ex-ch09-10

Medium

Show that for the BSC, the capacity-achieving input distribution must be $\text{Bernoulli}(1/2)$ by verifying the KKT conditions for the optimization $\max_{P_X} I(X; Y)$ .

Show Hint

Write $I(X;Y) = H(Y) - \mathcal{H}_2(p)$ where $H(Y)$ depends on $P_X(1) = \alpha$ .

The KKT conditions for the concave maximization require $\partial I / \partial \alpha = 0$ .

Solution

Express MI in terms of $\alpha$

Let $\alpha = P_X(1)$ . Then $P_Y(1) = \alpha(1-p) + (1-\alpha)p = p + \alpha(1-2p)$ .

$I = \mathcal{H}_2(p + \alpha(1-2p)) - \mathcal{H}_2(p)$ .

Take derivative

$\frac{dI}{d\alpha} = (1-2p) \cdot \mathcal{H}_2'(p + \alpha(1-2p))$ $where$ \mathcal{H}_2'(q) = -\log q + \log(1-q) = \log\frac{1-q}{q} $. Setting$ \frac{dI}{d\alpha} = 0 $: either$ p = 1/2 $(trivial channel) or$ \mathcal{H}_2'(P_Y(1)) = 0 $, i.e.,$ P_Y(1) = 1/2 $.$ P_Y(1) = 1/2 \implies p + \alpha(1-2p) = 1/2 \implies \alpha = 1/2$.

ex-ch09-11

Hard

Consider the channel with $\mathcal{X} = \{0, 1, 2\}$ , $\mathcal{Y} = \{0, 1\}$ , and: $P_{Y|X}(0|0) = 1, \quad P_{Y|X}(0|1) = 1/2, \quad P_{Y|X}(0|2) = 0$ (so $P_{Y|X}(1|0) = 0, P_{Y|X}(1|1) = 1/2, P_{Y|X}(1|2) = 1$ .) Find the capacity and the capacity-achieving input distribution.

Show Hint

This is not a symmetric channel.

Let $P_X = (\alpha_0, \alpha_1, \alpha_2)$ and maximize $I(X;Y)$ over the simplex.

By symmetry of the channel structure (0 → deterministic 0, 2 → deterministic 1), try $\alpha_0 = \alpha_2$ .

Solution

Parameterize

Let $\alpha_0 = \alpha_2 = \alpha$ and $\alpha_1 = 1 - 2\alpha$ .

$P_Y(0) = \alpha \cdot 1 + (1-2\alpha) \cdot 1/2 + \alpha \cdot 0 = \alpha + (1-2\alpha)/2 = 1/2$ .

Interesting: $P_Y$ is always $(1/2, 1/2)$ for this symmetric parameterization! So $H(Y) = 1$ bit.

Compute conditional entropy

$H(Y|X) = \alpha \cdot 0 + (1-2\alpha) \cdot 1 + \alpha \cdot 0 = 1 - 2\alpha$ .

$I(X;Y) = 1 - (1 - 2\alpha) = 2\alpha$ .

Maximized at $\alpha = 1/2$ ... but $\alpha_1 = 1 - 2\alpha \geq 0$ requires $\alpha \leq 1/2$ .

Capacity

At $\alpha = 1/2$ : $P_X = (1/2, 0, 1/2)$ and $C = 1$ bit.

The optimal strategy never uses input 1 (the noisy input). By using only inputs 0 and 2, the channel becomes noiseless (0 maps deterministically to output 0, 2 maps deterministically to output 1).

ex-ch09-12

Hard

Prove that the capacity-cost function $C(B)$ is concave in $B$ .

Show Hint

Show that $C(B)$ is the maximum of a linear function of $B$ over a convex set, or use a time-sharing argument.

If $(P_X^{(1)}, R_{1})$ achieves $C(B_1)$ and $(P_X^{(2)}, R_{2})$ achieves $C(B_2)$ , what does the mixture $\lambda P_X^{(1)} + (1-\lambda) P_X^{(2)}$ achieve?

Solution

Time-sharing argument

Let $P_X^{(1)}$ achieve $C(B_1)$ and $P_X^{(2)}$ achieve $C(B_2)$ . Consider $P_X^{(\lambda)} = \lambda P_X^{(1)} + (1-\lambda) P_X^{(2)}$ .

Cost: $\mathbb{E}_{P_X^{(\lambda)}}[b(X)] = \lambda B_1 + (1-\lambda) B_2$ .

Use concavity of MI

By concavity of $I(X;Y)$ in $P_X$ : $I_{P_X^{(\lambda)}}(X;Y) \geq \lambda I_{P_X^{(1)}}(X;Y) + (1-\lambda)I_{P_X^{(2)}}(X;Y)$ $= \lambda C(B_1) + (1-\lambda)C(B_2)$

Since $P_X^{(\lambda)}$ is feasible for cost $\lambda B_1 + (1-\lambda)B_2$ : $C(\lambda B_1 + (1-\lambda)B_2) \geq \lambdaC(B_1) + (1-\lambda)C(B_2)$

This is exactly the definition of concavity.

ex-ch09-13

Medium

A binary input channel has $\mathcal{X} = \{0, 1\}$ , $\mathcal{Y} = \{0, 1, 2\}$ , with transition probabilities: $P_{Y|X}(0|0) = 0.8, \; P_{Y|X}(1|0) = 0.1, \; P_{Y|X}(2|0) = 0.1$ $P_{Y|X}(0|1) = 0.1, \; P_{Y|X}(1|1) = 0.1, \; P_{Y|X}(2|1) = 0.8$ Is this channel symmetric? Compute or bound its capacity.

Show Hint

Check if the rows are permutations of each other.

The rows are $(0.8, 0.1, 0.1)$ and $(0.1, 0.1, 0.8)$ — are these permutations?

Solution

Check symmetry

Row 0: $(0.8, 0.1, 0.1)$ . Row 1: $(0.1, 0.1, 0.8)$ . Row 1 is a permutation of Row 0 (swap first and third elements). So the channel is weakly symmetric.

Check columns: Column 0 is $(0.8, 0.1)$ , Column 1 is $(0.1, 0.1)$ , Column 2 is $(0.1, 0.8)$ . Column 0 and Column 2 are permutations; Column 1 is constant. Not all columns are permutations of each other, so not strongly symmetric.

Compute capacity

For weakly symmetric channels: $H(Y|X)$ is the same for all $x$ : $H(Y|X=x) = \mathcal{H}(0.8, 0.1, 0.1) = -0.8\log 0.8 - 2(0.1\log 0.1) \approx 0.922$ .

$C = \max_{P_X} H(Y) - 0.922$ .

With $P_X(0) = P_X(1) = 1/2$ : $P_Y = (0.45, 0.1, 0.45)$ . $H(Y) = -2(0.45\log 0.45) - 0.1\log 0.1 = 1.477$ bits. $I = 1.477 - 0.922 = 0.555$ bits.

Since the channel is weakly symmetric with a "balanced" structure, $P_X = (1/2, 1/2)$ is likely optimal, giving $C \approx 0.555$ bits.

ex-ch09-14

Challenge

Prove that the Blahut-Arimoto algorithm converges to $C(B)$ from any initial distribution $\mathbf{p}^{(0)}$ . Specifically, show that $J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell)})$ is non-decreasing in $\ell$ and converges to $C(B)$ .

Show Hint

Show that $J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)}) \geq J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell)})$ (E-step maximizes over Q).

Show that $J(\mathbf{p}^{(\ell+1)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)}) \geq J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)})$ (M-step maximizes over p).

Use the fact that J is bounded above by $C(B)$ .

Solution

E-step increases J

For fixed $\mathbf{p}^{(\ell)}$ , the E-step sets $\mathbf{Q}^{(\ell+1)}$ to the Bayes-optimal backward channel, which maximizes $J$ over $\mathbf{Q}$ . So $J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)}) \geq J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell)})$ .

Moreover, $J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)}) = I_{P_X = \mathbf{p}^{(\ell)}}(X;Y)$ .

M-step increases J

For fixed $\mathbf{Q}^{(\ell+1)}$ , the M-step maximizes $J$ over $\mathbf{p} \in \mathcal{B}$ (the cost-constrained simplex). Since $J$ is concave in $\mathbf{p}$ for fixed $\mathbf{Q}$ , the M-step finds the global max: $J(\mathbf{p}^{(\ell+1)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)}) \geq J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell+1)})$ .

Convergence

Combining: $J^{(\ell+1)} \geq J^{(\ell)}$ (non-decreasing). Since $J \leq C(B)$ (bounded above), the sequence converges. At convergence, both the E-step and M-step are fixed points, which means $\mathbf{p}^{(\infty)}$ satisfies the KKT conditions for $\max_{\mathbf{p} \in \mathcal{B}} I(X;Y)$ . By concavity, the KKT point is the global maximum, so $J^{(\infty)} = C(B)$ .

ex-ch09-15

Medium

The binary channel with $P_{Y|X}(0|0) = 1-p_0$ , $P_{Y|X}(1|0) = p_0$ , $P_{Y|X}(0|1) = p_1$ , $P_{Y|X}(1|1) = 1-p_1$ is the general binary channel. When $p_0 = p_1 = p$ , it reduces to the BSC. Show that the capacity of the general binary channel with $p_0 \neq p_1$ is achieved by a non-uniform input distribution, and give the optimality condition.

Show Hint

Write $I(X;Y)$ as a function of $\alpha = P_X(1)$ and take the derivative.

Set $dI/d\alpha = 0$ and solve for $\alpha^*$ .

Solution

Express MI

$P_Y(0) = (1-\alpha)(1-p_0) + \alpha p_1$ , $P_Y(1) = (1-\alpha)p_0 + \alpha(1-p_1)$ .

$H(Y|X) = (1-\alpha)\mathcal{H}_2(p_0) + \alpha\mathcal{H}_2(p_1)$ .

$I(X;Y) = \mathcal{H}_2(P_Y(1)) - (1-\alpha)\mathcal{H}_2(p_0) - \alpha\mathcal{H}_2(p_1)$ .

Optimality condition

$\frac{dI}{d\alpha} = (1-p_1-p_0)\log\frac{1-P_Y(1)}{P_Y(1)} - \mathcal{H}_2(p_1) + \mathcal{H}_2(p_0) = 0$ $When$ p_0 = p_1 $:$ \mathcal{H}_2(p_0) = \mathcal{H}_2(p_1) $and we get$ P_Y(1) = 1/2 $, giving$ \alpha = 1/2 $(uniform input). When$ p_0 \neq p_1 $: the optimal$ \alpha^* $must satisfy a transcendental equation — it is generally not$ 1/2$. The channel asymmetry pushes the optimal input towards the "better" direction.

Exercises

ex-ch09-01

Compute binary entropy

Compute capacity

ex-ch09-02

BEC capacity

BSC capacity

Comparison

ex-ch09-03

Forward direction

Reverse direction

ex-ch09-04

Verify strong symmetry

Compute capacity

ex-ch09-05

Analyze $\ntn{entropy}(Y|X)$

Analyze $\ntn{entropy}(Y)$

Conclude

ex-ch09-06

Setup

Decoding condition

Error probability bound

ex-ch09-07

Type class approach

Bound using types

ex-ch09-08

E-step

M-step (no cost constraint)

Mutual information

ex-ch09-09

Check the condition

ex-ch09-10

Express MI in terms of $\alpha$

Take derivative

ex-ch09-11

Parameterize

Compute conditional entropy

Capacity

ex-ch09-12

Time-sharing argument

Use concavity of MI

ex-ch09-13

Check symmetry

Compute capacity

ex-ch09-14

E-step increases J

M-step increases J

Convergence

ex-ch09-15

Express MI

Optimality condition