Ferkans — Interactive Telecom Tutor

Computing Capacity for Important Channels

The channel coding theorem tells us $C = \max_{P_X} I(X;Y)$ , but actually computing the maximum requires solving an optimization problem. For channels with symmetry structure, the solution simplifies dramatically. We now compute the capacity of the most important DMC examples, each illustrating a different technique.

BSC vs. BEC Capacity Comparison

Side-by-side animation of BSC and BEC capacity curves. The BEC capacity

C = 1 - \epsilon

decreases linearly, while the BSC capacity

C = 1 - \mathcal{H}_2(p)

drops much faster. The gap between them illustrates why erasures are less harmful than errors.

Definition:
Binary Symmetric Channel (BSC)

The binary symmetric channel BSC( $p$ ) has $\mathcal{X} = \mathcal{Y} = \{0, 1\}$ with transition law:

$Y = X \oplus Z, \quad Z \sim \text{Bernoulli}(p)$

where $\oplus$ is modulo-2 addition and $Z$ is independent of $X$ . The parameter $p \in [0, 1/2]$ is the crossover probability — the probability that a bit is flipped.

Theorem: Capacity of the BSC

The capacity of the BSC( $p$ ) is:

$C = 1 - \mathcal{H}_2(p)$

where $\mathcal{H}_2(p) = -p\log p - (1-p)\log(1-p)$ is the binary entropy function. The capacity-achieving input distribution is $X \sim \text{Bernoulli}(1/2)$ (uniform).

The maximum output entropy is $H(Y) = 1$ bit (achieved when $Y$ is uniform). The noise entropy is $H(Y|X) = H(Z) = \mathcal{H}_2(p)$ (regardless of the input distribution). So $I(X;Y) = H(Y) - \mathcal{H}_2(p)$ is maximized by making $H(Y) = 1$ , which requires $X \sim \text{Bernoulli}(1/2)$ .

Proof

Compute mutual information

$I(X;Y) = H(Y) - H(Y|X) = H(Y) - H(X \oplus Z | X)KATEXPLACEHOLDER0END= H(Y) - H(Z) = H(Y) - \mathcal{H}_2(p)KATEXPLACEHOLDER1ENDC = 1 - \mathcal{H}_2(p)$ $When$ X \sim \text{Bernoulli}(1/2) $,$ Y = X \oplus Z $is also$ \text{Bernoulli}(1/2) $(since the XOR of a uniform bit with anything is uniform), so$ H(Y) = 1$.

,

Definition:
Binary Erasure Channel (BEC)

The binary erasure channel BEC( $\epsilon$ ) has $\mathcal{X} = \{0, 1\}$ , $\mathcal{Y} = \{0, ?, 1\}$ :

$Y = \begin{cases} X & \text{with probability } 1-\epsilon \\ ? & \text{with probability } \epsilon \end{cases}$

With probability $1-\epsilon$ , the input passes through perfectly; with probability $\epsilon$ , it is erased (the decoder sees "?" and knows the bit was lost but not which bit it was).

Theorem: Capacity of the BEC

The capacity of the BEC( $\epsilon$ ) is:

$C = 1 - \epsilon$

The capacity-achieving input distribution is $X \sim \text{Bernoulli}(1/2)$ (uniform).

A fraction $1-\epsilon$ of the bits arrive perfectly, and a fraction $\epsilon$ are lost. So the effective throughput is $1-\epsilon$ bits per channel use. The beauty of the BEC is that it separates "noise" from "loss" — when a bit arrives, it arrives perfectly. This makes the BEC the ideal channel for understanding erasure-based coding (LDPC codes, fountain codes).

Proof

Compute conditional entropy

$H(X|Y) = P_Y(0)H(X|Y=0) + P_Y(1)H(X|Y=1) + P_Y(?)H(X|Y=?)$ $When$ Y = 0 $:$ X = 0 $with certainty, so$ H(X|Y=0) = 0 $. When$ Y = 1 $:$ X = 1 $with certainty, so$ H(X|Y=1) = 0 $. When$ Y = ? $:$ X $is still$ \text{Bernoulli}(P_X(1)) $, so$ H(X|Y=?) = H(X) $. Therefore:$ H(X|Y) = \epsilon \cdot H(X)$.

Maximize mutual information

$I(X;Y) = H(X) - H(X|Y) = H(X) - \epsilonH(X) = (1-\epsilon)H(X)KATEXPLACEHOLDER0ENDC = 1 - \epsilon$ $

Definition:
Symmetric and Strongly Symmetric Channels

A DMC with transition matrix $\mathbf{P}$ (where $P_{r,s} = P_{Y|X}(s|r)$ ) is:

Weakly symmetric: Every row of $\mathbf{P}$ is a permutation of the first row. (Each input "sees" the same noise pattern, just shuffled.)
Strongly symmetric: Additionally, every column of $\mathbf{P}$ is a permutation of the first column.

For a strongly symmetric channel: $C = \log|\mathcal{Y}| - \mathcal{H}(P_{1,1}, P_{1,2}, \ldots, P_{1,|\mathcal{Y}|})$

achieved by the uniform input distribution $P_X = \text{Uniform}(\mathcal{X})$ .

Both the BSC and the BEC are strongly symmetric. The additive noise channel over $\mathbb{F}_q$ is also strongly symmetric with capacity $\log q - H(Z)$ .

Theorem: Capacity of Additive Noise Channels

For a discrete additive noise channel $Y = X + Z$ over $\mathbb{F}_q$ (addition in the finite field of order $q$ ), where $Z$ has PMF $P_Z$ independent of $X$ :

$C = \log q - H(Z)$

The capacity-achieving input distribution is $X \sim \text{Uniform}(\mathbb{F}_q)$ .

The maximum output entropy is $\log q$ (achieved when $Y$ is uniform, which happens when $X$ is uniform since $Y = X + Z$ is a convolution). The noise consumes $H(Z)$ bits of the output entropy. The BSC is the special case $q = 2$ , $Z \sim \text{Bernoulli}(p)$ : $C = 1 - \mathcal{H}_2(p)$ .

Proof

Strongly symmetric structure

The transition matrix has $P_{r,s} = P_Z(s - r)$ . Each row is a cyclic shift of the noise PMF, and each column sums to 1 with the same set of values (permuted). This is strongly symmetric.

$H(Y|X) = H(X + Z | X) = H(Z) \quad \text{(constant for all } x)$

$C = \max_{P_X} H(Y) - H(Z) = \log q - H(Z)$

Example: Capacity of the Z-Channel

The Z-channel has $\mathcal{X} = \mathcal{Y} = \{0, 1\}$ with:

$P_{Y|X}(0|0) = 1$ (0 always passes through)
$P_{Y|X}(0|1) = p$ , $P_{Y|X}(1|1) = 1-p$ (1 may be flipped to 0)

Compute the capacity and the capacity-achieving input distribution.

Solution

Compute mutual information

Let $P_X(1) = \alpha$ . Then:

$P_Y(0) = (1-\alpha) + \alpha p = 1 - \alpha(1-p)$
$P_Y(1) = \alpha(1-p)$

$H(Y|X) = (1-\alpha) \cdot 0 + \alpha \cdot \mathcal{H}_2(p) = \alpha\mathcal{H}_2(p)$

$I(X;Y) = \mathcal{H}_2(\alpha(1-p)) - \alpha\mathcal{H}_2(p)$

Optimize over $\alpha$

Taking the derivative with respect to $\alpha$ and setting to zero:

$(1-p)\log\frac{1 - \alpha(1-p)}{\alpha(1-p)} = \mathcal{H}_2(p)$

This gives $\alpha^* = \frac{1}{1 + 2^{\mathcal{H}_2(p)/(1-p)}}$ (not uniform!).

The capacity is: $C = \log(1 + (1-p)2^{-\mathcal{H}_2(p)/(1-p)})$ .

Note: the Z-channel is NOT symmetric, so the optimal input is not uniform. For $p = 0$ : $C = 1$ (noiseless). For $p \to 1/2$ : $C \to 0$ .

DMC Capacity Comparison

Compare the capacity of the BSC, BEC, and Z-channel as a function of the noise/erasure parameter. The BSC has the lowest capacity for a given $p$ because bit errors are worse than erasures.

Parameters

Maximum noise parameter0.5

Upper limit of the noise parameter range

BSC Capacity vs. Crossover Probability

The capacity of the BSC as a function of the crossover probability $p$ . At $p = 0$ : perfect channel ( $C = 1$ ). At $p = 1/2$ : completely noisy ( $C = 0$ ).

Parameters

Show H_2(p)1

Toggle display of the binary entropy function

Comparison of Important DMC Channels

Channel	Capacity $C$	Optimal $P_X$	Symmetric?
BSC( $p$ )	$1 - \mathcal{H}_2(p)$	Bernoulli(1/2)	Strongly symmetric
BEC( $\epsilon$ )	$1 - \epsilon$	Bernoulli(1/2)	Strongly symmetric
Z-channel( $p$ )	$\log(1 + (1-p)2^{-\mathcal{H}_2(p)/(1-p)})$	Bernoulli( $\alpha^$ ), $\alpha^ < 1/2$	Not symmetric
Additive $\mathbb{F}_q$	$\log q - H(Z)$	Uniform on $\mathbb{F}_q$	Strongly symmetric

Historical Note: The BEC: A Pedagogical Powerhouse

1955

The binary erasure channel was introduced by Elias in 1955 as a simplified model for packet-based communication. Its clean structure (no bit errors, only losses) makes it the ideal setting for understanding LDPC codes (density evolution is exact on the BEC), polar codes (polarization is easiest to analyze on the BEC), and fountain codes (rateless codes for the BEC).

In modern systems, the BEC models packet erasures in internet communication, where packets either arrive perfectly (via checksums) or are declared lost. This is the operational regime of HTTP, TCP/IP, and streaming protocols.

Common Mistake: BEC Capacity is Always Higher than BSC Capacity

Mistake:

Assuming the BSC and BEC have similar capacities for the same parameter value (e.g., BSC(0.1) and BEC(0.1)). Students sometimes treat $p$ and $\epsilon$ as equivalent.

Correction:

For the same parameter: $C_{\text{BEC}}(\epsilon) = 1 - \epsilon$ while $C_{\text{BSC}}(p) = 1 - \mathcal{H}_2(p) < 1 - p$ for $0 < p < 1/2$ . The BEC always has higher capacity because erasures are "kinder" than errors: the decoder knows which bits were lost (and can request retransmission), while with errors it does not even know which bits are wrong.

Key Takeaway

The BSC capacity is $1 - \mathcal{H}_2(p)$ , the BEC capacity is $1 - \epsilon$ , and additive noise channels over $\mathbb{F}_q$ have capacity $\log q - H(Z)$ . For symmetric channels, the uniform input is optimal. For asymmetric channels (like the Z-channel), the optimal input must be computed via optimization. Erasures are less harmful than errors: $C_{\text{BEC}} > C_{\text{BSC}}$ for the same parameter value.

Examples of DMC Capacity

Computing Capacity for Important Channels

BSC vs. BEC Capacity Comparison

Definition: Binary Symmetric Channel (BSC)

Theorem: Capacity of the BSC

Compute mutual information

Definition: Binary Erasure Channel (BEC)

Theorem: Capacity of the BEC

Compute conditional entropy

Maximize mutual information

Definition: Symmetric and Strongly Symmetric Channels

Theorem: Capacity of Additive Noise Channels

Strongly symmetric structure

Example: Capacity of the Z-Channel

Compute mutual information

Optimize over $\alpha$

DMC Capacity Comparison

Parameters

BSC Capacity vs. Crossover Probability

Parameters

Comparison of Important DMC Channels

Historical Note: The BEC: A Pedagogical Powerhouse

Common Mistake: BEC Capacity is Always Higher than BSC Capacity

Key Takeaway

Definition:
Binary Symmetric Channel (BSC)

Definition:
Binary Erasure Channel (BEC)

Definition:
Symmetric and Strongly Symmetric Channels