Examples of DMC Capacity

Computing Capacity for Important Channels

The channel coding theorem tells us C=max⁑PXI(X;Y)C = \max_{P_X} I(X;Y), but actually computing the maximum requires solving an optimization problem. For channels with symmetry structure, the solution simplifies dramatically. We now compute the capacity of the most important DMC examples, each illustrating a different technique.

BSC vs. BEC Capacity Comparison

Side-by-side animation of BSC and BEC capacity curves. The BEC capacity C=1βˆ’Ο΅C = 1 - \epsilon decreases linearly, while the BSC capacity C=1βˆ’H2(p)C = 1 - \mathcal{H}_2(p) drops much faster. The gap between them illustrates why erasures are less harmful than errors.

Definition:

Binary Symmetric Channel (BSC)

The binary symmetric channel BSC(pp) has X=Y={0,1}\mathcal{X} = \mathcal{Y} = \{0, 1\} with transition law:

Y=XβŠ•Z,Z∼Bernoulli(p)Y = X \oplus Z, \quad Z \sim \text{Bernoulli}(p)

where βŠ•\oplus is modulo-2 addition and ZZ is independent of XX. The parameter p∈[0,1/2]p \in [0, 1/2] is the crossover probability β€” the probability that a bit is flipped.

Theorem: Capacity of the BSC

The capacity of the BSC(pp) is:

C=1βˆ’H2(p)C = 1 - \mathcal{H}_2(p)

where H2(p)=βˆ’plog⁑pβˆ’(1βˆ’p)log⁑(1βˆ’p)\mathcal{H}_2(p) = -p\log p - (1-p)\log(1-p) is the binary entropy function. The capacity-achieving input distribution is X∼Bernoulli(1/2)X \sim \text{Bernoulli}(1/2) (uniform).

The maximum output entropy is H(Y)=1H(Y) = 1 bit (achieved when YY is uniform). The noise entropy is H(Y∣X)=H(Z)=H2(p)H(Y|X) = H(Z) = \mathcal{H}_2(p) (regardless of the input distribution). So I(X;Y)=H(Y)βˆ’H2(p)I(X;Y) = H(Y) - \mathcal{H}_2(p) is maximized by making H(Y)=1H(Y) = 1, which requires X∼Bernoulli(1/2)X \sim \text{Bernoulli}(1/2).

,

Definition:

Binary Erasure Channel (BEC)

The binary erasure channel BEC(Ο΅\epsilon) has X={0,1}\mathcal{X} = \{0, 1\}, Y={0,?,1}\mathcal{Y} = \{0, ?, 1\}:

Y={XwithΒ probabilityΒ 1βˆ’Ο΅?withΒ probabilityΒ Ο΅Y = \begin{cases} X & \text{with probability } 1-\epsilon \\ ? & \text{with probability } \epsilon \end{cases}

With probability 1βˆ’Ο΅1-\epsilon, the input passes through perfectly; with probability Ο΅\epsilon, it is erased (the decoder sees "?" and knows the bit was lost but not which bit it was).

Theorem: Capacity of the BEC

The capacity of the BEC(Ο΅\epsilon) is:

C=1βˆ’Ο΅C = 1 - \epsilon

The capacity-achieving input distribution is X∼Bernoulli(1/2)X \sim \text{Bernoulli}(1/2) (uniform).

A fraction 1βˆ’Ο΅1-\epsilon of the bits arrive perfectly, and a fraction Ο΅\epsilon are lost. So the effective throughput is 1βˆ’Ο΅1-\epsilon bits per channel use. The beauty of the BEC is that it separates "noise" from "loss" β€” when a bit arrives, it arrives perfectly. This makes the BEC the ideal channel for understanding erasure-based coding (LDPC codes, fountain codes).

Definition:

Symmetric and Strongly Symmetric Channels

A DMC with transition matrix P\mathbf{P} (where Pr,s=PY∣X(s∣r)P_{r,s} = P_{Y|X}(s|r)) is:

  • Weakly symmetric: Every row of P\mathbf{P} is a permutation of the first row. (Each input "sees" the same noise pattern, just shuffled.)

  • Strongly symmetric: Additionally, every column of P\mathbf{P} is a permutation of the first column.

For a strongly symmetric channel: C=log⁑∣Yβˆ£βˆ’H(P1,1,P1,2,…,P1,∣Y∣)C = \log|\mathcal{Y}| - \mathcal{H}(P_{1,1}, P_{1,2}, \ldots, P_{1,|\mathcal{Y}|})

achieved by the uniform input distribution PX=Uniform(X)P_X = \text{Uniform}(\mathcal{X}).

Both the BSC and the BEC are strongly symmetric. The additive noise channel over Fq\mathbb{F}_q is also strongly symmetric with capacity log⁑qβˆ’H(Z)\log q - H(Z).

Theorem: Capacity of Additive Noise Channels

For a discrete additive noise channel Y=X+ZY = X + Z over Fq\mathbb{F}_q (addition in the finite field of order qq), where ZZ has PMF PZP_Z independent of XX:

C=log⁑qβˆ’H(Z)C = \log q - H(Z)

The capacity-achieving input distribution is X∼Uniform(Fq)X \sim \text{Uniform}(\mathbb{F}_q).

The maximum output entropy is log⁑q\log q (achieved when YY is uniform, which happens when XX is uniform since Y=X+ZY = X + Z is a convolution). The noise consumes H(Z)H(Z) bits of the output entropy. The BSC is the special case q=2q = 2, Z∼Bernoulli(p)Z \sim \text{Bernoulli}(p): C=1βˆ’H2(p)C = 1 - \mathcal{H}_2(p).

Example: Capacity of the Z-Channel

The Z-channel has X=Y={0,1}\mathcal{X} = \mathcal{Y} = \{0, 1\} with:

  • PY∣X(0∣0)=1P_{Y|X}(0|0) = 1 (0 always passes through)
  • PY∣X(0∣1)=pP_{Y|X}(0|1) = p, PY∣X(1∣1)=1βˆ’pP_{Y|X}(1|1) = 1-p (1 may be flipped to 0)

Compute the capacity and the capacity-achieving input distribution.

DMC Capacity Comparison

Compare the capacity of the BSC, BEC, and Z-channel as a function of the noise/erasure parameter. The BSC has the lowest capacity for a given pp because bit errors are worse than erasures.

Parameters
0.5

Upper limit of the noise parameter range

BSC Capacity vs. Crossover Probability

The capacity of the BSC as a function of the crossover probability pp. At p=0p = 0: perfect channel (C=1C = 1). At p=1/2p = 1/2: completely noisy (C=0C = 0).

Parameters
1

Toggle display of the binary entropy function

Comparison of Important DMC Channels

ChannelCapacity CCOptimal PXP_XSymmetric?
BSC(pp)1βˆ’H2(p)1 - \mathcal{H}_2(p)Bernoulli(1/2)Strongly symmetric
BEC(Ο΅\epsilon)1βˆ’Ο΅1 - \epsilonBernoulli(1/2)Strongly symmetric
Z-channel(pp)log⁑(1+(1βˆ’p)2βˆ’H2(p)/(1βˆ’p))\log(1 + (1-p)2^{-\mathcal{H}_2(p)/(1-p)})Bernoulli(Ξ±βˆ—\alpha^*), Ξ±βˆ—<1/2\alpha^* < 1/2Not symmetric
Additive Fq\mathbb{F}_qlog⁑qβˆ’H(Z)\log q - H(Z)Uniform on Fq\mathbb{F}_qStrongly symmetric

Historical Note: The BEC: A Pedagogical Powerhouse

1955

The binary erasure channel was introduced by Elias in 1955 as a simplified model for packet-based communication. Its clean structure (no bit errors, only losses) makes it the ideal setting for understanding LDPC codes (density evolution is exact on the BEC), polar codes (polarization is easiest to analyze on the BEC), and fountain codes (rateless codes for the BEC).

In modern systems, the BEC models packet erasures in internet communication, where packets either arrive perfectly (via checksums) or are declared lost. This is the operational regime of HTTP, TCP/IP, and streaming protocols.

Common Mistake: BEC Capacity is Always Higher than BSC Capacity

Mistake:

Assuming the BSC and BEC have similar capacities for the same parameter value (e.g., BSC(0.1) and BEC(0.1)). Students sometimes treat pp and Ο΅\epsilon as equivalent.

Correction:

For the same parameter: CBEC(Ο΅)=1βˆ’Ο΅C_{\text{BEC}}(\epsilon) = 1 - \epsilon while CBSC(p)=1βˆ’H2(p)<1βˆ’pC_{\text{BSC}}(p) = 1 - \mathcal{H}_2(p) < 1 - p for 0<p<1/20 < p < 1/2. The BEC always has higher capacity because erasures are "kinder" than errors: the decoder knows which bits were lost (and can request retransmission), while with errors it does not even know which bits are wrong.

Key Takeaway

The BSC capacity is 1βˆ’H2(p)1 - \mathcal{H}_2(p), the BEC capacity is 1βˆ’Ο΅1 - \epsilon, and additive noise channels over Fq\mathbb{F}_q have capacity log⁑qβˆ’H(Z)\log q - H(Z). For symmetric channels, the uniform input is optimal. For asymmetric channels (like the Z-channel), the optimal input must be computed via optimization. Erasures are less harmful than errors: CBEC>CBSCC_{\text{BEC}} > C_{\text{BSC}} for the same parameter value.