Ferkans — Interactive Telecom Tutor

Three Extensions: Feedback, Source-Channel Coding, and Capacity Computation

This section covers three important extensions of the channel coding theorem. First, we prove the perhaps surprising result that feedback does not increase the capacity of a DMC — even though the encoder can adapt its transmission based on what the decoder has already received. Second, we state the source-channel coding theorem, connecting source coding and channel coding via the separation principle (already proved in Chapter 7, Section 4, for the general case). Third, we introduce the capacity-cost function and the Blahut-Arimoto algorithm for computing capacity numerically.

Definition:
Channel Coding with Feedback

An $(R, n)$ -code with feedback for a DMC consists of:

A message $M \sim \text{Uniform}([1 : 2^{nR}])$
A sequence of encoding functions: $f_i : \mathcal{M} \times \mathcal{Y}^{i-1} \to \mathcal{X}, \quad i = 1, \ldots, n$ so that $X_i = f_i(M, Y_1, \ldots, Y_{i-1})$
A decoding function $g : \mathcal{Y}^n \to \mathcal{M}$

The key difference from the no-feedback case: the encoder can use past channel outputs $Y^{i-1}$ to choose the current input $X_i$ . The channel output is fed back to the encoder with one time step of delay.

Theorem: Feedback Does Not Increase DMC Capacity

The feedback capacity of a discrete memoryless channel equals the non-feedback capacity:

$C_{\text{fb}} = C = \max_{P_X} I(X; Y)$

The converse proof from Section 4 already covers the feedback case! The key step $I(M, Y^{i-1}; Y_i | X_i) = 0$ holds because the channel is memoryless: $Y_i$ depends on $(M, Y^{i-1})$ only through $X_i$ , regardless of whether $X_i$ was chosen using feedback or not.

The achievability is trivial: any code without feedback is also valid with feedback (just ignore the feedback). So the achievable rates are at least as large as without feedback.

The point is that feedback cannot help because the channel is memoryless. The output $Y_i$ carries at most $C$ bits of new information per use, and no encoding trick can change this. What feedback can do is simplify coding and improve error exponents — but not capacity.

Proof

Converse (same as without feedback)

The converse proof from Section 4 applies verbatim. The only step that could potentially change is the bound $I(M; Y_i | Y^{i-1}) \leq I(X_i; Y_i)$ .

With feedback, $X_i = f_i(M, Y^{i-1})$ is a function of both $M$ and $Y^{i-1}$ . But the memoryless property still gives: $I(M, Y^{i-1}; Y_i | X_i) = 0$

So: $I(M; Y_i | Y^{i-1}) \leq I(M, Y^{i-1}; Y_i) = I(X_i; Y_i) \leq C$ .

The rest of the converse follows identically: $R \leq C$ .

Achievability

Any code without feedback works with feedback (ignore the feedback signal). Since rates up to $C$ are achievable without feedback, they are also achievable with feedback.

,

When Feedback Does Help

Although feedback does not increase DMC capacity, it provides significant practical benefits:

Simplified coding: The Schalkwijk-Kailath scheme for the Gaussian channel with feedback achieves capacity using a simple linear strategy — no codebook search, no joint typicality decoding.
Improved error exponents: With feedback, the error probability can decay doubly exponentially in the block length ( $P_e \sim 2^{-2^{nE}}$ ), compared to single exponential ( $P_e \sim 2^{-nE}$ ) without feedback. Burnashev's exponent gives the optimal error exponent with feedback.
ARQ protocols: On the BEC, feedback enables simple retransmission of erased packets, achieving capacity with minimal coding complexity.
Channels with memory: For channels with memory (ISI channels, fading channels), feedback can increase capacity. The DMC result does not generalize.
Multiuser channels: For MACs, broadcast channels, and interference channels, feedback can strictly increase the capacity region.

Historical Note: The Schalkwijk-Kailath Scheme

1966

In 1966, Schalkwijk and Kailath devised a remarkably simple feedback scheme for the Gaussian channel. The encoder sends a refinement of its estimate of the decoder's error at each step, using the feedback to track the decoder's state. After $n$ steps, the error probability decays as $2^{-2^{n\alpha}}$ — doubly exponential in the block length. This is astronomically faster than the single-exponential decay without feedback.

The scheme works because feedback allows the encoder to "zoom in" on the message: after the first transmission, the encoder knows what the decoder received and can send a correction. After the second transmission, it corrects the correction, and so on — each step squares the effective SNR.

Definition:
The Capacity-Cost Function

Many channels have input cost constraints. A cost function $b : \mathcal{X} \to \mathbb{R}_+$ assigns a cost to each input symbol, with the per-codeword cost $b(x^n) = \frac{1}{n}\sum_{i=1}^n b(x_i)$ .

The capacity-cost function $C(B)$ is the supremum of achievable rates for codes satisfying $b(x^n(m)) \leq B$ for all messages $m$ :

$C(B) = \max_{P_X : \mathbb{E}[b(X)] \leq B} I(X; Y)$

Properties:

$C(B)$ is non-decreasing in $B$ (more budget = more capacity)
$C(B)$ is concave in $B$ (diminishing returns)
$C(B_{\max}) = C$ where $B_{\max}$ makes the constraint inactive

The capacity-cost function is the channel coding analogue of the rate-distortion function: both are optimizations over input/test-channel distributions subject to an expected-value constraint. The water-filling solution for parallel Gaussian channels (Chapter 10) is a special case where the cost is power.

Example: BSC with Hamming Weight Constraint

The BSC( $p$ ) with Hamming weight constraint: $b(x) = \mathbf{1}\{x = 1\}$ , budget $B = \delta$ (at most fraction $\delta$ of input bits can be 1). Find $C(\delta)$ .

Solution

Compute output distribution

Let $P_X(1) = \alpha$ with $\alpha \leq \delta$ . Then: $P_Y(1) = \alpha(1-p) + (1-\alpha)p = p + \alpha(1-2p)$

$I(X;Y) = \mathcal{H}_2(p + \alpha(1-2p)) - \mathcal{H}_2(p)$

Since $\mathcal{H}_2$ is concave and increasing on $[0, 1/2]$ , $I$ is maximized by taking $\alpha$ as large as possible.

Capacity-cost function

$C(\delta) = \begin{cases} \mathcal{H}_2(p + \delta(1-2p)) - \mathcal{H}_2(p) & \text{if } 0 \leq \delta \leq 1/2 \\ 1 - \mathcal{H}_2(p) & \text{if } \delta > 1/2 \end{cases}$ $For$ \delta \geq 1/2$, the constraint is inactive and we recover the unconstrained BSC capacity.

Definition:
The Blahut-Arimoto Algorithm

The Blahut-Arimoto algorithm computes $C(B) = \max_{P_X \in \mathcal{B}} I(X; Y)$ by alternating maximization. Define:

$J(\mathbf{p}, \mathbf{P}, \mathbf{Q}) = \sum_r \sum_s p_r P_{r,s} \log \frac{Q_{s,r}}{p_r}$

where $\mathbf{Q}$ is an auxiliary backward channel matrix.

Iteration (starting from initial $\mathbf{p}^{(0)}$ , with fixed Lagrange multiplier $\lambda$ ):

E-step: Update backward channel: $Q_{s,r}^{(\ell)} = \frac{p_r^{(\ell-1)} P_{r,s}}{\sum_{r'} p_{r'}^{(\ell-1)} P_{r',s}}$
M-step: Update input distribution: $p_r^{(\ell)} = \frac{\exp\left(\sum_s P_{r,s} \log Q_{s,r}^{(\ell)} - \lambda b_r\right)}{\sum_{r'} \exp\left(\sum_s P_{r',s} \log Q_{s,r'}^{(\ell)} - \lambda b_{r'}\right)}$

Convergence: $J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell)}) \to C(B)$ as $\ell \to \infty$ , where $B = \sum_r b_r p_r^{(\infty)}$ .

The Blahut-Arimoto algorithm converges to the global maximum because: (1) $J$ is concave in $\mathbf{p}$ for fixed $\mathbf{Q}$ , (2) $J$ is concave in $\mathbf{Q}$ for fixed $\mathbf{p}$ , and (3) the constraint sets are convex and compact. Alternating maximization under these conditions provably converges.

The same algorithmic structure computes the rate-distortion function $R(D)$ — simply swap the roles of encoder and decoder. The Blahut-Arimoto algorithm is a special case of the EM algorithm, predating it by several years.

, ,

Blahut-Arimoto algorithm

An iterative algorithm for computing channel capacity (or the rate-distortion function) by alternating between optimizing the input distribution and the backward channel. Converges to the global optimum due to the concavity of mutual information.

Blahut-Arimoto Algorithm for Channel Capacity

Complexity: O(|X| * |Y|) per iteration, typically converges in O(1/eps) iterations

Input: Transition matrix P[r,s], cost vector b[r], Lagrange multiplier lambda, tolerance eps

Output: Capacity-achieving distribution p[r], capacity C

1. Initialize p[r] = 1/|X| for all r (uniform)

2. Repeat:

a. E-step: For all s, r:

Q[s,r] = p[r] * P[r,s] / sum_r'(p[r'] * P[r',s])

b. M-step: For all r:

p_new[r] = exp(sum_s(P[r,s] * log(Q[s,r])) - lambda * b[r])

p_new[r] = p_new[r] / sum_r'(p_new[r']) (normalize)

c. Compute C = sum_r sum_s p_new[r] * P[r,s] * log(Q[s,r] / p_new[r])

d. If |p_new - p| < eps: break

e. p = p_new

3. Return p, C

For the unconstrained case (no cost), set lambda = 0 and the M-step simplifies. For the constrained case, lambda is a parameter that traces out the capacity-cost curve C(B) parametrically: each lambda gives a (B, C(B)) pair.

Blahut-Arimoto Algorithm Convergence

Watch the Blahut-Arimoto algorithm converge to the capacity-achieving input distribution for different DMC channels. The plot shows the input distribution and the mutual information at each iteration.

Parameters

Channel type0

0=BSC, 1=BEC, 2=Z-channel, 3=Custom 4x4

Noise parameter0.1

Crossover/erasure probability

Max iterations20

Maximum number of Blahut-Arimoto iterations

Capacity-Cost Function

Explore the capacity-cost function C(B) for the BSC with Hamming weight constraint. Observe the concavity and the saturation at the unconstrained capacity.

Parameters

BSC crossover probability p0.1

Channel noise parameter

Quick Check

Why does feedback not increase the capacity of a DMC?

Because the encoder already knows the message and does not need the channel output

Because the memoryless property means $Y_i$ carries at most $C$ bits of new information regardless of how $X_i$ was chosen

Because feedback introduces a cycle that reduces the effective rate

Correction:

Because the memoryless property means

Y_i

carries at most

C

bits of new information regardless of how

X_i

was chosen

Correct! The converse bound $\I(M; Y_i | Y^{i-1}) \\leq \I(X_i; Y_i) \\leq \C$ holds whether or not $X_i$ uses feedback. The channel's memoryless property is the key: each use conveys at most $\C$ bits, period.

🔧Engineering Note

ARQ: Feedback in Practice

Automatic Repeat reQuest (ARQ) is the most common use of feedback in practical communication systems. On the BEC, the encoder simply retransmits erased packets. On general channels, Hybrid ARQ (HARQ) combines error correction with retransmission: the decoder attempts to decode and requests retransmission only if decoding fails.

While ARQ does not increase the capacity of a DMC, it achieves capacity with much simpler codes. Without ARQ, achieving near-capacity performance requires long block lengths and powerful codes (LDPC, polar). With ARQ, shorter codes and simpler decoders suffice because errors can be corrected by retransmission.

In 5G NR, HARQ with incremental redundancy is used on the downlink and uplink, with the feedback channel carrying ACK/NACK signals.

🔧Engineering Note

Computing Capacity in Practice

The Blahut-Arimoto algorithm is the standard tool for computing DMC capacity numerically. It converges geometrically (linear convergence rate) and typically requires 20-50 iterations for 6 digits of accuracy.

For continuous-alphabet channels (Gaussian, fading), the optimization is infinite-dimensional and Blahut-Arimoto does not directly apply. Instead:

For AWGN: the capacity has a closed-form expression (Chapter 10).
For fading: capacity involves an expectation over the fading distribution.
For MIMO: the water-filling solution provides an efficient computation.

Modern extensions include accelerated Blahut-Arimoto (using Nesterov momentum) and neural-network-based capacity estimation for channels without closed-form expressions.

Key Takeaway

Feedback does not increase DMC capacity ( $C_{\text{fb}} = C$ ) because the memoryless property limits each channel use to at most $C$ bits, regardless of how the input was chosen. The capacity-cost function $C(B) = \max_{P_X : \mathbb{E}[b(X)] \leq B} I(X;Y)$ handles input constraints and is concave in $B$ . The Blahut-Arimoto algorithm computes capacity by alternating optimization — a precursor to the EM algorithm.

Feedback, Source-Channel Coding, and Capacity Computation

Three Extensions: Feedback, Source-Channel Coding, and Capacity Computation

Definition: Channel Coding with Feedback

Theorem: Feedback Does Not Increase DMC Capacity

Converse (same as without feedback)

Achievability

When Feedback Does Help

Historical Note: The Schalkwijk-Kailath Scheme

Definition: The Capacity-Cost Function

Example: BSC with Hamming Weight Constraint

Compute output distribution

Capacity-cost function

Definition: The Blahut-Arimoto Algorithm

Blahut-Arimoto algorithm

Blahut-Arimoto Algorithm for Channel Capacity

Blahut-Arimoto Algorithm Convergence

Parameters

Capacity-Cost Function

Parameters

Quick Check

ARQ: Feedback in Practice

Computing Capacity in Practice

Key Takeaway

Definition:
Channel Coding with Feedback

Definition:
The Capacity-Cost Function

Definition:
The Blahut-Arimoto Algorithm