Feedback, Source-Channel Coding, and Capacity Computation

Three Extensions: Feedback, Source-Channel Coding, and Capacity Computation

This section covers three important extensions of the channel coding theorem. First, we prove the perhaps surprising result that feedback does not increase the capacity of a DMC β€” even though the encoder can adapt its transmission based on what the decoder has already received. Second, we state the source-channel coding theorem, connecting source coding and channel coding via the separation principle (already proved in Chapter 7, Section 4, for the general case). Third, we introduce the capacity-cost function and the Blahut-Arimoto algorithm for computing capacity numerically.

Definition:

Channel Coding with Feedback

An (R,n)(R, n)-code with feedback for a DMC consists of:

  • A message M∼Uniform([1:2nR])M \sim \text{Uniform}([1 : 2^{nR}])
  • A sequence of encoding functions: fi:MΓ—Yiβˆ’1β†’X,i=1,…,nf_i : \mathcal{M} \times \mathcal{Y}^{i-1} \to \mathcal{X}, \quad i = 1, \ldots, n so that Xi=fi(M,Y1,…,Yiβˆ’1)X_i = f_i(M, Y_1, \ldots, Y_{i-1})
  • A decoding function g:Ynβ†’Mg : \mathcal{Y}^n \to \mathcal{M}

The key difference from the no-feedback case: the encoder can use past channel outputs Yiβˆ’1Y^{i-1} to choose the current input XiX_i. The channel output is fed back to the encoder with one time step of delay.

Theorem: Feedback Does Not Increase DMC Capacity

The feedback capacity of a discrete memoryless channel equals the non-feedback capacity:

Cfb=C=max⁑PXI(X;Y)C_{\text{fb}} = C = \max_{P_X} I(X; Y)

The converse proof from Section 4 already covers the feedback case! The key step I(M,Yiβˆ’1;Yi∣Xi)=0I(M, Y^{i-1}; Y_i | X_i) = 0 holds because the channel is memoryless: YiY_i depends on (M,Yiβˆ’1)(M, Y^{i-1}) only through XiX_i, regardless of whether XiX_i was chosen using feedback or not.

The achievability is trivial: any code without feedback is also valid with feedback (just ignore the feedback). So the achievable rates are at least as large as without feedback.

The point is that feedback cannot help because the channel is memoryless. The output YiY_i carries at most CC bits of new information per use, and no encoding trick can change this. What feedback can do is simplify coding and improve error exponents β€” but not capacity.

,

When Feedback Does Help

Although feedback does not increase DMC capacity, it provides significant practical benefits:

  1. Simplified coding: The Schalkwijk-Kailath scheme for the Gaussian channel with feedback achieves capacity using a simple linear strategy β€” no codebook search, no joint typicality decoding.

  2. Improved error exponents: With feedback, the error probability can decay doubly exponentially in the block length (Pe∼2βˆ’2nEP_e \sim 2^{-2^{nE}}), compared to single exponential (Pe∼2βˆ’nEP_e \sim 2^{-nE}) without feedback. Burnashev's exponent gives the optimal error exponent with feedback.

  3. ARQ protocols: On the BEC, feedback enables simple retransmission of erased packets, achieving capacity with minimal coding complexity.

  4. Channels with memory: For channels with memory (ISI channels, fading channels), feedback can increase capacity. The DMC result does not generalize.

  5. Multiuser channels: For MACs, broadcast channels, and interference channels, feedback can strictly increase the capacity region.

Historical Note: The Schalkwijk-Kailath Scheme

1966

In 1966, Schalkwijk and Kailath devised a remarkably simple feedback scheme for the Gaussian channel. The encoder sends a refinement of its estimate of the decoder's error at each step, using the feedback to track the decoder's state. After nn steps, the error probability decays as 2βˆ’2nΞ±2^{-2^{n\alpha}} β€” doubly exponential in the block length. This is astronomically faster than the single-exponential decay without feedback.

The scheme works because feedback allows the encoder to "zoom in" on the message: after the first transmission, the encoder knows what the decoder received and can send a correction. After the second transmission, it corrects the correction, and so on β€” each step squares the effective SNR.

Definition:

The Capacity-Cost Function

Many channels have input cost constraints. A cost function b:Xβ†’R+b : \mathcal{X} \to \mathbb{R}_+ assigns a cost to each input symbol, with the per-codeword cost b(xn)=1nβˆ‘i=1nb(xi)b(x^n) = \frac{1}{n}\sum_{i=1}^n b(x_i).

The capacity-cost function C(B)C(B) is the supremum of achievable rates for codes satisfying b(xn(m))≀Bb(x^n(m)) \leq B for all messages mm:

C(B)=max⁑PX:E[b(X)]≀BI(X;Y)C(B) = \max_{P_X : \mathbb{E}[b(X)] \leq B} I(X; Y)

Properties:

  • C(B)C(B) is non-decreasing in BB (more budget = more capacity)
  • C(B)C(B) is concave in BB (diminishing returns)
  • C(Bmax⁑)=CC(B_{\max}) = C where Bmax⁑B_{\max} makes the constraint inactive

The capacity-cost function is the channel coding analogue of the rate-distortion function: both are optimizations over input/test-channel distributions subject to an expected-value constraint. The water-filling solution for parallel Gaussian channels (Chapter 10) is a special case where the cost is power.

Example: BSC with Hamming Weight Constraint

The BSC(pp) with Hamming weight constraint: b(x)=1{x=1}b(x) = \mathbf{1}\{x = 1\}, budget B=Ξ΄B = \delta (at most fraction Ξ΄\delta of input bits can be 1). Find C(Ξ΄)C(\delta).

Definition:

The Blahut-Arimoto Algorithm

The Blahut-Arimoto algorithm computes C(B)=max⁑PX∈BI(X;Y)C(B) = \max_{P_X \in \mathcal{B}} I(X; Y) by alternating maximization. Define:

J(p,P,Q)=βˆ‘rβˆ‘sprPr,slog⁑Qs,rprJ(\mathbf{p}, \mathbf{P}, \mathbf{Q}) = \sum_r \sum_s p_r P_{r,s} \log \frac{Q_{s,r}}{p_r}

where Q\mathbf{Q} is an auxiliary backward channel matrix.

Iteration (starting from initial p(0)\mathbf{p}^{(0)}, with fixed Lagrange multiplier Ξ»\lambda):

  1. E-step: Update backward channel: Qs,r(β„“)=pr(β„“βˆ’1)Pr,sβˆ‘rβ€²prβ€²(β„“βˆ’1)Prβ€²,sQ_{s,r}^{(\ell)} = \frac{p_r^{(\ell-1)} P_{r,s}}{\sum_{r'} p_{r'}^{(\ell-1)} P_{r',s}}

  2. M-step: Update input distribution: pr(β„“)=exp⁑(βˆ‘sPr,slog⁑Qs,r(β„“)βˆ’Ξ»br)βˆ‘rβ€²exp⁑(βˆ‘sPrβ€²,slog⁑Qs,rβ€²(β„“)βˆ’Ξ»brβ€²)p_r^{(\ell)} = \frac{\exp\left(\sum_s P_{r,s} \log Q_{s,r}^{(\ell)} - \lambda b_r\right)}{\sum_{r'} \exp\left(\sum_s P_{r',s} \log Q_{s,r'}^{(\ell)} - \lambda b_{r'}\right)}

Convergence: J(p(β„“),P,Q(β„“))β†’C(B)J(\mathbf{p}^{(\ell)}, \mathbf{P}, \mathbf{Q}^{(\ell)}) \to C(B) as β„“β†’βˆž\ell \to \infty, where B=βˆ‘rbrpr(∞)B = \sum_r b_r p_r^{(\infty)}.

The Blahut-Arimoto algorithm converges to the global maximum because: (1) JJ is concave in p\mathbf{p} for fixed Q\mathbf{Q}, (2) JJ is concave in Q\mathbf{Q} for fixed p\mathbf{p}, and (3) the constraint sets are convex and compact. Alternating maximization under these conditions provably converges.

The same algorithmic structure computes the rate-distortion function R(D)R(D) β€” simply swap the roles of encoder and decoder. The Blahut-Arimoto algorithm is a special case of the EM algorithm, predating it by several years.

, ,

Blahut-Arimoto algorithm

An iterative algorithm for computing channel capacity (or the rate-distortion function) by alternating between optimizing the input distribution and the backward channel. Converges to the global optimum due to the concavity of mutual information.

Related: Channel capacity, Rate-Distortion Function, EM algorithm

Blahut-Arimoto Algorithm for Channel Capacity

Complexity: O(|X| * |Y|) per iteration, typically converges in O(1/eps) iterations
Input: Transition matrix P[r,s], cost vector b[r], Lagrange multiplier lambda, tolerance eps
Output: Capacity-achieving distribution p[r], capacity C
1. Initialize p[r] = 1/|X| for all r (uniform)
2. Repeat:
a. E-step: For all s, r:
Q[s,r] = p[r] * P[r,s] / sum_r'(p[r'] * P[r',s])
b. M-step: For all r:
p_new[r] = exp(sum_s(P[r,s] * log(Q[s,r])) - lambda * b[r])
p_new[r] = p_new[r] / sum_r'(p_new[r']) (normalize)
c. Compute C = sum_r sum_s p_new[r] * P[r,s] * log(Q[s,r] / p_new[r])
d. If |p_new - p| < eps: break
e. p = p_new
3. Return p, C

For the unconstrained case (no cost), set lambda = 0 and the M-step simplifies. For the constrained case, lambda is a parameter that traces out the capacity-cost curve C(B) parametrically: each lambda gives a (B, C(B)) pair.

Blahut-Arimoto Algorithm Convergence

Watch the Blahut-Arimoto algorithm converge to the capacity-achieving input distribution for different DMC channels. The plot shows the input distribution and the mutual information at each iteration.

Parameters
0

0=BSC, 1=BEC, 2=Z-channel, 3=Custom 4x4

0.1

Crossover/erasure probability

20

Maximum number of Blahut-Arimoto iterations

Capacity-Cost Function

Explore the capacity-cost function C(B) for the BSC with Hamming weight constraint. Observe the concavity and the saturation at the unconstrained capacity.

Parameters
0.1

Channel noise parameter

Quick Check

Why does feedback not increase the capacity of a DMC?

Because the encoder already knows the message and does not need the channel output

Because the memoryless property means YiY_i carries at most CC bits of new information regardless of how XiX_i was chosen

Because feedback introduces a cycle that reduces the effective rate

πŸ”§Engineering Note

ARQ: Feedback in Practice

Automatic Repeat reQuest (ARQ) is the most common use of feedback in practical communication systems. On the BEC, the encoder simply retransmits erased packets. On general channels, Hybrid ARQ (HARQ) combines error correction with retransmission: the decoder attempts to decode and requests retransmission only if decoding fails.

While ARQ does not increase the capacity of a DMC, it achieves capacity with much simpler codes. Without ARQ, achieving near-capacity performance requires long block lengths and powerful codes (LDPC, polar). With ARQ, shorter codes and simpler decoders suffice because errors can be corrected by retransmission.

In 5G NR, HARQ with incremental redundancy is used on the downlink and uplink, with the feedback channel carrying ACK/NACK signals.

πŸ”§Engineering Note

Computing Capacity in Practice

The Blahut-Arimoto algorithm is the standard tool for computing DMC capacity numerically. It converges geometrically (linear convergence rate) and typically requires 20-50 iterations for 6 digits of accuracy.

For continuous-alphabet channels (Gaussian, fading), the optimization is infinite-dimensional and Blahut-Arimoto does not directly apply. Instead:

  • For AWGN: the capacity has a closed-form expression (Chapter 10).
  • For fading: capacity involves an expectation over the fading distribution.
  • For MIMO: the water-filling solution provides an efficient computation.

Modern extensions include accelerated Blahut-Arimoto (using Nesterov momentum) and neural-network-based capacity estimation for channels without closed-form expressions.

Key Takeaway

Feedback does not increase DMC capacity (Cfb=CC_{\text{fb}} = C) because the memoryless property limits each channel use to at most CC bits, regardless of how the input was chosen. The capacity-cost function C(B)=max⁑PX:E[b(X)]≀BI(X;Y)C(B) = \max_{P_X : \mathbb{E}[b(X)] \leq B} I(X;Y) handles input constraints and is concave in BB. The Blahut-Arimoto algorithm computes capacity by alternating optimization β€” a precursor to the EM algorithm.