Channels with Causal State Information at the Encoder

Why Channels with State?

In many communication scenarios, the channel is influenced by a random parameter β€” the state β€” that is partially or fully known to one or both parties:

  • Fading: the channel gain HH is the state, known at the receiver (from pilot estimation) and sometimes at the transmitter (via feedback).
  • Interference: in a multiuser system, the signal intended for other users acts as state known at the transmitter (who generated it).
  • Memory and storage: writing to a storage medium with defects, where the locations of defects are known.

The central question is: how does knowing the state help, and how much does it help? The answer depends critically on whether the state is known causally (up to the current time) or non-causally (the entire state sequence is known in advance), and whether it is known at the encoder, decoder, or both.

Definition:

Channel with State

A discrete memoryless channel with state is defined by:

  • Input alphabet X\mathcal{X}, output alphabet Y\mathcal{Y}, state alphabet S\mathcal{S},
  • State distribution PS(s)P_S(s) (i.i.d. across time),
  • Transition law PY∣X,S(y∣x,s)P_{Y|X,S}(y|x,s).

The channel is memoryless: P(Yn∣Xn,Sn)=∏i=1nPY∣X,S(Yi∣Xi,Si)P(Y^n|X^n, S^n) = \prod_{i=1}^n P_{Y|X,S}(Y_i|X_i, S_i).

The state sequence SnS^n is drawn i.i.d. ∼PS\sim P_S, independent of the message. The encoder may have access to the state sequence in one of three modes:

  • No state information: encoder sees only the message.
  • Causal: at time ii, the encoder knows Si=(S1,…,Si)S^i = (S_1, \ldots, S_i).
  • Non-causal (acausal): the encoder knows the entire SnS^n before transmission begins.

Channel state information (CSI)

Knowledge of the random state SS that affects the channel transition law. Causal CSI means knowing S1,…,SiS_1, \ldots, S_i at time ii; non-causal CSI means knowing the entire SnS^n before transmission.

Related: Shannon strategy, Dirty paper coding (DPC)

Theorem: Capacity with Causal State Information at the Encoder

The capacity of a DMC with state, where the state is known causally at the encoder and not at the decoder, is

C=max⁑PX∣SI(X;Y),C = \max_{P_{X|S}} I(X; Y),

where the maximization is over all Shannon strategies X=f(M,Si)X = f(M, S^i), and the mutual information is computed with (X,S)∼PX∣SPS(X, S) \sim P_{X|S} P_S and YY generated by PY∣X,SP_{Y|X,S}.

Equivalently, defining UU with ∣Uβˆ£β‰€βˆ£Xβˆ£β‹…βˆ£S∣|\mathcal{U}| \leq |\mathcal{X}| \cdot |\mathcal{S}|:

C=max⁑PU,f:UΓ—Sβ†’XI(U;Y),C = \max_{P_U, f: \mathcal{U} \times \mathcal{S} \to \mathcal{X}} I(U; Y),

where X=f(U,S)X = f(U, S) and UβŠ₯SU \perp S.

The encoder uses a Shannon strategy: it chooses its input XiX_i as a function of the message and the state observed so far. The key insight is that causal state information allows the encoder to adapt its strategy to the state, but it cannot "pre-cancel" future interference because it does not know future states. The auxiliary variable UU captures the encoder's "intention" β€” the action it would take before seeing the state β€” while f(U,S)f(U, S) is the actual transmission adapted to the current state.

,

Shannon strategy

An encoding scheme for channels with causal state information where the encoder chooses Xi=f(Ui,Si)X_i = f(U_i, S_i) β€” adapting the auxiliary codeword symbol to the current state. Named for Shannon's 1958 analysis of channels with side information.

Related: Channel state information (CSI)

Example: Additive State Channel with Causal CSI

Consider Y=X+S+ZY = X + S + Z where S∼N(0,Q)S \sim \mathcal{N}(0, Q), Z∼N(0,N)Z \sim \mathcal{N}(0, N), and E[X2]≀P\mathbb{E}[X^2] \leq P. The state SS is known causally at the encoder. What is the capacity?

Causal vs. Non-Causal: A Fundamental Gap

The example above reveals a key distinction: with causal state information, the encoder can only partially compensate for the interference because it must allocate power for both the information signal and the state cancellation. With non-causal state information (Section 12.2), something remarkable happens: the interference can be completely canceled at no cost in power. This is Costa's dirty paper coding result β€” arguably the most surprising theorem in multiuser information theory.

Quick Check

For the channel Y=X+S+ZY = X + S + Z with SS known causally at the encoder, does causal state knowledge help compared to no state knowledge?

Yes, the capacity is strictly larger than 12log⁑(1+P/(Q+N))\frac{1}{2}\log(1+P/(Q+N))

No, causal knowledge provides no benefit

Yes, and it achieves the same capacity as non-causal knowledge: 12log⁑(1+P/N)\frac{1}{2}\log(1+P/N)

It depends on the relative values of PP, QQ, and NN