Ferkans — Interactive Telecom Tutor

Why Channels with State?

In many communication scenarios, the channel is influenced by a random parameter — the state — that is partially or fully known to one or both parties:

Fading: the channel gain $H$ is the state, known at the receiver (from pilot estimation) and sometimes at the transmitter (via feedback).
Interference: in a multiuser system, the signal intended for other users acts as state known at the transmitter (who generated it).
Memory and storage: writing to a storage medium with defects, where the locations of defects are known.

The central question is: how does knowing the state help, and how much does it help? The answer depends critically on whether the state is known causally (up to the current time) or non-causally (the entire state sequence is known in advance), and whether it is known at the encoder, decoder, or both.

Definition:
Channel with State

A discrete memoryless channel with state is defined by:

Input alphabet $\mathcal{X}$ , output alphabet $\mathcal{Y}$ , state alphabet $\mathcal{S}$ ,
State distribution $P_S(s)$ (i.i.d. across time),
Transition law $P_{Y|X,S}(y|x,s)$ .

The channel is memoryless: $P(Y^n|X^n, S^n) = \prod_{i=1}^n P_{Y|X,S}(Y_i|X_i, S_i)$ .

The state sequence $S^n$ is drawn i.i.d. $\sim P_S$ , independent of the message. The encoder may have access to the state sequence in one of three modes:

No state information: encoder sees only the message.
Causal: at time $i$ , the encoder knows $S^i = (S_1, \ldots, S_i)$ .
Non-causal (acausal): the encoder knows the entire $S^n$ before transmission begins.

Channel state information (CSI)

Knowledge of the random state $S$ that affects the channel transition law. Causal CSI means knowing $S_1, \ldots, S_i$ at time $i$ ; non-causal CSI means knowing the entire $S^n$ before transmission.

Theorem: Capacity with Causal State Information at the Encoder

The capacity of a DMC with state, where the state is known causally at the encoder and not at the decoder, is

$C = \max_{P_{X|S}} I(X; Y),$

where the maximization is over all Shannon strategies $X = f(M, S^i)$ , and the mutual information is computed with $(X, S) \sim P_{X|S} P_S$ and $Y$ generated by $P_{Y|X,S}$ .

Equivalently, defining $U$ with $|\mathcal{U}| \leq |\mathcal{X}| \cdot |\mathcal{S}|$ :

$C = \max_{P_U, f: \mathcal{U} \times \mathcal{S} \to \mathcal{X}} I(U; Y),$

where $X = f(U, S)$ and $U \perp S$ .

The encoder uses a Shannon strategy: it chooses its input $X_i$ as a function of the message and the state observed so far. The key insight is that causal state information allows the encoder to adapt its strategy to the state, but it cannot "pre-cancel" future interference because it does not know future states. The auxiliary variable $U$ captures the encoder's "intention" — the action it would take before seeing the state — while $f(U, S)$ is the actual transmission adapted to the current state.

Proof

Achievability

Generate a codebook of $2^{nR}$ codewords $\mathbf{u}(m)$ i.i.d. $\sim P_U$ . To send message $m$ , at time $i$ the encoder sets $X_i = f(U_i(m), S_i)$ . The decoder uses joint typicality with the output $Y^n$ to find $m$ .

By the packing lemma, the error probability vanishes if $R < I(U; Y)$ .

Converse

By Fano's inequality and the standard converse argument, any achievable rate satisfies $R \leq I(U; Y)$ for some $P_U$ and strategy $f$ .

,

Shannon strategy

An encoding scheme for channels with causal state information where the encoder chooses $X_i = f(U_i, S_i)$ — adapting the auxiliary codeword symbol to the current state. Named for Shannon's 1958 analysis of channels with side information.

Example: Additive State Channel with Causal CSI

Consider $Y = X + S + Z$ where $S \sim \mathcal{N}(0, Q)$ , $Z \sim \mathcal{N}(0, N)$ , and $\mathbb{E}[X^2] \leq P$ . The state $S$ is known causally at the encoder. What is the capacity?

Solution

Without state information

If the encoder ignores the state, $C_{\text{no CSI}} = \frac{1}{2}\log(1 + P/(Q + N))$ .

With causal CSI

With causal state knowledge, the encoder can partially cancel the state by choosing $X = U - \alpha S$ where $U$ carries the information. Then $Y = U + (1-\alpha)S + Z$ .

Optimizing over $\alpha$ : the residual "noise" is $(1-\alpha)S + Z$ with variance $(1-\alpha)^2 Q + N$ . The power constraint: $\mathbb{E}[X^2] = P_U + \alpha^2 Q \leq P$ , so $P_U = P - \alpha^2 Q$ .

Capacity: $C = \max_\alpha \frac{1}{2}\log\!\left(1 + \frac{P - \alpha^2 Q}{(1-\alpha)^2 Q + N}\right)$ .

This is larger than $C_{\text{no CSI}}$ but strictly less than $\frac{1}{2}\log(1 + P/N)$ (the non-causal result).

Causal vs. Non-Causal: A Fundamental Gap

The example above reveals a key distinction: with causal state information, the encoder can only partially compensate for the interference because it must allocate power for both the information signal and the state cancellation. With non-causal state information (Section 12.2), something remarkable happens: the interference can be completely canceled at no cost in power. This is Costa's dirty paper coding result — arguably the most surprising theorem in multiuser information theory.

Quick Check

For the channel $Y = X + S + Z$ with $S$ known causally at the encoder, does causal state knowledge help compared to no state knowledge?

Yes, the capacity is strictly larger than $\frac{1}{2}\log(1+P/(Q+N))$

No, causal knowledge provides no benefit

Yes, and it achieves the same capacity as non-causal knowledge: $\frac{1}{2}\log(1+P/N)$

It depends on the relative values of $P$ , $Q$ , and $N$

Correction:

Yes, the capacity is strictly larger than

\frac{1}{2}\log(1+P/(Q+N))

Causal state knowledge always helps (or at least does not hurt). The encoder can partially cancel the state by subtracting a fraction $\alpha S$ , trading off cancellation quality against power. However, the improvement is strictly less than non-causal knowledge, which allows perfect interference cancellation.

Channels with Causal State Information at the Encoder