Ferkans — Interactive Telecom Tutor

Why We Need a Stronger Notion

The weak typical set defined via the AEP is sufficient for many purposes, but it has a limitation: it controls only the total log-probability, not the empirical frequencies of individual symbols. For multiuser coding theorems — where we need to reason about joint typicality of independently generated sequences — we need a notion of typicality that controls the empirical distribution symbol by symbol. This is strong typicality, and it is the version we will use throughout the rest of this book.

Definition:
Strongly Typical Set

Let $P_X$ be a distribution on a finite alphabet $\mathcal{X}$ . For a sequence $\mathbf{x} = (x_1, \ldots, x_n) \in \mathcal{X}^n$ , define the empirical distribution (type):

$\hat{P}_{\mathbf{x}}(a) = \frac{|\{i : x_i = a\}|}{n}, \quad a \in \mathcal{X}.$

The strongly typical set $\mathcal{T}_\epsilon^{(n)}(X)$ is:

$\mathcal{T}_\epsilon^{(n)}(X) = \left\{\mathbf{x} \in \mathcal{X}^n : |\hat{P}_{\mathbf{x}}(a) - P_X(a)| \leq \epsilon \cdot P_X(a) \text{ for all } a \in \mathcal{X}\right\}.$

That is, the empirical frequency of every symbol is within a relative $\epsilon$ -factor of its true probability.

This is the Orlitsky-Roche definition of strong typicality. Some authors use an absolute rather than relative tolerance: $|\hat{P}(a) - P(a)| \leq \epsilon/|\mathcal{X}|$ . The relative version is more natural for multiuser settings. The notation $\mathcal{T}_\epsilon^{(n)}(X)$ suppresses $n$ and $\epsilon$ when clear from context.

Strong typicality

A sequence is strongly typical if its empirical distribution is close to the true distribution in every symbol. The strongly typical set $\mathcal{T}_\epsilon^{(n)}(X)$ requires $|\hat{P}_{\mathbf{x}}(a) - P(a)| \leq \epsilon P(a)$ for all $a$ . Stronger than weak typicality; needed for joint typicality arguments.

Theorem: Properties of the Strongly Typical Set

For $\epsilon > 0$ and sufficiently large $n$ :

High probability: $\Pr(\mathbf{X} \in \mathcal{T}_\epsilon^{(n)}(X)) > 1 - \epsilon$ .
Equiprobability: Every $\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X)$ satisfies

$P_{X^n}(\mathbf{x}) \doteq 2^{-nH(X)},$

meaning $2^{-n(H(X)+\delta(\epsilon))} \leq P_{X^n}(\mathbf{x}) \leq 2^{-n(H(X)-\delta(\epsilon))}$ where $\delta(\epsilon) \to 0$ as $\epsilon \to 0$ .
Size: $|\mathcal{T}_\epsilon^{(n)}(X)| \doteq 2^{nH(X)}$ .
Marginal consistency: If $(\mathbf{x}, \mathbf{y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y)$ , then $\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X)$ and $\mathbf{y} \in \mathcal{T}_\epsilon^{(n)}(Y)$ .

Strong typicality inherits all the properties of weak typicality, but adds marginal consistency (Property 4). This is crucial for multiuser coding: if a pair of sequences is jointly typical, each must be individually typical. This property fails for weak typicality.

Proof

High probability (Property 1)

For each $a \in \mathcal{X}$ , $\hat{P}_{\mathbf{X}}(a) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\{X_i = a\}$ . By the WLLN, $\hat{P}_{\mathbf{X}}(a) \to P_X(a)$ in probability. Taking a union bound over the finitely many $a \in \mathcal{X}$ gives the result.

Equiprobability (Property 2)

For $\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X)$ :

$\log P(\mathbf{x}) = \sum_a n\hat{P}(a) \log P(a) = n\sum_a \hat{P}(a)\log P(a)$ .

Since $|\hat{P}(a) - P(a)| \leq \epsilon P(a)$ :

$\frac{1}{n}\log P(\mathbf{x}) \in [-H(X) - \delta(\epsilon), -H(X) + \delta(\epsilon)]$ .

Size (Property 3)

Follows from Properties 1 and 2, exactly as in the weak typical set case.

Marginal consistency (Property 4)

If $|\hat{P}_{\mathbf{x},\mathbf{y}}(a,b) - P_{XY}(a,b)| \leq \epsilon P_{XY}(a,b)$ for all $(a,b)$ , then summing over $b$ :

$|\hat{P}_{\mathbf{x}}(a) - P_X(a)| = |\sum_b \hat{P}(a,b) - \sum_b P(a,b)| \leq \epsilon \sum_b P(a,b) = \epsilon P_X(a)$ .

Weak vs Strong Typicality

Property	Weak typicality	Strong typicality
Definition	$\|\frac{1}{n}\log P(\mathbf{x}) + H\| \leq \epsilon$	$\|\hat{P}(a) - P(a)\| \leq \epsilon P(a)$ for all $a$
Controls	Total log-probability	Per-symbol empirical frequencies
Implies weak typicality?	Yes (by definition)	Yes (Property 2)
Marginal consistency	Not guaranteed	Guaranteed (Property 4)
Extends to continuous?	Yes (via quantization)	Only for finite alphabets
Typical use	Point-to-point source/channel coding	Multiuser coding theorems

Quick Check

If $\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X)$ with $P_X = (0.7, 0.3)$ on $\{0,1\}$ and $\epsilon = 0.1$ , approximately how many $1$ 's does the sequence $\mathbf{x}$ of length $n = 1000$ contain?

Between $270$ and $330$

Exactly $300$

Between $200$ and $400$

Between $290$ and $310$

Correction:

Between

270

and

330

Strong typicality requires $|\\hat{P}(1) - 0.3| \\leq 0.1 \\times 0.3 = 0.03$ , so $\\hat{P}(1) \\in [0.27, 0.33]$ . For $n = 1000$ : between $270$ and $330$ ones.

Strong Typicality