Strong Typicality

Why We Need a Stronger Notion

The weak typical set defined via the AEP is sufficient for many purposes, but it has a limitation: it controls only the total log-probability, not the empirical frequencies of individual symbols. For multiuser coding theorems — where we need to reason about joint typicality of independently generated sequences — we need a notion of typicality that controls the empirical distribution symbol by symbol. This is strong typicality, and it is the version we will use throughout the rest of this book.

Definition:

Strongly Typical Set

Let PXP_X be a distribution on a finite alphabet X\mathcal{X}. For a sequence x=(x1,,xn)Xn\mathbf{x} = (x_1, \ldots, x_n) \in \mathcal{X}^n, define the empirical distribution (type):

P^x(a)={i:xi=a}n,aX.\hat{P}_{\mathbf{x}}(a) = \frac{|\{i : x_i = a\}|}{n}, \quad a \in \mathcal{X}.

The strongly typical set Tϵ(n)(X)\mathcal{T}_\epsilon^{(n)}(X) is:

Tϵ(n)(X)={xXn:P^x(a)PX(a)ϵPX(a) for all aX}.\mathcal{T}_\epsilon^{(n)}(X) = \left\{\mathbf{x} \in \mathcal{X}^n : |\hat{P}_{\mathbf{x}}(a) - P_X(a)| \leq \epsilon \cdot P_X(a) \text{ for all } a \in \mathcal{X}\right\}.

That is, the empirical frequency of every symbol is within a relative ϵ\epsilon-factor of its true probability.

This is the Orlitsky-Roche definition of strong typicality. Some authors use an absolute rather than relative tolerance: P^(a)P(a)ϵ/X|\hat{P}(a) - P(a)| \leq \epsilon/|\mathcal{X}|. The relative version is more natural for multiuser settings. The notation Tϵ(n)(X)\mathcal{T}_\epsilon^{(n)}(X) suppresses nn and ϵ\epsilon when clear from context.

Strong typicality

A sequence is strongly typical if its empirical distribution is close to the true distribution in every symbol. The strongly typical set Tϵ(n)(X)\mathcal{T}_\epsilon^{(n)}(X) requires P^x(a)P(a)ϵP(a)|\hat{P}_{\mathbf{x}}(a) - P(a)| \leq \epsilon P(a) for all aa. Stronger than weak typicality; needed for joint typicality arguments.

Related: Typical set, Asymptotic equipartition property (AEP)

Theorem: Properties of the Strongly Typical Set

For ϵ>0\epsilon > 0 and sufficiently large nn:

  1. High probability: Pr(XTϵ(n)(X))>1ϵ\Pr(\mathbf{X} \in \mathcal{T}_\epsilon^{(n)}(X)) > 1 - \epsilon.

  2. Equiprobability: Every xTϵ(n)(X)\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X) satisfies

    PXn(x)2nH(X),P_{X^n}(\mathbf{x}) \doteq 2^{-nH(X)},

    meaning 2n(H(X)+δ(ϵ))PXn(x)2n(H(X)δ(ϵ))2^{-n(H(X)+\delta(\epsilon))} \leq P_{X^n}(\mathbf{x}) \leq 2^{-n(H(X)-\delta(\epsilon))} where δ(ϵ)0\delta(\epsilon) \to 0 as ϵ0\epsilon \to 0.

  3. Size: Tϵ(n)(X)2nH(X)|\mathcal{T}_\epsilon^{(n)}(X)| \doteq 2^{nH(X)}.

  4. Marginal consistency: If (x,y)Tϵ(n)(X,Y)(\mathbf{x}, \mathbf{y}) \in \mathcal{T}_\epsilon^{(n)}(X,Y), then xTϵ(n)(X)\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X) and yTϵ(n)(Y)\mathbf{y} \in \mathcal{T}_\epsilon^{(n)}(Y).

Strong typicality inherits all the properties of weak typicality, but adds marginal consistency (Property 4). This is crucial for multiuser coding: if a pair of sequences is jointly typical, each must be individually typical. This property fails for weak typicality.

Weak vs Strong Typicality

PropertyWeak typicalityStrong typicality
Definition1nlogP(x)+Hϵ|\frac{1}{n}\log P(\mathbf{x}) + H| \leq \epsilonP^(a)P(a)ϵP(a)|\hat{P}(a) - P(a)| \leq \epsilon P(a) for all aa
ControlsTotal log-probabilityPer-symbol empirical frequencies
Implies weak typicality?Yes (by definition)Yes (Property 2)
Marginal consistencyNot guaranteedGuaranteed (Property 4)
Extends to continuous?Yes (via quantization)Only for finite alphabets
Typical usePoint-to-point source/channel codingMultiuser coding theorems

Quick Check

If xTϵ(n)(X)\mathbf{x} \in \mathcal{T}_\epsilon^{(n)}(X) with PX=(0.7,0.3)P_X = (0.7, 0.3) on {0,1}\{0,1\} and ϵ=0.1\epsilon = 0.1, approximately how many 11's does the sequence x\mathbf{x} of length n=1000n = 1000 contain?

Between 270270 and 330330

Exactly 300300

Between 200200 and 400400

Between 290290 and 310310