Ferkans — Interactive Telecom Tutor

The Polarization Phenomenon

Polar codes, introduced by Erdal Ar\i kan in 2009, are the first (and so far only) family of codes with an explicit construction that provably achieves the capacity of any binary-input symmetric memoryless channel, with encoding and decoding complexity $O(N \log N)$ .

The idea is beautifully simple: apply a specific linear transformation to $N$ copies of a channel $W$ . The resulting $N$ "virtual" channels polarize — as $N$ grows, each virtual channel becomes either nearly perfect (capacity $\approx 1$ ) or nearly useless (capacity $\approx 0$ ). The fraction of good channels approaches the capacity $I(W)$ of the original channel. Send information on the good channels, send known (frozen) bits on the bad ones.

Definition:
The Polar Transform

Start with $N = 2^m$ independent copies of a binary-input DMC $W: \{0,1\} \to \mathcal{Y}$ with capacity $I(W)$ .

The polar transform applies the $N \times N$ matrix $\mathbf{G}_N = \mathbf{B}_N \mathbf{F}^{\otimes m}$ , where $\mathbf{F} = \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}$ is the basic $2 \times 2$ kernel, $\mathbf{F}^{\otimes m}$ is the $m$ -fold Kronecker product, and $\mathbf{B}_N$ is a bit-reversal permutation.

The input bits $U_1, \ldots, U_N$ are transformed to channel inputs $X^N = U^N \mathbf{G}_N$ (mod 2 for binary, or over $\mathbb{R}$ for the Gaussian channel).

Definition:
Bit-Channels (Virtual Channels)

After the polar transform, define the $i$ -th bit-channel as

$W_N^{(i)}: U_i \to (Y^N, U^{i-1}),$

i.e., the channel from $U_i$ to the outputs $Y^N$ and all previous inputs $U_1, \ldots, U_{i-1}$ (which are assumed known at the decoder for successive cancellation).

The symmetric capacity of the $i$ -th bit-channel is $I(W_N^{(i)}) = I(U_i; Y^N, U^{i-1})$ .

Theorem: Channel Polarization Theorem

For any binary-input symmetric DMC $W$ with capacity $I(W)$ , the bit-channels $\{W_N^{(i)}\}_{i=1}^N$ polarize as $N \to \infty$ : for any $\delta > 0$ ,

$\frac{1}{N}\left|\left\{i : I(W_N^{(i)}) \in (1-\delta, 1]\right\}\right| \to I(W),$ $\frac{1}{N}\left|\left\{i : I(W_N^{(i)}) \in [0, \delta)\right\}\right| \to 1 - I(W).$

That is, the fraction of "good" bit-channels approaches the capacity $I(W)$ , and the fraction of "bad" bit-channels approaches $1 - I(W)$ .

The polar transform is a recursive process. At each level, two copies of a channel $W$ are combined into a "better" channel $W^+$ and a "worse" channel $W^-$ . The total capacity is preserved: $I(W^+) + I(W^-) = 2I(W)$ . But the capacity splits unevenly, and repeated application drives all channels to the extremes. This is the polarization phenomenon — and it is the information-theoretic equivalent of distillation.

Proof

The basic polarization step

Given two independent copies of $W$ , define:

$W^-: U_1 \to (Y_1, Y_2)$ (the "worse" channel)
$W^+: U_2 \to (Y_1, Y_2, U_1)$ (the "better" channel)

By the chain rule: $I(W^-) + I(W^+) = I(U_1, U_2; Y_1, Y_2) = 2I(W)$ .

Moreover, $I(W^-) \leq I(W) \leq I(W^+)$ (data processing inequality for $W^-$ , side information for $W^+$ ).

Recursive application

Apply the same splitting recursively: $W_{2N}^{(2i-1)} = (W_N^{(i)})^-$ and $W_{2N}^{(2i)} = (W_N^{(i)})^+$ . After $m$ levels, we have $N = 2^m$ bit-channels. The sequence of capacities $\{I(W_N^{(i)})\}$ forms a bounded martingale.

Convergence

By the martingale convergence theorem, $I(W_N^{(i)})$ converges almost surely. Since the process preserves total capacity and drives individual channels toward 0 or 1, the limiting distribution must be a Bernoulli mixture: fraction $I(W)$ at 1 and fraction $1 - I(W)$ at 0.

Polar code

A capacity-achieving code based on channel polarization. Information bits are sent on the "good" virtual channels (high mutual information), while "frozen" known bits are sent on the bad channels. Encoding is $O(N \log N)$ via the Kronecker structure; decoding is $O(N \log N)$ via successive cancellation.

Related: Turbo code, LDPC code

Frozen bits

In a polar code, the bits assigned to unreliable (low-capacity) bit-channels. These are set to predetermined values (usually 0) known to both encoder and decoder.

Related: Polar code

Successive Cancellation (SC) Decoding

Complexity:

O(N \log N)

using the recursive structure of the polar transform. The LLR computation at step 1a decomposes into a butterfly network of

\log_2 N

stages.

Input: Received

\mathbf{y}

, frozen set

\mathcal{F}

, frozen values

Output: Decoded bits

\hat{U}_1, \ldots, \hat{U}_N

1. for

i = 1

to

N

:

a. Compute

L_i = \log \frac{P(Y^N, \hat{U}^{i-1} | U_i = 0)}{P(Y^N, \hat{U}^{i-1} | U_i = 1)}

(LLR)

b. if

i \in \mathcal{F}

: set

\hat{U}_i = 0

(frozen bit)

c. else: set

\hat{U}_i = \begin{cases} 0 & \text{if } L_i \geq 0 \\ 1 & \text{otherwise} \end{cases}

2. return

\hat{U}_1, \ldots, \hat{U}_N

SC decoding is sequential (bit-by-bit), which limits throughput. CRC-aided SC list (CA-SCL) decoding maintains $L$ candidate paths in parallel and selects the one passing a CRC check, achieving near-ML performance at the cost of $O(LN\log N)$ complexity.

Channel Polarization Tree

Watch the recursive polarization of a BEC(

\varepsilon = 0.5

): at each level, each channel splits into a worse (

\varepsilon^- = 2\varepsilon - \varepsilon^2

) and a better (

\varepsilon^+ = \varepsilon^2

) channel. After 4 levels (

N=16

), the channels have clearly polarized toward 0 (good) and 1 (bad).

Channel Polarization Visualization

Observe how the bit-channel capacities $I(W_N^{(i)})$ evolve as $N$ grows. For a BEC with erasure probability $\epsilon$ , the capacities split recursively toward 0 and 1. Adjust $N$ and the channel quality to see the polarization effect.

Parameters

\log_2 N

10

Erasure probability

\epsilon

(BEC)0.5

Example: Polar Code for the BEC

Design a polar code of length $N = 8$ for the BEC with erasure probability $\epsilon = 0.5$ . Which bits are frozen?

Solution

Compute bit-channel erasure probabilities

For the BEC, the polarization recursion is exact: $\epsilon^- = 2\epsilon - \epsilon^2$ (worse channel), $\epsilon^+ = \epsilon^2$ (better channel).

Starting from $\epsilon = 0.5$ : Level 1: $\epsilon^- = 0.75$ , $\epsilon^+ = 0.25$ . Level 2: $(0.75)^- = 0.9375$ , $(0.75)^+ = 0.5625$ , $(0.25)^- = 0.4375$ , $(0.25)^+ = 0.0625$ . Level 3 (8 channels): $0.996, 0.879, 0.684, 0.317, 0.816, 0.254, 0.121, 0.004$ .

Select frozen and information bits

The channel capacity is $I(W) = 1 - 0.5 = 0.5$ , so we should have $N \times 0.5 = 4$ information bits.

Rank by reliability (lowest erasure probability): Bit 8: $0.004$ , Bit 6: $0.121$ , Bit 4: $0.254$ , Bit 7: $0.317$ .

Frozen set: $\mathcal{F} = \{1, 2, 3, 5\}$ (worst 4 channels). Information set: $\{4, 6, 7, 8\}$ (best 4 channels).

Why This Matters: Polar Codes in 5G NR

Polar codes were adopted for the 5G NR control channels (PDCCH, PUCCH, PBCH) in 2016, making them the first provably capacity-achieving code family deployed in a commercial standard. The 5G implementation uses CRC-aided SC list (CA-SCL) decoding with list size $L = 8$ , which provides near-ML performance at short to moderate block lengths (up to 1024 bits).

The choice of polar codes for control channels reflects their excellent performance at short block lengths and low rates — exactly the regime where control information operates. For long blocks (data), LDPC codes remain preferred due to their higher throughput and easier parallelization.

See Book telecom, Ch. 24 for the full treatment of 5G NR channel coding.

Historical Note: Arıkan's Breakthrough

Erdal Ar\i kan published the polar coding paper in 2009, and it was immediately recognized as a breakthrough — the first explicit, provably capacity-achieving construction with polynomial complexity. The elegance of the proof (martingale convergence!) and the simplicity of the construction (just a Kronecker product of a $2 \times 2$ matrix) were remarkable.

Ar\i kan received the IEEE Richard W. Hamming Medal in 2010, the Shannon Award in 2018, and the Mustafa Prize in 2019 for this work. The rapid adoption of polar codes in 5G NR (less than 10 years from publication to commercial deployment) is exceptional in the history of coding theory.

Quick Check

In a polar code of length $N = 16$ for a channel with capacity $I(W) = 0.75$ , approximately how many information bits are transmitted?

4

8

12

16

Correction:

12

The number of information bits is approximately $N \times I(W) = 16 \times 0.75 = 12$ . The remaining 4 bit-channels are frozen (set to known values). As $N$ grows, the fraction of information bits approaches exactly $I(W)$ .

Polar Codes — Channel Polarization