Ferkans — Interactive Telecom Tutor

Beyond One Bit Per Symbol

Huffman coding has a fundamental limitation: it assigns an integer number of bits to each symbol. When the entropy is, say, 0.1 bits per symbol, Huffman must still use at least 1 bit per symbol — a 10x overhead. Arithmetic coding sidesteps this by encoding an entire sequence as a single number in $[0, 1)$ , using a number of bits that can be a fraction per symbol. The idea is beautifully simple: represent the probability interval of the sequence, and the code length equals $-\log P(\mathbf{x})$ plus at most 2 bits of overhead for the entire sequence.

Definition:
Arithmetic Code

An arithmetic code for a source $X$ with distribution $P$ on $\mathcal{X} = \{x_1, \ldots, x_m\}$ represents a sequence $\mathbf{x} = (x_1, \ldots, x_n)$ by a binary fraction in the sub-interval of $[0, 1)$ corresponding to $\mathbf{x}$ . The encoding proceeds iteratively:

Start with the interval $[0, 1)$ .
For each symbol $x_i$ , subdivide the current interval into $m$ sub-intervals proportional to $P(x_1), P(x_2), \ldots, P(x_m)$ .
Select the sub-interval corresponding to $x_i$ and continue.
The final interval $[L, L + P(\mathbf{x}))$ has width $P(\mathbf{x}) = \prod_{i=1}^n P(x_i)$ .
Output any binary fraction in this interval, using $\lceil -\log P(\mathbf{x}) \rceil + 1$ bits.

The genius of arithmetic coding is that it amortizes the rounding loss over the entire sequence. While Huffman loses up to 1 bit per symbol, arithmetic coding loses at most 2 bits per sequence — regardless of sequence length.

Example: Arithmetic Coding of a Binary Sequence

Encode the sequence $\mathbf{x} = (a, b, a)$ using arithmetic coding with $P(a) = 0.7$ , $P(b) = 0.3$ .

Solution

Initialize

Start with interval $[0, 1)$ .

Encode $x_1 = a$

Subdivide $[0, 1)$ : subinterval for $a$ is $[0, 0.7)$ , for $b$ is $[0.7, 1.0)$ . Select $[0, 0.7)$ .

Encode $x_2 = b$

Subdivide $[0, 0.7)$ : subinterval for $a$ is $[0, 0.49)$ , for $b$ is $[0.49, 0.7)$ . Select $[0.49, 0.7)$ .

Encode $x_3 = a$

Subdivide $[0.49, 0.7)$ : width $= 0.21$ . Subinterval for $a$ is $[0.49, 0.49 + 0.147) = [0.49, 0.637)$ , for $b$ is $[0.637, 0.7)$ . Select $[0.49, 0.637)$ .

Output

Final interval: $[0.49, 0.637)$ , width $= P(aba) = 0.7^2 \times 0.3 = 0.147$ . We need $\lceil -\log_2 0.147 \rceil + 1 = \lceil 2.77 \rceil + 1 = 4$ bits. Binary fraction: $0.1000_2 = 0.5$ is in $[0.49, 0.637)$ . So the codeword is $1000$ .

Total: 4 bits for 3 symbols $= 1.33$ bits/symbol. For comparison: $H(X) = 0.7 \log(1/0.7) + 0.3\log(1/0.3) \approx 0.881$ bits/symbol. The overhead is $(4 - 3 \times 0.881)/3 = 0.45$ bits/symbol — large for $n = 3$ , but the overhead per symbol shrinks as $2/n \to 0$ for longer sequences.

Theorem: Rate of Arithmetic Coding

For a DMS with distribution $P$ , the arithmetic code for a length- $n$ sequence uses at most $\lceil -\log P(\mathbf{x}) \rceil + 1$ bits. The expected rate satisfies $H(X) \leq \frac{1}{n}\mathbb{E}[\text{code length}] < H(X) + \frac{2}{n}.$ As $n \to \infty$ , the rate approaches $H(X)$ .

Arithmetic coding achieves essentially the ideal code length $-\log P(\mathbf{x})$ for each sequence, with only a constant (2-bit) overhead independent of $n$ . This is why arithmetic coding has essentially replaced Huffman in modern compression systems — it approaches entropy without block coding, and it adapts naturally to non-stationary and context-dependent sources.

Proof

Code length bound

The output is a binary fraction identifying a point in the interval $[L, L + P(\mathbf{x}))$ of width $P(\mathbf{x})$ . A binary fraction with $k$ bits has precision $2^{-k}$ . We need $2^{-k} \leq P(\mathbf{x})$ , i.e., $k \geq \lceil -\log P(\mathbf{x}) \rceil$ . Adding 1 bit to specify which half of the interval ensures the point lies strictly inside the interval: total $\leq \lceil -\log P(\mathbf{x}) \rceil + 1$ bits.

Expected rate

$\mathbb{E}[\text{code length}] \leq \mathbb{E}[-\log P(\mathbf{X})] + 2 = nH(X) + 2.$ $Dividing by$ n $: rate$ \leq H(X) + 2/n $. The lower bound$ \geq H(X)$ follows from the source coding converse.

,

Adaptive Arithmetic Coding

In practice, arithmetic coding is used with adaptive probability models: the distribution $P(x_i | x_1, \ldots, x_{i-1})$ is updated after each symbol based on the context. This allows the coder to exploit dependencies in the source without knowing the true distribution in advance. The resulting code length is approximately $-\log P(x_1, \ldots, x_n) = \sum_{i=1}^n [-\log P(x_i | x_1, \ldots, x_{i-1})]$ , which approaches the entropy rate $H_\infty$ for any stationary ergodic source if the model is sufficiently expressive. Modern compression algorithms (LZMA, Zstandard, CABAC in H.264/HEVC) all use variants of adaptive arithmetic coding.

Huffman vs. Arithmetic Coding

Property	Huffman	Arithmetic
Optimality	Optimal among symbol-by-symbol prefix codes	Approaches entropy for any blocklength
Overhead	Up to 1 bit/symbol	Up to 2 bits/sequence ( $2/n$ per symbol)
Low-entropy sources	Poor (minimum 1 bit/symbol)	Excellent (fractional bits/symbol)
Adaptivity	Requires rebuilding tree	Naturally adaptive with context models
Complexity	$O(m \log m)$ construction	$O(n)$ encoding with finite precision
Patent status	Unencumbered	Historical patents (now expired)

🔧Engineering Note

Finite-Precision Arithmetic Coding

The theoretical description of arithmetic coding uses arbitrary-precision real numbers, which is impractical. Real implementations use fixed-precision integers (typically 32 or 64 bits) and maintain the interval using integer arithmetic. The key technique is renormalization: when the interval becomes narrow enough that the leading bits are determined, those bits are emitted and the interval is rescaled. This allows streaming encoding and decoding with constant memory, regardless of sequence length. The finite-precision version adds at most 1 extra bit to the ideal code length.

Quick Check

For a source with $H(X) = 0.01$ bits/symbol (very low entropy), encoding $n = 1000$ symbols, approximately how many bits does arithmetic coding use?

1000 bits (at least 1 bit per symbol)

12 bits

100 bits

0.01 bits

Correction:

12 bits

Arithmetic coding uses approximately $n H(X) + 2 = 1000 \times 0.01 + 2 = 12$ bits. This is a 83x improvement over Huffman's minimum of 1000 bits.

Why This Matters: Arithmetic Coding in Wireless Video

Context-Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding engine in H.264/AVC, HEVC, and VVC — the standards used for video transmission over wireless channels. CABAC adapts its probability models to the local context of transform coefficients, achieving 5–15% better compression than Huffman-based alternatives (CAVLC). In 5G NR, HEVC with CABAC is the baseline for video services. The arithmetic coding framework from this section is exactly what CABAC implements, with context-dependent probability updates. See Book telecom, Ch. 11 for information-theoretic foundations of source coding in communications.

Key Takeaway

Arithmetic coding encodes an entire sequence as a single number in $[0,1)$ , using approximately $-\log P(\mathbf{x})$ bits — the information-theoretic ideal. The per-symbol overhead is $2/n$ , vanishing with blocklength. Unlike Huffman, arithmetic coding handles low-entropy sources efficiently and adapts naturally to context-dependent models. It is the foundation of all modern practical compression systems.

Arithmetic Coding

Beyond One Bit Per Symbol

Definition: Arithmetic Code

Example: Arithmetic Coding of a Binary Sequence

Initialize

Encode $x_1 = a$

Encode $x_2 = b$

Encode $x_3 = a$

Output

Theorem: Rate of Arithmetic Coding

Code length bound

Expected rate

Adaptive Arithmetic Coding

Huffman vs. Arithmetic Coding

Finite-Precision Arithmetic Coding

Quick Check

Why This Matters: Arithmetic Coding in Wireless Video

Key Takeaway

Definition:
Arithmetic Code