Arithmetic Coding

Beyond One Bit Per Symbol

Huffman coding has a fundamental limitation: it assigns an integer number of bits to each symbol. When the entropy is, say, 0.1 bits per symbol, Huffman must still use at least 1 bit per symbol β€” a 10x overhead. Arithmetic coding sidesteps this by encoding an entire sequence as a single number in [0,1)[0, 1), using a number of bits that can be a fraction per symbol. The idea is beautifully simple: represent the probability interval of the sequence, and the code length equals βˆ’log⁑P(x)-\log P(\mathbf{x}) plus at most 2 bits of overhead for the entire sequence.

Definition:

Arithmetic Code

An arithmetic code for a source XX with distribution PP on X={x1,…,xm}\mathcal{X} = \{x_1, \ldots, x_m\} represents a sequence x=(x1,…,xn)\mathbf{x} = (x_1, \ldots, x_n) by a binary fraction in the sub-interval of [0,1)[0, 1) corresponding to x\mathbf{x}. The encoding proceeds iteratively:

  1. Start with the interval [0,1)[0, 1).
  2. For each symbol xix_i, subdivide the current interval into mm sub-intervals proportional to P(x1),P(x2),…,P(xm)P(x_1), P(x_2), \ldots, P(x_m).
  3. Select the sub-interval corresponding to xix_i and continue.
  4. The final interval [L,L+P(x))[L, L + P(\mathbf{x})) has width P(x)=∏i=1nP(xi)P(\mathbf{x}) = \prod_{i=1}^n P(x_i).
  5. Output any binary fraction in this interval, using βŒˆβˆ’log⁑P(x)βŒ‰+1\lceil -\log P(\mathbf{x}) \rceil + 1 bits.

The genius of arithmetic coding is that it amortizes the rounding loss over the entire sequence. While Huffman loses up to 1 bit per symbol, arithmetic coding loses at most 2 bits per sequence β€” regardless of sequence length.

Example: Arithmetic Coding of a Binary Sequence

Encode the sequence x=(a,b,a)\mathbf{x} = (a, b, a) using arithmetic coding with P(a)=0.7P(a) = 0.7, P(b)=0.3P(b) = 0.3.

Theorem: Rate of Arithmetic Coding

For a DMS with distribution PP, the arithmetic code for a length-nn sequence uses at most βŒˆβˆ’log⁑P(x)βŒ‰+1\lceil -\log P(\mathbf{x}) \rceil + 1 bits. The expected rate satisfies H(X)≀1nE[codeΒ length]<H(X)+2n.H(X) \leq \frac{1}{n}\mathbb{E}[\text{code length}] < H(X) + \frac{2}{n}. As nβ†’βˆžn \to \infty, the rate approaches H(X)H(X).

Arithmetic coding achieves essentially the ideal code length βˆ’log⁑P(x)-\log P(\mathbf{x}) for each sequence, with only a constant (2-bit) overhead independent of nn. This is why arithmetic coding has essentially replaced Huffman in modern compression systems β€” it approaches entropy without block coding, and it adapts naturally to non-stationary and context-dependent sources.

,

Adaptive Arithmetic Coding

In practice, arithmetic coding is used with adaptive probability models: the distribution P(xi∣x1,…,xiβˆ’1)P(x_i | x_1, \ldots, x_{i-1}) is updated after each symbol based on the context. This allows the coder to exploit dependencies in the source without knowing the true distribution in advance. The resulting code length is approximately βˆ’log⁑P(x1,…,xn)=βˆ‘i=1n[βˆ’log⁑P(xi∣x1,…,xiβˆ’1)]-\log P(x_1, \ldots, x_n) = \sum_{i=1}^n [-\log P(x_i | x_1, \ldots, x_{i-1})], which approaches the entropy rate H∞H_\infty for any stationary ergodic source if the model is sufficiently expressive. Modern compression algorithms (LZMA, Zstandard, CABAC in H.264/HEVC) all use variants of adaptive arithmetic coding.

Huffman vs. Arithmetic Coding

PropertyHuffmanArithmetic
OptimalityOptimal among symbol-by-symbol prefix codesApproaches entropy for any blocklength
OverheadUp to 1 bit/symbolUp to 2 bits/sequence (2/n2/n per symbol)
Low-entropy sourcesPoor (minimum 1 bit/symbol)Excellent (fractional bits/symbol)
AdaptivityRequires rebuilding treeNaturally adaptive with context models
ComplexityO(mlog⁑m)O(m \log m) constructionO(n)O(n) encoding with finite precision
Patent statusUnencumberedHistorical patents (now expired)
πŸ”§Engineering Note

Finite-Precision Arithmetic Coding

The theoretical description of arithmetic coding uses arbitrary-precision real numbers, which is impractical. Real implementations use fixed-precision integers (typically 32 or 64 bits) and maintain the interval using integer arithmetic. The key technique is renormalization: when the interval becomes narrow enough that the leading bits are determined, those bits are emitted and the interval is rescaled. This allows streaming encoding and decoding with constant memory, regardless of sequence length. The finite-precision version adds at most 1 extra bit to the ideal code length.

Quick Check

For a source with H(X)=0.01H(X) = 0.01 bits/symbol (very low entropy), encoding n=1000n = 1000 symbols, approximately how many bits does arithmetic coding use?

1000 bits (at least 1 bit per symbol)

12 bits

100 bits

0.01 bits

Why This Matters: Arithmetic Coding in Wireless Video

Context-Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding engine in H.264/AVC, HEVC, and VVC β€” the standards used for video transmission over wireless channels. CABAC adapts its probability models to the local context of transform coefficients, achieving 5–15% better compression than Huffman-based alternatives (CAVLC). In 5G NR, HEVC with CABAC is the baseline for video services. The arithmetic coding framework from this section is exactly what CABAC implements, with context-dependent probability updates. See Book telecom, Ch. 11 for information-theoretic foundations of source coding in communications.

Key Takeaway

Arithmetic coding encodes an entire sequence as a single number in [0,1)[0,1), using approximately βˆ’log⁑P(x)-\log P(\mathbf{x}) bits β€” the information-theoretic ideal. The per-symbol overhead is 2/n2/n, vanishing with blocklength. Unlike Huffman, arithmetic coding handles low-entropy sources efficiently and adapts naturally to context-dependent models. It is the foundation of all modern practical compression systems.