References & Further Reading
References
- T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 2nd ed., 2006
Chapters 5 (data compression) and 13 (universal source coding) cover the material in this chapter. Our notation follows Cover & Thomas closely.
- D. A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, 1952
The original paper describing the Huffman algorithm. Written as a term paper for Robert Fano's information theory course at MIT β Fano had posed the problem of finding optimal codes, and Huffman solved it as a student.
- J. Rissanen, Generalized Kraft Inequality and Arithmetic Coding, 1976
The foundational paper on arithmetic coding as a practical compression method. Builds on earlier ideas by Elias (1975, unpublished) and Shannon.
- J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, 1977
The LZ77 (sliding-window) algorithm. The basis for gzip, DEFLATE, and most practical compression tools in use today.
- J. Ziv and A. Lempel, Compression of Individual Sequences via Variable-Rate Coding, 1978
The LZ78 (dictionary-based) algorithm. Proves universality for individual sequences under the finite-state compressibility framework.
- F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens, The Context-Tree Weighting Method: Basic Properties, 1995
Introduces context tree weighting, which achieves near-optimal redundancy for tree sources. Faster convergence than LZ for Markov-like sources.
Further Reading
For readers who want to go deeper into source coding theory and practice.
Arithmetic coding implementation details
I. H. Witten, R. M. Neal, and J. G. Cleary, 'Arithmetic Coding for Data Compression,' Communications of the ACM, vol. 30, no. 6, pp. 520β540, Jun. 1987
The definitive practical guide to implementing arithmetic coding with finite precision. Explains the renormalization trick that makes the algorithm work in constant memory.
Modern compression: ANS and Zstandard
J. Duda, 'Asymmetric Numeral Systems: Entropy Coding Combining Speed of Huffman Coding with Compression Rate of Arithmetic Coding,' arXiv:1311.2540, 2013
Asymmetric numeral systems (ANS) achieve arithmetic-coding compression rates with Huffman-like speed. Used in Zstandard (Facebook), Apple LZFSE, and JPEG XL.
Redundancy of universal codes
N. Merhav and M. Feder, 'Universal Prediction,' IEEE Trans. Inform. Theory, vol. 44, no. 6, pp. 2124β2147, Oct. 1998
Detailed analysis of the redundancy (excess rate above entropy) of universal codes including minimax and Bayesian approaches. Connects to MDL and statistical learning.
Source coding with side information
D. Slepian and J. K. Wolf, 'Noiseless Coding of Correlated Information Sources,' IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471β480, Jul. 1973
The foundational paper on distributed lossless compression β covered in Chapter 7 of this book. Shows that two correlated sources can be compressed separately at rates approaching their joint entropy.