References & Further Reading
References
- A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, Springer, 2nd ed., 1998
The standard graduate reference on large deviations theory. Chapters 1--2 cover everything in Sections 20.1--20.2.
- T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 2nd ed., 2006
Chapter 11 (method of types, Sanov, Stein) is the most accessible treatment for engineers.
- R. Vershynin, High-Dimensional Probability: An Introduction with Applications to Data Science, Cambridge University Press, 2018. [Link]
Chapters 2--4 cover sub-Gaussian/sub-exponential theory and matrix concentration.
- S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford University Press, 2013
Definitive modern reference on concentration inequalities. Chapters 2--3 cover Hoeffding and Bernstein.
- J. A. Tropp, An Introduction to Matrix Concentration Inequalities, Foundations and Trends in Machine Learning, 2015
Survey paper on matrix Bernstein and related inequalities.
- W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, 1963
The original paper establishing Hoeffding's inequality.
- H. Cramér, Sur un nouveau théorème-limite de la théorie des probabilités, 1938
Cramér's original paper on large deviations for sums of i.i.d. random variables.
- I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Cambridge University Press, 2nd ed., 2011
The definitive treatment of the method of types and its applications to coding theory.
- J. A. Bucklew, Introduction to Rare Event Simulation, Springer, 2004
Connects large deviations to importance sampling and rare-event Monte Carlo.
- J. Östman, G. Durisi, and G. Caire, Finite-Blocklength Bounds on the Maximum Coding Rate of Rician Fading Channels with Applications to Pilot-Assisted Transmission, 2021
Applies concentration inequalities to derive finite-blocklength bounds for MIMO channels.
- G. Caire, Fundamentals of Stochastic Processes: Lecture Notes, TU Berlin, 2024
Course material, Ch. 5: LLN, CLT, and large deviations.
- J. A. Gubner, Probability and Random Processes for Electrical and Computer Engineers, Cambridge University Press, 2006
Ch. 8 covers MGFs and the Chernoff bound, which are prerequisites for this chapter.
Further Reading
These resources extend the large deviations and concentration machinery to more advanced settings encountered in research.
Large deviations for Markov chains and dependent sequences
Dembo and Zeitouni (1998), Ch. 3
The Gärtner-Ellis theorem extends Cramér's theorem to non-i.i.d. settings. Essential for analyzing queueing systems and Markov chain Monte Carlo.
Sanov's theorem for continuous distributions
Dembo and Zeitouni (1998), Section 6.2
The finite-alphabet treatment here is a preview. The full theory requires weak convergence and topology on the space of probability measures.
Talagrand's concentration inequality
Boucheron, Lugosi, and Massart (2013), Ch. 6
Goes beyond sub-Gaussian theory to concentration of functions of independent variables that are not sums — e.g., suprema of empirical processes.
Error exponents for channel coding
Gallager, *Information Theory and Reliable Communication* (1968), Ch. 5
Develops the reliability function $E(R)$ for DMCs, showing how the error exponent varies with rate. The sphere-packing and random coding exponents are the pillars of finite-blocklength analysis.
Matrix concentration for covariance estimation
Tropp (2015), Section 6.3
Directly applicable to pilot design in massive MIMO: determines how many pilots are needed to estimate the spatial covariance to a given accuracy.