Prerequisites & Notation
Before You Begin
This chapter builds the mathematical foundations of information theory from scratch. We assume only basic probability and some mathematical maturity. If any item below feels unfamiliar, revisit the linked material before proceeding.
- Discrete probability: sample spaces, events, probability axioms
Self-check: Can you compute for non-disjoint events and ?
- Random variables, PMFs, expectation, and variance
Self-check: Given a PMF , can you compute for an arbitrary function ?
- Joint and conditional distributions, Bayes' rule
Self-check: Can you derive from a joint PMF table?
- Jensen's inequality for convex and concave functions
Self-check: Can you state Jensen's inequality and identify when equality holds?
- Basic calculus: derivatives, Lagrange multipliers
Self-check: Can you find the maximum of on ?
- Logarithms: change of base,
Self-check: Can you convert between and fluently?
Notation for This Chapter
Symbols introduced in this chapter. All logarithms are base 2 unless stated otherwise, so entropy is measured in bits.
| Symbol | Meaning | Introduced |
|---|---|---|
| Finite alphabets (source and output) | s01 | |
| Probability mass function of discrete random variable | s01 | |
| Shannon entropy of : | s01 | |
| Joint entropy of | s02 | |
| Conditional entropy of given | s02 | |
| Mutual information between and | s03 | |
| Kullback-Leibler divergence from to | s04 | |
| Markov chain relation: | s06 | |
| Probability of error | s06 | |
| Logarithm base 2 (bits) unless noted otherwise | s01 |