Prerequisites & Notation

Before You Begin

This chapter builds the mathematical foundations of information theory from scratch. We assume only basic probability and some mathematical maturity. If any item below feels unfamiliar, revisit the linked material before proceeding.

Discrete probability: sample spaces, events, probability axioms
Self-check: Can you compute $P(A \cup B)$ for non-disjoint events $A$ and $B$ ?
Random variables, PMFs, expectation, and variance
Self-check: Given a PMF $p(x)$ , can you compute $\mathbb{E}[g(X)]$ for an arbitrary function $g$ ?
Joint and conditional distributions, Bayes' rule
Self-check: Can you derive $P_{X|Y}(x|y)$ from a joint PMF table?
Jensen's inequality for convex and concave functions
Self-check: Can you state Jensen's inequality and identify when equality holds?
Basic calculus: derivatives, Lagrange multipliers
Self-check: Can you find the maximum of $f(x) = -x \log x$ on $(0,1)$ ?
Logarithms: change of base, $\log(ab) = \log a + \log b$
Self-check: Can you convert between $\log_2$ and $\ln$ fluently?

Notation for This Chapter

Symbols introduced in this chapter. All logarithms are base 2 unless stated otherwise, so entropy is measured in bits.

Symbol	Meaning	Introduced
$\mathcal{X}, \mathcal{Y}$	Finite alphabets (source and output)	s01
$p(x), p_X(x)$	Probability mass function of discrete random variable $X$	s01
$H(X)$	Shannon entropy of $X$ : $-\sum_x p(x) \log p(x)$	s01
$H(X,Y)$	Joint entropy of $(X,Y)$	s02
$H(Y\|X)$	Conditional entropy of $Y$ given $X$	s02
$I(X;Y)$	Mutual information between $X$ and $Y$	s03
$D(P \\| Q)$	Kullback-Leibler divergence from $P$ to $Q$	s04
$X \multimap Y \multimap Z$	Markov chain relation: $X - Y - Z$	s06
$P_e$	Probability of error	s06
$\log$	Logarithm base 2 (bits) unless noted otherwise	s01

Back to chapter overview Entropy and Its Operational Meaning