Prerequisites & Notation

Before You Begin

This chapter builds the mathematical foundations of information theory from scratch. We assume only basic probability and some mathematical maturity. If any item below feels unfamiliar, revisit the linked material before proceeding.

  • Discrete probability: sample spaces, events, probability axioms

    Self-check: Can you compute P(AB)P(A \cup B) for non-disjoint events AA and BB?

  • Random variables, PMFs, expectation, and variance

    Self-check: Given a PMF p(x)p(x), can you compute E[g(X)]\mathbb{E}[g(X)] for an arbitrary function gg?

  • Joint and conditional distributions, Bayes' rule

    Self-check: Can you derive PXY(xy)P_{X|Y}(x|y) from a joint PMF table?

  • Jensen's inequality for convex and concave functions

    Self-check: Can you state Jensen's inequality and identify when equality holds?

  • Basic calculus: derivatives, Lagrange multipliers

    Self-check: Can you find the maximum of f(x)=xlogxf(x) = -x \log x on (0,1)(0,1)?

  • Logarithms: change of base, log(ab)=loga+logb\log(ab) = \log a + \log b

    Self-check: Can you convert between log2\log_2 and ln\ln fluently?

Notation for This Chapter

Symbols introduced in this chapter. All logarithms are base 2 unless stated otherwise, so entropy is measured in bits.

SymbolMeaningIntroduced
X,Y\mathcal{X}, \mathcal{Y}Finite alphabets (source and output)s01
p(x),pX(x)p(x), p_X(x)Probability mass function of discrete random variable XXs01
H(X)H(X)Shannon entropy of XX: xp(x)logp(x)-\sum_x p(x) \log p(x)s01
H(X,Y)H(X,Y)Joint entropy of (X,Y)(X,Y)s02
H(YX)H(Y|X)Conditional entropy of YY given XXs02
I(X;Y)I(X;Y)Mutual information between XX and YYs03
D(PQ)D(P \| Q)Kullback-Leibler divergence from PP to QQs04
XYZX \multimap Y \multimap ZMarkov chain relation: XYZX - Y - Zs06
PeP_eProbability of errors06
log\logLogarithm base 2 (bits) unless noted otherwises01