Connections Between Discrete and Continuous Entropy

Bridging the Discrete and Continuous Worlds

We have treated discrete and continuous entropy as separate concepts. But in practice, every continuous signal is eventually quantized — by an ADC, by a digital representation, by finite-precision arithmetic. The natural question is: what happens to entropy when we quantize? This section establishes the precise relationship between discrete entropy of a quantized variable and the differential entropy of the underlying continuous variable. The connection resolves the puzzle of why differential entropy can be negative — and it has practical implications for quantization theory and analog-to-digital conversion.

Definition:

Quantized Random Variable

Let XX be a continuous random variable with PDF fXf_X. The Δ\Delta-quantized version of XX is the discrete random variable XΔ=X/ΔΔX^\Delta = \lfloor X / \Delta \rfloor \cdot \Delta, which rounds XX down to the nearest multiple of Δ\Delta. The PMF of XΔX^\Delta is:

Pr(XΔ=kΔ)=kΔ(k+1)ΔfX(x)dxfX(kΔ)Δ\Pr(X^\Delta = k\Delta) = \int_{k\Delta}^{(k+1)\Delta} f_X(x)\,dx \approx f_X(k\Delta) \cdot \Delta

for small Δ\Delta.

Theorem: Quantization and Differential Entropy

For a continuous random variable XX with PDF fXf_X:

limΔ0[H(XΔ)+logΔ]=h(X).\lim_{\Delta \to 0} \bigl[H(X^\Delta) + \log \Delta\bigr] = h(X).

Equivalently: H(XΔ)h(X)logΔH(X^\Delta) \approx h(X) - \log \Delta for small Δ\Delta.

The discrete entropy H(XΔ)H(X^\Delta) diverges as Δ0\Delta \to 0 because finer quantization creates more possible values. But the excess entropy beyond logΔ=log(1/Δ)-\log \Delta = \log(1/\Delta) bits (the cost of specifying which bin) converges to the differential entropy.

The point is that differential entropy is "entropy up to an additive constant" that depends on the quantization resolution. This explains why h(X)h(X) can be negative: it means the quantized entropy H(XΔ)H(X^\Delta) grows slower than logΔ-\log \Delta.

Example: Quantizing a Gaussian

A Gaussian source XN(0,1)X \sim \mathcal{N}(0, 1) is uniformly quantized with step size Δ\Delta. Approximately how many bits per sample does the quantized representation require?

Entropy of a Quantized Gaussian

Observe how the discrete entropy H(XΔ)H(X^\Delta) of a quantized Gaussian grows as the quantization step Δ\Delta decreases. The curve H(XΔ)+logΔH(X^\Delta) + \log\Delta converges to the differential entropy h(X)=12log(2πe)h(X) = \frac{1}{2}\log(2\pi e).

Parameters
1

Standard deviation of the Gaussian source

-1

Logarithm of the quantization step size

Definition:

Rényi Entropy

The Rényi entropy of order α>0\alpha > 0 (α1\alpha \neq 1) of a discrete RV XX is:

Hα(X)=11αlog ⁣(xp(x)α).H_\alpha(X) = \frac{1}{1-\alpha}\log\!\left(\sum_{x} p(x)^\alpha\right).

As α1\alpha \to 1: Hα(X)H(X)H_\alpha(X) \to H(X) (Shannon entropy).

As α0\alpha \to 0: H0(X)=logsupp(X)H_{0}(X) = \log|\text{supp}(X)| (Hartley entropy).

As α\alpha \to \infty: H(X)=logmaxxp(x)H_\infty(X) = -\log\max_x p(x) (min-entropy).

Rényi entropy plays a role in source coding with different error criteria, guessing problems, and one-shot information theory (Chapter 26). The parameter α\alpha controls which events are weighted more: α>1\alpha > 1 emphasizes high-probability events, α<1\alpha < 1 emphasizes low-probability events.

Rényi entropy

A one-parameter family of entropy measures generalizing Shannon entropy. Hα(X)=11αlog(p(x)α)H_\alpha(X) = \frac{1}{1-\alpha}\log(\sum p(x)^\alpha). Converges to Shannon entropy as α1\alpha \to 1. Used in one-shot information theory and security analysis.

Related: Entropy

Key Takeaway

Differential entropy equals the entropy of quantization minus the quantization cost. Precisely: h(X)=limΔ0[H(XΔ)+logΔ]h(X) = \lim_{\Delta \to 0}[H(X^\Delta) + \log\Delta]. This explains why h(X)h(X) can be negative (the quantization entropy grows slower than log(1/Δ)\log(1/\Delta)) and why mutual information — which is a difference of entropies — is always well-defined and non-negative for continuous variables.

Quick Check

If XUniform(0,1/2)X \sim \text{Uniform}(0, 1/2) and we quantize with step Δ=1/8\Delta = 1/8, approximately how many bits does H(XΔ)H(X^\Delta) require?

h(X)logΔ=log(1/2)log(1/8)=1+3=2h(X) - \log \Delta = \log(1/2) - \log(1/8) = -1 + 3 = 2 bits

33 bits

11 bit

1-1 bit

🎓CommIT Contribution(2006)

Capacity of the Gaussian MIMO Broadcast Channel

H. Weingarten, Y. Steinberg, S. Shamai, G. CaireIEEE Transactions on Information Theory

The maximum entropy property of the Gaussian distribution is central to characterizing the capacity region of the Gaussian MIMO broadcast channel. Weingarten, Steinberg, and Shamai proved that the capacity region is achieved by dirty-paper coding (DPC) with Gaussian signaling, establishing the connection between the continuous entropy results of this chapter and the multiuser capacity theory of Chapter 15. The proof relies heavily on the entropy power inequality and the vector version of the "Gaussian is worst noise" result.

MIMObroadcast-channelGaussiancapacityView Paper →