Ferkans — Interactive Telecom Tutor

Bridging the Discrete and Continuous Worlds

We have treated discrete and continuous entropy as separate concepts. But in practice, every continuous signal is eventually quantized — by an ADC, by a digital representation, by finite-precision arithmetic. The natural question is: what happens to entropy when we quantize? This section establishes the precise relationship between discrete entropy of a quantized variable and the differential entropy of the underlying continuous variable. The connection resolves the puzzle of why differential entropy can be negative — and it has practical implications for quantization theory and analog-to-digital conversion.

Definition:
Quantized Random Variable

Let $X$ be a continuous random variable with PDF $f_X$ . The $\Delta$ -quantized version of $X$ is the discrete random variable $X^\Delta = \lfloor X / \Delta \rfloor \cdot \Delta$ , which rounds $X$ down to the nearest multiple of $\Delta$ . The PMF of $X^\Delta$ is:

$\Pr(X^\Delta = k\Delta) = \int_{k\Delta}^{(k+1)\Delta} f_X(x)\,dx \approx f_X(k\Delta) \cdot \Delta$

for small $\Delta$ .

Theorem: Quantization and Differential Entropy

For a continuous random variable $X$ with PDF $f_X$ :

$\lim_{\Delta \to 0} \bigl[H(X^\Delta) + \log \Delta\bigr] = h(X).$

Equivalently: $H(X^\Delta) \approx h(X) - \log \Delta$ for small $\Delta$ .

The discrete entropy $H(X^\Delta)$ diverges as $\Delta \to 0$ because finer quantization creates more possible values. But the excess entropy beyond $-\log \Delta = \log(1/\Delta)$ bits (the cost of specifying which bin) converges to the differential entropy.

The point is that differential entropy is "entropy up to an additive constant" that depends on the quantization resolution. This explains why $h(X)$ can be negative: it means the quantized entropy $H(X^\Delta)$ grows slower than $-\log \Delta$ .

Proof

Approximate the PMF

For small $\Delta$ : $p_{X^\Delta}(k\Delta) \approx f_X(k\Delta)\Delta$ .

$H(X^\Delta) = -\sum_k p_{X^\Delta}(k\Delta) \log p_{X^\Delta}(k\Delta)$

$\approx -\sum_k f_X(k\Delta)\Delta \cdot \log(f_X(k\Delta)\Delta)$

$= -\sum_k f_X(k\Delta)\Delta \cdot [\log f_X(k\Delta) + \log \Delta].$

Identify the limit

$H(X^\Delta) \approx -\sum_k f_X(k\Delta)\Delta \cdot \log f_X(k\Delta) - \log\Delta \cdot \underbrace{\sum_k f_X(k\Delta)\Delta}_{= 1}.$ $As$ \Delta \to 0 $, the first term is a Riemann sum converging to$ -\int f_X(x) \log f_X(x),dx = h(X) $. Therefore$ H(X^\Delta) + \log\Delta \to h(X)$.

Example: Quantizing a Gaussian

A Gaussian source $X \sim \mathcal{N}(0, 1)$ is uniformly quantized with step size $\Delta$ . Approximately how many bits per sample does the quantized representation require?

Solution

Apply the quantization formula

$H(X^\Delta) \approx h(X) - \log\Delta = \frac{1}{2}\log(2\pi e) - \log\Delta$ .

For $\Delta = 0.01$ (fine quantization): $H(X^\Delta) \approx 2.047 + 6.644 = 8.691$ bits per sample.

Comparison with ADC bits

An $n$ -bit uniform ADC over range $[-A, A]$ has $\Delta = 2A/2^n$ . For $A = 4$ (covers $\pm 4\sigma$ ), $n = 8$ bits gives $\Delta = 2^{-5} \approx 0.031$ .

$H(X^\Delta) \approx 2.047 + 5 = 7.047$ bits — close to $n = 8$ but slightly less, because the source entropy is less than the ADC's full dynamic range.

Entropy of a Quantized Gaussian

Observe how the discrete entropy $H(X^\Delta)$ of a quantized Gaussian grows as the quantization step $\Delta$ decreases. The curve $H(X^\Delta) + \log\Delta$ converges to the differential entropy $h(X) = \frac{1}{2}\log(2\pi e)$ .

Parameters

Standard deviation σ1

Standard deviation of the Gaussian source

log₂(Δ)-1

Logarithm of the quantization step size

Definition:
Rényi Entropy

The Rényi entropy of order $\alpha > 0$ ( $\alpha \neq 1$ ) of a discrete RV $X$ is:

$H_\alpha(X) = \frac{1}{1-\alpha}\log\!\left(\sum_{x} p(x)^\alpha\right).$

As $\alpha \to 1$ : $H_\alpha(X) \to H(X)$ (Shannon entropy).

As $\alpha \to 0$ : $H_{0}(X) = \log|\text{supp}(X)|$ (Hartley entropy).

As $\alpha \to \infty$ : $H_\infty(X) = -\log\max_x p(x)$ (min-entropy).

Rényi entropy plays a role in source coding with different error criteria, guessing problems, and one-shot information theory (Chapter 26). The parameter $\alpha$ controls which events are weighted more: $\alpha > 1$ emphasizes high-probability events, $\alpha < 1$ emphasizes low-probability events.

Rényi entropy

A one-parameter family of entropy measures generalizing Shannon entropy. $H_\alpha(X) = \frac{1}{1-\alpha}\log(\sum p(x)^\alpha)$ . Converges to Shannon entropy as $\alpha \to 1$ . Used in one-shot information theory and security analysis.

Related: Entropy

Key Takeaway

Differential entropy equals the entropy of quantization minus the quantization cost. Precisely: $h(X) = \lim_{\Delta \to 0}[H(X^\Delta) + \log\Delta]$ . This explains why $h(X)$ can be negative (the quantization entropy grows slower than $\log(1/\Delta)$ ) and why mutual information — which is a difference of entropies — is always well-defined and non-negative for continuous variables.

Quick Check

If $X \sim \text{Uniform}(0, 1/2)$ and we quantize with step $\Delta = 1/8$ , approximately how many bits does $H(X^\Delta)$ require?

$h(X) - \log \Delta = \log(1/2) - \log(1/8) = -1 + 3 = 2$ bits

$3$ bits

$1$ bit

$-1$ bit

Correction:

h(X) - \log \Delta = \log(1/2) - \log(1/8) = -1 + 3 = 2

bits

$h(X) = \log(1/2) = -1$ bit. The quantized variable has about $4$ equally likely levels (range $1/2$ divided by step $1/8$ ), so $H(X^\Delta) \approx \log 4 = 2$ bits. This matches $h(X) - \log\Delta = -1 - (-3) = 2$ bits.

🎓CommIT Contribution(2006)

Capacity of the Gaussian MIMO Broadcast Channel

H. Weingarten, Y. Steinberg, S. Shamai, G. Caire — IEEE Transactions on Information Theory

The maximum entropy property of the Gaussian distribution is central to characterizing the capacity region of the Gaussian MIMO broadcast channel. Weingarten, Steinberg, and Shamai proved that the capacity region is achieved by dirty-paper coding (DPC) with Gaussian signaling, establishing the connection between the continuous entropy results of this chapter and the multiuser capacity theory of Chapter 15. The proof relies heavily on the entropy power inequality and the vector version of the "Gaussian is worst noise" result.

MIMObroadcast-channelGaussiancapacityView Paper →

Connections Between Discrete and Continuous Entropy