Estimation, Rate, and Mutual Information

A Bridge Between Two Worlds

Estimation theory and information theory grew up side by side but largely ignored each other. Estimation theorists measured performance in mean squared error (MMSE); information theorists in mutual information (nats or bits). The two quantities describe the same physical situation --- a source XX observed through noise --- from complementary angles, but there was no clean identity connecting them.

In 2005, Guo, Shamai and Verdu proved that for the canonical Gaussian channel Y=SNR X+NY = \sqrt{\text{SNR}}\,X + N with N∼N(0,1)N \sim \mathcal{N}(0,1), ddSNR I(X;Y)β€…β€Š=β€…β€Š12 mmse(SNR),\frac{d}{d\text{SNR}}\,I(X;Y) \;=\; \frac{1}{2}\,\text{mmse}(\text{SNR}), with mmse(SNR)=E[(Xβˆ’E[X∣Y])2]\text{mmse}(\text{SNR}) = \mathbb{E}[(X - \mathbb{E}[X|Y])^2]. The derivative of mutual information with respect to SNR equals one half the MMSE. This is the I-MMSE identity, and it is exact for every input distribution PXP_X with finite second moment.

The identity is beautiful on its own, and it is operationally transformative. On the estimation side, it gives an integral representation of mutual information in terms of MMSE curves --- the "estimation-theoretic" meaning of channel capacity. On the information- theoretic side, it provides a powerful tool for computing mutual information through simulation of the MMSE (much easier than direct entropy integrals). It has since reshaped how the field thinks about Gaussian channels.

Definition:

MMSE as a Function of SNR

For a random variable XX of finite variance and standard Gaussian noise N∼N(0,1)N \sim \mathcal{N}(0,1) independent of XX, define the scalar Gaussian channel Yβ€…β€Š=β€…β€ŠSNR X+N,SNRβ‰₯0,Y \;=\; \sqrt{\text{SNR}}\,X + N, \qquad \text{SNR} \geq 0, and the MMSE function mmse(SNR)β€…β€Š=β€…β€ŠE ⁣[(Xβˆ’E[X∣Y])2].\text{mmse}(\text{SNR}) \;=\; \mathbb{E}\!\left[(X - \mathbb{E}[X\mid Y])^2\right]. The MMSE is non-increasing in SNR\text{SNR}, with mmse(0)=Var(X)\text{mmse}(0) = \mathrm{Var}(X) and mmse(∞)=0\text{mmse}(\infty) = 0.

The MMSE function encapsulates all the second-order information content of XX as it is revealed through progressively stronger observations. Different input distributions produce different MMSE curves: Gaussian XX gives mmse(SNR)=1/(1+SNR)\text{mmse}(\text{SNR}) = 1/(1+\text{SNR}), binary Β±1\pm 1 gives a sigmoidal curve that saturates sharply, and sparse distributions give an L-shaped curve.

Theorem: I-MMSE Identity (Guo-Shamai-Verdu)

Let XX be a random variable with E[X2]<∞\mathbb{E}[X^2] < \infty, and let Y=SNR X+NY = \sqrt{\text{SNR}}\,X + N with N∼N(0,1)N \sim \mathcal{N}(0,1) independent of XX. Then ddSNR I(X;Y)β€…β€Š=β€…β€Š12 mmse(SNR).\frac{d}{d\text{SNR}}\,I(X;Y) \;=\; \frac{1}{2}\,\text{mmse}(\text{SNR}). Consequently, for every SNR1>SNR0β‰₯0\text{SNR}_{1} > \text{SNR}_{0} \geq 0, I(X;YSNR1)βˆ’I(X;YSNR0)β€…β€Š=β€…β€Š12∫SNR0SNR1mmse(s) ds.I(X; Y_{\text{SNR}_{1}}) - I(X; Y_{\text{SNR}_{0}}) \;=\; \frac{1}{2}\int_{\text{SNR}_{0}}^{\text{SNR}_{1}} \text{mmse}(s)\,ds.

At each SNR, an incremental SNR d SNRd\,\text{SNR} yields an incremental Gaussian observation. The posterior mean is the MMSE estimator, and the incremental information it adds is exactly 12mmse(SNR) d SNR\frac{1}{2} \text{mmse}(\text{SNR})\,d\,\text{SNR} nats.

Key Takeaway

Under the canonical Gaussian channel Y=SNR X+NY = \sqrt{\text{SNR}}\,X+N, the MMSE is literally the slope of the mutual-information curve versus SNR: mmse(SNR)=2β‹…d I(X;Y)/d SNR\text{mmse}(\text{SNR}) = 2 \cdot d\,I(X;Y)/d\,\text{SNR}. Information and estimation are tied together by a derivative.

Example: I-MMSE Verification: Gaussian Input

For X∼N(0,1)X \sim \mathcal{N}(0,1) on the Gaussian channel Y=SNR X+NY = \sqrt{\text{SNR}}\,X + N, compute I(X;Y)I(X;Y) and mmse(SNR)\text{mmse}(\text{SNR}) directly, and verify the I-MMSE identity.

Example: I-MMSE for BPSK Input

Let XX be equiprobable Β±1\pm 1. Compute the MMSE function and the mutual information of the channel Y=SNR X+NY = \sqrt{\text{SNR}}\,X + N, and discuss how the two are related.

MMSE and Mutual Information vs. SNR

Pick an input distribution and watch the MMSE curve and the mutual-information curve. The MMSE curve is the slope of the MI curve (scaled by 1/21/2). Compare Gaussian (concave MMSE), BPSK (sigmoidal MMSE), and sparse Β±1\pm 1 with probability pp.

Parameters
-10
20
0.2

Active probability (sparse input only)

Vector and Non-Gaussian Extensions

The I-MMSE identity extends to vector Gaussian channels y=Hx+n\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n} via βˆ‡HI(x;y)=H E\nabla_{\mathbf{H}}I(\mathbf{x};\mathbf{y}) = \mathbf{H}\, \mathbf{E} where E\mathbf{E} is the MMSE error covariance, and to non-Gaussian noise via an expression involving the score of the noise density. These generalisations connect information theory to the analysis of MIMO channels, CDMA, and dense spectral estimation.

Computing Mutual Information via MMSE Simulation

Complexity: O(K*N) MMSE evaluations
Input: input distribution P_X, SNR grid {snr_1, ..., snr_K}
Output: mutual information I(X;Y) at each SNR
for each snr_k in the grid:
Draw N i.i.d. samples x_1,...,x_N ~ P_X
Draw N i.i.d. samples n_1,...,n_N ~ N(0,1)
Compute y_i = sqrt(snr_k) * x_i + n_i
Compute the MMSE estimator hat(x)_i = E[X | Y = y_i]
(closed form if possible; otherwise numerical posterior integration)
Estimate mmse_hat(snr_k) = mean_i (x_i - hat(x)_i)^2
Integrate: I_hat(snr_k) = 0.5 * cumulative_trapz(mmse_hat, snr_grid)
return I_hat

This algorithm is often dramatically faster than direct mutual- information estimation (which requires an nn-dimensional density ratio). Conditioned on the closed-form MMSE estimator being cheap, the approach is the standard way to benchmark non-Gaussian inputs.

MMSE Curve Shapes for Canonical Inputs

InputMMSE at low SNRMMSE shapeMI shape
X∼N(0,1)X\sim\mathcal{N}(0,1)111/(1+SNR)1/(1+\text{SNR}) (smooth convex)12log⁑(1+SNR)\tfrac{1}{2}\log(1+\text{SNR}) (concave)
BPSK ±1\pm 111Sigmoidal, sharp transition near SNR∼1\text{SNR}\sim 1SS-shaped, saturates at log⁑2\log 2
QPSK11Two-sigmoid, saturates at 00Saturates at log⁑4\log 4
Sparse Bernoulli ppp(1βˆ’p)p(1-p) or smallerL-shaped, long tailLong linear rise then saturation

Common Mistake: Nats versus Bits

Mistake:

Reporting the I-MMSE identity as dI/dSNR=mmse/2dI/d\text{SNR} = \text{mmse}/2 in bits rather than nats, or mixing the two units in the same plot.

Correction:

The factor 1/21/2 comes from the natural logarithm (nats). In bits, the identity reads dIbits/dSNR=mmse/(2ln⁑2)dI_{\text{bits}}/d\text{SNR} = \text{mmse}/ (2\ln 2). Always state your information unit; the scale factor matters.

Historical Note: An Identity That Reshaped the Field

2005-present

The identity was discovered by Dongning Guo, Shlomo Shamai and Sergio Verdu in 2005 while studying the error performance of CDMA systems via random-matrix tools. What began as a technical lemma for computing spectral efficiency turned out to be a foundational identity connecting estimation theory to information theory. Within three years, I-MMSE had become a standard tool in the analysis of Gaussian channels, MIMO channels, sparse superposition codes, and mismatched decoding. Verdu would later describe I-MMSE as the result he was most surprised to discover --- a textbook-level identity hiding in plain sight for half a century.

πŸŽ“CommIT Contribution(2022)

MMSE Curves for Massive-MIMO Uplink Detection

G. Caire, R. Chopra β€” IEEE Transactions on Information Theory

CommIT's recent work applies the I-MMSE identity to the massive MIMO uplink: the replica-symmetric fixed-point equations yield the MMSE of per-user decoders in the large-system limit, and integration over SNR then gives the achievable rate of each user. The approach bypasses direct mutual-information computation --- which would require a high-dimensional density ratio --- and reveals how pilot contamination, power control, and channel hardening affect per-user spectral efficiency through their effect on the MMSE curve.

massive-mimoi-mmsespectral-efficiency

I-MMSE Identity

The Guo-Shamai-Verdu result stating d I(X;Y)/d SNR=12mmse(SNR)d\,I(X;Y)/d\,\text{SNR} = \tfrac{1}{2}\text{mmse}(\text{SNR}) for the canonical Gaussian channel Y=SNR X+NY = \sqrt{\text{SNR}}\,X+N. Expresses mutual information as the integral of an MMSE curve.

Related: MMSE as a Function of SNR, Mutual Information, Gaussian Channel