Ferkans — Interactive Telecom Tutor

When BP Is Not Enough

Belief propagation is tractable when the messages have closed forms — discrete distributions, Gaussians, or mixtures with small support. When variables are continuous and the factors are non-conjugate (for example, a discrete-constellation prior combined with Gaussian likelihood factors, as in MIMO detection with 64-QAM), sum-product messages become intractable mixtures that grow with each iteration.

Expectation Propagation (EP), introduced by Minka in 2001, fixes this by forcing every message to lie in a tractable exponential family (usually Gaussian) via moment matching. EP is to Gaussian BP what the extended Kalman filter is to nonlinear filtering: it projects non-Gaussian quantities onto their closest Gaussian by matching the first two moments. In MIMO detection, EP delivers near-ML performance at LMMSE complexity and has become the state-of-the-art iterative detector for large-MIMO with high-order constellations.

Definition:
KL-Projection onto the Gaussian Family

Let $\mathcal{F} = \{ q(\cdot; \boldsymbol{\theta}) : \boldsymbol{\theta} \in \Theta\}$ be the exponential family of Gaussian distributions (natural parameters mean and precision). For any probability density $p$ , define $\text{Proj}_{\mathcal{F}}[p] \;\triangleq\; \arg\min_{q \in \mathcal{F}} D_{\text{KL}}\!\big(p \,\|\, q\big).$ The minimizer is the Gaussian whose mean and variance match those of $p$ : $\mu_q = \mathbb{E}_p[x], \qquad \sigma_q^2 = \text{Var}_p[x].$ This is the unique moment-matching Gaussian approximation.

The KL direction matters: $D_{\text{KL}}(p\|q)$ (moment matching) gives a Gaussian that covers the support of $p$ , while the reverse $D_{\text{KL}}(q\|p)$ (variational/mean-field) tends to under-cover. EP uses the moment-matching direction.

Theorem: KL Minimization Equals Moment Matching

Let $p$ be any density with finite first two moments and $q$ belong to the Gaussian family with mean $\mu$ and variance $\sigma^2$ . Then $D_{\text{KL}}(p\|q)$ is minimized uniquely at $\mu = \mathbb{E}_p[x]$ , $\sigma^2 = \text{Var}_p[x]$ .

For any exponential family, minimizing $D_{\text{KL}}(p\|q_\theta)$ reduces to matching sufficient-statistic expectations — this is the standard exponential-family duality.

Proof

KL expansion for Gaussian $q$

Write $D_{\text{KL}}(p\|q) = -h(p) + \frac{1}{2}\log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\big(\text{Var}_p[x] + (\mathbb{E}_p[x] - \mu)^2\big)$ , where $h(p) = -\int p\log p$ is the differential entropy of $p$ .

Minimize in $\mu$

The only $\mu$ -dependence is $(\mathbb{E}_p[x] - \mu)^2/(2\sigma^2)$ , minimized at $\mu = \mathbb{E}_p[x]$ .

Minimize in $\sigma^2$

Setting derivative to zero: $\partial_{\sigma^2} D_{\text{KL}} = \frac{1}{2\sigma^2} - \frac{\text{Var}_p[x]}{2\sigma^4} = 0 \Rightarrow \sigma^2 = \text{Var}_p[x]$ . $\blacksquare$

Definition:
EP Messages and the Cavity Distribution

Suppose the posterior factorizes as $p(\mathbf{x}|\mathbf{y}) \propto \prod_a f_a(\mathbf{x}_{\partial a})$ and we approximate it by $q(\mathbf{x}) = \prod_a \tilde{f}_a(\mathbf{x}_{\partial a})$ with $\tilde{f}_a$ chosen from a tractable (Gaussian) family. For each factor $a$ define the cavity $q^{\backslash a}(\mathbf{x}) = q(\mathbf{x}) / \tilde{f}_a(\mathbf{x}_{\partial a}),$ i.e., the current approximation with factor $a$ removed. The EP update for $\tilde{f}_a$ is: $\tilde{f}_a^{\text{new}}(\mathbf{x}_{\partial a}) = \frac{\text{Proj}_{\mathcal{F}}\!\big[q^{\backslash a}(\mathbf{x}) \, f_a(\mathbf{x}_{\partial a})\big]}{q^{\backslash a}(\mathbf{x})}.$ In words: replace the approximate factor by the true factor, project the product back to the Gaussian family by moment matching, then divide out the cavity to get the new approximation factor.

Division of Gaussians is well-defined in natural-parameter (precision-and-precision-weighted-mean) form; the result is a Gaussian with possibly negative precision. EP becomes unstable when precisions turn negative — see the pitfalls below.

Expectation Propagation for MIMO Detection

Complexity:

O(T_{\max} N_t^3)

dominated by the matrix inversion per iteration

Input:

\mathbf{y}, \mathbf{H}, \sigma^2

; constellation prior

p_s

;

damping

\beta \in (0,1]

; iterations

T_{\max}

.

Output: Posterior means

\hat{x}_i

and variances

v_i

.

1. Initialize cavity parameters

\gamma_i = 0, \lambda_i = \epsilon

(small)

for all

i

.

2. for

t = 1, \ldots, T_{\max}

do

3.

\quad

Form

\boldsymbol{\Sigma} = (\mathbf{H}^H\mathbf{H}/\sigma^2 + \boldsymbol{\Lambda})^{-1}

and

\boldsymbol{\mu} = \boldsymbol{\Sigma}(\mathbf{H}^H\mathbf{y}/\sigma^2 + \boldsymbol{\gamma})

,

where

\boldsymbol{\Lambda} = \text{diag}(\lambda_i)

.

4.

\quad

for each stream

i

do

5.

\qquad

Cavity:

v^{\backslash}_i = (1/\Sigma_{ii} - \lambda_i)^{-1}

,

\mu^{\backslash}_i = v^{\backslash}_i(\mu_i/\Sigma_{ii} - \gamma_i)

.

6.

\qquad

Tilted mean/variance: moment-match

\mathcal{N}(\mu_i^\backslash, v_i^\backslash) \cdot p_s(x_i)

to get

\hat{\mu}_i, \hat{v}_i

.

7.

\qquad

Update:

\lambda_i^{\text{new}} = 1/\hat{v}_i - 1/v^{\backslash}_i

,

\gamma_i^{\text{new}} = \hat{\mu}_i/\hat{v}_i - \mu^{\backslash}_i/v^{\backslash}_i

.

8.

\qquad

Damp:

\lambda_i \leftarrow \beta\lambda_i^{\text{new}} + (1-\beta)\lambda_i

,

and similarly for

\gamma_i

.

9.

\quad

end for

10. end for

11. return

(\mu_i, \Sigma_{ii})

as approximate posterior of

x_i

.

The moment matching in step 6 is a closed-form sum over constellation points for discrete priors — the same Gaussian-weighted constellation average that appears in every symbol demapper. Damping $\beta \in (0.5, 0.9)$ is standard for MIMO detection.

Theorem: EP Fixed Points Are Stationary Points of the Bethe Free Energy

Any fixed point of the EP iteration is a stationary point of a specific "expectation-constrained" free-energy functional, $F_{\text{EP}}[q] = -\log Z + \sum_a \text{KL}(q_a \| f_a q^{\backslash a})$ subject to moment-matching constraints between the factor approximations and the global approximation. Under these moment constraints $F_{\text{EP}}$ reduces (for tree-structured graphs) to the exact log-partition. For loopy graphs, EP fixed points coincide with stationary points of the Bethe-like free energy whose first variations are zero.

Just as loopy BP is a fixed-point method for the Bethe free energy, EP is a fixed-point method for an exponential-family generalization. This is why EP produces meaningful marginals even on loopy graphs — it is not just heuristic projection.

Proof

Free-energy formulation

Each EP factor update $\tilde{f}_a^{\text{new}}$ minimizes $\text{KL}(q^{\backslash a} f_a \| q^{\backslash a} \tilde{f}_a)$ over Gaussian $\tilde{f}_a$ , which is the moment matching step. The corresponding Lagrangian enforces moment consistency between the cavity-tilted distribution and the global approximation.

Stationarity

At a fixed point, no local moment matching update changes $\tilde{f}_a$ ; the Lagrangian's gradient in the natural parameters vanishes. This is exactly the stationarity condition for the expectation-constrained Bethe free energy.

Tree case

On a tree, cavities are exact and the Bethe free energy equals the exact free energy, so EP returns exact Gaussian moments of the true posterior (projected). $\blacksquare$

EP vs LMMSE vs ML: BER for Large MIMO Detection

Compare the BER of EP-based detection to LMMSE detection, MMSE-SIC, and the ML lower bound for an $N_t \times N_t$ MIMO system with 64-QAM. Observe the EP gain at high SNR where LMMSE suffers from ill-conditioning.

Parameters

SNR (dB)18

N_t = N_r

Modulation

EP iterations10

Damping

\beta

0.7

Example: Hand Computation: EP for $2 \times 2$ BPSK

Let $\mathbf{H} = \begin{bmatrix}1 & 0.8 \\ 0.8 & 1\end{bmatrix}$ , $\sigma^2 = 0.5$ , and BPSK symbols $x_i \in \{+1, -1\}$ with uniform prior. Given $\mathbf{y} = [1.3, 0.9]^\mathsf{T}$ , perform one EP update starting from flat cavities ( $\gamma_i = 0, \lambda_i = 0$ ).

Solution

Gaussian pass

With $\boldsymbol{\Lambda} = \mathbf{0}$ : $\mathbf{H}^H\mathbf{H}/\sigma^2 = 2\begin{bmatrix}1.64 & 1.6\\1.6 & 1.64\end{bmatrix}$ . Then $\boldsymbol{\Sigma} = (2\mathbf{H}^H\mathbf{H})^{-1}$ and $\boldsymbol{\mu} = \boldsymbol{\Sigma}(\mathbf{H}^H\mathbf{y}/\sigma^2)$ . Computing: $\mathbf{H}^H\mathbf{y} = [2.02, 1.94]^\mathsf{T}$ , so $\mathbf{H}^H\mathbf{y}/\sigma^2 = [4.04, 3.88]^\mathsf{T}$ .

Cavity at $i=1$

With flat cavity ( $\lambda_1 = 0, \gamma_1 = 0$ ), the cavity marginal is just the Gaussian marginal from the posterior with the BPSK constraint ignored: $\mu_1^{\backslash} = \mu_1, v_1^{\backslash} = \Sigma_{11}$ .

Moment match against BPSK prior

Tilted density: $\propto \mathcal{N}(x_1; \mu_1^\backslash, v_1^\backslash)\cdot \frac{1}{2}(\delta(x_1-1)+\delta(x_1+1))$ . Compute $\hat{\mu}_1 = \tanh(\mu_1^\backslash/v_1^\backslash)$ , and $\hat{v}_1 = 1 - \hat{\mu}_1^2$ .

Update cavity parameters

Set $\lambda_1^{\text{new}} = 1/\hat{v}_1 - 1/v_1^\backslash$ and $\gamma_1^{\text{new}} = \hat{\mu}_1/\hat{v}_1 - \mu_1^\backslash/v_1^\backslash$ . Apply damping, repeat for $i=2$ , and iterate. EP converges within 3–5 iterations to posterior means near $+1$ .

Common Mistake: Negative Precisions Break EP

Mistake:

An over-confident cavity (very small $v^\backslash$ ) combined with a weakly informative factor can produce $\hat{v} > v^\backslash$ , which yields $\lambda^{\text{new}} < 0$ : the Gaussian approximation's precision is negative, meaning it is not a valid density.

Correction:

Use damping $\beta < 1$ and, if necessary, clip precisions to a small positive floor $\lambda_{\min} > 0$ . Heavy damping ( $\beta \approx 0.5$ ) always fixes divergence but slows convergence. Many production EP detectors use $\beta \in [0.6, 0.8]$ .

Common Mistake: Initializing EP Cavities Matters

Mistake:

Initializing all $\lambda_i$ to zero and $\gamma_i$ to zero can lead to a badly conditioned Gaussian pass and slow early convergence.

Correction:

A standard initialization is $\lambda_i = 1/E_s$ (prior second moment reciprocal) and $\gamma_i = 0$ . This means "start from the LMMSE solution," after which EP refines the marginals. It also guarantees $\boldsymbol{\Lambda} \succ 0$ at iteration 1.

EP as Refined Belief Propagation

EP reduces to loopy belief propagation when the exponential family is the full factorized distribution over the variables (no projection needed). It reduces to Gaussian BP (GaBP) when all factors are already Gaussian. The distinctive contribution of EP appears in the discrete-prior-Gaussian-factor case, where it produces Gaussian approximate messages that are strictly better than the ones used by GaBP applied to the relaxed continuous problem.

🔧Engineering Note

EP Detectors in Advanced Massive MIMO Receivers

EP-based detection has emerged as the favored algorithm for large-MIMO uplink with high-order constellations (64-QAM, 256-QAM) in 5G-Advanced and 6G prototypes. Unlike ML-based sphere decoding, EP has polynomial per-iteration cost and fixed runtime; unlike LMMSE, it exploits the discrete constellation prior. Real-time FPGA implementations at $N_t = 16$ , 64-QAM, and $4$ – $8$ EP iterations have been reported with throughput matching LMMSE-SIC, while attaining within $0.5$ dB of ML.

Practical Constraints

•
Iterations fixed at design time (4–8 typical)
•
Damping factor $\beta$ often frozen at $0.7$ on silicon

🎓CommIT Contribution(2023)

EP Receivers for RIS-Assisted Massive MIMO

G. Caire, Colleagues at TU Berlin — IEEE Trans. Signal Processing (preprint)

The CommIT group has applied EP-based iterative detection to reconfigurable intelligent surface (RIS) assisted MIMO uplink, where the effective channel is the sum of a direct-path term and an RIS-reflected term with phase-configurable columns. EP naturally handles the joint uncertainty in the RIS phases and the data streams, matching the performance of joint ML estimation at a tiny fraction of the cost. This line of work connects the iterative receivers of this chapter to the RIS book (Book ris).

eprismimo

Expectation Propagation

An iterative message-passing algorithm that approximates a complex posterior by a product of exponential-family factors. Each factor is updated by projecting the product of the true factor and the cavity distribution onto the tractable family via moment matching.

Cavity Distribution

The distribution obtained from a factorized approximation by removing one of its factors. In EP, the cavity represents "everything the model knows about a variable except what factor $a$ says."

Related: Expectation Propagation

Moment Matching

The procedure of choosing an exponential-family distribution whose first few moments (typically mean and variance) agree with those of a target distribution. This is the KL-projection of the target onto the family.

Iterative MIMO Detectors — Key Properties

Detector	Complexity per iter	Prior used	Typical gap to ML
LMMSE (non-iterative)	$O(N_t^3)$	Gaussian relaxation	3–6 dB (high-SNR, high-order QAM)
MMSE-SIC (soft)	$O(N_t^3)$	Gaussian soft symbol	1–2 dB
Gaussian BP	$O(N_t^2)$	Gaussian relaxation	2–4 dB
Expectation Propagation	$O(N_t^3)$	Discrete constellation prior	0.3–0.7 dB
Sphere Decoder (ML)	Exponential (worst case)	Discrete constellation	0 dB (reference)

Why This Matters: From EP to AMP and OAMP

When the MIMO channel matrix is i.i.d. Gaussian and dimensions grow, EP's expensive per-iteration matrix inversion becomes unnecessary: the Onsager-corrected AMP algorithm (Chapter 20) achieves the same state-evolution fixed point with $O(N_t N_r)$ cost. For structured channels (sparse, Kronecker, Fourier), OAMP/VAMP (Chapter 21) are the finite-dimensional generalizations. EP, AMP, and VAMP form a family of message-passing detectors that unify the iterative receiver design space.

See full treatment in Chapter 20

Expectation Propagation

When BP Is Not Enough

Definition: KL-Projection onto the Gaussian Family

Theorem: KL Minimization Equals Moment Matching

KL expansion for Gaussian $q$

Minimize in $\mu$

Minimize in $\sigma^2$

Definition: EP Messages and the Cavity Distribution

Expectation Propagation for MIMO Detection

Theorem: EP Fixed Points Are Stationary Points of the Bethe Free Energy

Free-energy formulation

Stationarity

Tree case

EP vs LMMSE vs ML: BER for Large MIMO Detection

Parameters

Example: Hand Computation: EP for 2×22 \times 22×2 BPSK

Gaussian pass

Cavity at $i=1$

Moment match against BPSK prior

Update cavity parameters

Common Mistake: Negative Precisions Break EP

Common Mistake: Initializing EP Cavities Matters

EP as Refined Belief Propagation

EP Detectors in Advanced Massive MIMO Receivers

EP Receivers for RIS-Assisted Massive MIMO

Expectation Propagation

Cavity Distribution

Moment Matching

Iterative MIMO Detectors — Key Properties

Why This Matters: From EP to AMP and OAMP

Definition:
KL-Projection onto the Gaussian Family

Definition:
EP Messages and the Cavity Distribution

Example: Hand Computation: EP for $2 \times 2$ BPSK