Expectation Propagation

When BP Is Not Enough

Belief propagation is tractable when the messages have closed forms β€” discrete distributions, Gaussians, or mixtures with small support. When variables are continuous and the factors are non-conjugate (for example, a discrete-constellation prior combined with Gaussian likelihood factors, as in MIMO detection with 64-QAM), sum-product messages become intractable mixtures that grow with each iteration.

Expectation Propagation (EP), introduced by Minka in 2001, fixes this by forcing every message to lie in a tractable exponential family (usually Gaussian) via moment matching. EP is to Gaussian BP what the extended Kalman filter is to nonlinear filtering: it projects non-Gaussian quantities onto their closest Gaussian by matching the first two moments. In MIMO detection, EP delivers near-ML performance at LMMSE complexity and has become the state-of-the-art iterative detector for large-MIMO with high-order constellations.

Definition:

KL-Projection onto the Gaussian Family

Let F={q(β‹…;ΞΈ):θ∈Θ}\mathcal{F} = \{ q(\cdot; \boldsymbol{\theta}) : \boldsymbol{\theta} \in \Theta\} be the exponential family of Gaussian distributions (natural parameters mean and precision). For any probability density pp, define ProjF[p]β€…β€Šβ‰œβ€…β€Šarg⁑min⁑q∈FDKL ⁣(p βˆ₯ q).\text{Proj}_{\mathcal{F}}[p] \;\triangleq\; \arg\min_{q \in \mathcal{F}} D_{\text{KL}}\!\big(p \,\|\, q\big). The minimizer is the Gaussian whose mean and variance match those of pp: ΞΌq=Ep[x],Οƒq2=Varp[x].\mu_q = \mathbb{E}_p[x], \qquad \sigma_q^2 = \text{Var}_p[x]. This is the unique moment-matching Gaussian approximation.

The KL direction matters: DKL(pβˆ₯q)D_{\text{KL}}(p\|q) (moment matching) gives a Gaussian that covers the support of pp, while the reverse DKL(qβˆ₯p)D_{\text{KL}}(q\|p) (variational/mean-field) tends to under-cover. EP uses the moment-matching direction.

Theorem: KL Minimization Equals Moment Matching

Let pp be any density with finite first two moments and qq belong to the Gaussian family with mean ΞΌ\mu and variance Οƒ2\sigma^2. Then DKL(pβˆ₯q)D_{\text{KL}}(p\|q) is minimized uniquely at ΞΌ=Ep[x]\mu = \mathbb{E}_p[x], Οƒ2=Varp[x]\sigma^2 = \text{Var}_p[x].

For any exponential family, minimizing DKL(pβˆ₯qΞΈ)D_{\text{KL}}(p\|q_\theta) reduces to matching sufficient-statistic expectations β€” this is the standard exponential-family duality.

Definition:

EP Messages and the Cavity Distribution

Suppose the posterior factorizes as p(x∣y)∝∏afa(xβˆ‚a)p(\mathbf{x}|\mathbf{y}) \propto \prod_a f_a(\mathbf{x}_{\partial a}) and we approximate it by q(x)=∏af~a(xβˆ‚a)q(\mathbf{x}) = \prod_a \tilde{f}_a(\mathbf{x}_{\partial a}) with f~a\tilde{f}_a chosen from a tractable (Gaussian) family. For each factor aa define the cavity q\a(x)=q(x)/f~a(xβˆ‚a),q^{\backslash a}(\mathbf{x}) = q(\mathbf{x}) / \tilde{f}_a(\mathbf{x}_{\partial a}), i.e., the current approximation with factor aa removed. The EP update for f~a\tilde{f}_a is: f~anew(xβˆ‚a)=ProjF ⁣[q\a(x) fa(xβˆ‚a)]q\a(x).\tilde{f}_a^{\text{new}}(\mathbf{x}_{\partial a}) = \frac{\text{Proj}_{\mathcal{F}}\!\big[q^{\backslash a}(\mathbf{x}) \, f_a(\mathbf{x}_{\partial a})\big]}{q^{\backslash a}(\mathbf{x})}. In words: replace the approximate factor by the true factor, project the product back to the Gaussian family by moment matching, then divide out the cavity to get the new approximation factor.

Division of Gaussians is well-defined in natural-parameter (precision-and-precision-weighted-mean) form; the result is a Gaussian with possibly negative precision. EP becomes unstable when precisions turn negative β€” see the pitfalls below.

Expectation Propagation for MIMO Detection

Complexity: O(Tmax⁑Nt3)O(T_{\max} N_t^3) dominated by the matrix inversion per iteration
Input: y,H,Οƒ2\mathbf{y}, \mathbf{H}, \sigma^2; constellation prior psp_s;
damping β∈(0,1]\beta \in (0,1]; iterations Tmax⁑T_{\max}.
Output: Posterior means x^i\hat{x}_i and variances viv_i.
1. Initialize cavity parameters Ξ³i=0,Ξ»i=Ο΅\gamma_i = 0, \lambda_i = \epsilon (small)
for all ii.
2. for t=1,…,Tmax⁑t = 1, \ldots, T_{\max} do
3. \quad Form Ξ£=(HHH/Οƒ2+Ξ›)βˆ’1\boldsymbol{\Sigma} = (\mathbf{H}^H\mathbf{H}/\sigma^2 + \boldsymbol{\Lambda})^{-1}
and ΞΌ=Ξ£(HHy/Οƒ2+Ξ³)\boldsymbol{\mu} = \boldsymbol{\Sigma}(\mathbf{H}^H\mathbf{y}/\sigma^2 + \boldsymbol{\gamma}),
where Ξ›=diag(Ξ»i)\boldsymbol{\Lambda} = \text{diag}(\lambda_i).
4. \quad for each stream ii do
5. \qquad Cavity: vi\=(1/Ξ£iiβˆ’Ξ»i)βˆ’1v^{\backslash}_i = (1/\Sigma_{ii} - \lambda_i)^{-1},
ΞΌi\=vi\(ΞΌi/Ξ£iiβˆ’Ξ³i)\mu^{\backslash}_i = v^{\backslash}_i(\mu_i/\Sigma_{ii} - \gamma_i).
6. \qquad Tilted mean/variance: moment-match
N(ΞΌi\,vi\)β‹…ps(xi)\mathcal{N}(\mu_i^\backslash, v_i^\backslash) \cdot p_s(x_i)
to get ΞΌ^i,v^i\hat{\mu}_i, \hat{v}_i.
7. \qquad Update: Ξ»inew=1/v^iβˆ’1/vi\\lambda_i^{\text{new}} = 1/\hat{v}_i - 1/v^{\backslash}_i,
Ξ³inew=ΞΌ^i/v^iβˆ’ΞΌi\/vi\\gamma_i^{\text{new}} = \hat{\mu}_i/\hat{v}_i - \mu^{\backslash}_i/v^{\backslash}_i.
8. \qquad Damp: Ξ»i←βλinew+(1βˆ’Ξ²)Ξ»i\lambda_i \leftarrow \beta\lambda_i^{\text{new}} + (1-\beta)\lambda_i,
and similarly for Ξ³i\gamma_i.
9. \quad end for
10. end for
11. return (ΞΌi,Ξ£ii)(\mu_i, \Sigma_{ii}) as approximate posterior of xix_i.

The moment matching in step 6 is a closed-form sum over constellation points for discrete priors β€” the same Gaussian-weighted constellation average that appears in every symbol demapper. Damping β∈(0.5,0.9)\beta \in (0.5, 0.9) is standard for MIMO detection.

Theorem: EP Fixed Points Are Stationary Points of the Bethe Free Energy

Any fixed point of the EP iteration is a stationary point of a specific "expectation-constrained" free-energy functional, FEP[q]=βˆ’log⁑Z+βˆ‘aKL(qaβˆ₯faq\a)F_{\text{EP}}[q] = -\log Z + \sum_a \text{KL}(q_a \| f_a q^{\backslash a}) subject to moment-matching constraints between the factor approximations and the global approximation. Under these moment constraints FEPF_{\text{EP}} reduces (for tree-structured graphs) to the exact log-partition. For loopy graphs, EP fixed points coincide with stationary points of the Bethe-like free energy whose first variations are zero.

Just as loopy BP is a fixed-point method for the Bethe free energy, EP is a fixed-point method for an exponential-family generalization. This is why EP produces meaningful marginals even on loopy graphs β€” it is not just heuristic projection.

EP vs LMMSE vs ML: BER for Large MIMO Detection

Compare the BER of EP-based detection to LMMSE detection, MMSE-SIC, and the ML lower bound for an NtΓ—NtN_t \times N_t MIMO system with 64-QAM. Observe the EP gain at high SNR where LMMSE suffers from ill-conditioning.

Parameters
18
10
0.7

Example: Hand Computation: EP for 2Γ—22 \times 2 BPSK

Let H=[10.80.81]\mathbf{H} = \begin{bmatrix}1 & 0.8 \\ 0.8 & 1\end{bmatrix}, Οƒ2=0.5\sigma^2 = 0.5, and BPSK symbols xi∈{+1,βˆ’1}x_i \in \{+1, -1\} with uniform prior. Given y=[1.3,0.9]T\mathbf{y} = [1.3, 0.9]^\mathsf{T}, perform one EP update starting from flat cavities (Ξ³i=0,Ξ»i=0\gamma_i = 0, \lambda_i = 0).

Common Mistake: Negative Precisions Break EP

Mistake:

An over-confident cavity (very small v\v^\backslash) combined with a weakly informative factor can produce v^>v\\hat{v} > v^\backslash, which yields Ξ»new<0\lambda^{\text{new}} < 0: the Gaussian approximation's precision is negative, meaning it is not a valid density.

Correction:

Use damping Ξ²<1\beta < 1 and, if necessary, clip precisions to a small positive floor Ξ»min⁑>0\lambda_{\min} > 0. Heavy damping (Ξ²β‰ˆ0.5\beta \approx 0.5) always fixes divergence but slows convergence. Many production EP detectors use β∈[0.6,0.8]\beta \in [0.6, 0.8].

Common Mistake: Initializing EP Cavities Matters

Mistake:

Initializing all Ξ»i\lambda_i to zero and Ξ³i\gamma_i to zero can lead to a badly conditioned Gaussian pass and slow early convergence.

Correction:

A standard initialization is Ξ»i=1/Es\lambda_i = 1/E_s (prior second moment reciprocal) and Ξ³i=0\gamma_i = 0. This means "start from the LMMSE solution," after which EP refines the marginals. It also guarantees Λ≻0\boldsymbol{\Lambda} \succ 0 at iteration 1.

EP as Refined Belief Propagation

EP reduces to loopy belief propagation when the exponential family is the full factorized distribution over the variables (no projection needed). It reduces to Gaussian BP (GaBP) when all factors are already Gaussian. The distinctive contribution of EP appears in the discrete-prior-Gaussian-factor case, where it produces Gaussian approximate messages that are strictly better than the ones used by GaBP applied to the relaxed continuous problem.

πŸ”§Engineering Note

EP Detectors in Advanced Massive MIMO Receivers

EP-based detection has emerged as the favored algorithm for large-MIMO uplink with high-order constellations (64-QAM, 256-QAM) in 5G-Advanced and 6G prototypes. Unlike ML-based sphere decoding, EP has polynomial per-iteration cost and fixed runtime; unlike LMMSE, it exploits the discrete constellation prior. Real-time FPGA implementations at Nt=16N_t = 16, 64-QAM, and 44–88 EP iterations have been reported with throughput matching LMMSE-SIC, while attaining within 0.50.5 dB of ML.

Practical Constraints
  • β€’

    Iterations fixed at design time (4–8 typical)

  • β€’

    Damping factor Ξ²\beta often frozen at 0.70.7 on silicon

πŸŽ“CommIT Contribution(2023)

EP Receivers for RIS-Assisted Massive MIMO

G. Caire, Colleagues at TU Berlin β€” IEEE Trans. Signal Processing (preprint)

The CommIT group has applied EP-based iterative detection to reconfigurable intelligent surface (RIS) assisted MIMO uplink, where the effective channel is the sum of a direct-path term and an RIS-reflected term with phase-configurable columns. EP naturally handles the joint uncertainty in the RIS phases and the data streams, matching the performance of joint ML estimation at a tiny fraction of the cost. This line of work connects the iterative receivers of this chapter to the RIS book (Book ris).

eprismimo

Expectation Propagation

An iterative message-passing algorithm that approximates a complex posterior by a product of exponential-family factors. Each factor is updated by projecting the product of the true factor and the cavity distribution onto the tractable family via moment matching.

Cavity Distribution

The distribution obtained from a factorized approximation by removing one of its factors. In EP, the cavity represents "everything the model knows about a variable except what factor aa says."

Related: Expectation Propagation

Moment Matching

The procedure of choosing an exponential-family distribution whose first few moments (typically mean and variance) agree with those of a target distribution. This is the KL-projection of the target onto the family.

Iterative MIMO Detectors β€” Key Properties

DetectorComplexity per iterPrior usedTypical gap to ML
LMMSE (non-iterative)O(Nt3)O(N_t^3)Gaussian relaxation3–6 dB (high-SNR, high-order QAM)
MMSE-SIC (soft)O(Nt3)O(N_t^3)Gaussian soft symbol1–2 dB
Gaussian BPO(Nt2)O(N_t^2)Gaussian relaxation2–4 dB
Expectation PropagationO(Nt3)O(N_t^3)Discrete constellation prior0.3–0.7 dB
Sphere Decoder (ML)Exponential (worst case)Discrete constellation0 dB (reference)

Why This Matters: From EP to AMP and OAMP

When the MIMO channel matrix is i.i.d. Gaussian and dimensions grow, EP's expensive per-iteration matrix inversion becomes unnecessary: the Onsager-corrected AMP algorithm (Chapter 20) achieves the same state-evolution fixed point with O(NtNr)O(N_t N_r) cost. For structured channels (sparse, Kronecker, Fourier), OAMP/VAMP (Chapter 21) are the finite-dimensional generalizations. EP, AMP, and VAMP form a family of message-passing detectors that unify the iterative receiver design space.

See full treatment in Chapter 20