Machine Learning and Autoencoder-Based Code Design

Can Neural Networks Design Codes?

The point is that every code, mapper, and receiver in this book was hand-designed for an idealised channel — AWGN, Rayleigh, or MIMO block fading. Real channels are messy: nonlinear HPAs, phase noise, hardware impairments, and inter-cell interference distort both signal and noise. O'Shea and Hoydis (2017) asked: can we LEARN better codes directly from the channel, end-to-end, using neural networks? The autoencoder framework produces constellations and receivers that beat hand-design on nonlinear channels — but leaves wide-open theoretical questions about generalisation.

Definition:
Autoencoder for End-to-End Physical Layer

An end-to-end communication autoencoder consists of three components trained jointly:

Encoder (neural network): maps $k$ -bit messages $s \in \{0,1\}^k$ to $n$ -dimensional complex transmit vectors $\mathbf{x} \in \mathbb{C}^n$ with average-power constraint.
Channel layer (non-trainable, differentiable): applies the channel $\mathbf{y} = f(\mathbf{x}, \mathbf{w})$ . Common choices: AWGN, Rayleigh, memoryless nonlinear HPA, phase-noise model.
Decoder (neural network): maps $\mathbf{y}$ to a soft estimate $\hat{s}$ . Training minimises cross-entropy loss end-to-end via gradient descent. The encoder and decoder jointly adapt to the channel.

Theorem: Autoencoder Learns a Valid Constellation

For the AWGN channel with $k = 4$ bits and $n = 2$ complex channel uses, an autoencoder with sufficient capacity trained with the cross-entropy loss converges to a constellation equivalent (up to unitary rotation) to 16-QAM with Gray labelling, achieving the same BER as hand-designed 16-QAM in the Shannon random-coding regime.

Proof

Loss minimisation is MI maximisation

The cross-entropy loss $\mathcal{L} = -\mathbb{E}[\log p_\theta(s \mid \mathbf{y})]$ is equivalent to maximising the mutual information $I(S; \mathbf{Y})$ — Shannon's capacity objective, subject to the encoder's power constraint.

Capacity-achieving distribution

For AWGN, the MI-maximising INPUT distribution (conditional on $k$ equi-likely messages) is 16-QAM up to unitary rotation. The encoder converges to this distribution in the limit of infinite training data.

Labeling by differentiation

The cross-entropy loss penalises probability mass on INCORRECT bit labels — it drives the encoder to use Gray labelling (or equivalent) to minimise bit errors. $\blacksquare$

Autoencoder-Learned Constellations

Compare baseline 16-QAM with constellations learned by a shallow autoencoder on four channels: AWGN (trivial), nonlinear HPA (compressed outer points), phase noise (radial/angular mismatch), Rayleigh (non-uniform density). Arrows show the learned displacement of each QAM point.

Parameters

Channel type

Example: Autoencoder vs Hand-Designed 16-QAM on Nonlinear HPA

Consider a Rapp model HPA with smoothness 2 operating at 2 dB input back-off (IBO). Hand-designed 16-QAM has a typical BER = $10^{-3}$ at 14 dB. An autoencoder trained on this channel model learns a constellation with compressed outer points. What BER gain can it achieve?

Solution

Hand-designed baseline

Uniform 16-QAM experiences clipping of the outer four corner points at 2 dB IBO, raising effective noise by ~1.5 dB. BER at 14 dB $\approx 10^{-3}$ .

Autoencoder adaptation

The autoencoder learns to pull corner points inward (trading minimum distance for reduced HPA distortion). O'Shea-Hoydis report $\sim 0.5-1$ dB gain at matched BER — equivalent to 20% shorter blocklength or 25% smaller $E_b/N_0$ .

Caveat

The gain is channel-specific: the learned constellation is near- optimal for the TRAINED HPA model but may be suboptimal for a different (real) HPA with different nonlinearity parameters.

The Generalisation Problem

The central open question: autoencoder codes are trained for a SPECIFIC channel model. When deployed, they encounter channels that are slightly different (e.g., different HPA IBO, different phase- noise PSD). There is currently NO guarantee that they maintain their BER advantage — the gain can even invert on out-of- distribution channels. This is the same "distribution-shift" problem that haunts deep learning in general. Theoretical bounds on autoencoder robustness are an active research area (PAC-Bayes bounds, stability theory, etc.) with no definitive answer yet.

Historical Note: A Decade of Physical-Layer Deep Learning

Key milestones in neural physical layer:

2016: Dörner-Cammerer-Hoydis-Brink — first autoencoder for binary-input AWGN; learns Hamming-like codes.
2017: O'Shea-Hoydis — the "introduction to deep learning for physical layer" paper that defines the field.
2018-2020: extensions to MIMO detection (DetNet), channel estimation (ChannelNet), OFDM equalisation, and optical fibre.
2021-2025: end-to-end learned codes for 5G NR short-block scenarios; adversarial training for robustness. Deployment reality as of 2026: research prototypes in Ericsson, Nokia, Huawei, and Mitsubishi labs; no production 3GPP standard uses learned physical-layer codes yet. The gap between theory and deployment remains large.

🔧Engineering Note

Industry Perspective on Learned Codes

Industrial adoption of learned codes faces three barriers:

Certification: safety-critical communications (URLLC, V2X) demand analytical performance guarantees that neural models cannot provide.
Interoperability: standards-based interworking requires deterministic, specification-based behaviour — not a black-box NN.
Generalisation: BER guarantees hold only for the training channel distribution. Real deployments span orders of magnitude in channel conditions. Current research focus: HYBRID approaches where NNs replace only the receiver (leaving the encoder standards-compliant), with safety-net fallback to classical detection.

Common Mistake: Autoencoder BER Does Not Generalise

Mistake:

"Our autoencoder beats 16-QAM on the trained HPA by 1 dB. Ship it!"

Correction:

The autoencoder was optimised for a specific HPA model. On a real HPA with 10% different IBO or a different smoothness parameter, the 1 dB advantage can shrink to 0 or even INVERT. Robust deployment requires training over an ensemble of HPA parameter variations — which brings the autoencoder closer to a hand-designed code in the MIXED-channel setting. Domain randomisation + adversarial training is the current best-practice mitigation, with open theoretical questions.

Key Takeaway

Autoencoder-based end-to-end learning can produce codes that beat hand-design on specific nonlinear channels. The open questions are THEORETICAL (generalisation guarantees) and PRACTICAL (certification for safety-critical systems). Learned codes are a promising frontier, but not yet a replacement for the classical theory of Chapters 1-21.

Coded Modulation for URLLC and Short Packets Coded Modulation for the Optical Fibre Channel