Ferkans — Interactive Telecom Tutor

Triangularising the MIMO Channel: The MMSE-GDFE Idea

Having set up the LAST codebook in §1, the question is: how do we decode it on a MIMO fading channel? Direct maximum-likelihood decoding means searching the lattice $\Lambda_c$ for the point closest to the (filtered) received vector — the closest-lattice- point problem, which is NP-hard in general and intractable for the dimensions $n_t T$ we care about.

The V-BLAST story (Wolniansky-Foschini-Golden-Valenzuela 1998) suggests a different path. V-BLAST receives a MIMO signal and triangularises the channel using a decision-feedback structure: zero-force or MMSE-equalise the first layer, decide on it, subtract its contribution, and recurse. For Gaussian random codes, MMSE-SIC (the Tse-Viswanath analysis) achieves the sum capacity. For lattice codes we need the lattice-analog. The point is that MMSE-GDFE (Minimum-Mean-Square-Error Generalised Decision Feedback Equaliser) plays this role — it is the receiver that lets lattice codes achieve MIMO capacity, and hence the DMT.

The derivation we are about to perform is compact: augment the channel, QR-decompose the augmented matrix, strip off the MMSE bias. What results is $n_t$ parallel triangular lattice channels, each of which can be lattice-decoded layer by layer. Intuitively, what happens is that the augmentation by $\sqrt{\alpha} \mathbf{I}$ regularises the channel inverse (this is the MMSE essence) and the QR decomposition arranges the $n_t$ transmit dimensions in a cascade of one-dimensional noisy lattice channels — the same trick that makes Gaussian-MMSE-SIC work, but now keeping the lattice integer structure intact.

,

Definition:
MMSE-GDFE (Augmented-Channel Form)

Consider the vectorised MIMO channel $\mathbf{y} = \tilde{\mathbf{H}} \mathbf{x} + \mathbf{w}$ with $\tilde{\mathbf{H}} = \mathbf{I}_T \otimes \mathbf{H} \in \mathbb{C}^{n_r T \times n_t T}$ and noise variance $\sigma^2 = 1$ . Let $\alpha = 1/\text{SNR}$ .

The MMSE-GDFE receiver is defined by the following three-step procedure.

Augment the channel. Form the $(n_r + n_t) T \times n_t T$ augmented matrix $\bar{\mathbf{H}} \;=\; \begin{pmatrix} \tilde{\mathbf{H}} \\ \sqrt{\alpha}\, \mathbf{I}_{n_t T} \end{pmatrix}.$
QR-decompose $\bar{\mathbf{H}} = \mathbf{Q} \mathbf{R}$ with $\mathbf{Q} \in \mathbb{C}^{(n_r + n_t) T \times n_t T}$ having orthonormal columns and $\mathbf{R} \in \mathbb{C}^{n_t T \times n_t T}$ upper-triangular with positive diagonal entries. Partition $\mathbf{Q} = (\mathbf{Q}_1^T, \mathbf{Q}_2^T)^T$ by rows, where $\mathbf{Q}_1 \in \mathbb{C}^{n_r T \times n_t T}$ and $\mathbf{Q}_2 \in \mathbb{C}^{n_t T \times n_t T}$ . The MMSE-GDFE feed-forward filter is $\mathbf{F} = \mathbf{Q}_1^H$ .
Filter and lattice-decode. The filtered observation is $\mathbf{z} \;=\; \mathbf{F} \mathbf{y} \;=\; \mathbf{R} \mathbf{x} - \mathbf{Q}_2^H (\sqrt{\alpha} \mathbf{x}) + \mathbf{F} \mathbf{w} \;=\; \mathbf{R} \mathbf{x} + \mathbf{w}_{\text{eff}},$ where the effective noise $\mathbf{w}_{\text{eff}} = \mathbf{F} \mathbf{w} - \sqrt{\alpha} \mathbf{Q}_2^H \mathbf{x}$ has covariance $\mathbf{I}_{n_t T}$ by the orthogonality of $\mathbf{Q}$ . The decoder then performs layer-by-layer lattice decoding on the triangular system $\mathbf{z} = \mathbf{R} \mathbf{x} + \mathbf{w}_{\text{eff}}$ , starting from the last row (where $\mathbf{R}$ has a single non-zero entry) and substituting recovered symbols into earlier rows.

Three comments on this definition.

First, the augmentation by $\sqrt{\alpha} \mathbf{I}$ is the classical trick turning $\mathbf{H}^{+} = (\mathbf{H}^{H} \mathbf{H})^{-1} \mathbf{H}^{H}$ (zero-forcing pseudoinverse, which amplifies noise when $\mathbf{H}$ is ill-conditioned) into the MMSE receiver $(\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I})^{-1} \mathbf{H}^{H}$ (which does not). It is the same trick used in ridge regression.

Second, the "effective noise" $\mathbf{w}_{\text{eff}}$ is not white Gaussian — it has a non-Gaussian term $\sqrt{\alpha} \mathbf{Q}_2^H \mathbf{x}$ coming from the augmentation. This is the MMSE bias. It is harmless for lattice decoders (which care about the covariance, not the distribution), but it is why naive "pretend the effective channel is AWGN" analysis would be incorrect for Gaussian random codes. For lattice codes the Erez-Zamir crypto-lemma argument (§1) lets us treat it effectively as white Gaussian.

Third, the triangular structure of $\mathbf{R}$ means we can decode one layer at a time with a one-dimensional lattice decoder — specifically, with a scalar decoder for $\Lambda_c^{(i)}$ , the one-dimensional slice of $\Lambda_c$ at row $i$ . For structured LAST (§4), each layer is a single coordinate of the inner lattice; the per-layer decoder is just nearest-neighbour in one dimension.

,

Theorem: MMSE-GDFE Preserves Mutual Information

For any input distribution $p(\mathbf{x})$ on the MIMO channel $\mathbf{y} = \tilde{\mathbf{H}} \mathbf{x} + \mathbf{w}$ , the mutual information between $\mathbf{x}$ and the MMSE-GDFE filtered output $\mathbf{z} = \mathbf{R} \mathbf{x} + \mathbf{w}_{\text{eff}}$ equals the mutual information between $\mathbf{x}$ and the raw output $\mathbf{y}$ : $I(\mathbf{x}; \mathbf{z}) \;=\; I(\mathbf{x}; \mathbf{y}).$ Equivalently, the MMSE-GDFE is a sufficient statistic for decoding $\mathbf{x}$ from $\mathbf{y}$ .

The point is that MMSE-GDFE is an invertible linear transformation of $\mathbf{y}$ — it loses no information. The QR decomposition just rearranges the bases; the augmentation adds a deterministic linear function of $\mathbf{x}$ (not dependent on $\mathbf{y}$ ) which also does not affect mutual information. Thus the MMSE-GDFE is a lossless receiver — the same statement that is true of MMSE-SIC in the Gaussian-code setting.

Show Hint

Show $\mathbf{F}$ has full column rank (comes from the upper-block of an orthonormal $\mathbf{Q}$ that is augmented with the $\sqrt{\alpha}\mathbf{I}$ block).

Full-rank linear transformations of the observation are sufficient statistics.

The augmentation term contributes deterministic knowledge of $\mathbf{x}$ plus independent structure.

Proof

Step 1 — MMSE-GDFE as a linear transformation

The filtered output $\mathbf{z} = \mathbf{F} \mathbf{y}$ is a linear transformation of $\mathbf{y}$ . The filter $\mathbf{F} = \mathbf{Q}_1^H$ has full column rank $n_t T$ (since $\mathbf{Q}_1^H \mathbf{Q}_1 + \mathbf{Q}_2^H \mathbf{Q}_2 = \mathbf{I}_{n_t T}$ and $\mathbf{Q}_2^H \mathbf{Q}_2 = \alpha (\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I})^{-1}$ is positive definite).

Step 2 — Linear sufficient statistic

Any full-rank linear transformation of the observation is a sufficient statistic: mutual information is invariant under invertible mappings of the observation. Hence $I(\mathbf{x}; \mathbf{F} \mathbf{y}) = I(\mathbf{x}; \mathbf{y})$ .

Step 3 — Triangularisation does not reduce information

Rewriting $\mathbf{z} = \mathbf{R} \mathbf{x} + \mathbf{w}_{\text{eff}}$ is just a change of variables; the mutual information is invariant. Hence $I(\mathbf{x}; \mathbf{z}) = I(\mathbf{x}; \mathbf{y})$ . $\blacksquare$

,

Theorem: MMSE-GDFE Triangularises the MIMO Channel

Let $\tilde{\mathbf{H}} = \mathbf{I}_T \otimes \mathbf{H}$ be the vectorised MIMO channel of block length $T$ , and let $\mathbf{R}$ be the upper-triangular factor from the QR decomposition of the augmented matrix $[\tilde{\mathbf{H}}^T, \sqrt{\alpha}\, \mathbf{I}]^T$ with $\alpha = 1/\text{SNR}$ . Then the diagonal entries of $\mathbf{R}$ satisfy $\prod_{i=1}^{n_t T} R_{ii}^2 \;=\; \det(\tilde{\mathbf{H}}^H \tilde{\mathbf{H}} + \alpha \mathbf{I}_{n_t T}) \;=\; \prod_{j=1}^{T} \det(\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t}),$ and the aggregate effective SNR across the $n_t T$ triangular layers equals the total MIMO mutual information: $\sum_{i=1}^{n_t T} \log_2(R_{ii}^2) \;=\; T \log_2 \det\bigl( \mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t}\bigr).$

This is the heart of the MMSE-GDFE. The sum of per-layer log-SNRs equals the full MIMO mutual information (with uniform power allocation) — no information is lost by the triangularisation. Each layer individually has noisy decoding, but collectively they preserve the entire MIMO capacity. This is the same conservation law that drives MMSE-SIC in the Gaussian-random-code setting, now lifted to lattices.

Show Hint

Use $|\det \bar{\mathbf{H}}|^2 = \det(\bar{\mathbf{H}}^H \bar{\mathbf{H}})$ and the fact that $\bar{\mathbf{H}}^H \bar{\mathbf{H}} = \tilde{\mathbf{H}}^H \tilde{\mathbf{H}} + \alpha \mathbf{I}$ .

For QR decomposition, $|\det \bar{\mathbf{H}}| = \prod_{i} R_{ii}$ .

Use the Kronecker property $\det(\mathbf{A} \otimes \mathbf{B}) = (\det \mathbf{A})^m (\det \mathbf{B})^n$ .

Proof

Step 1 — Gram matrix of the augmented channel

$\bar{\mathbf{H}}^H \bar{\mathbf{H}} = \tilde{\mathbf{H}}^H \tilde{\mathbf{H}} + \alpha \mathbf{I}_{n_t T}$ . With $\tilde{\mathbf{H}} = \mathbf{I}_T \otimes \mathbf{H}$ we have $\tilde{\mathbf{H}}^H \tilde{\mathbf{H}} = \mathbf{I}_T \otimes (\mathbf{H}^{H} \mathbf{H})$ . Hence $\bar{\mathbf{H}}^H \bar{\mathbf{H}} = \mathbf{I}_T \otimes (\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t})$ .

Step 2 — Determinant factorises

By the Kronecker determinant identity $\det(\mathbf{A} \otimes \mathbf{B}) = (\det \mathbf{A})^{\dim \mathbf{B}} \cdot (\det \mathbf{B})^{\dim \mathbf{A}}$ , we get $\det(\bar{\mathbf{H}}^H \bar{\mathbf{H}}) = \bigl[\det(\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t})\bigr]^T$ .

Step 3 — QR gives diagonal product

$\bar{\mathbf{H}} = \mathbf{Q} \mathbf{R}$ with orthonormal $\mathbf{Q}$ , so $\det(\bar{\mathbf{H}}^H \bar{\mathbf{H}}) = |\det \mathbf{R}|^2 = \prod_{i=1}^{n_t T} R_{ii}^2$ . Combining with Step 2: $\prod_i R_{ii}^2 = \prod_{j=1}^T \det(\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t})$ .

Step 4 — Sum-log equals mutual information

Take $\log_2$ : $\sum_i \log_2 R_{ii}^2 = T \log_2 \det(\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_{n_t})$ . With $\alpha = 1/\text{SNR}$ , this is exactly $T \log_2 \det(\mathbf{I}_{n_t} + \text{SNR} \mathbf{H}^{H} \mathbf{H})$ up to an offset of $n_t T \log_2 \alpha^{-1}$ that corresponds to the MMSE-bias term — harmless in DMT analysis. The per-layer SNRs $R_{ii}^2 / \sigma_{\text{eff}}^2$ collectively reproduce the MIMO mutual information. $\blacksquare$

,

MMSE-GDFE + Layer-by-Layer Lattice Decoder

Complexity: Feed-forward filter application:

O((n_r T)(n_t T))

. QR decomposition of the augmented matrix:

O((n_t T)^2 (n_r + n_t) T)

. Backsubstitution lattice decoding:

O(n_t T \cdot K_\Lambda)

where

K_\Lambda

is the kissing number / per-layer candidate count. Aggregate average complexity:

O((n_t T)^3)

for the linear algebra plus the per-layer cost. Sphere decoding (for full ML) can raise the per-layer cost exponentially in low-SNR regimes but matches polynomial complexity at high SNR.

Input: received block Y in C^{n_r x T}, channel matrix H in C^{n_r x n_t},

lattice pair Lambda_c ⊇ Lambda_s, dither d, SNR = 1/alpha.

Output: estimated information vector u_hat in Z^{n_t T}.

1. Vectorise: y <- vec(Y)

2. Build augmented channel:

H_aug <- [ (I_T ⊗ H) ; sqrt(alpha) * I_{n_t T} ]

3. QR decompose: [Q, R] <- QR(H_aug)

F <- Q_1^H # upper (n_r T) x (n_t T) block of Q^H

4. Apply feed-forward filter:

z <- F * y

5. Undo dither: z <- z - F * (I_T ⊗ H) * d

6. Layer-by-layer lattice decode (backsubstitution):

for i = n_t T down to 1:

# all x_j with j > i are already decoded

z_i_tilde <- z_i - sum_{j > i} R_{ij} * x_hat_j

# nearest-neighbour lattice-decode at this layer

x_hat_i <- argmin_{x_i in Lambda_c^(i)} | z_i_tilde / R_{ii} - x_i |^2

end for

7. Strip shaping: u_hat <- G^{-1} ( (x_hat + d) mod Lambda_s )

8. Return u_hat.

Complexity notes (lines 6 dominates):

- Per-layer decoding: O(M) for scalar QAM, O(K(Lambda)) for structured

LAST with inner lattice Lambda (e.g., K(E_8) = 240).

- Total: O(n_t T * |Lambda_c^(i)|) if using nearest-neighbour at each

layer. For ML sphere decoding, worst case O(M^{n_t T}).

The MMSE-GDFE algorithm is the LAST decoder, and its polynomial complexity $O((n_t T)^3)$ at high SNR is precisely what makes LAST codes practical where full ML on a CDA codebook would be exponentially complex. This polynomial scaling is also what distinguishes LAST codes from CDA codes in the 5G-era discussion: CDA+ML is $O(M^{n_t^2})$ (intractable for $n_t \ge 5$ ), while LAST+MMSE-GDFE is polynomial (scalable to larger MIMO).

,

DMT with and without MMSE-GDFE

Compares the diversity-multiplexing tradeoff curves achieved by three receivers on an $n_t \times n_r$ i.i.d. Rayleigh block-fading MIMO channel with a LAST codebook: (i) naive zero-forcing lattice decoding (which fails to achieve full DMT), (ii) MMSE-GDFE lattice decoding (which achieves the full Zheng-Tse curve), (iii) the Zheng-Tse upper bound. The gap between ZF and MMSE-GDFE — the MMSE dividend — is the whole point: without the augmentation by $\sqrt{\alpha} \mathbf{I}$ , the diversity order is strictly smaller and the tradeoff is strictly below $d^*(r)$ .

Parameters

n_t

2

n_r

2

MMSE-GDFE Pipeline for LAST Decoding

An animated walk-through of the MMSE-GDFE receiver. Received observation

\mathbf{y}

is filtered through the feed-forward matrix

\mathbf{F}

; the augmented-channel QR decomposition produces the upper-triangular

\mathbf{R}

; layers are then decoded in reverse order, with each decision feeding back into earlier rows. At the end, the lattice decoder outputs

\hat{\mathbf{x}}

, which the dither unwrap and modulo-shaping step convert back into the information index

\hat{\mathbf{u}}

.

MMSE-GDFE pipeline:

\mathbf{y} \to \mathbf{F}\mathbf{y} \to

QR

(\bar{\mathbf{H}}) \to

layer-by-layer lattice decode

\to \hat{\mathbf{x}}

. The triangular structure of

\mathbf{R}

is what makes lattice decoding tractable.

Example: Computing the MMSE Coefficient $\alpha$ for a 2x2 Channel at SNR = 10 dB

Consider a $2 \times 2$ MIMO channel with SNR $= 10$ dB $= 10$ (in linear scale), block length $T = 2$ , and channel matrix (a random realisation) $\mathbf{H} = \begin{pmatrix} 0.8 + 0.3 j & 0.2 - 0.5 j \\ -0.4 + 0.6 j & 0.7 + 0.1 j \end{pmatrix}.$ Compute: (a) the MMSE coefficient $\alpha$ ; (b) the product $\prod_{i} R_{ii}^2$ of the triangular diagonal squares of the MMSE-GDFE; (c) the sum-rate $\sum_i \log_2 R_{ii}^2$ and compare it to the MIMO capacity $T \log_2 \det(\mathbf{I}_{n_t} + \text{SNR} \mathbf{H}^{H} \mathbf{H})$ .

Solution

Part (a): MMSE coefficient

$\alpha = 1/\text{SNR} = 1/10 = 0.1$ .

Part (b): Product of diagonal squares

Compute $\mathbf{H}^{H} \mathbf{H}$ : $\mathbf{H}^{H} \mathbf{H} \approx \begin{pmatrix} 1.25 & -0.11 + 0.43 j \\ -0.11 - 0.43 j & 0.79 \end{pmatrix}$ (to two decimal places). Then $\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I}_2 \approx \begin{pmatrix} 1.35 & -0.11 + 0.43 j \\ -0.11 - 0.43 j & 0.89 \end{pmatrix}$ . Its determinant is $1.35 \cdot 0.89 - (|0.11|^2 + |0.43|^2) \approx 1.20 - 0.20 = 1.00$ . Hence (using Kronecker) $\prod_{i=1}^{n_t T} R_{ii}^2 = 1.00^T = 1.00^2 = 1.00$ .

Part (c): Sum-rate comparison

$\sum_i \log_2 R_{ii}^2 = \log_2 1.00 = 0$ in the MMSE convention. With the MMSE offset absorbed, the effective sum-rate is $T \log_2 \det(\mathbf{I}_{n_t} + \text{SNR} \mathbf{H}^{H} \mathbf{H}) = 2 \log_2 \det(\mathbf{I}_2 + 10 \mathbf{H}^{H} \mathbf{H}) = 2 \log_2 \det\bigl(\begin{smallmatrix} 13.5 & -1.1 + 4.3 j \\ -1.1 - 4.3 j & 8.9 \end{smallmatrix}\bigr) = 2 \log_2(13.5 \cdot 8.9 - (1.21 + 18.49)) \approx 2 \log_2(100.4) \approx 2 \cdot 6.65 = 13.3$ bits per block, or $13.3/T = 6.65$ bits/ch.use. This is the MIMO-capacity target that MMSE-GDFE preserves. $\blacksquare$

Historical Note: V-BLAST (1998) — The Precursor of MMSE-SIC and MMSE-GDFE

1996-1998

The MMSE-GDFE idea has a clear antecedent in the V-BLAST (Vertical Bell Labs Layered Space-Time) receiver of Wolniansky, Foschini, Golden, and Valenzuela (1998), which was itself inspired by Foschini's 1996 diagonal-BLAST paper. V-BLAST decodes layers sequentially: zero-force (or MMSE-equalise) the strongest layer, decide on it, subtract its contribution, and recurse on the remaining $n_t - 1$ layers. For Gaussian random codes, MMSE-SIC (the MMSE-filtered version of V-BLAST) achieves the MIMO sum capacity — a fact proved by Tse-Viswanath (2005 textbook, §8).

El Gamal, Caire, and Damen (2004) recognised that the lattice analog of V-BLAST is not literal V-BLAST (which assumes a Gaussian codebook) but rather MMSE-GDFE: the QR decomposition of the augmented matrix, applied globally rather than iteratively. The triangularisation is the same idea; the key difference is that MMSE-GDFE is non-iterative (one linear filter, one QR, one backsubstitution) while V-BLAST iterates a detection-subtraction cycle. For lattice codes the non-iterative form is natural because lattice decoding makes hard decisions per-layer without a soft feedback that V-BLAST's SIC would require.

This historical evolution — from 1996 diagonal-BLAST through 1998 V-BLAST to 2004 MMSE-GDFE — traces the maturation of MIMO layered receivers from ad-hoc engineering to the principled DMT-optimal receivers of modern textbooks.

,

Historical Note: Erez-Zamir (2004) — Lattice Codes Achieve AWGN Capacity

2004

In the same year as the LAST paper, Uri Erez and Ram Zamir established the lattice-AWGN counterpart of Shannon's AWGN capacity theorem: nested lattice codes with MMSE scaling and common random dithering achieve the AWGN capacity $\tfrac12 \log_2 (1 + \text{SNR})$ . Their proof uses Minkowski-Hlawka averaging (Ch. 15) over random lattices plus a crypto-lemma argument: modulo a fine Voronoi region, the dithered codeword is uniform — so averaging error probability over the dither is tractable.

El Gamal, Caire, and Damen recognised that Erez-Zamir's proof architecture — random lattice, MMSE scaling, dithering — is the AWGN half of the LAST argument, and that V-BLAST's MMSE-SIC triangularisation is the MIMO half. Gluing the two halves together gives the LAST theorem: Erez-Zamir's lattice machinery applied to the MMSE-GDFE-triangularised MIMO channel. This composition is why the LAST proof is compact: both pieces were already established in the 2004 literature, and the LAST paper's contribution is to recognise that they compose into a DMT-optimal MIMO coding theorem.

Common Mistake: MMSE-GDFE Is Not Plain MMSE — The Feedback Structure Is Essential

Mistake:

A reader newly introduced to MMSE-GDFE might conflate it with the plain MMSE linear receiver $\hat{\mathbf{x}}_{\text{MMSE}} = (\mathbf{H}^{H} \mathbf{H} + \alpha \mathbf{I})^{-1} \mathbf{H}^{H} \mathbf{y}$ and conclude that LAST codes can be decoded with any off-the-shelf linear MMSE equaliser.

Correction:

MMSE-GDFE is not plain MMSE. The critical extra step is the QR decomposition followed by backsubstitution — without the triangularisation, one does not get the per-layer lattice channels that are essential for DMT optimality. Plain MMSE linear decoding of a lattice code gives only diversity $n_r - n_t + 1$ (zero-forcing level), not the full $n_r n_t$ diversity at $r = 0$ . The feedback structure — substituting recovered symbols into earlier rows — is what converts the MIMO channel into $n_t$ independent triangular layers and recovers the diversity. In practice, a "plain MMSE" receiver is $O(n_t^3)$ but loses the DMT; MMSE-GDFE is the same $O(n_t^3)$ plus a triangular backsubstitution and keeps the DMT. The extra cost is trivial, the DMT gain is huge.

Key Takeaway

The MMSE-GDFE is the lattice-code analog of MMSE-SIC for Gaussian codes: QR-decompose the augmented matrix $[\mathbf{H}^{T}, \sqrt{\alpha}\mathbf{I}]^T$ , filter by $\mathbf{Q}_1^H$ , and lattice-decode the triangular system layer by layer. This transformation preserves mutual information (hence loses no capacity) and produces $n_t$ parallel lattice-AWGN channels whose aggregate effective SNR equals the full MIMO SNR. It is the receiver that lets lattice codes achieve the DMT — the engine of the El Gamal-Caire-Damen 2004 theorem of §3.

Quick Check

What is the role of the $\sqrt{\alpha}\mathbf{I}$ block in the augmented channel matrix?

It scales the received signal to match the transmit constellation energy.

It regularises the MMSE inverse, preventing noise amplification when $\mathbf{H}$ is ill-conditioned.

It adds a decoy channel to confuse eavesdroppers.

It forces $\mathbf{R}$ to be unitary so that QR decomposition is unique.