Inner Products and Norms

Why Inner Products Are Central to Wireless Communications

At its core, wireless communication is the art of extracting a desired signal from a noisy, fading environment. The inner product is the mathematical tool that quantifies similarity between two signals or vectors, and it appears almost everywhere in the physical layer:

  • Matched filtering. The optimal detector for a known waveform s(t)s(t) in additive white Gaussian noise computes the inner product ⟨r,s⟩=∫r(t) s(t)‾ dt\langle r, s \rangle = \int r(t)\,\overline{s(t)}\,dt, projecting the received signal onto the transmitted waveform.

  • Beamforming. A base station with ntn_t antennas steers energy toward a user by choosing a weight vector w\mathbf{w} that maximises ∣hHw∣|\mathbf{h}^H \mathbf{w}| β€” an inner product between the channel vector and the beamformer.

  • Orthogonal waveforms. OFDM, CDMA, and spatial multiplexing all rely on orthogonality (⟨x,y⟩=0\langle \mathbf{x}, \mathbf{y} \rangle = 0) to separate co-existing signals without mutual interference.

  • Projections and subspace methods. Minimum-mean-square-error (MMSE) estimation, interference cancellation, and subspace-based channel estimation each reduce to an orthogonal projection β€” the geometric consequence of the inner product.

This section builds the precise machinery: inner products, norms, the Cauchy--Schwarz inequality, orthogonal projections, and the Gram--Schmidt procedure. Every concept will reappear throughout the book.

Definition:

Inner Product on Cn\mathbb{C}^n

An inner product on Cn\mathbb{C}^n is a function βŸ¨β‹…,β‹…βŸ©:CnΓ—Cnβ†’C\langle \cdot, \cdot \rangle : \mathbb{C}^n \times \mathbb{C}^n \to \mathbb{C} satisfying, for all x,y,z∈Cn\mathbf{x}, \mathbf{y}, \mathbf{z} \in \mathbb{C}^n and all Ξ±,β∈C\alpha, \beta \in \mathbb{C}:

  1. Conjugate symmetry. ⟨x,y⟩=⟨y,xβŸ©β€Ύ\langle \mathbf{x}, \mathbf{y} \rangle = \overline{\langle \mathbf{y}, \mathbf{x} \rangle}.

  2. Linearity in the first argument. ⟨αx+βy,z⟩=α⟨x,z⟩+β⟨y,z⟩\langle \alpha \mathbf{x} + \beta \mathbf{y}, \mathbf{z} \rangle = \alpha \langle \mathbf{x}, \mathbf{z} \rangle + \beta \langle \mathbf{y}, \mathbf{z} \rangle.

  3. Positive definiteness. ⟨x,x⟩β‰₯0\langle \mathbf{x}, \mathbf{x} \rangle \geq 0, with equality if and only if x=0\mathbf{x} = \mathbf{0}.

The standard (Euclidean) inner product on Cn\mathbb{C}^n is ⟨x,yβŸ©β€…β€Š=β€…β€ŠyHxβ€…β€Š=β€…β€Šβˆ‘k=1nxk ykβ€Ύ.\langle \mathbf{x}, \mathbf{y} \rangle \;=\; \mathbf{y}^H \mathbf{x} \;=\; \sum_{k=1}^{n} x_k \,\overline{y_k}.

Convention alert. Axiom 2 makes the inner product linear in the first slot and, by Axiom 1, conjugate-linear (antilinear) in the second: ⟨x,Ξ±y⟩=Ξ±β€Ύβ€‰βŸ¨x,y⟩.\langle \mathbf{x}, \alpha \mathbf{y} \rangle = \overline{\alpha}\,\langle \mathbf{x}, \mathbf{y} \rangle. Some references (especially in mathematics) adopt the opposite convention β€” linear in the second argument. Throughout this book we follow the physics/engineering convention stated above, so the standard inner product reads yHx\mathbf{y}^H \mathbf{x}, not xHy\mathbf{x}^H \mathbf{y}. See also ⚠Which Argument Is Conjugate-Linear?.

Definition:

Norm Induced by an Inner Product

Given an inner product space (Cn,βŸ¨β‹…,β‹…βŸ©)(\mathbb{C}^n, \langle\cdot,\cdot\rangle), the induced norm (or Euclidean norm) of x\mathbf{x} is βˆ₯xβˆ₯β€…β€Š=β€…β€ŠβŸ¨x,xβŸ©β€…β€Š=β€…β€Šβˆ‘k=1n∣xk∣2 .\|\mathbf{x}\| \;=\; \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle} \;=\; \sqrt{\sum_{k=1}^{n} |x_k|^2}\,. It satisfies the three norm axioms for all x,y∈Cn\mathbf{x}, \mathbf{y} \in \mathbb{C}^n and α∈C\alpha \in \mathbb{C}:

  1. Positive definiteness. βˆ₯xβˆ₯β‰₯0\|\mathbf{x}\| \geq 0, with equality iff x=0\mathbf{x} = \mathbf{0}.
  2. Absolute homogeneity. βˆ₯Ξ±xβˆ₯=βˆ£Ξ±βˆ£β€‰βˆ₯xβˆ₯\|\alpha \mathbf{x}\| = |\alpha|\,\|\mathbf{x}\|.
  3. Triangle inequality. βˆ₯x+yβˆ₯≀βˆ₯xβˆ₯+βˆ₯yβˆ₯\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|.

The triangle inequality is a consequence of the Cauchy--Schwarz inequality (TCauchy--Schwarz Inequality). We prove this implication after establishing Cauchy--Schwarz.

Definition:

β„“p\ell_p Norms

For 1≀p<∞1 \leq p < \infty the β„“p\ell_p norm of x∈Cn\mathbf{x} \in \mathbb{C}^n is βˆ₯xβˆ₯pβ€…β€Š=β€…β€Š(βˆ‘k=1n∣xk∣p)1/p.\|\mathbf{x}\|_p \;=\; \Bigl(\sum_{k=1}^{n} |x_k|^p\Bigr)^{1/p}. The limiting case pβ†’βˆžp \to \infty gives the β„“βˆž\ell_\infty (max) norm: βˆ₯xβˆ₯βˆžβ€…β€Š=β€…β€Šmax⁑1≀k≀n∣xk∣.\|\mathbf{x}\|_\infty \;=\; \max_{1 \leq k \leq n} |x_k|.

Important special cases:

pp Name Formula
11 Manhattan / taxicab norm βˆ‘k∣xk∣\sum_k \lvert x_k \rvert
22 Euclidean norm (βˆ‘k∣xk∣2)1/2\bigl(\sum_k \lvert x_k \rvert^2\bigr)^{1/2}
∞\infty Chebyshev / max norm max⁑k∣xk∣\max_k \lvert x_k \rvert

For 0<p<10 < p < 1 the expression above is still well-defined but is not a norm (it violates the triangle inequality); it is sometimes called a quasi-norm and appears in sparse-signal-recovery literature.

Only p=2p = 2 yields a norm that is induced by an inner product. The β„“1\ell_1 norm is heavily used in compressed sensing and LASSO-type regularisation for sparse channel estimation, while β„“βˆž\ell_\infty appears in per-antenna power constraints for massive MIMO precoding.

β„“p\ell_p Norm Unit Ball in R2\mathbb{R}^2

Explore how the unit ball {x:βˆ₯xβˆ₯p≀1}\{\mathbf{x} : \|\mathbf{x}\|_p \leq 1\} changes shape as pp varies from 0.5 to ∞\infty.

Parameters
2

Norm order

The β„“p\ell_p Unit Ball as pp Varies

Watch how the shape of the unit ball {x:βˆ₯xβˆ₯p≀1}\{\mathbf{x} : \|\mathbf{x}\|_p \leq 1\} morphs as pp sweeps from 0.30.3 to 10β‰ˆβˆž10 \approx \infty and back.
For p<1p < 1 the "ball" is non-convex (star-shaped); at p=1p = 1 it is a diamond; at p=2p = 2 the familiar circle; and as pβ†’βˆžp \to \infty it approaches the square [βˆ’1,1]2[-1,1]^2.

The β„“p\ell_p Unit Ball in R3\mathbb{R}^3

The same sweep from p=0.3p = 0.3 to 10β‰ˆβˆž10 \approx \infty, now in three dimensions with a rotating camera. At p=2p = 2 the ball is a sphere; at p=1p = 1 an octahedron; as pβ†’βˆžp \to \infty a cube.
The 3D unit ball transitions from a spiky star (p<1p < 1) through the octahedron (p=1p = 1), sphere (p=2p = 2), and finally the cube (pβ†’βˆžp \to \infty).

Definition:

Orthogonality

Two vectors x,y∈Cn\mathbf{x}, \mathbf{y} \in \mathbb{C}^n are orthogonal, written xβŠ₯y\mathbf{x} \perp \mathbf{y}, if ⟨x,y⟩=0.\langle \mathbf{x}, \mathbf{y} \rangle = 0. A set {v1,…,vk}\{\mathbf{v}_1, \ldots, \mathbf{v}_k\} is called an orthogonal set if ⟨vi,vj⟩=0\langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0 for all iβ‰ ji \neq j, and an orthonormal set if additionally βˆ₯viβˆ₯=1\|\mathbf{v}_i\| = 1 for every ii. Compactly: ⟨vi,vj⟩=Ξ΄ij(KroneckerΒ delta).\langle \mathbf{v}_i, \mathbf{v}_j \rangle = \delta_{ij} \qquad \text{(Kronecker delta)}.

An orthogonal set of nonzero vectors is automatically linearly independent. Proof: suppose βˆ‘iΞ±ivi=0\sum_i \alpha_i \mathbf{v}_i = \mathbf{0}. Taking the inner product with vk\mathbf{v}_k gives Ξ±kβˆ₯vkβˆ₯2=0\alpha_k \|\mathbf{v}_k\|^2 = 0, so Ξ±k=0\alpha_k = 0 for every kk.

Definition:

Orthogonal Complement

Let S\mathcal{S} be a subspace of Cn\mathbb{C}^n. The orthogonal complement of S\mathcal{S} is SβŠ₯β€…β€Š=β€…β€Š{x∈Cn:⟨x,s⟩=0β€…β€ŠforΒ allΒ s∈S}.\mathcal{S}^\perp \;=\; \bigl\{\mathbf{x} \in \mathbb{C}^n : \langle \mathbf{x}, \mathbf{s} \rangle = 0 \;\text{for all } \mathbf{s} \in \mathcal{S}\bigr\}. SβŠ₯\mathcal{S}^\perp is itself a subspace, and Cn=SβŠ•SβŠ₯\mathbb{C}^n = \mathcal{S} \oplus \mathcal{S}^\perp (direct sum), meaning every x∈Cn\mathbf{x} \in \mathbb{C}^n can be written uniquely as x=xS+xSβŠ₯\mathbf{x} = \mathbf{x}_{\mathcal{S}} + \mathbf{x}_{\mathcal{S}^\perp} with xS∈S\mathbf{x}_{\mathcal{S}} \in \mathcal{S} and xSβŠ₯∈SβŠ₯\mathbf{x}_{\mathcal{S}^\perp} \in \mathcal{S}^\perp.

Moreover, dim⁑(S)+dim⁑(SβŠ₯)=n\dim(\mathcal{S}) + \dim(\mathcal{S}^\perp) = n and (SβŠ₯)βŠ₯=S(\mathcal{S}^\perp)^\perp = \mathcal{S}.

In MIMO communications the column space R(H)\mathcal{R}(\mathbf{H}) of the channel matrix carries the signal, and its orthogonal complement R(H)βŠ₯=N(HH)\mathcal{R}(\mathbf{H})^\perp = \mathcal{N}(\mathbf{H}^H) is the "interference-free" subspace used by zero-forcing receivers.

Theorem: Cauchy--Schwarz Inequality

For any x,y∈Cn\mathbf{x}, \mathbf{y} \in \mathbb{C}^n: ∣⟨x,y⟩∣2β€…β€Šβ‰€β€…β€Šβˆ₯xβˆ₯2 βˆ₯yβˆ₯2,|\langle \mathbf{x}, \mathbf{y} \rangle|^2 \;\leq\; \|\mathbf{x}\|^2 \,\|\mathbf{y}\|^2, with equality if and only if x\mathbf{x} and y\mathbf{y} are linearly dependent, i.e. x=Ξ±y\mathbf{x} = \alpha \mathbf{y} for some α∈C\alpha \in \mathbb{C}, or y=0\mathbf{y} = \mathbf{0}.

The inner product measures the "component" of x\mathbf{x} along y\mathbf{y}. Cauchy--Schwarz says this component can never exceed the full length of x\mathbf{x} β€” you cannot project more of a vector onto a direction than the vector itself has. Equality holds exactly when x\mathbf{x} already lies entirely along y\mathbf{y} (or one of them is zero).

In signal-processing language: the output of a correlator ∣⟨r,s⟩∣|\langle \mathbf{r}, \mathbf{s} \rangle| is bounded by the energies βˆ₯rβˆ₯\|\mathbf{r}\| and βˆ₯sβˆ₯\|\mathbf{s}\|, and the bound is achieved when the received signal is a scaled copy of the template β€” the matched-filter condition.

Theorem: Triangle Inequality for the Euclidean Norm

For any x,y∈Cn\mathbf{x}, \mathbf{y} \in \mathbb{C}^n, βˆ₯x+yβˆ₯β€…β€Šβ‰€β€…β€Šβˆ₯xβˆ₯+βˆ₯yβˆ₯.\|\mathbf{x} + \mathbf{y}\| \;\leq\; \|\mathbf{x}\| + \|\mathbf{y}\|.

Theorem: Orthogonal Projection Theorem

Let S\mathcal{S} be a closed subspace of Cn\mathbb{C}^n (every subspace of a finite-dimensional space is closed). For any x∈Cn\mathbf{x} \in \mathbb{C}^n, there exists a unique x^∈S\hat{\mathbf{x}} \in \mathcal{S} that minimises the distance from x\mathbf{x} to S\mathcal{S}: x^β€…β€Š=β€…β€Šarg⁑min⁑s∈Sβˆ₯xβˆ’sβˆ₯.\hat{\mathbf{x}} \;=\; \arg\min_{\mathbf{s} \in \mathcal{S}} \|\mathbf{x} - \mathbf{s}\|. This minimiser is characterised by the orthogonality condition xβˆ’x^β€…β€ŠβŠ₯β€…β€ŠS⟺⟨xβˆ’x^,β€…β€Šs⟩=0βˆ€β€‰s∈S.\mathbf{x} - \hat{\mathbf{x}} \;\perp\; \mathcal{S} \qquad\Longleftrightarrow\qquad \langle \mathbf{x} - \hat{\mathbf{x}},\; \mathbf{s} \rangle = 0 \quad \forall\, \mathbf{s} \in \mathcal{S}. If {u1,…,uk}\{\mathbf{u}_1, \ldots, \mathbf{u}_k\} is an orthonormal basis for S\mathcal{S}, the projection is given explicitly by x^β€…β€Š=β€…β€Šβˆ‘i=1k⟨x,uiβŸ©β€‰uiβ€…β€Š=β€…β€ŠUUHx,\hat{\mathbf{x}} \;=\; \sum_{i=1}^{k} \langle \mathbf{x}, \mathbf{u}_i \rangle\,\mathbf{u}_i \;=\; \mathbf{U}\mathbf{U}^H \mathbf{x}, where U=[u1β€…β€Šβ‹―β€…β€Šuk]∈CnΓ—k\mathbf{U} = [\mathbf{u}_1 \;\cdots\; \mathbf{u}_k] \in \mathbb{C}^{n \times k}.

Theorem: Pythagorean Theorem

If xβŠ₯y\mathbf{x} \perp \mathbf{y} in Cn\mathbb{C}^n, then βˆ₯x+yβˆ₯2=βˆ₯xβˆ₯2+βˆ₯yβˆ₯2.\|\mathbf{x} + \mathbf{y}\|^2 = \|\mathbf{x}\|^2 + \|\mathbf{y}\|^2. More generally, for mutually orthogonal vectors x1,…,xm\mathbf{x}_1, \ldots, \mathbf{x}_m: βˆ₯βˆ‘i=1mxiβˆ₯2=βˆ‘i=1mβˆ₯xiβˆ₯2.\Bigl\|\sum_{i=1}^{m} \mathbf{x}_i\Bigr\|^2 = \sum_{i=1}^{m} \|\mathbf{x}_i\|^2.

Classical Gram--Schmidt Orthogonalization

Complexity: O(nk2)O(nk^2)
Input: Linearly independent vectors
v1,…,vk∈Cn\mathbf{v}_1, \ldots, \mathbf{v}_k \in \mathbb{C}^n.
Output: Orthonormal vectors
u1,…,uk\mathbf{u}_1, \ldots, \mathbf{u}_k spanning the same subspace.
1. u1←v1/βˆ₯v1βˆ₯\mathbf{u}_1 \leftarrow \mathbf{v}_1 / \|\mathbf{v}_1\|
2. for i=2,…,ki = 2, \ldots, k do
3. wi←viβˆ’βˆ‘m=1iβˆ’1⟨vi,umβŸ©β€‰um\qquad \mathbf{w}_i \leftarrow \mathbf{v}_i - \displaystyle\sum_{m=1}^{i-1} \langle \mathbf{v}_i, \mathbf{u}_m \rangle\, \mathbf{u}_m
\qquad (subtract projections onto all previous directions)
4. ui←wi/βˆ₯wiβˆ₯\qquad \mathbf{u}_i \leftarrow \mathbf{w}_i / \|\mathbf{w}_i\|
\qquad (normalise)
5. end for
6. return u1,…,uk\mathbf{u}_1, \ldots, \mathbf{u}_k

Numerical stability. Classical Gram--Schmidt (CGS) suffers from catastrophic cancellation in floating-point arithmetic: the computed vectors lose orthogonality as rounding errors accumulate. Modified Gram--Schmidt (MGS) is algebraically identical but numerically superior. In MGS, the projections in line 3 are subtracted one at a time, updating wi\mathbf{w}_i after each subtraction rather than computing all projections from the original vi\mathbf{v}_i. For production code (e.g.
QR factorisation in MATLAB/NumPy), Householder reflections or Givens rotations are preferred.

,

Gram--Schmidt Orthogonalization Step by Step

Watch the Gram--Schmidt process build an orthonormal basis from two vectors: normalize the first, project the second, subtract the projection, normalize. The right angle at the end confirms orthogonality.
The key insight: each new basis vector is obtained by removing the components along all previously computed basis vectors, then normalizing.

Example: Gram--Schmidt on Three Vectors in C3\mathbb{C}^3

Apply the Gram--Schmidt procedure to the vectors v1=(110),v2=(101),v3=(011)\mathbf{v}_1 = \begin{pmatrix}1\\1\\0\end{pmatrix},\qquad \mathbf{v}_2 = \begin{pmatrix}1\\0\\1\end{pmatrix},\qquad \mathbf{v}_3 = \begin{pmatrix}0\\1\\1\end{pmatrix} to obtain an orthonormal basis for C3\mathbb{C}^3.

Historical Note: The Many Names of the Cauchy--Schwarz Inequality

19th century

Few results in mathematics have been independently discovered β€” and named β€” as often as this one.

Augustin-Louis Cauchy (1821) proved the discrete inequality for real sequences in his Cours d'analyse. Viktor Bunyakovsky (1859) extended it to integrals, leading some authors (particularly in the Russian tradition) to call it the Cauchy--Bunyakovsky inequality. Hermann Amandus Schwarz (1885) independently proved the integral version with full rigour.

The inequality is therefore variously known as:

  • Cauchy--Schwarz (most common in Western engineering literature),
  • Cauchy--Bunyakovsky--Schwarz (CBS) (common in mathematical analysis),
  • Schwarz inequality (in some functional-analysis texts).

The proof technique we presented β€” subtracting the projection and exploiting non-negativity of the squared norm β€” is essentially the one Schwarz used, and it generalises verbatim to arbitrary inner product spaces, including L2L^2 spaces of square-integrable functions fundamental to signal processing.

Common Mistake: Which Argument Is Conjugate-Linear?

Mistake:

Writing ⟨x,Ξ±y⟩=α⟨x,y⟩\langle \mathbf{x}, \alpha\mathbf{y} \rangle = \alpha \langle \mathbf{x}, \mathbf{y} \rangle β€” treating the inner product as linear in both arguments. This leads to sign and phase errors in every derivation that involves complex scalars.

Correction:

Under our convention (linear in the first argument, following the notation table in NNotation for This Chapter): ⟨x,Ξ±y⟩=Ξ±β€Ύβ€‰βŸ¨x,y⟩(conjugate-linearΒ inΒ theΒ secondΒ argument).\langle \mathbf{x}, \alpha\mathbf{y} \rangle = \overline{\alpha}\,\langle \mathbf{x}, \mathbf{y} \rangle \qquad \text{(conjugate-linear in the second argument)}. Always check which convention a reference uses before borrowing a formula. In particular, the projection formula becomes x^=⟨x,u⟩⟨u,uβŸ©β€‰u\hat{\mathbf{x}} = \frac{\langle \mathbf{x}, \mathbf{u}\rangle} {\langle \mathbf{u}, \mathbf{u}\rangle}\,\mathbf{u} (numerator: the vector being projected goes in the first slot).

A quick sanity check: ⟨x,x⟩\langle \mathbf{x}, \mathbf{x} \rangle must be a real non-negative number. If your calculation yields a complex value, you have mixed up the convention.

Common Mistake: Classical Gram--Schmidt Loses Orthogonality

Mistake:

Implementing Gram--Schmidt exactly as written in AClassical Gram--Schmidt Orthogonalization in floating-point arithmetic and trusting that the output vectors are orthogonal to machine precision.

Correction:

Classical Gram--Schmidt (CGS) is numerically unstable: rounding errors cause the computed ui\mathbf{u}_i to drift away from orthogonality, often severely when the input vectors are nearly dependent.

Modified Gram--Schmidt (MGS) reorders the computation: instead of projecting vi\mathbf{v}_i onto all previous um\mathbf{u}_m simultaneously, MGS subtracts each projection sequentially, updating the working vector after each subtraction. This yields the same result in exact arithmetic but reduces error propagation in finite precision.

For high-reliability implementations (QR decomposition, MIMO detection), prefer Householder reflections or library routines (numpy.linalg.qr, MATLAB qr), which are backward stable.

⚠️Engineering Note

Orthogonalization in Production: QR Factorization

In production signal processing code, never implement Gram-Schmidt manually. Use QR factorization from a numerical linear algebra library:

  • Python: Q, R = numpy.linalg.qr(A) (Householder-based, backward stable)
  • MATLAB: [Q, R] = qr(A) (same algorithm)
  • C/Fortran: LAPACK dgeqrf / zgeqrf

Householder QR costs 2mn2βˆ’2n3/32mn^2 - 2n^3/3 flops for an mΓ—nm \times n matrix β€” the same order as Modified Gram-Schmidt but with guaranteed backward stability. In MIMO detection, the QR decomposition of the channel matrix H=QR\mathbf{H} = \mathbf{Q}\mathbf{R} enables efficient successive interference cancellation (SIC) by back-substitution on the upper-triangular R\mathbf{R}.

Practical Constraints
  • β€’

    Classical Gram-Schmidt: loss of orthogonality proportional to ΞΊ(A)\kappa(\mathbf{A})

  • β€’

    Modified Gram-Schmidt: loss proportional to ΞΊ(A)β‹…Ξ΅mach\kappa(\mathbf{A}) \cdot \varepsilon_{\text{mach}}

  • β€’

    Householder QR: backward stable β€” orthogonality loss bounded by O(nβ‹…Ξ΅mach)O(n \cdot \varepsilon_{\text{mach}}) regardless of ΞΊ\kappa

Why This Matters: From Inner Products to Beamforming

When a base station with ntn_t antennas transmits signal ss using beamforming vector w∈Cnt\mathbf{w} \in \mathbb{C}^{n_t}, the received signal at a single-antenna user is yβ€…β€Š=β€…β€ŠhHw sβ€…β€Š+β€…β€Šn,y \;=\; \mathbf{h}^H \mathbf{w}\,s \;+\; n, where h∈Cnt\mathbf{h} \in \mathbb{C}^{n_t} is the channel vector and nn is additive noise.

The effective channel gain is ∣hHw∣|\mathbf{h}^H \mathbf{w}|, which is the modulus of an inner product. Maximising this gain subject to the unit power constraint βˆ₯wβˆ₯=1\|\mathbf{w}\| = 1 is a direct application of Cauchy--Schwarz: ∣hHwβˆ£β€…β€Šβ‰€β€…β€Šβˆ₯hβˆ₯ βˆ₯wβˆ₯β€…β€Š=β€…β€Šβˆ₯hβˆ₯,|\mathbf{h}^H \mathbf{w}| \;\leq\; \|\mathbf{h}\|\,\|\mathbf{w}\| \;=\; \|\mathbf{h}\|, with equality when w=ejΞΈh/βˆ₯hβˆ₯\mathbf{w} = e^{j\theta}\mathbf{h}/\|\mathbf{h}\| for any phase ΞΈ\theta. The optimal beamformer is therefore the matched filter (or maximum-ratio transmission) beamformer w⋆=hβˆ₯hβˆ₯.\mathbf{w}^\star = \frac{\mathbf{h}}{\|\mathbf{h}\|}. This result generalises to multi-user settings (zero-forcing beamforming uses projections) and to receive combining (maximum-ratio combining).

See full treatment in Precoding with CSIT

Why This Matters: Orthogonal Projection as MMSE Estimation

The orthogonal projection theorem (TOrthogonal Projection Theorem) is the geometric backbone of linear minimum-mean-square-error (LMMSE) estimation.

Suppose we observe y=Hx+n\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n} and wish to estimate x\mathbf{x} by a linear function x^=Wy\hat{\mathbf{x}} = \mathbf{W}\mathbf{y}. Minimising the MSE E[βˆ₯xβˆ’x^βˆ₯2]E[\|\mathbf{x} - \hat{\mathbf{x}}\|^2] is equivalent to requiring the estimation error xβˆ’x^\mathbf{x} - \hat{\mathbf{x}} to be orthogonal (in the stochastic inner product ⟨a,b⟩=E[bHa]\langle \mathbf{a}, \mathbf{b}\rangle = E[\mathbf{b}^H \mathbf{a}]) to the observation space spanned by y\mathbf{y} β€” precisely the orthogonality principle. The solution is WMMSE=RxyRyyβˆ’1,\mathbf{W}_{\mathrm{MMSE}} = \mathbf{R}_{xy}\mathbf{R}_{yy}^{-1}, where Rxy=E[xyH]\mathbf{R}_{xy} = E[\mathbf{x}\mathbf{y}^H] and Ryy=E[yyH]\mathbf{R}_{yy} = E[\mathbf{y}\mathbf{y}^H].

Every LMMSE channel estimator and equaliser in this book traces back to this projection.

See full treatment in Estimation Theory Fundamentals

Quick Check

Let x=(1j)\mathbf{x} = \begin{pmatrix}1\\j\end{pmatrix} and y=(22j)\mathbf{y} = \begin{pmatrix}2\\2j\end{pmatrix} in C2\mathbb{C}^2. Does the Cauchy--Schwarz inequality hold with equality?

Yes, because x=12y\mathbf{x} = \tfrac{1}{2}\mathbf{y}.

No, strict inequality holds.

It depends on the choice of inner product convention.

Cannot be determined without computing.

Quick Check

Under the convention used in this book (linear in the first argument), what is ⟨jx,x⟩\langle j\mathbf{x}, \mathbf{x} \rangle for a nonzero x∈Cn\mathbf{x} \in \mathbb{C}^n?

jβˆ₯xβˆ₯2j\|\mathbf{x}\|^2

βˆ’jβˆ₯xβˆ₯2-j\|\mathbf{x}\|^2

βˆ₯xβˆ₯2\|\mathbf{x}\|^2

βˆ’βˆ₯xβˆ₯2-\|\mathbf{x}\|^2

Inner product

A function βŸ¨β‹…,β‹…βŸ©:VΓ—Vβ†’C\langle\cdot,\cdot\rangle: V \times V \to \mathbb{C} satisfying conjugate symmetry, linearity in (at least) one argument, and positive definiteness. Equips a vector space with geometric notions of length, angle, and orthogonality.

Related: Norm, Inner Product on Cn\mathbb{C}^n, Orthogonality

Norm

A function βˆ₯β‹…βˆ₯:Vβ†’[0,∞)\|\cdot\|: V \to [0, \infty) satisfying positive definiteness, absolute homogeneity, and the triangle inequality. The Euclidean norm βˆ₯xβˆ₯=⟨x,x⟩\|\mathbf{x}\| = \sqrt{\langle\mathbf{x}, \mathbf{x}\rangle} is the norm induced by the standard inner product.

Related: Inner product, Norm Induced by an Inner Product, β„“p\ell_p Norms

Orthogonal projection

The unique closest point in a subspace S\mathcal{S} to a given vector x\mathbf{x}. Characterised by the condition that the residual xβˆ’x^\mathbf{x} - \hat{\mathbf{x}} is orthogonal to every vector in S\mathcal{S}. Computed via x^=UUHx\hat{\mathbf{x}} = \mathbf{U}\mathbf{U}^H\mathbf{x} when U\mathbf{U} has orthonormal columns spanning S\mathcal{S}.

Related: Orthogonal Projection Theorem, Inner product, Orthogonal Projection as MMSE Estimation

Key Takeaway

  • The inner product ⟨x,y⟩=yHx\langle \mathbf{x}, \mathbf{y}\rangle = \mathbf{y}^H \mathbf{x} is the fundamental tool for measuring similarity, length, and angle in Cn\mathbb{C}^n. Our convention: linear in the first argument, conjugate-linear in the second.

  • Cauchy--Schwarz bounds the inner product by the product of norms and is the workhorse inequality of linear algebra. Its equality condition (x∝y\mathbf{x} \propto \mathbf{y}) directly gives the matched-filter / maximum-ratio-transmission beamformer.

  • Orthogonal projection onto a subspace is the unique best approximation in the least-squares sense. The orthogonality condition (residual βŠ₯\perp subspace) underlies MMSE estimation, interference nulling, and subspace signal processing.

  • Gram--Schmidt converts any linearly independent set into an orthonormal basis. Use the modified variant or Householder reflections in numerical implementations.

  • The β„“p\ell_p norm family (p=1,2,∞p = 1, 2, \infty) appears throughout communications: β„“2\ell_2 for energy, β„“1\ell_1 for sparsity promotion, and β„“βˆž\ell_\infty for per-antenna power constraints.