Trace, Determinant, and Matrix Inequalities

Why Trace and Determinant Are Everywhere in Capacity Expressions

Every capacity formula in multi-antenna communications rests on two matrix functionals: the trace and the determinant.

The MIMO capacity formula. For an nrΓ—ntn_r \times n_t channel H\mathbf{H} with input covariance Rx\mathbf{R}_x and noise variance Οƒ2\sigma^2, the mutual information is I=log⁑det⁑ ⁣(Inr+1Οƒ2HRxHH).I = \log\det\!\left(\mathbf{I}_{n_r} + \frac{1}{\sigma^2}\mathbf{H}\mathbf{R}_x\mathbf{H}^H\right). The capacity-achieving Rx\mathbf{R}_x is found by maximizing this log⁑det⁑\log\det subject to a trace constraint: tr⁑(Rx)≀P.\operatorname{tr}(\mathbf{R}_x) \leq P. The trace constrains total transmit power; the log-determinant measures information rate. Understanding the interplay between the two β€” through trace identities, Hadamard's inequality, Fischer's inequality, and the concavity of log⁑det⁑\log\det β€” is essential for deriving capacity bounds, water-filling solutions, and beamforming strategies.

Where these tools appear:

  • Trace identities simplify covariance manipulations: tr⁑(HHHRx)=tr⁑(RxHHH)\operatorname{tr}(\mathbf{H}^H\mathbf{H}\mathbf{R}_x) = \operatorname{tr}(\mathbf{R}_x\mathbf{H}^H\mathbf{H}).
  • Hadamard inequality bounds the determinant by the product of diagonal entries, yielding capacity upper bounds when the channel decomposes into independent sub-channels.
  • Fischer inequality provides tighter bounds when the channel has block structure (e.g., multi-user MIMO).
  • Log-det concavity guarantees that the MIMO capacity optimization is a convex program (maximizing a concave function over a convex set), ensuring that water-filling is globally optimal.

Definition:

Trace of a Matrix

The trace of a square matrix A∈CnΓ—n\mathbf{A} \in \mathbb{C}^{n \times n} is the sum of its diagonal entries: tr⁑(A)=βˆ‘i=1naii.\operatorname{tr}(\mathbf{A}) = \sum_{i=1}^{n} a_{ii}.

The trace satisfies the following fundamental properties:

  1. Linearity. For any Ξ±,β∈C\alpha, \beta \in \mathbb{C} and A,B∈CnΓ—n\mathbf{A}, \mathbf{B} \in \mathbb{C}^{n \times n}: tr⁑(Ξ±A+Ξ²B)=Ξ±tr⁑(A)+Ξ²tr⁑(B).\operatorname{tr}(\alpha \mathbf{A} + \beta \mathbf{B}) = \alpha \operatorname{tr}(\mathbf{A}) + \beta \operatorname{tr}(\mathbf{B}).

  2. Cyclic property. For A∈CmΓ—n\mathbf{A} \in \mathbb{C}^{m \times n} and B∈CnΓ—m\mathbf{B} \in \mathbb{C}^{n \times m}: tr⁑(AB)=tr⁑(BA).\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{B}\mathbf{A}). More generally, the trace is invariant under cyclic permutations of a matrix product (see TCyclic Property of the Trace).

  3. Relation to eigenvalues. If Ξ»1,…,Ξ»n\lambda_1, \ldots, \lambda_n are the eigenvalues of A\mathbf{A} (counted with algebraic multiplicity), then tr⁑(A)=βˆ‘i=1nΞ»i.\operatorname{tr}(\mathbf{A}) = \sum_{i=1}^{n} \lambda_i.

  4. Similarity invariance. For any invertible P\mathbf{P}: tr⁑(Pβˆ’1AP)=tr⁑(A).\operatorname{tr}(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}) = \operatorname{tr}(\mathbf{A}). This follows immediately from the cyclic property.

  5. Transpose and conjugate transpose. tr⁑(AT)=tr⁑(A)\operatorname{tr}(\mathbf{A}^T) = \operatorname{tr}(\mathbf{A}) and tr⁑(AH)=tr⁑(A)β€Ύ\operatorname{tr}(\mathbf{A}^H) = \overline{\operatorname{tr}(\mathbf{A})}.

In telecommunications, tr⁑(Rx)\operatorname{tr}(\mathbf{R}_x) represents total transmit power when Rx=E[xxH]\mathbf{R}_x = \mathbb{E}[\mathbf{x}\mathbf{x}^H] is the input covariance matrix. The power constraint is therefore a trace constraint.

Definition:

Determinant of a Matrix

The determinant of a square matrix A∈CnΓ—n\mathbf{A} \in \mathbb{C}^{n \times n}, denoted det⁑(A)\det(\mathbf{A}) or ∣A∣|\mathbf{A}|, is the unique multilinear, alternating function on the columns of A\mathbf{A} satisfying det⁑(I)=1\det(\mathbf{I}) = 1. Equivalently, via Leibniz's formula: det⁑(A)=βˆ‘ΟƒβˆˆSnsgn⁑(Οƒ)∏i=1nai,Οƒ(i),\det(\mathbf{A}) = \sum_{\sigma \in S_n} \operatorname{sgn}(\sigma) \prod_{i=1}^{n} a_{i,\sigma(i)}, where SnS_n is the symmetric group on {1,…,n}\{1, \ldots, n\}.

Fundamental properties:

  1. Product of eigenvalues. If Ξ»1,…,Ξ»n\lambda_1, \ldots, \lambda_n are the eigenvalues of A\mathbf{A}, then det⁑(A)=∏i=1nΞ»i.\det(\mathbf{A}) = \prod_{i=1}^{n} \lambda_i.

  2. Multiplicativity. For A,B∈CnΓ—n\mathbf{A}, \mathbf{B} \in \mathbb{C}^{n \times n}: det⁑(AB)=det⁑(A)det⁑(B).\det(\mathbf{A}\mathbf{B}) = \det(\mathbf{A})\det(\mathbf{B}).

  3. Transpose and conjugate transpose. det⁑(AT)=det⁑(A)\det(\mathbf{A}^T) = \det(\mathbf{A}) and det⁑(AH)=det⁑(A)β€Ύ\det(\mathbf{A}^H) = \overline{\det(\mathbf{A})}. In particular, det⁑(A)∈R\det(\mathbf{A}) \in \mathbb{R} when A\mathbf{A} is Hermitian.

  4. Invertibility. A\mathbf{A} is invertible if and only if det⁑(A)β‰ 0\det(\mathbf{A}) \neq 0.

  5. Determinant of a block triangular matrix. If M=(AB0D)\mathbf{M} = \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{0} & \mathbf{D} \end{pmatrix}, then det⁑(M)=det⁑(A)det⁑(D)\det(\mathbf{M}) = \det(\mathbf{A})\det(\mathbf{D}).

  6. Positive definiteness. If A≻0\mathbf{A} \succ 0, then det⁑(A)>0\det(\mathbf{A}) > 0 (since all eigenvalues are strictly positive).

In capacity formulas, log⁑det⁑(β‹…)\log\det(\cdot) converts a product of eigenvalue contributions into a sum: log⁑det⁑(A)=βˆ‘ilog⁑λi\log\det(\mathbf{A}) = \sum_i \log \lambda_i. This is why the MIMO capacity decomposes into a sum of per-eigenmode rates.

Theorem: Cyclic Property of the Trace

Let A1∈Cn1Γ—n2\mathbf{A}_1 \in \mathbb{C}^{n_1 \times n_2}, A2∈Cn2Γ—n3\mathbf{A}_2 \in \mathbb{C}^{n_2 \times n_3}, …\ldots, Ak∈CnkΓ—n1\mathbf{A}_k \in \mathbb{C}^{n_k \times n_1} be matrices such that the product A1A2β‹―Ak\mathbf{A}_1 \mathbf{A}_2 \cdots \mathbf{A}_k is square. Then the trace is invariant under cyclic permutations: tr⁑(A1A2β‹―Ak)=tr⁑(A2A3β‹―AkA1)=β‹―=tr⁑(AkA1β‹―Akβˆ’1).\operatorname{tr}(\mathbf{A}_1 \mathbf{A}_2 \cdots \mathbf{A}_k) = \operatorname{tr}(\mathbf{A}_2 \mathbf{A}_3 \cdots \mathbf{A}_k \mathbf{A}_1) = \cdots = \operatorname{tr}(\mathbf{A}_k \mathbf{A}_1 \cdots \mathbf{A}_{k-1}). In particular, for three square matrices: tr⁑(ABC)=tr⁑(BCA)=tr⁑(CAB)\operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}) = \operatorname{tr}(\mathbf{B}\mathbf{C}\mathbf{A}) = \operatorname{tr}(\mathbf{C}\mathbf{A}\mathbf{B}).

Note: The trace is not invariant under arbitrary permutations. In general, tr⁑(ABC)β‰ tr⁑(ACB)\operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}) \neq \operatorname{tr}(\mathbf{A}\mathbf{C}\mathbf{B}).

A cyclic permutation moves the first matrix to the end. The trace "sees" a product of matrices as a closed loop of index contractions, and rotating the starting point of a loop does not change the loop itself.

Theorem: Hadamard's Inequality

Let A∈CnΓ—n\mathbf{A} \in \mathbb{C}^{n \times n} be positive definite (A≻0\mathbf{A} \succ 0) with diagonal entries a11,a22,…,anna_{11}, a_{22}, \ldots, a_{nn}. Then det⁑(A)β‰€βˆi=1naii,\det(\mathbf{A}) \leq \prod_{i=1}^{n} a_{ii}, with equality if and only if A\mathbf{A} is diagonal.

Equivalently, if a1,…,an∈Cn\mathbf{a}_1, \ldots, \mathbf{a}_n \in \mathbb{C}^n are the columns of A\mathbf{A}, then ∣det⁑(A)βˆ£β‰€βˆi=1nβˆ₯aiβˆ₯|\det(\mathbf{A})| \leq \prod_{i=1}^{n} \|\mathbf{a}_i\|, with equality if and only if the columns are mutually orthogonal.

The determinant measures the volume of the parallelepiped spanned by the columns. If columns are not orthogonal, the parallelepiped "collapses" partially, reducing its volume below the product of the column lengths. For a positive definite matrix, the diagonal entries are the squared norms of the "contributions" in each coordinate direction, and any off-diagonal correlation reduces the determinant.

Hadamard's Inequality: Volume of a Parallelepiped

The determinant equals the volume of the parallelepiped spanned by the columns. Hadamard's inequality says this volume is maximized when the columns are orthogonal. Watch the parallelepiped morph to a rectangle at equality.
Geometric interpretation: correlated columns waste volume. Orthogonal columns achieve the maximum det⁑(A)=∏iβˆ₯aiβˆ₯\det(\mathbf{A}) = \prod_i \|\mathbf{a}_i\|.

Theorem: Concavity of log⁑det⁑\log\det on the Positive Definite Cone

The function f(X)=log⁑det⁑(X)f(\mathbf{X}) = \log\det(\mathbf{X}) is concave on the cone of positive definite matrices S++n\mathbb{S}_{++}^n. That is, for any A,B≻0\mathbf{A}, \mathbf{B} \succ 0 and θ∈[0,1]\theta \in [0, 1]: log⁑det⁑(ΞΈA+(1βˆ’ΞΈ)B)β‰₯ΞΈlog⁑det⁑(A)+(1βˆ’ΞΈ)log⁑det⁑(B).\log\det\bigl(\theta \mathbf{A} + (1 - \theta)\mathbf{B}\bigr) \geq \theta \log\det(\mathbf{A}) + (1 - \theta) \log\det(\mathbf{B}).

Since log⁑det⁑(X)=βˆ‘ilog⁑λi(X)\log\det(\mathbf{X}) = \sum_i \log \lambda_i(\mathbf{X}), the log-det is a sum of concave (log⁑\log) functions of the eigenvalues. However, eigenvalues of a convex combination are not simply convex combinations of eigenvalues, so the proof requires more care. The key idea is to reduce to a one-dimensional argument by examining g(t)=log⁑det⁑(A+tV)g(t) = \log\det(\mathbf{A} + t\mathbf{V}) and showing it is concave in tt.

Theorem: Fischer's Inequality

Let M∈CnΓ—n\mathbf{M} \in \mathbb{C}^{n \times n} be positive definite and partitioned as M=(ABBHD),\mathbf{M} = \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^H & \mathbf{D} \end{pmatrix}, where A∈CkΓ—k\mathbf{A} \in \mathbb{C}^{k \times k} and D∈C(nβˆ’k)Γ—(nβˆ’k)\mathbf{D} \in \mathbb{C}^{(n-k) \times (n-k)} are the diagonal blocks. Then det⁑(M)≀det⁑(A)det⁑(D),\det(\mathbf{M}) \leq \det(\mathbf{A}) \det(\mathbf{D}), with equality if and only if B=0\mathbf{B} = \mathbf{0} (i.e., the two blocks are uncorrelated).

Fischer's inequality says that correlations between two groups of variables (captured by the off-diagonal block B\mathbf{B}) can only reduce the determinant, never increase it. Hadamard's inequality is the special case where each block is 1Γ—11 \times 1.

Example: Verifying Trace Identities and Hadamard's Inequality for a 3Γ—33 \times 3 Matrix

Let A=(410131012).\mathbf{A} = \begin{pmatrix} 4 & 1 & 0 \\ 1 & 3 & 1 \\ 0 & 1 & 2 \end{pmatrix}. (a) Verify the cyclic property of the trace for A2\mathbf{A}^2. (b) Compute det⁑(A)\det(\mathbf{A}) and verify Hadamard's inequality. (c) Verify Fisher's inequality for the 2Γ—22 \times 2/1Γ—11 \times 1 partition.

log⁑det⁑\log\det of a Parameterized Positive Definite Matrix

Explore how log⁑det⁑(I+αA)\log\det(\mathbf{I} + \alpha \mathbf{A}) varies with α\alpha, demonstrating the concavity of the log-det function. The plot shows log⁑det⁑(I+αA)\log\det(\mathbf{I} + \alpha \mathbf{A}) as a function of α\alpha for several matrix types. Observe that the curve always bends downward (concavity), and that ill-conditioned matrices exhibit more pronounced concavity due to the spread of their eigenvalues.

Parameters
1

Scaling parameter

4

Why This Matters: Log-Det in the MIMO Capacity Formula

The central object in MIMO information theory is the log-det capacity formula: C=max⁑Rxβͺ°0tr⁑(Rx)≀Plog⁑2det⁑ ⁣(I+1Οƒ2HRxHH).C = \max_{\substack{\mathbf{R}_x \succeq 0 \\ \operatorname{tr}(\mathbf{R}_x) \leq P}} \log_2 \det\!\left( \mathbf{I} + \frac{1}{\sigma^2} \mathbf{H} \mathbf{R}_x \mathbf{H}^H \right).

The results of this section connect directly to this formula:

  1. Log-det concavity (log⁑det⁑\log\det on the Positive Definite Cone" data-ref-type="theorem">TConcavity of log⁑det⁑\log\det on the Positive Definite Cone) guarantees that the capacity optimization is a concave maximization over the convex set {Rxβͺ°0:tr⁑(Rx)≀P}\{\mathbf{R}_x \succeq 0 : \operatorname{tr}(\mathbf{R}_x) \leq P\}. Any local maximum is therefore the global maximum, and the water-filling solution is provably optimal.

  2. Hadamard's inequality (THadamard's Inequality) yields the bound Cβ‰€βˆ‘i=1nlog⁑2(1+SNRi)C \leq \sum_{i=1}^{n} \log_2(1 + \text{SNR}_{i}), showing that capacity is maximized when the effective channel sub-channels are uncorrelated β€” achieved by diagonalizing through the SVD of H\mathbf{H}.

  3. Fischer's inequality (TFischer's Inequality) provides capacity bounds for block-structured channels, such as multi-user MIMO systems where the channel matrix has a natural block partition corresponding to different users.

  4. Trace identities (TCyclic Property of the Trace) are used repeatedly in manipulating covariance expressions: tr⁑(HHHRx)=tr⁑(RxHHH)=tr⁑(HRxHH)\operatorname{tr}(\mathbf{H}^H\mathbf{H}\mathbf{R}_x) = \operatorname{tr}(\mathbf{R}_x\mathbf{H}^H\mathbf{H}) = \operatorname{tr}(\mathbf{H}\mathbf{R}_x\mathbf{H}^H).

See full treatment in Physical MIMO Channel Modeling

Key Takeaway

The trace constrains power (tr⁑(Rx)≀P\operatorname{tr}(\mathbf{R}_x) \leq P) while the log-determinant measures information rate (C=log⁑det⁑(I+SNRβ‹…HHH)C = \log\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}\mathbf{H}^H)). The inequalities of this section β€” Hadamard, Fischer, and log-det concavity β€” are the analytical backbone of every capacity bound in MIMO communications. Hadamard bounds capacity from above by the sum of independent sub-channel rates; Fischer refines this for block-structured channels; and log-det concavity ensures that capacity optimization is a tractable convex program with a unique global optimum (water-filling).

Common Mistake: Trace of a Product Is Not the Product of Traces

Mistake:

A common error is to assume that tr⁑(AB)=tr⁑(A)tr⁑(B)\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{A})\operatorname{tr}(\mathbf{B}). This is analogous to confusing "the sum of products" with "the product of sums." Students who have internalized the linearity of trace sometimes extend it incorrectly to multiplicativity.

Correction:

The trace is linear but NOT multiplicative. In general, tr⁑(AB)β‰ tr⁑(A)tr⁑(B).\operatorname{tr}(\mathbf{A}\mathbf{B}) \neq \operatorname{tr}(\mathbf{A})\operatorname{tr}(\mathbf{B}).

Counterexample. Let A=(1000)\mathbf{A} = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, B=(0001)\mathbf{B} = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}. Then AB=0\mathbf{A}\mathbf{B} = \mathbf{0}, so tr⁑(AB)=0\operatorname{tr}(\mathbf{A}\mathbf{B}) = 0, but tr⁑(A)tr⁑(B)=1β‹…1=1β‰ 0\operatorname{tr}(\mathbf{A})\operatorname{tr}(\mathbf{B}) = 1 \cdot 1 = 1 \neq 0.

What is true is the cyclic property: tr⁑(AB)=tr⁑(BA)\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{B}\mathbf{A}). See TCyclic Property of the Trace.

Quick Check

Let A≻0\mathbf{A} \succ 0 be a 3Γ—33 \times 3 positive definite matrix with a11=2a_{11} = 2, a22=3a_{22} = 3, a33=5a_{33} = 5. By Hadamard's inequality, what is the tightest upper bound on det⁑(A)\det(\mathbf{A}) that the inequality provides?

1010

3030

1515

∞\infty (Hadamard gives no finite bound)

Quick Check

Which of the following statements about f(X)=log⁑det⁑(X)f(\mathbf{X}) = \log\det(\mathbf{X}) on the positive definite cone is TRUE?

ff is convex.

ff is concave.

ff is neither convex nor concave.

ff is linear.

Trace

The sum of the diagonal entries of a square matrix: tr⁑(A)=βˆ‘i=1naii\operatorname{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii}. Equivalently, the sum of the eigenvalues. The trace is linear, similarity-invariant, and satisfies the cyclic property tr⁑(AB)=tr⁑(BA)\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{B}\mathbf{A}). In wireless communications, the trace of the input covariance matrix equals total transmit power.

Related: Determinant of a Matrix, Eigenvalue and Eigenvector, power constraint

Hadamard Inequality

For a positive definite matrix A\mathbf{A}, the bound det⁑(A)β‰€βˆi=1naii\det(\mathbf{A}) \leq \prod_{i=1}^n a_{ii}, with equality if and only if A\mathbf{A} is diagonal. Geometrically, the volume of the parallelepiped spanned by the columns of A\mathbf{A} is maximized when the columns are orthogonal. In MIMO systems, this bounds the capacity by the sum of independent sub-channel capacities.

Related: Determinant of a Matrix, Fischer's Inequality, Positive Definite Matrix

Log-Det Concavity

The property that f(X)=log⁑det⁑(X)f(\mathbf{X}) = \log\det(\mathbf{X}) is a concave function on the cone of positive definite matrices. This ensures that MIMO capacity optimization (maximizing log⁑det⁑(I+SNRβ‹…HRxHH)\log\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}\mathbf{R}_x\mathbf{H}^H) subject to a convex power constraint) is a convex program, guaranteeing global optimality of the water-filling solution.

Related: concave function, convex optimization, MIMO Capacity (Deterministic Channel)

⚠️Engineering Note

Computing log-det in Practice: Cholesky, Not Eigendecomposition

Never compute log⁑det⁑(A)\log\det(\mathbf{A}) by finding all eigenvalues first β€” that costs O(n3)O(n^3) for the eigendecomposition plus the risk of numerical overflow/underflow in the product. Instead, use the Cholesky factorization A=LLH\mathbf{A} = \mathbf{L}\mathbf{L}^H (which requires A≻0\mathbf{A} \succ 0): log⁑det⁑(A)=2βˆ‘i=1nlog⁑Lii.\log\det(\mathbf{A}) = 2\sum_{i=1}^n \log L_{ii}. Cholesky costs n3/3n^3/3 flops β€” half the cost of a full eigendecomposition β€” and the sum of logarithms avoids overflow. In MIMO capacity computation, this is the standard approach: numpy.linalg.slogdet(A) uses Cholesky internally.

Practical Constraints
  • β€’

    Cholesky: n3/3n^3/3 flops vs eigendecomposition: ∼4n3\sim 4n^3 flops

  • β€’

    For 64Γ—6464 \times 64 MIMO: Cholesky takes ~90K flops, eigendecomposition ~1M flops

  • β€’

    numpy.linalg.slogdet returns (sign, log|det|) β€” numerically stable for any matrix

Historical Note: Hadamard's Inequality and the Maximum Determinant Problem

1893

Jacques Hadamard proved his determinant inequality in 1893, establishing that the determinant of a positive definite matrix is bounded by the product of its diagonal entries. The geometric interpretation β€” the volume of a parallelepiped is maximized when its edges are orthogonal β€” was already implicit in the work of Gram (1883), but Hadamard gave the sharp algebraic bound.

The related Hadamard maximum determinant problem β€” finding the nΓ—nn \times n matrix with entries in {βˆ’1,+1}\{-1, +1\} that maximizes ∣det⁑(A)∣|\det(\mathbf{A})| β€” remains open for most nn. Hadamard matrices (those achieving equality with entries Β±1\pm 1) exist only when n=1,2n = 1, 2, or nn is a multiple of 4, and their existence for all such nn is a famous unsolved conjecture.

Why This Matters: Trace and Determinant in Information-Theoretic Capacity Bounds

The trace and log-determinant are the two fundamental matrix functionals in information theory. The trace constrains power (tr(Rx)≀P\text{tr}(\mathbf{R}_x) \leq P), while the log-determinant measures rate (C=log⁑det⁑(I+SNRβ‹…HRxHH)C = \log\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}\mathbf{R}_x\mathbf{H}^H)). The inequalities of this section β€” Hadamard, Fischer, and log-det concavity β€” are the analytical backbone of capacity analysis in Book ITA (Chapters 13-16) and Book MIMO (Chapters 1-5).