Compact Linear Algebra Reference (CLAR)

"Clar" is a root word of Latin origin, meaning "clear."

CLAR is inspired by the excellent Linear Algebra Review and Reference by Kolter and Do, but is intended as a more compact alternative.

Fundamental Equation: \(Ax = b\)
- \(A \in \mathbb{R}^{m \times n}\) is a matrix with \(m\) rows and \(n\) columns
  - \(m\) is the number of equations (rows), while \(n\) is the number of unknowns (columns)
- \(x \in \mathbb{R}^{n}\) is a (column) vector with \(n\) entries/unknowns (or an \(n \times 1\) matrix)
- \(b \in \mathbb{R}^{m}\) is a (column) vector with \(m\) entries/outputs (or an \(m \times 1\) matrix)
Span: \(\text{span}(\{x_1, \dots, x_n\}) = \Big(v : v = \sum_{i = 1}^n \alpha_ix_i, \alpha_i \in \mathbb{R}\Big)\)
- Interpretation: set of all vectors that can be expressed as a linear combination of vectors \(x_1, \dots, x_n\)
Matrix Rank: \(\text{rank}(A)\) is number of linearly independent rows or columns
- Note that Row Rank = Column Rank, and hence we refer to both simply as rank
- \(A\) is full rank iff \(\text{rank}(A) = \min{(m, n)}\)
Matrix Spaces:
- Range (or Columnspace): \(\mathcal{R}(A) = \{v \in \mathbb{R}^m : v = Ax, x \in \mathbb{R}^n\} = \text{span}(\text{Col}(A))\)
- Nullspace: \(\mathcal{N}(A) = \{x \in \mathbb{R} : Ax = 0\}\)
Transpose: matrix \(A^\top\) formed by swapping the rows and columns of \(A\)
- \((A^\top)^\top = A\)
- \((AB)^\top = B^\top A^\top\)
- \((A + B)^\top = A^\top + B^\top\)
- \(\forall a \in \mathbb{R}, a^\top = a\)
Vector Dot (or Inner) Product: If \(x, y \in \mathbb{R}^n\), then \(x^\top y = y^\top x = \sum_{i=1}^n x_iy_i\)
- \(x\) and \(y\) are orthogonal iff \(x^\top y = y^\top x = 0\) (geometrically, these two vectors will be perpendicular to each other)
Matrix Multiplication (MatMul): if \(A \in \mathbb{R}^{m \times n}\) and \(B \in \mathbb{R}^{n \times p}\), then \(C = AB \in \mathbb{R}^{m \times p}\), with \(C_{ij} = \sum_{k=1}^n A_{ik}B_{kj}\)
- MatMul is Associative: \((AB)C = A(BC)\)
- MatMul is Distributive: \(A(B + C) = AB + AC\)
- MatMul is NOT Commutative: \(AB \neq BA\)
  - In fact, MatMul does not exist if \(m \neq p\)
Determinant: \(\det : \mathbb{R}^{n \times n} \rightarrow \mathbb{R}\)
- Interpretation: the signed \(n\)-volume of the parallelotope formed by the matrix's column vectors
- For \(A \in \mathbb{R}^{2 \times 2} = \langle a, b; c, d \rangle\), we have \(\det A = ad - bc\)
- If \(\det A = 0\), matrix \(A\) is singular (non-invertible), otherwise it is non-singular (invertible)
Trace: \(\mathrm{tr} A = \sum_{i=1}^n A_{ii}\), which is the sum of diagonal elements
- \(\mathrm{tr}\ A = \mathrm{tr}\ A^\top\)
- For \(A, B \in \mathbb{R}^{n \times n}, \mathrm{tr}\ (A + B) = \mathrm{tr}\ A + \mathrm{tr}\ B\)
Vector \(L_p\) Norm: \(\lvert\lvert x\rvert\rvert_p = \big(\sum_{i=1}^n \lvert x_i \rvert^p\big)^{1/p}\)
- \(\text{L}_1\) Norm: \(\lvert\lvert x\rvert\rvert_1 = \sum_{i=1}^n x_i\)
- \(\text{L}_2\) Norm: \(\lvert\lvert x\rvert\rvert_2 = \sqrt{\sum_{i=1}^n x_i^2}\)
- \(\text{L}_\infty\) Norm: \(\lvert\lvert x\rvert\rvert _\infty = \max_i \lvert x_i\rvert\)
Matrix Frobenius Norm: \(\lvert\lvert A\rvert\rvert_{\text{F}} = \sqrt{\sum_{i=1}^m\sum_{j=1}^n A_{ij}^2} = \sqrt{\text{tr}(A^\top A)}\)
Identity Matrix: \(I \in \mathbb{R}^{n \times n}\) s.t. \(AI = A = IA\) with \(I_{ij} = 1\) iff \(i = j\) and \(0\) otherwise
Inverse Matrix: \(A^{-1}A = I = AA^{-1}\)
- \(A\) is invertible (non-singular) and inverse \(A^{-1}\) exists if \(\det A \neq 0\), and it is non-invertible (singular) if \(\det A = 0\)
- \((A^{-1})^{-1} = A\)
- \((AB)^{-1} = B^{-1}A^{-1}\)
- \((A^{-1})^\top = (A^\top)^{-1} = A^{-T}\)
Symmetric Matrix: square matrix \(A^{n \times n}\) is symmetric iff \(A = A^\top\)
- Matrix \(A + A^\top\) is symmetric
Anti-symmetric Matrix: square matrix \(A^{n \times n}\) is symmetric iff \(A = -A^\top\)
- Matrix \(A - A^\top\) is anti-symmetric
Matrix Quadratic Form: for a symmetric matrix \(A \in \mathbb{n \times n}\), we have \(Q(x) = x^\top Ax\)
- Positive Definite (PD) Matrix: \(\forall x \in \mathbb{R}^n \neq 0, Q(x) > 0 \implies A \succ 0 \text{ (or just } A > 0)\)
- Positive Semidefinite (PSD) Matrix: \(\forall x \in \mathbb{R}^n \neq 0, Q(x) \geq 0 \implies A \succeq 0 \text{ (or just } A \geq 0)\)
- Negative Definite (ND) Matrix: \(\forall x \in \mathbb{R}^n \neq 0, Q(x) < 0 \implies A \prec 0 \text{ (or just } A < 0)\)
- Negative Semidefinite (NSD) Matrix: \(\forall x \in \mathbb{R}^n \neq 0, Q(x) \leq 0 \implies A \preceq 0 \text{ (or just } A \leq 0)\)
Orthogonal Matrix: a square matrix \(U \in \mathbb{R}^{n \times n}\) is orthogonal if all of its columns are orthogonal to each other and are normalized
- The columns of \(U\) are referred to as orthonormal
- It follows that \(U^\top U = I = UU^\top\)
Eigenvalue and eigenvector: given \(A \in \mathbb{R}^{n \times n}\), a scalar \(\lambda \in \mathbb{C}\) is an eigenvalue with eigenvector \(x \in \mathbb{C}^n \setminus{\{0\}}\) if \(Ax = \lambda x\)
- Reformulation: \((\lambda, x)\) is an eigenvalue-eigenvector pair of A iff \((A - \lambda I)x = 0\)
- Interpretation: an eigenvector is a direction a matrix scales without changing direction, and the eigenvalue is how much it scales
- Trace equals to the sum of eigenvalues: \(\mathrm{tr} A = \sum_{i=1}^n \lambda_i\)
- Determinant equals to the product of eigenvalues: \(\det A = \prod_{i=1}^n \lambda_i\)
- Rank is equal to the number of non-zero eigenvalues: \(\text{rank}(A) = \left| \{ \lambda_i \mid \lambda_i \neq 0 \} \right|\)
- If \(A\) is non-singular, \(A^{-1}x_i = (1/\lambda_i)x_i\) (i.e., \(1/\lambda_i\) is the eigenvalue and \(x_i\) is the eigenvector)
- Eigenvalues of a diagonal matrix \(D = \text{diag}(d_1,\dots,d_n)\) are the diagonal entries \(d_1,\dots,d_n\) themselves
- In PCA, eigenvalues represent the variance explained by each principal component, while eigenvectors define their directions
Diagonalizable Matrix: a square matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable iff its eigenvector matrix is full rank
- Eigenvalue Decomposition (EVD): \(A = PDP^{-1}\) where \(P\) is the eigenvector matrix and \(D\) is a diagonal matrix of eigenvalues
LU Decomposition: \(A = L \cdot U\)
- Efficiently solves systems of equations \(Ax = b_1, \dots, Ax = b_n\) by factoring \(A = LU\) once, then using substitution for each \(b_i\)
- To perform LU decomposition, solve two equations: \(Ly = b\) for \(y\) and then solve \(Ux = y\) for \(x\)
- \(L\) and \(U\) are lower and upper triangular matrices, respectively, with \(L\) having 1s on the diagonal
Gram Matrix: \(G = A^\top A\)