Background Knowledge
Glossary
- Inner product: $<a,b> = \vec{a}^T \cdot \vec{b}$, which is a.k.a “dot product”
- Outer product: $a \otimes b = \vec{a} \cdot \vec{b}^T$, which results in a matrix.
Vector Projection
Projection of a on to b $proj_ba= \frac{<a,b>}{\sqrt{<a,a>}}a$
Gram-Schmidt Orthogonalization
Goal: given m linearly independent N-dimensional vectors ${v_1 … v_m}$, transform them into an orthogonal set of vectors.
Method: an orthogonal vector is formed by subtracting a vector’s projection onto other orthogonal basis, $U$.
- $u_1 = v_1$
- $u_2 = v_2 - \frac{<v_2,u_1>}{\sqrt{<u_1,u_1>}}u_1$
- $u_3 = v_3 - \frac{<v_3,u_1>}{\sqrt{<u_1,u_1>}}u_1 - \frac{<v_3,u_2>}{\sqrt{<u_2,u_2>}}u_2$
Caveats
- It’s not numerically stable - rounding error in each step can be accumulated. In real life, people use Householder transformation, or Givens rotation.
QR Decomposition
Goal: given a matrix $A$, decompose it into $A=QR$ such that Q is orthonormal, and R is an upper triangular matrix.
How:
- Get an Orthogonal Basis of $A$’s columns using Gram-Schmidt (or Householder Transformation). This basis is $Q$
- Get $R$:
Example:
\[\begin{gather*} A = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} \end{gather*}\]The gram schmidt process:
\[\begin{equation*} \begin{aligned} \mathbf{u}_1 = \mathbf{a}_1 = (1, 1, 0), \\ \mathbf{e}_1 = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|} = \frac{1}{\sqrt{2}} (1, 1, 0) = \left( \frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}, 0 \right), \\ \mathbf{u}_2 = \mathbf{a}_2 - (\mathbf{a}_2 \cdot \mathbf{e}_1)\mathbf{e}_1 = (1, 0, 1) - \frac{1}{\sqrt{2}} \left(1, 1, 0\right) = \left(1 - \frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}, 1\right), \\ \mathbf{e}_2 = \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|} = \frac{1}{\sqrt{3/2}} \left( \frac{1}{2}, -\frac{1}{2}, 1 \right) = \left( \frac{1}{\sqrt{6}}, -\frac{1}{\sqrt{6}}, \frac{2}{\sqrt{6}} \right), \\ \mathbf{u}_3 = \mathbf{a}_3 - (\mathbf{a}_3 \cdot \mathbf{e}_1)\mathbf{e}_1 - (\mathbf{a}_3 \cdot \mathbf{e}_2)\mathbf{e}_2 \\ = (0, 1, 1) - \frac{1}{\sqrt{2}} \left(1, 1, 0\right) - \frac{1}{\sqrt{6}} \left(1, -1, 2\right) \\ = \left(0 - \frac{1}{\sqrt{2}} - \frac{1}{\sqrt{6}}, 1 - \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{6}}, 1 - \frac{2}{\sqrt{6}}\right) = \left(-\frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}}\right), \\ \mathbf{e}_3 = \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|} = \left( -\frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}} \right). \end{aligned} \end{equation*}\]One important note from Gram-Schmidt, is that each subsequent basis vector is perpendicular to the vectors that creates the previous basis vectors, but not to the vector that it directly comes from. E.g., $e_2\perp a_1$, $e_3\perp a_1$, $e_3\perp a_2$, $e_2 \not\perp a_2$
Then, Q is
\[\begin{gather*} Q = \begin{bmatrix} e_1 & e_2 & e_3 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \end{bmatrix} \end{gather*}\] \[\begin{gather*} R = Q^T A = \begin{bmatrix} \mathbf{a}_1 \cdot \mathbf{e}_1 & \mathbf{a}_2 \cdot \mathbf{e}_1 & \mathbf{a}_3 \cdot \mathbf{e}_1 \\ 0 & \mathbf{a}_2 \cdot \mathbf{e}_2 & \mathbf{a}_3 \cdot \mathbf{e}_2 \\ 0 & 0 & \mathbf{a}_3 \cdot \mathbf{e}_3 \end{bmatrix} = \begin{bmatrix} \frac{2}{\sqrt{2}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 0 & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ 0 & 0 & \frac{2}{\sqrt{3}} \end{bmatrix} \end{gather*}\]When A is m x n
, Q will be m x n
and R will be n x n
. If A has full rank, then Q will have full rank, so R’s diagnoal terms will be non-zero. Then R is invertible.
On the other hand, if A doesn’t have full rank, some of its column vectors are linearly dependent on others. Their corresponding diagonal terms in R, will be zero, because the orthonormal vectors will be zero vectors (because they derive from previous orthonormal basis). In that case, R will not be invertible.
When m < n
, A is overdetermined. So it will defintely have linearly dependent columns. Q will be mxm
, and R will be mxn
. In that case, R will not be a square matrix.
Applications
- Find Least Square Solution to $Ax = b$. The common solution is
Using QR decomposition:
\[\begin{gather*} \hat{x} = (A^TA)^{-1}A^Tb \\ = (R^TR)^{-1}R^TQ^T b \end{gather*}\]Note that $R$ may not be square matrix (when m<n
), or may not be invertible (when A is not invertible). Except from those cases,
One trick for solving for x is
\[\begin{gather*} \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ 0 & r_{22} & r_{23} \\ 0 & 0 & r_{33} \end{bmatrix} \hat{x} = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} \\ => y_3 = r_33 \\... \end{gather*}\]So x can be solved quite easily through substitution.
Gram Matrix
Gram Matrix G
is defined as:
In otherwords, G
compares the similarity of V against its own columns