# Revision history [back]

Q1. In his lecture, Prof. Strang begins with the simplest case: the orthogonal projection of a vector $\mathbf{b}$ onto a one dimensional subspace $S=\mathrm{span}\langle\mathbf{a}\rangle$, $\mathbf{a}$ being a non-null vector in $\mathbb{R}^n$. The projection vector $\mathbf{p}$ should be a multiple of $\mathbf{a}$, so Prof. Strang writes it as $\mathbf{p}=x\mathbf{a}$. Here $x$ is a scalar number, not a matrix, and $\mathbf{a}$ is a vector, identified with a $n\times 1$ matrix. He wants to deduce that $\mathbf{p}=P\mathbf{b}$, where $P$ is the projection matrix $P=\frac{1}{\mathbf{a}^T\mathbf{a}}\mathbf{a}\mathbf{a}^T,$ which is an $n\times n$ matrix. To this end, Prof. Strang writes the projection vector as $\mathbf{p}=\mathbf{a}\,x$. Now, you should see $\mathbf{p}$ and $\mathbf{a}$ as $n\times1$ matrices and $x$ as a one dimensional vector or a $1\times1$ matrix, so that $\mathbf{a}$ and $x$ can be multiplied. Thus, we have $x\,\mathbf{a}=\mathbf{a}\,x.$ This identity does not mean that $\mathbf{a}$ and $x$ are two matrices that commute. Just read the left and right sides as stated above. It is true that the notation may be a bit misleading. Perhaps it would be better to write $(x)$ instead of $x$ if $x$ should be seen as a $1\times 1$ matrix (parentheses are used to delimit matrices). For example, $4\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix} =\begin{pmatrix} 4 \\ 8 \\ 12\end{pmatrix} =\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix}(4).$

The advantage of expressing $\mathbf{p}$ as $\mathbf{a}x$ is that generalization is then possible. If $S$ is spanned by the linearly independent vectors $\mathbf{a}_1,\ldots,\mathbf{a}_k$, the orthogonal projection $\mathbf{p}$ of a vector $\mathbf{b}$ onto $S$ should be a linear combination of $\mathbf{a}_1,\ldots,\mathbf{a}_k$, that is, $\mathbf{p}=x_1\mathbf{a}_1+\cdots+x_k\mathbf{a}_k=AX$ with $A=\left(\mathbf{a}_1 \vert\ldots \vert\mathbf{a}_k\right)$ and $X=\begin{pmatrix}x_1 \\ \vdots \\ x_k\end{pmatrix}$ The reasoning in Prof. Strang’s lecture would then show that $\mathbf{p}=P\mathbf{b}$, where $P$ is now the projection matrix $P=A(A^TA)^{-1}A^T.$

Q2. In your code, A.transpose()*A is a $1\times 1$ matrix. You cannot divide by a matrix. Thus you need to extract the unique element of this matrix, either with the .det() method or simply writing (A.transpose()*A)[0,0]. By the way, note that, in the video, the projection matrix is denoted by $P$, not $X$.

In SageMath, I think that it is better to use vector instead of matrix to represent $\mathbf{a}$, $\mathbf{b}$ and $\mathbf{p}$, as shown here.

Q1. In his lecture, Prof. Strang begins with the simplest case: the orthogonal projection of a vector $\mathbf{b}$ onto a one dimensional subspace $S=\mathrm{span}\langle\mathbf{a}\rangle$, $\mathbf{a}$ being a non-null vector in $\mathbb{R}^n$. The projection vector $\mathbf{p}$ should be a multiple of $\mathbf{a}$, so Prof. Strang writes it as $\mathbf{p}=x\mathbf{a}$. Here $x$ is a scalar number, not a matrix, and $\mathbf{a}$ is a vector, identified with a $n\times 1$ matrix. He wants to deduce that $\mathbf{p}=P\mathbf{b}$, where $P$ is the projection matrix $P=\frac{1}{\mathbf{a}^T\mathbf{a}}\mathbf{a}\mathbf{a}^T,$ which is an $n\times n$ matrix. To this end, Prof. Strang writes the projection vector as $\mathbf{p}=\mathbf{a}\,x$. Now, you should see $\mathbf{p}$ and $\mathbf{a}$ as $n\times1$ matrices and $x$ as a one dimensional vector or a $1\times1$ matrix, so that $\mathbf{a}$ and $x$ can be multiplied. Thus, we have $x\,\mathbf{a}=\mathbf{a}\,x.$ This identity does not mean that $\mathbf{a}$ and $x$ are two matrices that commute. Just read the left and right sides as stated above. It is true that the notation may be a bit misleading. Perhaps it would be better to write $(x)$ instead of $x$ if $x$ should be seen as a $1\times 1$ matrix (parentheses are used to delimit matrices). For example, $4\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix} =\begin{pmatrix} 4 \\ 8 \\ 12\end{pmatrix} =\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix}(4).$

The advantage of expressing $\mathbf{p}$ as $\mathbf{a}x$ is that generalization is then possible. If $S$ is spanned by the linearly independent vectors $\mathbf{a}_1,\ldots,\mathbf{a}_k$, the orthogonal projection $\mathbf{p}$ of a vector $\mathbf{b}$ onto $S$ should be a linear combination of $\mathbf{a}_1,\ldots,\mathbf{a}_k$, that is, $\mathbf{p}=x_1\mathbf{a}_1+\cdots+x_k\mathbf{a}_k=AX$ with $A=\left(\mathbf{a}_1 \vert\ldots \vert\mathbf{a}_k\right)$ and $X=\begin{pmatrix}x_1 \\ \vdots \\ x_k\end{pmatrix}$ The reasoning in Prof. Strang’s lecture would then show that $\mathbf{p}=P\mathbf{b}$, where $P$ is now the projection matrix $P=A(A^TA)^{-1}A^T.$

Returning to your question, you write P=X*A where, in fact, you should write $\mathbf{p}=x\mathbf{a}$ and consider $x$ being an scalar, not a matrix. Consequently, since $x$ is a scalar, from $\mathbf{a}^T(\mathbf{b}-\mathbf{p})=0$, one has $\mathbf{a}^T\mathbf{b}=\mathbf{a}^T\mathbf{p}=\mathbf{a}^T(x\mathbf{a})=x\mathbf{a}^T\mathbf{a}.$ Hence $x=\frac{\mathbf{a}^T\mathbf{b}}{\mathbf{a}^T\mathbf{a}}.$

Q2. In your code, A.transpose()*A is a $1\times 1$ matrix. You cannot divide by a matrix. Thus you need to extract the unique element of this matrix, either with the .det() method or simply writing (A.transpose()*A)[0,0]. By the way, note that, in the video, the projection matrix is denoted by $P$, not $X$.

In SageMath, I think that it is better to use vector instead of matrix to represent $\mathbf{a}$, $\mathbf{b}$ and $\mathbf{p}$, as shown here.