Revision history [back]

Q1. In his lecture, Prof. Strang begins with the simplest case: the orthogonal projection of a vector \(\mathbf{b}\) onto a one dimensional subspace \(S=\mathrm{span}\langle\mathbf{a}\rangle\), \(\mathbf{a}\) being a non-null vector in \(\mathbb{R}^n\). The projection vector \(\mathbf{p}\) should be a multiple of \(\mathbf{a}\), so Prof. Strang writes it as \(\mathbf{p}=x\mathbf{a}\). Here \(x\) is a scalar number, not a matrix, and \(\mathbf{a}\) is a vector, identified with a \(n\times 1\) matrix. He wants to deduce that \(\mathbf{p}=P\mathbf{b}\), where \(P\) is the projection matrix \[P=\frac{1}{\mathbf{a}^T\mathbf{a}}\mathbf{a}\mathbf{a}^T,\] which is an \(n\times n\) matrix. To this end, Prof. Strang writes the projection vector as \(\mathbf{p}=\mathbf{a}\,x\). Now, you should see \(\mathbf{p}\) and \(\mathbf{a}\) as \(n\times1\) matrices and \(x\) as a one dimensional vector or a \(1\times1\) matrix, so that \(\mathbf{a}\) and \(x\) can be multiplied. Thus, we have \[x\,\mathbf{a}=\mathbf{a}\,x.\] This identity does not mean that \(\mathbf{a}\) and \(x\) are two matrices that commute. Just read the left and right sides as stated above. It is true that the notation may be a bit misleading. Perhaps it would be better to write \((x)\) instead of \(x\) if \(x\) should be seen as a \(1\times 1\) matrix (parentheses are used to delimit matrices). For example, \[4\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix} =\begin{pmatrix} 4 \\ 8 \\ 12\end{pmatrix} =\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix}(4).\]

The advantage of expressing \(\mathbf{p}\) as \(\mathbf{a}x\) is that generalization is then possible. If \(S\) is spanned by the linearly independent vectors \(\mathbf{a}_1,\ldots,\mathbf{a}_k\), the orthogonal projection \(\mathbf{p}\) of a vector \(\mathbf{b}\) onto \(S\) should be a linear combination of \(\mathbf{a}_1,\ldots,\mathbf{a}_k\), that is, \[\mathbf{p}=x_1\mathbf{a}_1+\cdots+x_k\mathbf{a}_k=AX\] with \[A=\left(\mathbf{a}_1 \vert\ldots \vert\mathbf{a}_k\right)\] and \[X=\begin{pmatrix}x_1 \\ \vdots \\ x_k\end{pmatrix}\] The reasoning in Prof. Strang’s lecture would then show that \(\mathbf{p}=P\mathbf{b}\), where \(P\) is now the projection matrix \[P=A(A^TA)^{-1}A^T.\]

Q2. In your code, A.transpose()*A is a \(1\times 1\) matrix. You cannot divide by a matrix. Thus you need to extract the unique element of this matrix, either with the .det() method or simply writing (A.transpose()*A)[0,0]. By the way, note that, in the video, the projection matrix is denoted by \(P\), not \(X\).

In SageMath, I think that it is better to use vector instead of matrix to represent \(\mathbf{a}\), \(\mathbf{b}\) and \(\mathbf{p}\), as shown here.

Returning to your question, you write P=X*A where, in fact, you should write \(\mathbf{p}=x\mathbf{a}\) and consider \(x\) being an scalar, not a matrix. Consequently, since \(x\) is a scalar, from \(\mathbf{a}^T(\mathbf{b}-\mathbf{p})=0\), one has \[\mathbf{a}^T\mathbf{b}=\mathbf{a}^T\mathbf{p}=\mathbf{a}^T(x\mathbf{a})=x\mathbf{a}^T\mathbf{a}.\] Hence \[x=\frac{\mathbf{a}^T\mathbf{b}}{\mathbf{a}^T\mathbf{a}}.\]

In SageMath, I think that it is better to use vector instead of matrix to represent \(\mathbf{a}\), \(\mathbf{b}\) and \(\mathbf{p}\), as shown here.