Ask Your Question

Matrix projection blues

asked 2019-06-03 19:31:17 +0200

ortollj gravatar image

updated 2019-06-03 21:57:42 +0200

Hi on OpenCourseware MIT site: Lecture 15: Projections onto subspaces

[edited :the code on sagecell sagemath org was not the good one ! it was the first version I wrote. when I updated the code in the cell, that does not work, I had to open a new cell]

code on sagecell sagemath org

p is the b projected vector on vector a 
below all letters are matrices
E=B-P with P=X*A  
E=B-X*A as E perpendicular to A => dot product(A,E)) = 0 means A.transpose()*E=0
then A^T*(B-X*A)=0 so A^T*(X*A)=A^T*B (and this is false !)
but Gilbert Strang write X*A^T*A=A^T*B (and this is good !)
but we are not allowed to commute A^T*X*A by X*A^T*A if X is a matrix !
I know I made a mistake in my raisonning but I fail to find it !

Q2 : Why I must write  X=A * (A.transpose())/(( (A.transpose()) * A ).det())  
instead of             X=A * (A.transpose())/(( (A.transpose()) * A ))
why I need to add .det() ?
edit retag flag offensive close merge delete


I know the first question is basic math question not a sagemath question, sorry, tell me if I must post it on a math forum instead. now the code in the link is my last code.

ortollj gravatar imageortollj ( 2019-06-03 19:53:43 +0200 )edit

1 Answer

Sort by » oldest newest most voted

answered 2019-06-04 03:49:20 +0200

Juanjo gravatar image

updated 2019-06-04 04:48:35 +0200

Q1. In his lecture, Prof. Strang begins with the simplest case: the orthogonal projection of a vector \(\mathbf{b}\) onto a one dimensional subspace \(S=\mathrm{span}\langle\mathbf{a}\rangle\), \(\mathbf{a}\) being a non-null vector in \(\mathbb{R}^n\). The projection vector \(\mathbf{p}\) should be a multiple of \(\mathbf{a}\), so Prof. Strang writes it as \(\mathbf{p}=x\mathbf{a}\). Here \(x\) is a scalar number, not a matrix, and \(\mathbf{a}\) is a vector, identified with a \(n\times 1\) matrix. He wants to deduce that \(\mathbf{p}=P\mathbf{b}\), where \(P\) is the projection matrix \[P=\frac{1}{\mathbf{a}^T\mathbf{a}}\mathbf{a}\mathbf{a}^T,\] which is an \(n\times n\) matrix. To this end, Prof. Strang writes the projection vector as \(\mathbf{p}=\mathbf{a}\,x\). Now, you should see \(\mathbf{p}\) and \(\mathbf{a}\) as \(n\times1\) matrices and \(x\) as a one dimensional vector or a \(1\times1\) matrix, so that \(\mathbf{a}\) and \(x\) can be multiplied. Thus, we have \[x\,\mathbf{a}=\mathbf{a}\,x.\] This identity does not mean that \(\mathbf{a}\) and \(x\) are two matrices that commute. Just read the left and right sides as stated above. It is true that the notation may be a bit misleading. Perhaps it would be better to write \((x)\) instead of \(x\) if \(x\) should be seen as a \(1\times 1\) matrix (parentheses are used to delimit matrices). For example, \[4\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix} =\begin{pmatrix} 4 \\ 8 \\ 12\end{pmatrix} =\begin{pmatrix} 1 \\ 2 \\ 3\end{pmatrix}(4).\]

The advantage of expressing \(\mathbf{p}\) as \(\mathbf{a}x\) is that generalization is then possible. If \(S\) is spanned by the linearly independent vectors \(\mathbf{a}_1,\ldots,\mathbf{a}_k\), the orthogonal projection \(\mathbf{p}\) of a vector \(\mathbf{b}\) onto \(S\) should be a linear combination of \(\mathbf{a}_1,\ldots,\mathbf{a}_k\), that is, \[\mathbf{p}=x_1\mathbf{a}_1+\cdots+x_k\mathbf{a}_k=AX\] with \[A=\left(\mathbf{a}_1 \vert\ldots \vert\mathbf{a}_k\right)\] and \[X=\begin{pmatrix}x_1 \\ \vdots \\ x_k\end{pmatrix}\] The reasoning in Prof. Strang’s lecture would then show that \(\mathbf{p}=P\mathbf{b}\), where \(P\) is now the projection matrix \[P=A(A^TA)^{-1}A^T.\]

Returning to your question, you write P=X*A where, in fact, you should write \(\mathbf{p}=x\mathbf{a}\) and consider \(x\) being an scalar, not a matrix. Consequently, since \(x\) is a scalar, from \(\mathbf{a}^T(\mathbf{b}-\mathbf{p})=0\), one has \[\mathbf{a}^T\mathbf{b}=\mathbf{a}^T\mathbf{p}=\mathbf{a}^T(x\mathbf{a})=x\mathbf{a}^T\mathbf{a}.\] Hence \[x=\frac{\mathbf{a}^T\mathbf{b}}{\mathbf{a}^T\mathbf{a}}.\]

Q2. In your code, A.transpose()*A is a \(1\times 1\) matrix. You cannot divide by a matrix. Thus you need to extract the unique element of this matrix, either with the .det() method or simply writing (A.transpose()*A)[0,0]. By the way, note that, in the video, the projection matrix is denoted by \(P\), not \(X\).

In SageMath, I think that it is better to use vector instead of matrix to represent \(\mathbf{a}\), \(\mathbf{b}\) and \(\mathbf{p}\), as shown here.

edit flag offensive delete link more


Thanks a lot, Juanjo I understand my mistakes now. I'm a bit ashamed of the confusion I made, sorry.

ortollj gravatar imageortollj ( 2019-06-04 08:36:19 +0200 )edit

Don't worry. Glad to help you

Juanjo gravatar imageJuanjo ( 2019-06-04 11:45:11 +0200 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2019-06-03 19:31:17 +0200

Seen: 599 times

Last updated: Jun 04 '19