A similarity transformation is\[ B=M^{-1}AM \] Where \(B,A,M\) are square matrices. The goal of similarity transformation is to find a \(B\) matrix which has a simpler form than \(A\) so that we can use \(B\) in place of \(A\) to ease some computational work. Lets set our goal in having \(B\) be a diagonal matrix (a general diagonal form is called block diagonal or Jordan form, but here we are just looking at the case of \(B\) being a diagonal matrix).
The question becomes: Given \(A\), can we find \(M\) such that \(M^{-1}AM\) is diagonal?
The standard method to show the above is via an algebraic method, which we show first. Next we show a geometric method that explains similarity transformation geometrically.
Starting with\[ B=M^{-1}AM \] Our goal is to find a real matrix \(B\) such that it is diagonal. From the above, by pre multiplying each side by \(M\) we obtain\[ AM=MB \] Now, since our goal is to make \(B\) diagonal, let us select the eigenvalues of \(A\) to be the diagonal of \(B\). Now we write the above in expanded form as follows\[\begin {bmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{nn}\end {bmatrix}\begin {bmatrix} m_{11} & \cdots & m_{1n}\\ \vdots & \ddots & \vdots \\ m_{n1} & \cdots & m_{nn}\end {bmatrix} =\begin {bmatrix} m_{11} & \cdots & m_{1n}\\ \vdots & \ddots & \vdots \\ m_{n1} & \cdots & m_{nn}\end {bmatrix}\begin {bmatrix} \lambda _{1} & 0 & 0\\ 0 & \ddots & 0\\ 0 & 0 & \lambda _{n}\end {bmatrix} \] The above can be written as \(n\) separate equations by looking at the column view of matrix multiplication\[\begin {bmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{nn}\end {bmatrix}\begin {bmatrix} m_{11}\\ \vdots \\ m_{n1}\end {bmatrix} =\lambda _{1}\begin {bmatrix} m_{11}\\ \vdots \\ m_{n1}\end {bmatrix} \] And\[\begin {bmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{nn}\end {bmatrix}\begin {bmatrix} m_{12}\\ \vdots \\ m_{n2}\end {bmatrix} =\lambda _{2}\begin {bmatrix} m_{12}\\ \vdots \\ m_{n2}\end {bmatrix} \] All the way to the n\(^{th}\) column of \(M\)\[\begin {bmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{nn}\end {bmatrix}\begin {bmatrix} m_{1n}\\ \vdots \\ m_{nn}\end {bmatrix} =\lambda _{n}\begin {bmatrix} m_{1n}\\ \vdots \\ m_{nn}\end {bmatrix} \] Each of the above \(n\) equations is in the form\[ A\mathbf {x=}\lambda \mathbf {x}\] and this is the key idea.
The above shows that if we take the diagonal elements of \(B\) to be the eigenvalues of \(A\) then the \(M\) matrix is the (right) eigenvectors of \(A.\)
Hence, if we want the \(B\) matrix to be diagonal and real, this means the \(A\) matrix itself must have some conditions put on it. Specifically, \(A\) eigenvalues must be all distinct and real. This happens when \(A\) is real and symmetric. Therefore, we have the general result that given a matrix \(A\) which has distinct and real eigenvalues, only then can we find a similar matrix to it which is real and diagonal, and these diagonal values are the eigenvalues of \(A\).
It turns out that this derivation is more complicated than the algebraic approach. It involves a change of basis matrix (which will be our \(M\) matrix) and the representation of linear transformation \(T\) as a matrix under some basis (which will be our \(A\) matrix). Let us start from the beginning.
Given the vector \(\mathbf {x}\) in \(\mathbb {R} ^{n}\), it can have many different representations (or coordinates) depending on which basis are used. There are an infinite number of basis that span \(\mathbb {R} ^{n}\), hence there are infinite representations for the same vector. Consider for example \(\mathbb {R} ^{2}\). The standard basis vectors are \(\left \{ \begin {bmatrix} 1\\ 0 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \), but another basis are \(\left \{ \begin {bmatrix} 1\\ 1 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \), and yet another are \(\left \{ \begin {bmatrix} 1\\ 1 \end {bmatrix} ,\begin {bmatrix} 0\\ -1 \end {bmatrix} \right \} \), and so on.
In \(\mathbb {R} ^{n}\) any \(n\) linearly independent vectors can be used as basis. Let \(\mathbf {x}_{v}\) be the vector representation in w.r.t. basis \(v\), and let \(\mathbf {x}_{v^{\prime }}\) be the vector representation w.r.t. basis \(v^{\prime }\).
An important problem in linear algebra is to obtain \(\mathbf {x}_{v}\) given \(\mathbf {x}_{v^{\prime }}\) and the change of basis matrix \(\left [ M\right ] _{v^{\prime }\to v}\).
This requires finding a matrix representation for the change of basis. Once the \(\left [ M\right ] _{v^{\prime }\to v}\) matrix is found, we can write \begin {equation} \mathbf {x}_{v}=\left [ M\right ] _{v^{\prime }\to v}\mathbf {x}_{v^{\prime }} \tag {1} \end {equation} Where \(\left [ M\right ] _{v^{\prime }\to v}\) is the change of basis matrix which when applied to \(\mathbf {x}_{v^{\prime }}\) returns the coordinates of the vector w.r.t. basis \(v.\)
From (1) we see that given \(\mathbf {x}_{v}\), then to obtain \(\mathbf {x}_{v^{\prime }}\) we write\begin {equation} \mathbf {x}_{v^{\prime }}=[M]_{v^{\prime }\rightarrow v}^{-1}\mathbf {x}_{v}\tag {1A} \end {equation} Another important problem is that of linear transformation \(T\), where now we apply some transformation on the whole space and we want to find what happens to the space coordinates (all the position vectors) due to this transformation.
Consider for example \(T\) which is a rotation in \(\mathbb {R} ^{2}\) by some angle \(\theta \). Here, every position vector in the space is affected by this rotation but the basis remain fixed, and we want to find the new coordinates of each point in space.
Let \(\mathbf {x}_{v}\) be the position vector before applying the linear transformation, and let \(\mathbf {y}_{v}\) be the new position vector. Notice that both vectors are written with respect to the same basis \(v\). Similar to change basis, we want a way to find \(\mathbf {y}_{v}\) from \(\mathbf {x}_{v}\), hence we need a way to represent this linear transformation by a matrix, say \(\left [ T\right ] _{v}\) with respect to the basis \(v\) and so we could write the following\begin {equation} \mathbf {y}_{v}=\left [ T\right ] _{v}\mathbf {x}_{v}\tag {2} \end {equation}
Assume the basis is changed to \(v^{\prime }\). Let the representation of the same linear transformation \(T\) w.r.t. basis \(v^{\prime }\) be called \(\left [ T\right ] _{v^{\prime }}\), so we write\begin {equation} \mathbf {y}_{v^{\prime }}=\left [ T\right ] _{v^{\prime }}\mathbf {x}_{v^{\prime }}\tag {2A} \end {equation}
Hence when the basis changes from \(v\) to \(v^{\prime }\), \(T\) representation changes from \(\left [ T\right ] _{v}\) to \(\left [ T\right ] _{v^{\prime }}.\)
Now assume we are given some linear transformation \(T\) and its representation \(\left [ T\right ] _{v}\) w.r.t. basis \(v\), how could we find \(T\) representation \(\left [ T\right ] _{v^{\prime }}\) w.r.t. to new basis \(v^{\prime }\)?
From (2) we have\[ \mathbf {y}_{v}=\left [ T\right ] _{v}\mathbf {x}_{v}\] But from (1) \(\mathbf {x}_{v}=\left [ M\right ] _{v^{\prime }\rightarrow v}\mathbf {x}_{v^{\prime }}\), hence the above becomes\[ \mathbf {y}_{v}=\left [ T\right ] _{v}\ \left ( \left [ M\right ] _{v^{\prime }\rightarrow v}\mathbf {x}_{v^{\prime }}\right ) \] Pre-multiplying from left the above equation by \([M]_{v^{\prime }\rightarrow v}^{-1}\) we obtain\[ \lbrack M]_{v^{\prime }\rightarrow v}^{-1}\mathbf {y}_{v}=[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\mathbf {x}_{v^{\prime }}\] But since \([M]_{v^{\prime }\rightarrow v}^{-1}\mathbf {y}_{v}=\mathbf {y}_{v^{\prime }}\), then above reduces to\[ \mathbf {y}_{v^{\prime }}=[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\mathbf {x}_{v^{\prime }}\] But from (2A) \(\mathbf {y}_{v^{\prime }}=\left [ T\right ] _{v^{\prime }}\mathbf {x}_{v^{\prime }}\), hence the above becomes\[ \lbrack T]_{v^{\prime }}\mathbf {x}_{v^{\prime }}=[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\mathbf {x}_{v^{\prime }}\] Therefore\begin {equation} \lbrack T]_{v^{\prime }}=[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\tag {3} \end {equation} Notice the difference between change of basis and linear transformation. In change of basis, the vector \(\mathbf {x}\) remained fixed in space, but the basis changed, and we want to find the coordinates of the vector w.r.t the new basis. With linear transformation, the vector itself is changed from \(\mathbf {x}_{v}\) to \(\mathbf {y}_{v}\), but both vectors are expressed w.r.t. the same basis.
Equation (3) allows us to obtain a new matrix representation of the linear transformation \(T\) by expressing the same \(T\) w.r.t. to different basis. Hence if we are given a representation of \(T\) which is not the most optimal representation, we can, by change of basis, obtain a different representation of the same \(T\) by using (3). The most optimal matrix representation for linear transformation is a diagonal matrix.
Equation (3) is called a similarity transformation. To make it easier to compare (3) above with what we wrote in the previous section when we derived the above using an algebraic approach, we let \(\left [ T\right ] _{v^{\prime }}=B,\left [ T\right ] _{v}=A\), hence (3) is \[ B=M^{-1}AM \]
The matrix \(\left [ T\right ] _{v^{\prime }}\) is similar to \(\left [ T\right ] _{v}\). (i.e. \(B\) is similar to \(A)\). Both matrices represent the same linear transformation applied on the space. We will show below some examples of how to find \(\left [ T\right ] _{v^{\prime }}\) given \(\left [ T\right ] _{v}\) and \(\left [ M\right ] \). But first we show how to obtain matrix representation of \(T.\)
Given a vector \(\mathbf {x}\in \mathbb {R} ^{n}\) and some linear transformation (or a linear operator) \(T\) that acts on the vector \(\mathbf {x}\) transforming this vector into another vector \(\mathbf {y}\in \mathbb {R} ^{n}\) according to some prescribed manner \(T:\mathbf {x\rightarrow y}\). Examples of such linear transformation are rotation, elongation, compression, and reflection, and any combination of these operations, but it can not include the translation of the vector, since translation is not linear.
The question to answer here is how to write down a representation of this linear transformation? It turns out that \(T\) can be represented by a matrix of size \(n\times n\), and the actual numbers that go into the matrix, or the shape of the matrix, will depend on which basis in \(\mathbb {R} ^{n}\) we choose to represent \(\mathbf {x}\).
We would like to pick some basis so that the final shape of the matrix is the most simple shape.
Let us pick a set of basis \(v=\left \{ \vec {e}_{1},\vec {e}_{2},\cdots ,\vec {e}_{n}\right \} \) that span \(\mathbb {R}^{n}\) hence \[ \mathbf {x}_{v}=a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n}\] Now \begin {equation} T\mathbf {x}_{v}=T\left ( a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n}\right ) \nonumber \end {equation} And since \(T\) is linear we can rewrite the above as\begin {equation} T\mathbf {x}_{v}=a_{1}T\vec {e}_{1}+a_{2}T\vec {e}_{2}+\cdots +a_{n}T\vec {e}_{n}\tag {1} \end {equation} We see from above that the new transformed vector \(T\mathbf {x}_{v}\) has the same coordinates as \(\mathbf {x}_{v}\) if we view \(T\vec {e}_{i}\) as the new basis.
Now \(T\vec {e}_{i}\) itself is an application of the linear transformation but now it is being done on each basis vector \(\vec {e}_{i}\). Hence it will cause each specific basis vector to transform in some manner to a new vector. Whatever this new vector is, we must be able to represent it in terms of the same basis vectors \(\left \{ \vec {e}_{1},\vec {e}_{2},\cdots ,\vec {e}_{n}\right \} ,\) therefore, we write\[ T\vec {e}_{1}=\xi _{11}\vec {e}_{1}+\xi _{21}\vec {e}_{2}+\cdots +\xi _{n1}\vec {e}_{n}\] Where \(\xi _{ij}\) is the \(i^{th}\) coordinate of the vector \(T\vec {e}_{j}\). And we do the same for \(T\vec {e}_{2}\)\[ T\vec {e}_{2}=\xi _{12}\vec {e}_{1}+\xi _{22}\vec {e}_{2}+\cdots +\xi _{n2}\vec {e}_{n}\] And so on until\[ T\vec {e}_{n}=\xi _{1n}\vec {e}_{1}+\xi _{2n}\vec {e}_{2}+\cdots +\xi _{nn}\vec {e}_{n}\] Or\[ T\vec {e}_{j}=\xi _{ij}\vec {e}_{i}\] Now plug the above into (1) and obtain\begin {align*} T\mathbf {x}_{v} & =a_{1}\left ( \xi _{11}\vec {e}_{1}+\xi _{21}\vec {e}_{2}+\cdots +\xi _{n1}\vec {e}_{n}\right ) \\ & +a_{2}\left ( \xi _{12}\vec {e}_{1}+\xi _{22}\vec {e}_{2}+\cdots +\xi _{n2}\vec {e}_{n}\right ) \\ & +\cdots \\ & +a_{n}\left ( \xi _{1n}\vec {e}_{1}+\xi _{2n}\vec {e}_{2}+\cdots +\xi _{nn}\vec {e}_{n}\right ) \end {align*}
Hence, if we take the \(\vec {e}_{i}\) as common factors, we have\begin {align*} T\mathbf {x} & =\vec {e}_{1}\left ( a_{1}\xi _{11}+a_{2}\xi _{12}+\cdots +a_{n}\xi _{1n}\right ) \\ & +\vec {e}_{2}\left ( a_{1}\xi _{21}+a_{2}\xi _{22}+\cdots +a_{n}\xi _{2n}\right ) \\ & +\cdots \\ & +\vec {e}_{n}\left ( a_{1}\xi _{n1}+a_{2}\xi _{n2}+\cdots +a_{n}\xi _{nn}\right ) \end {align*}
Hence, since \(T\mathbf {x}_{v}=a_{1}T\vec {e}_{1}+a_{2}T\vec {e}_{2}+\cdots +a_{n}T\vec {e}_{n}\) from (1), then by comparing coefficients of each basis vector \(\vec {e}_{i}\) we obtain\begin {align*} a_{1}T & =a_{1}\xi _{11}+a_{2}\xi _{12}+\cdots +a_{n}\xi _{1n}\\ a_{2}T & =a_{1}\xi _{21}+a_{2}\xi _{22}+\cdots +a_{n}\xi _{2n}\\ & \cdots \\ a_{n}T & =a_{1}\xi _{n1}+a_{2}\xi _{n2}+\cdots +a_{n}\xi _{nn} \end {align*}
Or in matrix form\[ T\begin {pmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{n}\end {pmatrix} =\begin {pmatrix} \xi _{11} & \xi _{12} & \cdots & \xi _{1n}\\ \xi _{21} & \xi _{22} & \cdots & \xi _{2n}\\ \vdots & \vdots & \cdots & \vdots \\ \xi _{n1} & \xi _{n2} & \cdots & \xi _{nn}\end {pmatrix}\begin {pmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{n}\end {pmatrix} \] Hence we finally obtain that\[ T\Rightarrow \left [ T\right ] _{v}=\begin {pmatrix} \xi _{11} & \xi _{12} & \cdots & \xi _{1n}\\ \xi _{21} & \xi _{22} & \cdots & \xi _{2n}\\ \vdots & \vdots & \cdots & \vdots \\ \xi _{n1} & \xi _{n2} & \cdots & \xi _{nn}\end {pmatrix} \] Let us call the matrix that represents \(T\) under basis \(v\) as \(\left [ T\right ] _{v}.\)
We see from the above that the \(j^{th}\) column of \(\left [ T\right ] _{v}\) contain the coordinates of the vector \(T\vec {e}_{j}\). This gives us a quick way to construct \(\left [ T\right ] _{v}\): Apply \(T\) to each basis vector \(\vec {e}_{j}\), and take the resulting vector and place it in the \(j^{th}\) column of \(\left [ T\right ] _{v}\).
We now see that \(\left [ T\right ] _{v}\) will have a different numerical values if the basis used to span \(\mathbb {R} ^{n}\) are different from the ones used to obtain the above \(\left [ T\right ] _{v}.\)
Let use pick some new basis, say \(v^{\prime }=\left \{ \vec {e}_{1}^{\prime },\vec {e}_{2}^{\prime },\cdots ,\vec {e}_{n}^{\prime }\right \} \). Let the new representation of \(T\) now be the matrix \(\left [ T\right ] _{v^{\prime }}\), then the \(j^{th}\) column of \(\left [ T\right ] _{v^{\prime }}\) is found from applying \(T\) on the new basis \(\vec {e}_{j}^{\prime }\) \[ T\vec {e}_{j}^{\prime }=\zeta _{ij}\vec {e}_{i}^{\prime }\] Where now \(\zeta _{ij}\) is the \(i^{th}\) coordinates of the the vector \(T\vec {e}_{j}^{\prime }\) which will be different from \(\xi _{ij}\) since in one case we used the basis set \(v^{\prime }=\left \{ \vec {e}_{i}^{\prime }\right \} \) and in the other case we used the basis set \(v=\left \{ \vec {e}_{i}\right \} \). Hence we see that \(\left [ T\right ] _{v^{\prime }}\) will numerically be different depending on the basis used to span the space even though the linear transformation \(T\) itself did not change.
Now we show how to determine \(\left [ M\right ] _{v\to v^{\prime }}\), the matrix which when applied to a vector \(\mathbf {x}_{v}\) will result in the vector \(\mathbf {x}_{v^{\prime }}.\)
Given a vector \(\mathbf {x}\), it is represented w.r.t. basis \(v=\left \{ \vec {e}_{1},\vec {e}_{2},\cdots ,\vec {e}_{n}\right \} \) as \[ \mathbf {x}_{v}=a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n}\] and w.r.t. basis \(v^{\prime }=\left \{ \vec {e}_{1}^{\prime },\vec {e}_{2}^{\prime },\cdots ,\vec {e}_{n}^{\prime }\right \} \) as\[ \mathbf {x}_{v^{\prime }}=b_{1}\vec {e}_{1}^{\prime }+b_{2}\vec {e}_{2}^{\prime }+\cdots +b_{n}\vec {e}_{n}^{\prime }\]
But the vector itself is invariant under any change of basis, hence \(\mathbf {x}_{v}=\mathbf {x}_{v^{\prime }}\)\begin {equation} a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n}=b_{1}\vec {e}_{1}^{\prime }+b_{2}\vec {e}_{2}^{\prime }+\cdots +b_{n}\vec {e}_{n}^{\prime }\tag {1} \end {equation} Now write each basis \(\vec {e}_{i}^{\prime }\) in terms of the basis \(\vec {e}_{j}\), hence we have\begin {align} \vec {e}_{1}^{\prime } & =c_{11}\vec {e}_{1}+c_{12}\vec {e}_{2}+\cdots +c_{1n}\vec {e}_{n}\nonumber \\ \vec {e}_{2}^{\prime } & =c_{21}\vec {e}_{1}+c_{22}\vec {e}_{2}+\cdots +c_{2n}\vec {e}_{n}\nonumber \\ & \vdots \nonumber \\ \vec {e}_{n}^{\prime } & =c_{n1}\vec {e}_{1}+c_{n2}\vec {e}_{2}+\cdots +c_{nn}\vec {e}_{n}\tag {2} \end {align}
Substitute (2) into RHS of (1) we obtain\begin {align*} a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n} & =b_{1}\left ( c_{11}\vec {e}_{1}+c_{12}\vec {e}_{2}+\cdots +c_{1n}\vec {e}_{n}\right ) +\\ & b_{2}\left ( c_{21}\vec {e}_{1}+c_{22}\vec {e}_{2}+\cdots +c_{2n}\vec {e}_{n}\right ) +\\ & \cdots \\ & +b_{n}\left ( c_{n1}\vec {e}_{1}+c_{n2}\vec {e}_{2}+\cdots +c_{nn}\vec {e}_{n}\right ) \end {align*}
Factor out the basis \(\vec {e}_{i}\) from the RHS, we obtain\begin {align*} a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n} & =\left ( b_{1}c_{11}\vec {e}_{1}+b_{1}c_{12}\vec {e}_{2}+\cdots +b_{1}c_{1n}\vec {e}_{n}\right ) +\\ & \left ( b_{2}c_{21}\vec {e}_{1}+b_{2}c_{22}\vec {e}_{2}+\cdots +b_{2}c_{2n}\vec {e}_{n}\right ) +\\ & \cdots \\ & +\left ( b_{n}c_{n1}\vec {e}_{1}+b_{n}c_{n2}\vec {e}_{2}+\cdots +b_{n}c_{nn}\vec {e}_{n}\right ) \end {align*}
Hence\begin {align*} a_{1}\vec {e}_{1}+a_{2}\vec {e}_{2}+\cdots +a_{n}\vec {e}_{n} & =\vec {e}_{1}\left ( b_{1}c_{11}+b_{2}c_{21}+\cdots +b_{n}c_{n1}\right ) +\\ & \vec {e}_{2}\left ( b_{1}c_{12}+b_{2}c_{22}+\cdots +b_{n}c_{n2}\right ) +\\ & \cdots \\ & +\vec {e}_{n}\left ( b_{1}c_{1n}+b_{2}c_{2n}+\cdots +b_{n}c_{nn}\right ) \end {align*}
Now comparing coefficients of each of the basis \(\vec {e}_{i}\) we obtain the following result\begin {align*} a_{1} & =b_{1}c_{11}+b_{2}c_{21}+\cdots +b_{n}c_{n1}\\ a_{2} & =b_{1}c_{12}+b_{2}c_{22}+\cdots +b_{n}c_{n2}\\ & \cdots \\ a_{n} & =b_{1}c_{1n}+b_{2}c_{2n}+\cdots +b_{n}c_{nn} \end {align*}
Or in Matrix form, we write\begin {equation} \begin {bmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{n}\end {bmatrix} =\begin {bmatrix} c_{11} & c_{21} & c_{31} & \cdots & c_{n1}\\ c_{12} & c_{22} & c_{32} & \cdots & c_{n2}\\ \vdots & \cdots & \cdots & \ddots & \vdots \\ c_{1n} & c_{2n} & c_{3n} & \cdots & c_{nn}\end {bmatrix}\begin {bmatrix} b_{1}\\ b_{2}\\ \vdots \\ b_{n}\end {bmatrix} \tag {3} \end {equation} The above gives a relation between the coordinates of the vector \(\mathbf {x}\) w.r.t. basis \(v\) (these are the \(a_{i}\) coordinates) to the coordinates of the same vector w.r.t. to basis \(v^{\prime }\) (these are the coordinates \(b_{j}\)). The mapping between these coordinates is the matrix shown above which we call the \(\left [ M\right ] \) matrix. Since the above matrix returns the coordinates of the vector w.r.t. \(v\) when it is multiplied by the coordinates of the vector w.r.t. basis \(v^{\prime }\), we write it as \(\left [ M\right ] _{v^{\prime }\rightarrow v}\) to make it clear from which basis to which basis is the conversion taking place.
Looking at (2) and (3) above, we see that column \(j\) in \(\left [ M\right ] _{v^{\prime }\to v}\) is the coordinates of the \(\vec {e}_{j}^{\prime }\) w.r.t. basis \(v\).
Hence the above gives a method to generate the change of basis matrix \(\left [ M\right ] _{v^{\prime }\to v}\). Simply represent each basis in \(v^{\prime }\) w.r.t. to basis \(v\). Take the result of each such representation and write it as a column in \(\left [ M\right ] _{v^{\prime }\to v}\). We now show few examples to illustrate this.
Example showing how to find change of basis matrix
Given the \(\mathbb {R}^{2}\) basis \(v=\left \{ \vec {e}_{1},\vec {e}_{2}\right \} =\left \{ \begin {bmatrix} 1\\ 0 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \), and new basis \(v^{\prime }=\left \{ \vec {e}_{1}^{\prime },\vec {e}_{2}^{\prime }\right \} =\left \{ \begin {bmatrix} 1\\ 1 \end {bmatrix} ,\begin {bmatrix} -1\\ 1 \end {bmatrix} \right \} \), find the change of basis matrix \(\left [ M\right ] _{v\rightarrow v^{\prime }}\)
Column 1 of \(\left [ M\right ] _{v\rightarrow v^{\prime }}\) is the coordinates of \(\vec {e}_{1}\) w.r.t. basis \(v^{\prime }\). i.e. \begin {equation} \vec {e}_{1}=c_{11}\vec {e}_{1}^{\prime }+c_{21}\vec {e}_{2}^{\prime }\tag {4} \end {equation} and column 2 of \(\left [ M\right ] _{v\rightarrow v^{\prime }}\) is the coordinates of \(\vec {e}_{2}\) w.r.t. basis \(v^{\prime }\). i.e. \begin {equation} \vec {e}_{2}=c_{12}\vec {e}_{1}^{\prime }+c_{22}\vec {e}_{2}^{\prime }\tag {5} \end {equation} But (4) is\[\begin {bmatrix} 1\\ 0 \end {bmatrix} =c_{11}\begin {bmatrix} 1\\ 1 \end {bmatrix} +c_{21}\begin {bmatrix} -1\\ 1 \end {bmatrix} \] Hence \(1=c_{11}-c_{21}\) and \(0=c_{11}+c_{21}\), solving, we obtain \(c_{11}=\frac {1}{2},c_{21}=\frac {-1}{2}\) and (5) is\[\begin {bmatrix} 0\\ 1 \end {bmatrix} =c_{12}\begin {bmatrix} 1\\ 1 \end {bmatrix} +c_{22}\begin {bmatrix} -1\\ 1 \end {bmatrix} \] Hence \(0=c_{12}-c_{22}\) and \(1=c_{12}+c_{22}\), solving we obtain \(c_{12}=\frac {1}{2}\), \(c_{22}=\frac {1}{2}\), hence\[ \left [ M\right ] _{v\rightarrow v^{\prime }}=\begin {bmatrix} \frac {1}{2} & \frac {1}{2}\\ \frac {-1}{2} & \frac {1}{2}\end {bmatrix} \] Now we can use the above change of basis matrix to find coordinates of the vector under \(v^{\prime }\) given its coordinates under \(v\). For example, given \(\mathbf {x}_{v}=\begin {bmatrix} 10\\ 5 \end {bmatrix} \), then \[ \mathbf {x}_{v^{\prime }}=\begin {bmatrix} \frac {1}{2} & \frac {1}{2}\\ \frac {-1}{2} & \frac {1}{2}\end {bmatrix}\begin {bmatrix} 10\\ 5 \end {bmatrix} =\begin {bmatrix} 7.5\\ -2.5 \end {bmatrix} \]
Example 1 This example from Strang book (Linear Algebra and its applications, 4th edition), but shown here with little more details.
Consider \(\mathbb {R} ^{2}\), and let \(T\) be the projection onto a line at angle \(\theta =135^{0}\). Let the first basis \(v=\left \{ \begin {bmatrix} 1\\ 0 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \) and let the second basis \(v^{\prime }=\left \{ \begin {bmatrix} 1\\ -1 \end {bmatrix} ,\begin {bmatrix} 1\\ 1 \end {bmatrix} \right \} \).
The first step is to find \(\left [ M\right ] _{v^{\prime }\rightarrow v}\). The first column of \(\left [ M\right ] _{v^{\prime }\rightarrow v}\) is found by writing \(e_{1}^{\prime }\)w.r.t. basis \(v\), and the second column of \(\left [ M\right ] _{v^{\prime }\rightarrow v}\) is found by writing \(e_{2}^{\prime }\)w.r.t. basis \(v\), Hence \begin {align*} e_{1}^{\prime } & =c_{11}e_{1}+c_{21}e_{2}\\\begin {bmatrix} 1\\ -1 \end {bmatrix} & =c_{11}\begin {bmatrix} 1\\ 0 \end {bmatrix} +c_{21}\begin {bmatrix} 0\\ 1 \end {bmatrix} \end {align*}
Or \(1=c_{11}\) and \(c_{21}=-1\). And\begin {align*} e_{2}^{\prime } & =c_{12}e_{1}+c_{22}e_{2}\\\begin {bmatrix} 1\\ 1 \end {bmatrix} & =c_{12}\begin {bmatrix} 1\\ 0 \end {bmatrix} +c_{22}\begin {bmatrix} 0\\ 1 \end {bmatrix} \end {align*}
Hence \(c_{12}=1\) and \(c_{22}=1\). Hence \[ \left [ M\right ] _{v^{\prime }\rightarrow v}=\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \] Now we need to find \(\left [ T\right ] _{v}\) the representation of \(T\) w.r.t. basis \(v\). The first column of \(\left [ T\right ] _{v}\) is the new coordinates of the basis \(v_{1}\) after applying \(T\) onto it. but \begin {align*} Te_{1} & =T\begin {bmatrix} 0\\ 1 \end {bmatrix} \\ & =\begin {bmatrix} 0.5\\ -0.5 \end {bmatrix} \end {align*}
and the second column of \(\left [ T\right ] _{v}\) is the new coordinates of the basis \(v_{2}\) after applying \(T\) onto it. but \begin {align*} Te_{2} & =T\begin {bmatrix} 1\\ 0 \end {bmatrix} \\ & =\begin {bmatrix} -0.5\\ 0.5 \end {bmatrix} \end {align*}
Hence \[ \left [ T\right ] _{v}=\begin {bmatrix} 0.5 & -0.5\\ -0.5 & 0.5 \end {bmatrix} \] Therefore\begin {align*} \lbrack T]_{v^{\prime }} & =[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\\ & =\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} ^{-1}\begin {bmatrix} 0.5 & -0.5\\ -0.5 & 0.5 \end {bmatrix}\begin {bmatrix} 1 & 1\\ -1 & 1 \end {bmatrix} \\ \left [ T\right ] _{v^{\prime }} & =\begin {bmatrix} 1 & 0\\ 0 & 0 \end {bmatrix} \end {align*}
Hence we see that the linear transformation \(T\) is represented as \(\begin {bmatrix} 1 & 0\\ 0 & 0 \end {bmatrix} \) w.r.t. basis \(v^{\prime }\), while it is represented as \(\begin {bmatrix} 0.5 & -0.5\\ -0.5 & 0.5 \end {bmatrix} \)w.r.t. basis \(v^{\prime }\). Therefore, it will be better to perform all calculations involving this linear transformation under basis \(v^{\prime }\) instead of basis \(v\)
Consider \(\mathbb {R} ^{2}\), and let \(T\) be a rotation of space by \(\theta =\frac {\pi }{4}\). Let the first basis \(v=\left \{ \begin {bmatrix} 1\\ 0 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \) and let the second basis be \(v^{\prime }=\left \{ \begin {bmatrix} 1\\ 1 \end {bmatrix} ,\begin {bmatrix} -1\\ 1 \end {bmatrix} \right \} \).
The first step is to find \(\left [ M\right ] _{v^{\prime }\rightarrow v}\). The first column of \(\left [ M\right ] _{v^{\prime }\rightarrow v}\) is found by writing \(e_{1}^{\prime }\)w.r.t. basis \(v\), and the second column of \(\left [ M\right ] _{v^{\prime }\rightarrow v}\) is found by writing \(e_{2}^{\prime }\)w.r.t. basis \(v\), Hence \begin {align*} e_{1}^{\prime } & =c_{11}e_{1}+c_{21}e_{2}\\\begin {bmatrix} 1\\ 1 \end {bmatrix} & =c_{11}\begin {bmatrix} 1\\ 0 \end {bmatrix} +c_{21}\begin {bmatrix} 0\\ 1 \end {bmatrix} \end {align*}
Or \(1=c_{11}\) and \(c_{21}=1\) and\begin {align*} e_{2}^{\prime } & =c_{12}e_{1}+c_{22}e_{2}\\\begin {bmatrix} -1\\ 1 \end {bmatrix} & =c_{12}\begin {bmatrix} 1\\ 0 \end {bmatrix} +c_{22}\begin {bmatrix} 0\\ 1 \end {bmatrix} \end {align*}
Hence \(c_{12}=-1\) and \(c_{22}=1\). Hence \[ \left [ M\right ] _{v^{\prime }\rightarrow v}=\begin {bmatrix} 1 & -1\\ 1 & 1 \end {bmatrix} \] Now we need to find \(\left [ T\right ] _{v}\) the representation of \(T\) w.r.t. basis \(v\). The first column of \([T]_{v}\) is the new coordinates of the basis \(v_{1}\) after applying \(T\) onto it. but \begin {align*} Te_{1} & =T\begin {bmatrix} 0\\ 1 \end {bmatrix} \\ & =\begin {bmatrix} \cos \theta \\ \sin \theta \end {bmatrix} \end {align*}
and the second column of \(\left [ T\right ] _{v}\) is the new coordinates of the basis \(v_{2}\) after applying \(T\) onto it. but \begin {align*} Te_{2} & =T\begin {bmatrix} 1\\ 0 \end {bmatrix} \\ Te_{2} & =\begin {bmatrix} -\sin \theta \\ \cos \theta \end {bmatrix} \end {align*}
Hence \begin {align*} \left [ T\right ] _{v} & =\begin {bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end {bmatrix} \\ & =\begin {bmatrix} \cos \frac {\pi }{4} & -\sin \frac {\pi }{4}\\ \sin \frac {\pi }{4} & \cos \frac {\pi }{4}\end {bmatrix} \\ \left [ T\right ] _{v} & =\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \end {align*}
Therefore\begin {align*} \lbrack T]_{v^{\prime }} & =[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\\ & =\begin {bmatrix} 1 & -1\\ 1 & 1 \end {bmatrix} ^{-1}\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix}\begin {bmatrix} 1 & -1\\ 1 & 1 \end {bmatrix} \\ & =\begin {bmatrix} \frac {1}{2} & \frac {1}{2}\\ -\frac {1}{2} & \frac {1}{2}\end {bmatrix}\begin {bmatrix} 0 & -\sqrt {2}\\ \sqrt {2} & 0 \end {bmatrix} \\ & =\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \end {align*}
Hence we see that the linear transformation \(T\) is represented as \(\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \) w.r.t. basis \(v^{\prime }\), while it is represented as \(\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \)w.r.t. basis \(v^{\prime }\). Therefore the change of basis selected above did not result in making the representation of \(T\) any simpler (in fact there was no change in the representation). This means we need to find a more direct method of finding the basis under which \(T\) has the simplest representation. Clearly we can’t just keep trying different basis to find if \(T\) has simpler representation under the new basis.
Our goal of simpler representation of \(T\) is that of a diagonal matrix or as close as possible to being diagonal matrix (i.e. Jordan form). Given \(\left [ T\right ] _{v}\) and an infinite number of basis \(v^{\prime }\) that we could select to represent \(T\) under in the hope we can find a new representation \(\left [ T\right ] _{v^{\prime }}\) such that it is simpler than \(\left [ T\right ] _{v}\), we now ask, how does one go about finding such basis?
It turns out the if we select the eigenvectors of \(\left [ T\right ] _{v}\) as the columns of \(\left [ M\right ] _{v^{\prime }\rightarrow v}\), this will result in \(\left [ T\right ] _{v^{\prime }}\) being diagonal or as close to being diagonal as possible (block diagonal or Jordan form).
Let us apply this to the second example in the previous section. In that example, we had\[ \left [ T\right ] _{v}=\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \] The eigenvectors of \(\left [ T\right ] _{v}\) are \(\begin {bmatrix} -i\\ 1 \end {bmatrix} \) and \(\begin {bmatrix} i\\ 1 \end {bmatrix} \), \(\left [ M\right ] _{v^{\prime }\rightarrow v}=\begin {bmatrix} -i & i\\ 1 & 1 \end {bmatrix} \). Therefore\begin {align*} \lbrack T]_{v^{\prime }} & =[M]_{v^{\prime }\rightarrow v}^{-1}[T]_{v}[M]_{v^{\prime }\rightarrow v}\\ & =\begin {bmatrix} -i & i\\ 1 & 1 \end {bmatrix} ^{-1}\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix}\begin {bmatrix} -i & i\\ 1 & 1 \end {bmatrix} \\ \left [ T\right ] _{v^{\prime }} & =\begin {bmatrix} \left ( \frac {1}{2}-\frac {1}{2}i\right ) \sqrt {2} & 0\\ 0 & \left ( \frac {1}{2}+\frac {1}{2}i\right ) \sqrt {2}\end {bmatrix} \end {align*}
Which is diagonal representation. Hence we see that the linear transformation \(T\) is represented as \(\begin {bmatrix} \left ( \frac {1}{2}-\frac {1}{2}i\right ) \sqrt {2} & 0\\ 0 & \left ( \frac {1}{2}+\frac {1}{2}i\right ) \sqrt {2}\end {bmatrix} \) w.r.t. basis \(v^{\prime }\)
To obtain \(B=M^{-1}AM\) such that \(B\) is real and diagonal requires that \(A\) be real and symmetric. The eigenvalues of \(A\) goes into the diagonal of \(B\) and the eigenvectors of \(A\) go into the columns of \(M.\) This is an algebraic view.
Geometrically, \(A\) is viewed as the matrix representation under some basis \(v\) of a linear transformation \(T\). And \(B\) is the matrix representation of the same linear transformation \(T\) but now under a new basis \(v^{\prime }\) and \(M\) is the matrix that represents the change of basis from \(v^{\prime }\) to \(v\).
The question then immediately arise: If \(A\) must be real and symmetric (for \(B\) to be real and diagonal), what does this mean in terms of the linear transformation \(T\) and change of basis matrix \(M\)? This clearly mean that, under some basis \(v\), not every linear transformation represented by \(A\) can have a similar matrix \(B\) which is real and diagonal. Only those linear transformations which result in real and symmetric representation can have a similar matrix which is real and diagonal. This is shown in the previous examples, where we saw that \(T\) defined as rotation by \(45^{0}\) under the standard basis \(v=\left \{ \begin {bmatrix} 1\\ 0 \end {bmatrix} ,\begin {bmatrix} 0\\ 1 \end {bmatrix} \right \} \) resulted in \(A=\begin {bmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}}\\ \frac {1}{\sqrt {2}} & \frac {1}{\sqrt {2}}\end {bmatrix} \) and since this is not symmetric, hence we will not be able to factor this matrix using similarity transformation to a diagonal matrix, no matter which change of basis we try to represent \(T\) under. The question then, under a given basis \(v\) what is the class of linear transformation which leads to a matrix representation that is symmetric? One such example was shown above which is the projection into the line at \(135^{0}\). This question needs more time to look into.
We now write \(\Lambda \) to represent a diagonal matrix, hence the similarity transformation above can be written as\[ \Lambda =M^{-1}AM \] Or\[ A=M\Lambda M^{-1}\]
Using similarity transformation, we found that for a real and symmetric matrix \(A\) we are able to decompose it as \(A=M\Lambda M^{-1}\) where \(\Lambda \) is diagonal and contains the eigenvalues of \(A\) on the diagonal, and \(M\) contains the right eigenvectors of \(A\) in its columns.
Right eigenvectors are the standard eigenvectors we have been talking about all the time. When \(\lambda \) is an eigenvalues of \(A\) then \(\mathbf {x}\) is a right eigenvector when we write\[ A\mathbf {x}=\lambda \mathbf {x}\] However, \(\mathbf {x}\) is a left eigenvector of \(A\) when we have\[ \mathbf {x}^{H}A=\lambda \mathbf {x}^{H}\] where \(\mathbf {x}^{H}\) is the Hermitian of \(\mathbf {x}\)
Given any arbitrary matrix \(A_{m\times n}\) it can be factored into 3 matrices as follows \(A_{m\times n}=P_{m\times m}D_{m\times n}Q_{n\times n}\) where \(P\) is a unitary matrix (\(P^{H}=P^{-1}\) or \(P^{H}P=I\)), and \(Q\) is also unitary matrix.
These are the steps to do SVD
The following diagram help illustrate the SVD process described above.
As a conclusion, the following diagram shows the difference between similarity transformation factorization of a matrix and SVD.
Trying to understand SVD from geometrical point view requires more time, and this is left for future research into this subject.