The test problem used is \[ \frac{\partial ^{2}u}{\partial x^{2}}+\frac{\partial ^{2}u}{\partial y^{2}}=0 \] on the unit square \(\left [ \left ( 0,1\right ) ,\left ( 0,1\right ) \right ] \) with zero boundary conditions.
The above problem is solved using the numerical method of conjugate gradient iterative solver. The mesh spacings used is \[ h=\left \{ \frac{1}{16},\frac{1}{32},\frac{1}{64},\frac{1}{128}\right \} \] and the tolerance used to check for convergence is\[ \varepsilon =10^{-6}\] The solver terminates when the mesh norm of the residual becomes smaller than the above quantity using the following check
\[ \sqrt{h}\left \Vert r^{\left ( k\right ) }\right \Vert _{2}<\varepsilon \] Where in the above, \(r^{\left ( k\right ) }\) is the residual at the \(k^{th}\) iteration and \(h\) is the current value of the mesh spacing.
The reason for using \(zero\) as the driving force on the RHS of the pde, is to allow the calculation and tracking of the error at each iteration as the exact solution \(u_{exact}\) for this problem is now known, which is zero. Now the error at each iteration \(k\) to be found using \begin{align*} e^{\left ( k\right ) } & =\left \Vert u_{exact}-u^{\left ( k\right ) }\right \Vert \\ & =\left \Vert u^{\left ( k\right ) }\right \Vert \end{align*}
Where in the above \(u^{\left ( k\right ) }\) represents the approximate solution at the \(k^{th}\) iteration.
A Matlab function \(CG.m\) was written to implement conjugate gradient solver. One of the parameters this function accepts is the name of the preconditioner to use. The following preconditioners are supported\[ \text{NONE,\ SSOR,\ MultiGrid,\ IncompleteCholesky}\] When NONE is specified, then no preconditioning is done.
For each preconditioner, the solver was run to find the solution to the above test problem. The initial guess for the solution \(u^{\left ( 0\right ) }\) used was generated using Matlab rand() function.
For each preconditioner the following plots were generated
In addition to the above plots, for each mesh spacing \(h\), the actual result table is printed which tabulates the above values at each iteration. This table was used to generate the above plots. The printed tables also show the ratio of the value of norm of the residual at the current iteration to its value at the previous iteration, similarly for the error norm.
Due to the large size of these tables, the tables for all the spacings and for each solver are available in the appendix.
Conjugate gradient algorithm description
The idea of conjugate gradient is to use preconditioning matrix to speed up the convergence of the conjugate gradient method. The original problem\[ Ax=f \] is transformed to a new problem\[ M^{-1}Ax=M^{-1}f \] such that \(M^{-1}A\) has a smaller condition number than \(A\). For most iterative solvers, the rate of convergence increases as the condition number of the system matrix \(A\) decreases.
The conjugate gradient method works only on symmetric positive definite \(A\) matrix, and its speed of convergence is affected by the distribution of the eigenvalues of the \(A\) matrix. The estimate of convergence is more accurate if the distribution of eigenvalues is uniform. For the discrete 2D Poisson problem, this is the case, as verified by plots of the spectrum generated below for each case.
The preconditioning is used to modify the spectrum of \(A\) so that the eigenvalues of the new system matrix \(M^{-1}A\) become more clustered together causing the condition number to become smaller and thus increasing the convergence rate.
The following table was generated to show the effect of preconditioning on lowering the condition number. It shows the condition number for CG (in other words, for the \(A\) matrix only), and then the condition number for \(M^{-1}A\) for different solvers as mesh spacing is changed. It also shows below the condition number value, in a box, the maximum eigenvalue and the minimum eigenvalue. Notice that in the following table, if one tries to apply the\(\frac{\left \vert \max \lambda \right \vert }{\left \vert \min \lambda \right \vert }\)to determine the condition number, then the result will not match the condition number as shown. The above formula do not apply in this case, as these are sparse matrices and the condition number was found by estimating its value using the Matlab function condest() and not by applying the above formula.
solver | \(h=\frac{1}{16},N=225\) | \(h=\frac{1}{32},N=961\) | \(h=\frac{1}{64},N=3969\) | \(h=\frac{1}{128},N=16129\) |
NONE | \(150,(7.923,0.0768)\) | \(603,(7.9807,0.01926)\) | \(2413,(7.9995,0.0048)\) | \(9655,(7.9987,0.001205)\) |
\(SSOR\) | \(33,(0.25,0.018203)\) | \(128,(0.25,0.004748)\) | \(511,(0.25,0.0012)\) | |
IncCholesky \(\varepsilon =10^{-2}\) | \(32,(2.365,0.2279)\) | \(108,(2.44,0.0585)\) | \(422,(2.537,0.0146)\) | |
IncCholesky \(\varepsilon =10^{-3}\) | \(48,(2.337,0.508)\) | \(153,(2.526,0.1603)\) | \(373,(2.592,0.041756)\) | |
This diagram below reflects the above table result to clearly show the reduction of the condition number as a result of preconditioning.
CG Algorithm pseudocode
The following is the algorithm used for the implementation of conjugate gradient with preconditioning.\[\begin{array} [c]{l}Input:\ A,f,tol,preconditionSolverName,dropTol\\ Output:\ \tilde{u}\ approximate\ solution\ to\ Ax=0\\ \\ u_{0}=\operatorname{rand}()\ (\ast initial\ guess\ of\ solution\ \ast )\\ r_{0}=f-Au_{0}\ (\ast initial\ residual\ast )\\ z_{0}\leftarrow CALL\ \ preconditionSolver(r_{0},A,preconditionSolverName,dropTol)\\ p_{0}=z_{0}\\ FOR\ k=1,2,3,\cdots \\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}e_{k-1}=\sqrt{h}\left \Vert u_{k-1}\right \Vert _{2}\ \ (\ast \ the\ error\ since\ u_{exact}\ is\ known\ to\ be\ zero\ast )\\ \omega _{k-1}=Ap_{k-1}\\ \alpha _{k-1}=\frac{z_{k-1}^{T}r_{k-1}}{p_{k-1}^{T}\omega _{k-1}}\\ u_{k}=u_{k-1}+\alpha _{k-1\ }p_{k-1}\\ r_{k}=r_{k-1}-\alpha _{k-1}\omega _{k-1}\\ IF\ \left ( \sqrt{h}\left \Vert r_{k}\right \Vert _{2}\right ) <tol\ THEN\\ \ \ \ \ \ RETURN\ u_{k}\\ END\ IF\\ z_{k}\leftarrow CALL\ \ preconditionSolver(r_{k},A,preconditionSolverName,dropTol)\\ \beta _{k-1}=\frac{z_{k}^{T}r_{k}}{z_{k-1}^{T}r_{k-1}}\\ p_{k}=z_{k}+\beta _{k-1}p_{k-1}\end{array} \\ END\ FOR \end{array} \] The algorithm for the function \(preconditionSolver()\) is as follows\[\begin{array} [c]{l}Input:\ r,A,preconditionSolverName,dropTol\\ Output:\ z\ approximate\ solution\ to\ Mz=r\\ CASE\ preconditionSolverName\ IS\\ \ \ \ \ \ \ \ \ \begin{array} [c]{l}WHEN\ NONE\ z\leftarrow r\ \ \ \ \ \ \ \ \ \ //\ no\ preconditioning\ \\ WHEN\ MultiGrid\\ \ \ \ \ \ \ \begin{array} [c]{l}\mu _{1}=\mu _{2}=1\ (\ast \ presmoother\ and\ postsmoother\ast )\\ z\leftarrow CALL\ VCYCLE(zeroInitialGuess,r)\\ \ \ \ //VCYCLE\ is\ one\ implemented\ in\ HW4\ but\ changed\ to\ do\ \\ \ \ \ //one\ forward\ Gauss-Seidel/red-black\ followed\ by\ \\ \ \ \ //one\ reverse\ Gauss-Seidel/red-black\ \end{array} \\ WHEN\ \ SSOR\\ \ \ \ \ \ \ z\leftarrow CALL\ SOR\ forward\ followed\ by\ SOR\ in\ reverse\\ WHEN\ IncompleteCholesky\\\begin{array} [c]{l}R=cholinc(A,dropTol)\\ z\leftarrow R\backslash (R^{T}\backslash r) \end{array} \end{array} \\ END\ CASE\\ RETURN\ z \end{array} \]
In addition to finding the number of iterations needed for convergence by each solver, the problem also asked to compare the efficiency of each solver. This is done by finding the work needed by each solver to converge.
Work needed is defined as\[ Work=NumberOfIterations\ \times \ WorkPerIteration \] Before determining the work for each solver, the following table lists the \(\boldsymbol{cputime}\) used by each solver for the different spacings. The cpu time is measured using Matlab cputime function, and measures only the call to CG() and does not include any other calls such as plotting.
preconditioning | \(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \) | \(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \) | \(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \) | \(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \) |
NONE | \(0.19\) | \(0.37\) | \(13.48\) | \(556.6\) |
Multigrid | \(0.34\) | \(0.6\) | \(13.48\) | \(566\) |
SSOR | \(0.23\) | \(0.5\) | \(13.3\) | \(564.6\) |
Incomplete Cholesky\(\ \varepsilon =10^{-2}\) | \(0.16\) | \(0.34\) | \(13.8\) | \(559\) |
Incomplete Cholesky\(\ \varepsilon =10^{-3}\) | \(0.19\) | \(0.5\) | \(13.3\) | \(559\) |
Surprisingly, no appreciable difference can be seen between the different solvers in terms of cpu time. It was expected that NONE would have the largest CPU time as it has the lowest efficiency. This result can be attributed to using small number of \(N\) values, which was not large enough in the limit to reflect the difference. One needs to use much larger values of \(N\) to see the effect of preconditioning on CPU time difference. Due to memory limitation, this was not possible to implement at this time. Now the work per iteration is analyzed.
All solvers perform similar work per iterations except for the step needed to apply the preconditioning to determine \(z_{k}\). The only difference between not using preconditioning and using one, is in the step to solve for \(z\) in \(Mz=r\). Using work per iteration as \(O\left ( N\right ) \) for the base CG with no preconditioning, then the following can be defined for work per iteration for each solver:
From lectures notes, it was found that the error rate in conjugate gradient (with no preconditioning) behaves as\[ \left \Vert e_{k}\right \Vert _{A}\leq 2\frac{\sqrt{\Bbbk \left ( A\right ) }-1}{\sqrt{\Bbbk \left ( A\right ) }+1}\left \Vert e_{0}\right \Vert _{A}\] Where \(\Bbbk \left ( A\right ) \) is the condition number of \(A\), it was shown that \(\Bbbk \left ( A\right ) =O\left ( h^{-1}\right ) \) where \(h\) is the mesh spacing. Hence, for fixed tolerance, which is the case here, the number of iterations is \(O\left ( h^{-1}\right ) =O\left ( N^{1/2}\right ) \) where \(N\) is number of unknowns.
The results of the numerical experiment done agrees with the above, as shown below, for the case of NONE (which is the case of conjugate gradient with no preconditioning).
The following table was generated from result of running the program. In this table, \(N\) is the number of unknowns, \(\Bbbk \left ( M^{-1}A\right ) \) is the condition number of \(M^{-1}A\), and \(\Bbbk \left ( A\right ) \) is the condition number of \(A\). Since sparse matrices are used, the Matlab function condest() was used to find the condition numbers. In this table I.C. means Incomplete Cholesky
preconditioner | \(\begin{array} [c]{l}h=\frac{1}{16}\\ N=225 \end{array} \) | \(\Bbbk \left ( M^{-1}A\right ) \) | \(\Bbbk \left ( A\right ) \) | \(\begin{array} [c]{l}h=\frac{1}{32}\\ N=961 \end{array} \) | \(\Bbbk \left ( M^{-1}A\right ) \) | \(\Bbbk \left ( A\right ) \) |
NONE | \(42\) | \(N/A\) | \(150\) | \(82\) | \(N/A\) | \(603\) |
Multigrid | \(4\) | \(150\) | \(4\) | \(603\) | ||
SSOR | \(18\) | \(33\) | \(150\) | \(30\) | \(128\) | \(603\) |
IC \(\varepsilon =10^{-2}\) | \(7\) | \(32\) | \(150\) | \(13\) | \(108\) | \(603\) |
IC \(\varepsilon =10^{-3}\) | \(4\) | \(48\) | \(150\) | \(6\) | \(153\) | \(603\) |
preconditioner | \(\begin{array} [c]{l}h=\frac{1}{64}\\ N=3969 \end{array} \) | \(\Bbbk \left ( M^{-1}A\right ) \) | \(\Bbbk \left ( A\right ) \) | \(\begin{array} [c]{l}h=\frac{1}{128}\\ N=16129 \end{array} \) | \(\Bbbk \left ( M^{-1}A\right ) \) | \(\Bbbk \left ( A\right ) \) |
NONE | \(157\) | \(N/A\) | \(2413\) | \(291\) | N/A | \(9655\) |
Multigrid | \(4\) | \(2413\) | \(4\) | \(9655\) | ||
SSOR | \(56\) | \(511\) | \(2413\) | \(103\) | Memory problem | \(9655\) |
IC \(\varepsilon =10^{-2}\) | \(22\) | \(422\) | \(2413\) | \(39\) | Memory problem | \(9655\) |
IC \(\varepsilon =10^{-3}\) | \(8\) | \(373\) | \(2413\) | \(14\) | Memory problem | \(9655\) |
The following is a plot that represents the above results.
From the above one can see that multigrid has \(O\left ( 1\right ) \) for the number of iterations. The number of iterations was \(4\) for all the cases from \(h=\frac{1}{16}\) to \(h=\frac{1}{128}\). For SSOR, the number of iterations grew sublinear in terms of \(N\), from the above one can estimate this to be \(O\left ( N^{1/4}\right ) \), while for no preconditioning, the number of iterations grew as approximately as \(O\left ( N^{\frac{1}{2}}\right ) \) as predicted by earlier analytical result.
The use of preconditioning on \(A\) caused a reduction of the number of iterations to convergence to the same fixed tolerance when compared to convergence with no preconditioning. This was due to reduction of the condition number of the system matrix as can be seen in the above table. By reducing the largest eigenvalue, the rate of convergence increased. However, preconditioning also adds an extra cost per iteration. The extra work however, was also of order \(N\) and hence the final efficiency was governed by the number of iterations for large \(N.\)
Therefore, this is the result of work efficiency for the main solvers, using the formula of \[ \text{Work}=\text{NumberOfIterations}\ \times \ \text{WorkPerIteration}\]
Incomplete Cholesky was not added as it was hard to estimate from the curve above the number of iterations and since depend on the drop tolerance values.
The above analysis showed that Multigrid is most efficient, followed by Incomplete Cholesky (but these depend on the tolerance drop term), followed by SSOR, and finally by CG with no preconditioning
In conclusion, using Multigrid for preconditioner for conjugate gradient seems to be the most effective solver.
Plots h=1/16
Plots for h=1/16
Plots for h=1/128
Plots for h=1/32
Plots for h=1/64
Plots for h=1/16
Plots for h=1/32
Plots for h=1/64
Plots for h=1/128
Plots for h=1/16
underlinePlots for h=1/32
Plots for \(h=\frac{1}{64}\)
Plots for h=1/128
table for CG with no preconditioner
table for CG with no preconditioner \(h=\frac{1}{16}\)
Tolerance=\(10^{-6}\) method=NONE, Iterations = \(42\), condition number A=\(150.4167\)
Table CG with no preconditioner for \(h=\frac{1}{32}\)
Tolerance=\(10^{-6}\) method=NONE, Iterations =\(82\), condition number A=\(603\)
Table for CG with no preconditioner \(h=\frac{1}{64}\)
Tolerance=\(10^{-6}\) method=NONE, Iterations = \(157\), condition number A=\(2413\)
Table for CG with no preconditioner \(h=\frac{1}{128}\)
Tolerance=\(10^{-6}\) method=NONE, Iterations = \(291\), condition number A=\(9655\)
tables for CG with Multigrid preconditioner
Tables for CG with SSOR preconditioner
Tables for CG with SSOR preconditioner h=1/16
Tables for CG with SSOR preconditioner h=1/32
Tables for CG with SSOR preconditioner h=1/64
Tables for CG with SSOR preconditioner h=1/128
Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{16}\)
Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{32}\)
Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{64}\)
Table CG with incomplete cholesky preconditioner \(\epsilon =10^{-2}\), \(h=\frac{1}{128}\)
Tables for CG with incomplete cholesky preconditioner \(\epsilon =10^{-3}\)