Recent Algorithmic and Computational Efficiency Improvements in the NIMROD Code
S. J. Plimpton, Sandia National Laboratory
C. R. Sovinec and T. A. Gianakon, Los Alamos National Laboratory
S. E. Parker, University of ColoradoBoulder
and The NIMROD Team
presented at the
41st Annual Meeting of the American Physical Society, Division of Plasma Physics, November 1519, 1999
OUTLINE
The NIMROD Code:
The NIMROD Team:
PRESENT TEAM MEMBERS AND ADVISORS
Ahmet Aydemir IFS
James Callen UWI
Ming Chu GA
John Finn LANL
Tom Gianakon LANL
Charlson Kim CUBoulder
Scott Kruger SAIC
JeanNoel Leboeuf UCLA
Richard Nebel LANL
Scott Parker CUBoulder
Steve Plimpton SNL
Nina Popova MSU
Dalton Schnack SAIC
Carl Sovinec LANL
Alfonso Tarditi SAIC
NIMROD FEATURES
EQUATIONS
NIMROD presently solves Maxwell's equations plus the following set of fluid moment relations:
where quasineutrality, , is assumed, w_{p} is the plasma frequency, and r is the mass density.
CHALLENGES
SOLVING ILL CONDITIONED MATRICES
When applied to high temperature magnetized plasma, the NIMROD time advance requires solution of an ill conditioned matrix.
, from the fastest normal mode represented in the numerical system on a given grid. [The smallest eigenvalue is approximately unity.] When simulating slow resistive behavior, we may need to take time steps that are much larger than the Alfven time for propagation across the entire domain. Thus, the condition number may be very large.
We have recently found that global line Jacobi preconditioning in the structured grid region scales well with problem size and time step. [See below.]
As a refresher, a Jacobi iteration scheme is the simplest matrix splitting technique for solving a matrix equation. If we split the matrix A into L+D+U, where D is the diagonal and L and U are the strictly lower and upper triangles, respectively, Jacobi iterations find the solution of
.
The scheme converges provided the magnitude of the largest eigenvalue of D1(L+U) is less than unity. Furthermore, a Jacobi iteration step may be used to precondition the conjugate gradient scheme.
The definition of D may be generalized to improve the effectiveness of each iteration step.
We have found that using the 1D matrices formed by matrix elements along the radial grid lines and along the azimuthal grid lines in the structured grid region produces an effective preconditioner when the semiimplicit matrix is ill conditioned. [This forms two sets of 1D matrices, and the solutions from the two sets are averaged.]
This lineJacobi preconditioning is most effective when the 1D solves are extended as far as possible across the grid. Data for each line may be distributed on different processors. We use nonblocking MPI pointtopoint communication to make the exchanges illustrated in the following schematic:
rblock 1 
rblock 2 
rblock 3 
line 3 
line 2 
line 1 
Decomposition swaps reorganize from the normal block pattern (above) to the line pattern (below) for global preconditioning, then swap back to the block pattern.
SCALING RESULTS
Fixed problem size scaling for global line Jacobi on the T3E for a 100x100 polar grid case. The reported times are the total time step loop ("loop"), the line swapping time ("line") including data copies and parallel communication, and the seam communication time ("seam"). Times are per time step, averaged over 5 steps. The efficiency is (1block loop time)/(# processors * loop time). The number of blocks equals the number of processors in each case, and the nxbl / nybl decomposition is balanced.
NIMROD & AZTEC
Steve Plimpton
Sandia National Labs
Seattle APS Meeting  Nov 99
NIMROD support: Carl Sovinec
AZTEC support: Mike Heroux
Good News: Fully implemented in 2.3.2
Bad News: Doesn't compete with NIMROD rblock linesolve methods
What is AZTEC?
• Parallel linear solver library
• Solves Ax = b
• Variety of preconditioners  Jacobi, ILU, ICC, ILUT, ...
• Variety of solvers  CG, GMRES, ...
• Point or vector unknowns  MSR, VBR matrix formats
• Real and complex
• Designed for unstructured grids, asymmetric matrices
• Highly parallel
• Developed at Sandia National Labs
• Similar in scope to Argonne's PETSc
Why do This
Key point:Ax = b is different in NIMROD vs AZTEC
AZTEC into NIMROD: Onetime Setup
Function of grid connectivity and processor blocks
Label unknowns uniquely from 1 > N
Assign unknowns to processors:
interior nodes to obvious processor
shared nodes to exactly one processor
Determine structure of matrix rows:
how many elements
what global columns they are in
A and x,b
Onprocessor copying:
reordering
F90 vs. C
Offprocessor communication:
determine processor partners
what to send, what to receive
how to pack/unpack
AZTEC into NIMROD: Matrix Creation
Create input matrix for AZTEC solve call
Onprocessor copying into AZTEC format
Offprocessor communication:
asynchronous communication
one message per partner processor
form entire rows of matrix on a processor
Call AZTEC global > local reordering routine
AZTEC into NIMROD: Solve
Results
Test #1: 64 x 100 circular grid w/ center point, 6465 unknowns
Test #2: Circular grid w/ center pt & 2 external tblocks, 4494 unknowns
Test #3: Same as #2 with 10x smaller timestep
Test #1: 64x100 circular grid w/ center point, 6465 unknowns

Complex Equation 
Real Equation 


CPU 
its 
err 
CPU 
its 
err 
NIM Jacobi 
60 
1000 
2x10^{3} 



NIM lineJacobi 
48 
453 
1x10^{6} 
3.3 
48 
1x10^{6} 
AZ CG/Jacobi 
84 
1000 
2x10^{3} 



AZ CG/ICC 
267 
1000 
7x10^{4} 
7.1 
56 
1x10^{6} 
AZ GMRES/Jacobi 
351 
1000 
5x10^{4} 



Test #2: 2 external tblocks + 48x32 circular grid w/ center point, 4494 unknowns

Complex Equation 
Real Equation 


CPU 
its 
err 
CPU 
its 
err 
NIM Jacobi 
51 
1000 
5x10^{4} 



NIM lineJacobi 
21 
241 
1x10^{6} 
1.5 
25 
1x10^{6} 
AZ CG/Jacobi 
58 
1000 
5x10^{4} 



AZ CG/ICC 
153 
789 
1x10^{6} 
3.9 
29 
1x10^{6} 
AZGMRES/Jacobi 
231 
1000 
9x10^{5} 
4.0 
72 
1x10^{6} 
AZGMRES/ICC 
219 
583 
1x10^{6} 
4.1 
26 
1x10^{6} 
Test #3: same as #2 w/ 10x smaller timestep, 4494 unknowns

Complex Equation 
Real Equation 


CPU 
its 
err 
CPU 
its 
err 
NIM Jacobi 
29 
571 
1x10^{6} 
1.6 
40 
1x10^{6} 
NIM lineJacobi 
5.6 
65 
1x10^{6} 
0.73 
12 
1x10^{6} 
AZ CG/Jacobi 
34 
573 
1x10^{6} 
1.2 
40 
1x10^{6} 
AZ CG/ICC 
48 
183 
1x10^{6} 
3.1 
15 
1x10^{6} 
AZGMRES/Jacobi 
149 
657 
1x10^{6} 
1.6 
35 
1x10^{6} 
AZGMRES/ICC 
71 
169 
1x10^{6} 
3.2 
14 
1x10^{6} 
Comments on Results
What is hard for AZTEC ?
AZTEC doesn't take advantage of that
> treats grid as unstructured
> few symmetric preconditioners
AZTEC doesn't know about structured grid
> no linesolve preconditioners
Will this be true for future NIMROD geometries?
Slows down complex arithmetic vs. F90
What to try with AZTEC?
ANISOTROPIC THERMAL CONDUCTION
See "Simulations of Neoclassical Tearing Modes with NIMROD," by T. A. Gianakon
GENERALIZED BASIS ELEMENTS
The truncation error of the finite element method is intimately tied to interpolation errors of the chosen discrete space.
For piecewise polynomials of order , the interpolation of a sufficiently smooth function u satisfies:
where h is a measure of the grid spacing, and and are the norm and seminorm, respectively, of the Sobolev space over the domain W with the first subscript indicating order of differentiation. [Jiang, The Leastsquares Finite Element Method, SpringerVerlag 1998, p. 64.]
Hence, order of convergence depends on the choice of polynomials used.
Until now, the NIMROD algorithm has been restricted to using only bilinear basis functions in regions of quadrilateral elements and linear basis functions in regions of triangular elements.
This restriction has led to at least two difficulties:
We expect that using quadratic basis elements will improve both issues significantly.
NIMROD is presently being modified to use any Lagrange type element, where only firstorder derivatives are squareintegrable.
The generalized version of NIMROD is partially functional, and we have been able to compare biquadratic and bilinear elements on two test problems.
1) Sheared slab with a linear n=0 tearing instability, where then number of cells in the "radial" direction is twice the number in the "azimuthal" direction.
Comments on these test cases:
PROBLEMS WITH SIGNIFICANT FLOWS
KINETIC EFFECTS THROUGH SIMULATION PARTICLES
See "Kinetic MHD: Making the MicroMacro Connection," by Scott Parker
SUMMARY
Please visit us at http://nimrodteam.org or http://nimrod.saic.com, where this poster will be made available.