Recent Algorithmic and Computational Efficiency Improvements in the NIMROD Code
S. J. Plimpton, Sandia National Laboratory
C. R. Sovinec and T. A. Gianakon, Los Alamos National Laboratory
S. E. Parker, University of Colorado-Boulder
and The NIMROD Team
presented at the
41st Annual Meeting of the American Physical Society, Division of Plasma Physics, November 15-19, 1999
OUTLINE
The NIMROD Code:
The NIMROD Team:
PRESENT TEAM MEMBERS AND ADVISORS
Ahmet Aydemir IFS
James Callen U-WI
Ming Chu GA
John Finn LANL
Tom Gianakon LANL
Charlson Kim CU-Boulder
Scott Kruger SAIC
Jean-Noel Leboeuf UCLA
Richard Nebel LANL
Scott Parker CU-Boulder
Steve Plimpton SNL
Nina Popova MSU
Dalton Schnack SAIC
Carl Sovinec LANL
Alfonso Tarditi SAIC
NIMROD FEATURES
EQUATIONS
NIMROD presently solves Maxwell's equations plus the following set of fluid moment relations:
where quasineutrality, , is assumed,
wp is the
plasma frequency, and r is the mass density.
CHALLENGES
SOLVING ILL CONDITIONED MATRICES
When applied to high temperature magnetized plasma, the NIMROD time advance requires solution of an ill conditioned matrix.
, from the fastest normal mode represented in the numerical system on a given grid. [The smallest eigenvalue is approximately unity.] When simulating slow resistive behavior, we may need to take time steps that are much larger than the Alfven time for propagation across the entire domain. Thus, the condition number may be very large.
We have recently found that global line Jacobi preconditioning in the structured grid region scales well with problem size and time step. [See below.]
As a refresher, a Jacobi iteration scheme is the simplest matrix splitting technique for solving a matrix equation. If we split the matrix A into L+D+U, where D is the diagonal and L and U are the strictly lower and upper triangles, respectively, Jacobi iterations find the solution of
.
The scheme converges provided the magnitude of the largest eigenvalue of D-1(L+U) is less than unity. Furthermore, a Jacobi iteration step may be used to precondition the conjugate gradient scheme.
The definition of D may be generalized to improve the effectiveness of each iteration step.
We have found that using the 1D matrices formed by matrix elements along the radial grid lines and along the azimuthal grid lines in the structured grid region produces an effective preconditioner when the semi-implicit matrix is ill conditioned. [This forms two sets of 1D matrices, and the solutions from the two sets are averaged.]
This line-Jacobi preconditioning is most effective when the 1D solves are extended as far as possible across the grid. Data for each line may be distributed on different processors. We use non-blocking MPI point-to-point communication to make the exchanges illustrated in the following schematic:
rblock 1 |
rblock 2 |
rblock 3 |
line 3 |
line 2 |
line 1 |
Decomposition swaps reorganize from the normal block pattern (above) to the line pattern (below) for global preconditioning, then swap back to the block pattern.
SCALING RESULTS
Fixed problem size scaling for global line Jacobi on the T3E for a 100x100 polar grid case. The reported times are the total time step loop ("loop"), the line swapping time ("line") including data copies and parallel communication, and the seam communication time ("seam"). Times are per time step, averaged over 5 steps. The efficiency is (1-block loop time)/(# processors * loop time). The number of blocks equals the number of processors in each case, and the nxbl / nybl decomposition is balanced.
NIMROD & AZTEC
Steve Plimpton
Sandia National Labs
Seattle APS Meeting - Nov 99
NIMROD support: Carl Sovinec
AZTEC support: Mike Heroux
Good News: Fully implemented in 2.3.2
Bad News: Doesn't compete with NIMROD rblock line-solve methods
What is AZTEC?
• Parallel linear solver library
• Solves Ax = b
• Variety of pre-conditioners -- Jacobi, ILU, ICC, ILUT, ...
• Variety of solvers -- CG, GMRES, ...
• Point or vector unknowns -- MSR, VBR matrix formats
• Real and complex
• Designed for unstructured grids, asymmetric matrices
• Highly parallel
• Developed at Sandia National Labs
• Similar in scope to Argonne's PETSc
Why do This
Key point:Ax = b is different in NIMROD vs AZTEC
AZTEC into NIMROD: One-time Setup
Function of grid connectivity and processor blocks
Label unknowns uniquely from 1 -> N
Assign unknowns to processors:
interior nodes to obvious processor
shared nodes to exactly one processor
Determine structure of matrix rows:
how many elements
what global columns they are in
A and x,b
On-processor copying:
reordering
F90 vs. C
Off-processor communication:
determine processor partners
what to send, what to receive
how to pack/unpack
AZTEC into NIMROD: Matrix Creation
Create input matrix for AZTEC solve call
On-processor copying into AZTEC format
Off-processor communication:
asynchronous communication
one message per partner processor
form entire rows of matrix on a processor
Call AZTEC global -> local reordering routine
AZTEC into NIMROD: Solve
Results
Test #1: 64 x 100 circular grid w/ center point, 6465 unknowns
Test #2: Circular grid w/ center pt & 2 external tblocks, 4494 unknowns
Test #3: Same as #2 with 10x smaller timestep
Test #1: 64x100 circular grid w/ center point, 6465 unknowns
|
Complex Equation |
Real Equation |
||||
|
CPU |
its |
err |
CPU |
its |
err |
NIM Jacobi |
60 |
1000 |
2x10-3 |
|
|
|
NIM line-Jacobi |
48 |
453 |
1x10-6 |
3.3 |
48 |
1x10-6 |
AZ CG/Jacobi |
84 |
1000 |
2x10-3 |
|
|
|
AZ CG/ICC |
267 |
1000 |
7x10-4 |
7.1 |
56 |
1x10-6 |
AZ GMRES/Jacobi |
351 |
1000 |
5x10-4 |
|
|
|
Test #2: 2 external tblocks + 48x32 circular grid w/ center point, 4494 unknowns
|
Complex Equation |
Real Equation |
||||
|
CPU |
its |
err |
CPU |
its |
err |
NIM Jacobi |
51 |
1000 |
5x10-4 |
|
|
|
NIM line-Jacobi |
21 |
241 |
1x10-6 |
1.5 |
25 |
1x10-6 |
AZ CG/Jacobi |
58 |
1000 |
5x10-4 |
|
|
|
AZ CG/ICC |
153 |
789 |
1x10-6 |
3.9 |
29 |
1x10-6 |
AZGMRES/Jacobi |
231 |
1000 |
9x10-5 |
4.0 |
72 |
1x10-6 |
AZGMRES/ICC |
219 |
583 |
1x10-6 |
4.1 |
26 |
1x10-6 |
Test #3: same as #2 w/ 10x smaller timestep, 4494 unknowns
|
Complex Equation |
Real Equation |
||||
|
CPU |
its |
err |
CPU |
its |
err |
NIM Jacobi |
29 |
571 |
1x10-6 |
1.6 |
40 |
1x10-6 |
NIM line-Jacobi |
5.6 |
65 |
1x10-6 |
0.73 |
12 |
1x10-6 |
AZ CG/Jacobi |
34 |
573 |
1x10-6 |
1.2 |
40 |
1x10-6 |
AZ CG/ICC |
48 |
183 |
1x10-6 |
3.1 |
15 |
1x10-6 |
AZGMRES/Jacobi |
149 |
657 |
1x10-6 |
1.6 |
35 |
1x10-6 |
AZGMRES/ICC |
71 |
169 |
1x10-6 |
3.2 |
14 |
1x10-6 |
Comments on Results
What is hard for AZTEC ?
AZTEC doesn't take advantage of that
-> treats grid as unstructured
-> few symmetric preconditioners
AZTEC doesn't know about structured grid
-> no line-solve preconditioners
Will this be true for future NIMROD geometries?
Slows down complex arithmetic vs. F90
What to try with AZTEC?
ANISOTROPIC THERMAL CONDUCTION
See "Simulations of Neoclassical Tearing Modes with NIMROD," by T. A. Gianakon
GENERALIZED BASIS ELEMENTS
The truncation error of the finite element method is intimately tied to interpolation errors of the chosen discrete space.
For piecewise polynomials of order , the interpolation of a
sufficiently smooth function u satisfies:
where h is a measure of
the grid spacing, and and
are the
norm and semi-norm, respectively, of the Sobolev space over the
domain W with
the first subscript indicating order of differentiation. [Jiang,
The Least-squares Finite Element Method, Springer-Verlag
1998, p. 64.]
Hence, order of convergence depends on the choice of polynomials used.
Until now, the NIMROD algorithm has been restricted to using only bilinear basis functions in regions of quadrilateral elements and linear basis functions in regions of triangular elements.
This restriction has led to at least two difficulties:
We expect that using quadratic basis elements will improve both issues significantly.
NIMROD is presently being modified to use any Lagrange type element, where only first-order derivatives are square-integrable.
The generalized version of NIMROD is partially functional, and we have been able to compare biquadratic and bilinear elements on two test problems.
1) Sheared slab with a linear n=0 tearing instability, where then number of cells in the "radial" direction is twice the number in the "azimuthal" direction.
Comments on these test cases:
PROBLEMS WITH SIGNIFICANT FLOWS
KINETIC EFFECTS THROUGH SIMULATION PARTICLES
See "Kinetic MHD: Making the Micro-Macro Connection," by Scott Parker
SUMMARY
Please visit us at http://nimrodteam.org or http://nimrod.saic.com, where this poster will be made available.