Recent Algorithmic and Computational Efficiency Improvements in the NIMROD Code

 

S. J. Plimpton, Sandia National Laboratory

C. R. Sovinec and T. A. Gianakon, Los Alamos National Laboratory

S. E. Parker, University of Colorado-Boulder

and The NIMROD Team

presented at the

41st Annual Meeting of the American Physical Society, Division of Plasma Physics, November 15-19, 1999

 

 

OUTLINE

  1. Introduction
    1. Code and Team descriptions
    2. Algorithm features
    3. Equations
    4. Challenges
  2. Matrix Solution
    1. Global preconditioning for CG
    2. Linking AZTEC
  3. Anisotropic thermal conduction
  4. Generalized basis elements
  5. Strong flow
    1. Time advance
    2. Divergence condition
  6. Modeling kinetic effects

 

 

The NIMROD Code:

The NIMROD Team:

 

 

PRESENT TEAM MEMBERS AND ADVISORS

 

Ahmet Aydemir IFS

James Callen U-WI

Ming Chu GA

John Finn LANL

Tom Gianakon LANL

Charlson Kim CU-Boulder

Scott Kruger SAIC

Jean-Noel Leboeuf UCLA

Richard Nebel LANL

Scott Parker CU-Boulder

Steve Plimpton SNL

Nina Popova MSU

Dalton Schnack SAIC

Carl Sovinec LANL

Alfonso Tarditi SAIC

 

 

NIMROD FEATURES

The NIMROD spatial representation is finite element for one plane and Fourier series for the third (periodic) direction.

 

Our time-split, semi-implicit advance of the coupled Maxwell/velocity-moment (electrons and ions) set of equations permits efficient parallel computation of slow, nonlinear electromagnetic phenomena.

 

 

 

 

EQUATIONS

 

NIMROD presently solves Maxwell's equations plus the following set of fluid moment relations:

 

 

where quasineutrality, , is assumed, wp is the plasma frequency, and r is the mass density.

 

 

CHALLENGES

 

 

SOLVING ILL CONDITIONED MATRICES

 

When applied to high temperature magnetized plasma, the NIMROD time advance requires solution of an ill conditioned matrix.

 

, from the fastest normal mode represented in the numerical system on a given grid. [The smallest eigenvalue is approximately unity.] When simulating slow resistive behavior, we may need to take time steps that are much larger than the Alfven time for propagation across the entire domain. Thus, the condition number may be very large.

We have recently found that global line Jacobi preconditioning in the structured grid region scales well with problem size and time step. [See below.]

 

As a refresher, a Jacobi iteration scheme is the simplest matrix splitting technique for solving a matrix equation. If we split the matrix A into L+D+U, where D is the diagonal and L and U are the strictly lower and upper triangles, respectively, Jacobi iterations find the solution of

 

by iterating

or

.

 

The scheme converges provided the magnitude of the largest eigenvalue of D-1(L+U) is less than unity. Furthermore, a Jacobi iteration step may be used to precondition the conjugate gradient scheme.

 

The definition of D may be generalized to improve the effectiveness of each iteration step.

We have found that using the 1D matrices formed by matrix elements along the radial grid lines and along the azimuthal grid lines in the structured grid region produces an effective preconditioner when the semi-implicit matrix is ill conditioned. [This forms two sets of 1D matrices, and the solutions from the two sets are averaged.]

This line-Jacobi preconditioning is most effective when the 1D solves are extended as far as possible across the grid. Data for each line may be distributed on different processors. We use non-blocking MPI point-to-point communication to make the exchanges illustrated in the following schematic:

 

rblock 1

rblock 2

rblock 3

 

line 3

line 2

line 1

 

Decomposition swaps reorganize from the normal block pattern (above) to the line pattern (below) for global preconditioning, then swap back to the block pattern.

 

 

SCALING RESULTS

 

 

 

 

Fixed problem size scaling for global line Jacobi on the T3E for a 100x100 polar grid case. The reported times are the total time step loop ("loop"), the line swapping time ("line") including data copies and parallel communication, and the seam communication time ("seam"). Times are per time step, averaged over 5 steps. The efficiency is (1-block loop time)/(# processors * loop time). The number of blocks equals the number of processors in each case, and the nxbl / nybl decomposition is balanced.

 

 

 

Comparison of preconditioner effectiveness on the 100x100 case. Block-based line Jacobi with periodic blocks (nybl=1, nxbl=nprocs), and balanced decomposition are compared with the global scheme.

 

 

 

Comparison of time step rates (per CPU time per processor). The rate is the inverse of the loop time per node per time step.

 

 

 

Increasing problem size scaling for global line Jacobi on the T3E. The nxbl / nybl decomposition is balanced.

 

 

 

Comparison of preconditioner effectiveness with increasing problem size.

 

 

 

Comparison of time per time step on the T3E with increasing problem size.

 

 

 

NIMROD & AZTEC

Steve Plimpton

Sandia National Labs

Seattle APS Meeting - Nov 99

NIMROD support: Carl Sovinec

AZTEC support: Mike Heroux

 

Good News: Fully implemented in 2.3.2

Bad News: Doesn't compete with NIMROD rblock line-solve methods

 

What is AZTEC?

• Parallel linear solver library

• Solves Ax = b

• Variety of pre-conditioners -- Jacobi, ILU, ICC, ILUT, ...

• Variety of solvers -- CG, GMRES, ...

• Point or vector unknowns -- MSR, VBR matrix formats

• Real and complex

• Designed for unstructured grids, asymmetric matrices

• Highly parallel

• Developed at Sandia National Labs

• Similar in scope to Argonne's PETSc

 

Why do This

 

Key point:Ax = b is different in NIMROD vs AZTEC

 

AZTEC into NIMROD: One-time Setup

Function of grid connectivity and processor blocks

Label unknowns uniquely from 1 -> N

Assign unknowns to processors:

interior nodes to obvious processor

shared nodes to exactly one processor

Determine structure of matrix rows:

how many elements

what global columns they are in

A and x,b

On-processor copying:

reordering

F90 vs. C

Off-processor communication:

determine processor partners

what to send, what to receive

how to pack/unpack

 

AZTEC into NIMROD: Matrix Creation

Create input matrix for AZTEC solve call

On-processor copying into AZTEC format

Off-processor communication:

asynchronous communication

one message per partner processor

form entire rows of matrix on a processor

Call AZTEC global -> local reordering routine

 

AZTEC into NIMROD: Solve

 

Results

Test #1: 64 x 100 circular grid w/ center point, 6465 unknowns

Test #2: Circular grid w/ center pt & 2 external tblocks, 4494 unknowns

Test #3: Same as #2 with 10x smaller timestep

 

Test #1: 64x100 circular grid w/ center point, 6465 unknowns

 

Complex Equation

Real Equation

 

CPU

its

err

CPU

its

err

NIM Jacobi

60

1000

2x10-3

 

 

 

NIM line-Jacobi

48

453

1x10-6

3.3

48

1x10-6

AZ CG/Jacobi

84

1000

2x10-3

 

 

 

AZ CG/ICC

267

1000

7x10-4

7.1

56

1x10-6

AZ GMRES/Jacobi

351

1000

5x10-4

 

 

 

 

Test #2: 2 external tblocks + 48x32 circular grid w/ center point, 4494 unknowns

 

Complex Equation

Real Equation

 

CPU

its

err

CPU

its

err

NIM Jacobi

51

1000

5x10-4

 

 

 

NIM line-Jacobi

21

241

1x10-6

1.5

25

1x10-6

AZ CG/Jacobi

58

1000

5x10-4

 

 

 

AZ CG/ICC

153

789

1x10-6

3.9

29

1x10-6

AZ

GMRES/Jacobi

231

1000

9x10-5

4.0

72

1x10-6

AZ

GMRES/ICC

219

583

1x10-6

4.1

26

1x10-6

 

Test #3: same as #2 w/ 10x smaller timestep, 4494 unknowns

 

Complex Equation

Real Equation

 

CPU

its

err

CPU

its

err

NIM Jacobi

29

571

1x10-6

1.6

40

1x10-6

NIM line-Jacobi

5.6

65

1x10-6

0.73

12

1x10-6

AZ CG/Jacobi

34

573

1x10-6

1.2

40

1x10-6

AZ CG/ICC

48

183

1x10-6

3.1

15

1x10-6

AZ

GMRES/Jacobi

149

657

1x10-6

1.6

35

1x10-6

AZ

GMRES/ICC

71

169

1x10-6

3.2

14

1x10-6

 

Comments on Results

 

What is hard for AZTEC ?

AZTEC doesn't take advantage of that

-> treats grid as unstructured

-> few symmetric preconditioners

AZTEC doesn't know about structured grid

-> no line-solve preconditioners

Will this be true for future NIMROD geometries?

Slows down complex arithmetic vs. F90

 

What to try with AZTEC?

 

 

 

ANISOTROPIC THERMAL CONDUCTION

See "Simulations of Neoclassical Tearing Modes with NIMROD," by T. A. Gianakon

 

 

 

GENERALIZED BASIS ELEMENTS

 

The truncation error of the finite element method is intimately tied to interpolation errors of the chosen discrete space.

 

For piecewise polynomials of order , the interpolation of a sufficiently smooth function u satisfies:

and

where h is a measure of the grid spacing, and and are the norm and semi-norm, respectively, of the Sobolev space over the domain W with the first subscript indicating order of differentiation. [Jiang, The Least-squares Finite Element Method, Springer-Verlag 1998, p. 64.]

 

Hence, order of convergence depends on the choice of polynomials used.

 

 

Until now, the NIMROD algorithm has been restricted to using only bilinear basis functions in regions of quadrilateral elements and linear basis functions in regions of triangular elements.

This restriction has led to at least two difficulties:

  1.  

  2. Spatial convergence in the azimuthal direction takes a large number of elements, often comparable to the resolution required in the radial direction. Besides the obvious costs associated with using more elements, this also increases matrix condition numbers and tightens flow CFL restrictions.
  3.  

  4. Our magnetic field representation has nonzero divergence, and controlling that error without being too invasive is often tricky.

We expect that using quadratic basis elements will improve both issues significantly.

 

NIMROD is presently being modified to use any Lagrange type element, where only first-order derivatives are square-integrable.

The generalized version of NIMROD is partially functional, and we have been able to compare biquadratic and bilinear elements on two test problems.

1) Sheared slab with a linear n=0 tearing instability, where then number of cells in the "radial" direction is twice the number in the "azimuthal" direction.

 

2) Resonant n=2 instability in an annulus. Azimuthal resolution is the same as radial resolution in these cases.

 

Comments on these test cases:

 

 

PROBLEMS WITH SIGNIFICANT FLOWS

 

 

KINETIC EFFECTS THROUGH SIMULATION PARTICLES

See "Kinetic MHD: Making the Micro-Macro Connection," by Scott Parker

 

 

SUMMARY

 

Please visit us at http://nimrodteam.org or http://nimrod.saic.com, where this poster will be made available.