A highly portable heterogeneous implementation of

(1)

A highly portable heterogeneous implementation of

symmetry-preserving methods for magnetohydrodynamics

J.A. Hopman, F.X. Trias and J.Rigola

Heat and Mass Transfer Technological Center (CTTC)

Technical University of Catalonia (UPC), Terrassa, Spain

(2)

Outline

Symmetry preserving methods in MHD HPC ² framework

Exploiting symmetries in geometry

2

(3)

MHD introduction

(4)

MHD introduction

Liquid metals in magnetic field

3

(5)

MHD introduction

(6)

MHD introduction

3

(7)

MHD introduction

(8)

MHD introduction

3

(9)

MHD introduction

(10)

MHD introduction

3

(11)

Following method of Ni et al. ^{1, 2}

Collocated + staggered current densities, j Conserves current density

Scheme basics

Update U using projection method Solve 2

^nd

Poisson equation for φ:

Update j

Interpolate to cell center to calculate F

_Lor

Lorentz force implementation

(12)

Following method of Trias et al. ¹

Conserve physical properties by mimicking continuous operators

• Use projected distances in gradients

• Use midpoint interpolation

• Use face-volume weighted interpolation

Consequences for MHD

Avoid iterative correction schemes

Conserve total momentum from Lorentz force:

Preserving symmetries

5

1Trias, F. X., Lehmkuhl, O., Oliva, A., Pérez-Segarra, C. D., & Verstappen, R. W. C. P. (2014). Symmetry-preserving discretization of Navier–Stokes equations on collocated unstructured grids. Journal of Computational Physics, 258, 246-267.

(13)

Case 1: 2D Taylor-Green vortex in transverse magnetic field

HydroDynamic TGV

(14)

Case 1: 2D Taylor-Green vortex in transverse magnetic field

6

HydroDynamic TGV Transverse B-field

(15)

Case 1: 2D Taylor-Green vortex in transverse magnetic field

HydroDynamic TGV Transverse B-field Imposed φ-field

(16)

Case 1: 2D Taylor-Green vortex in transverse magnetic field

6

= +

HydroDynamic TGV Transverse B-field Imposed φ-field

Magneto-

(17)

Case 2: MHD duct flow

(18)

2D Taylor-Green vortex

Should not generate Lorentz Force Should not dissipate energy

Results

Symmetry Preserving method outperforms Ni method

Especially on distorted grids

Accuracy results

8

(19)

Stability for highly distorted meshes

Taylor green vortex M-profile in duct

(20)

Highly-portable code for HPC

Stencil based → Algebra based

Only a few algebraic kernels are needed

HPC ²

10

(21)

From continuous NS equations:

To discrete algebraic equations:

Using three kernels only:

𝒚 ← 𝐴𝒙, 𝒛 ← 𝑎𝒙 + 𝑏𝒚, 𝑟 ← 𝒙 ∙ 𝒚

Algebraic kernels

SpMV axpy dot

(22)

Memory boundedness

12

Arithmetic intensity (FLOP/byte)

(23)

Memory boundedness

dot,

axpy

(24)

Memory boundedness

12

Arithmetic intensity (FLOP/byte)

dot, axpy

SpMV

(25)

HPC ² : hierarchy

multiple nodes interconnected via

high-bandwidth network

(26)

X.Álvarez, A.Gorobets, and F.X.Trias. "A hierarchical parallel implementation for heterogeneous computing. Application to algebra-based CFD simulations on hybrid supercomputers". Computer & Fluids, 214:104768, 2021.

HPC ² : hierarchy

14

multiple multi-core CPU per node OpenMP is used at this level

multiple nodes interconnected via high-bandwidth network

MPI is used at this level

(27)

multiple accelerators per node

HPC ² : hierarchy

multiple multi-core CPU per node OpenMP is used at this level

multiple nodes interconnected

via high-bandwidth network

(28)

HPC ² : hierarchy

16

(29)

HPC ² : hierarchy

1. The MPI process

(30)

HPC ² : hierarchy

16

1. The MPI process

2. The host and co-processors

(31)

HPC ² : hierarchy

1. The MPI process

2. The host and co-processors 3. Multiple NUMA nodes in a

manycore CPU

(32)

HPC ² : tested architectures

17

MareNostrum 4 Lomonosov-2 TSUBAME3.0

rank #42

3456 nodes with:

2× Intel Xeon 8160 1× Intel Omni-Path

rank #31

540 nodes with:

2× Intel Xeon E5-2680 v4 4× NVIDIA Tesla P100 4× Intel Omni-Path rank #156

1696 nodes with:

2× Intel Xeon E5-2697 v3

1× NVIDIA Tesla K40M

1× InfiniBand FDR

(33)

Exploiting symmetries

1 Symmetry 2 Symmetries

(34)

Exploiting symmetries

18

1 Symmetry 2 Symmetries

Symmetry-aware ordering

(35)

Exploiting symmetries

1 Symmetry 2 Symmetries

(36)

Exploiting symmetries

18

1 Symmetry 2 Symmetries

Symmetry-aware ordering

(37)

Exploiting symmetries

1 Symmetry 2 Symmetries

(38)

Exploiting symmetries

18

1 Symmetry 2 Symmetries

Symmetry-aware ordering

(39)

Block-diagonalising L

𝐿 = ∈ 𝑅

^𝑁×𝑁

𝑆 = 1

2 ∈ 𝑅

^𝑁×𝑁

෠𝐿 = 𝑆𝐿𝑆

⁻¹

= L

_inn

L

_out

L

_out

L

_inn

I

_N/2

I

_N/2

I

_N/2

-I

_N/2

L

_inn

+ L

_out

0 0 L

_inn

- L

_out

(40)

Block-diagonalising L

𝐿 = ∈ 𝑅

^𝑁×𝑁

𝑆 = 1

2 ∈ 𝑅

^𝑁×𝑁

෠𝐿 = 𝑆𝐿𝑆

⁻¹

=

19

L

_inn

L

_out

L

_out

L

_inn

I

_N/2

I

_N/2

I

_N/2

-I

_N/2

L

_inn

+ L

_out

0 0 L

_inn

- L

_out

(41)

Block-diagonalising L

𝐿 = ∈ 𝑅

^𝑁×𝑁

𝑆 = 1

2 ∈ 𝑅

^𝑁×𝑁

෠𝐿 = 𝑆𝐿𝑆

⁻¹

= L

_inn

L

_out

L

_out

L

_inn

I

_N/2

I

_N/2

I

_N/2

-I

_N/2

L

_inn

+ L

_out

0 0 L

_inn

- L

_out

(42)

Block-diagonalising L

𝐿 = ∈ 𝑅

^𝑁×𝑁

𝑆 = 1

2 ∈ 𝑅

^𝑁×𝑁

෠𝐿 = 𝑆𝐿𝑆

⁻¹

=

19

L

_inn

L

_out

L

_out

L

_inn

I

_N/2

I

_N/2

I

_N/2

-I

_N/2

L

_inn

+ L

_out

0 0 L

_inn

- L

_out

(43)

Block-diagonalising L

𝐿 = ∈ 𝑅

^𝑁×𝑁

𝑆 = 1

2 ∈ 𝑅

^𝑁×𝑁

෠𝐿 = 𝑆𝐿𝑆

⁻¹

= L

_inn

L

_out

L

_out

L

_inn

I

_N/2

I

_N/2

I

_N/2

-I

_N/2

L

_inn

+ L

_out

0 0 L

_inn

- L

_out

(44)

Rewriting the SpMV product:

into an SpMM product

Sparse matrix-matrix product

20

A reduction in time complexity

A reduction in memory footprint

An increase in arithmetic intensity

(45)

Increasing arithmetic intensity

dot,

(46)

Increasing arithmetic intensity

21

Arithmetic intensity (FLOP/byte)

dot, axpy

SpMV

SpMM

(47)

Summarising

Symmetry preserving methods in MHD HPC ² framework

Exploiting symmetries in geometry

(48)

Thank you for attending!

23

A highly portable heterogeneous implementation of