Multiscale Gyrokinetic Analysis in the Tokamak Pedestal

(1)

Multiscale Gyrokinetic Analysis in the Tokamak Pedestal

By E.A. Belli¹, J. Candy¹, I. Sfiligoi²

1 General Atomics

2 UCSD San Diego

Supercomputing Center Presented at the

Fusion HPC Workshop December 2022

Supported by U.S. DOE under DE-FG02-95ER54309 and DE-FC02-06ER54873

(2)

• The most promising operating scenario for achieving fusion in tokamaks is H-mode confinement regime.

• H-mode is characterized by the formation of edge transport

• Region of reduced transport leads to steeper gradients in density and temperature à “pedestal” structure at the edge of the plasma

• Pedestal plays a key role in determining global energy confinement.

Turbulent transport in pedestal is less well understood than in the core.

Understanding transport in the H-mode pedestal can help to develop operating regimes for optimal confinement and fusion performance.

0.0 0.2 0.4 0.6 0.8

r/a 0

2 4 6 8 10

Te(keV ) ne(£10¹⁹/m³)

DIII-D H-mode

ITER baseline #164988

𝑇_!(keV)

𝑟/𝑎

𝑛_!(𝑥10^"#/𝑚^$) DIII-D H-mode

ITER baseline #164988

(3)

While ion-scale turbulence in the core has been modeled extensively, multiscale pedestal simulations are far more challenging.

DIII-D H-mode ITER Baseline #164988

• Strongly shaped edge geometry – Need 2-6X increase in resolution

in parallel (field-line) direction 𝜃

(4)

4

• Strongly shaped edge geometry – Need 2-6X increase in resolution

• Weaker pitch of confining magnetic field (large safety factor q)

– Need large radial resolution

0.0 0.2 0.4 0.6 0.8

r/a 1

2 3 4 5 6

q

DIII-D H-mode ITER baseline #164988

1.0 1.5 2.0 2.5 3.0 3.5 4.0

q 0

5 10 15 20 25

Qe/QGB

exp r/a = 0.92

𝑞

𝑟/𝑎

𝑞 𝑄!/𝑄"#

𝑟

𝑎 = 0.92

exp

(5)

• Weaker pitch of confining magnetic field (large safety factor q)

• Large collisionality

– Need advanced collision models

0.0 0.2 0.4 0.6 0.8

r/a 10^°2

10^°1 10⁰ 10¹

(a/cs)¯∫e

DIII-D H-mode ITER baseline #164988

Standard e pitch angle scattering + energy diffusion

+ conservation (n,v,E)

+ inter-species (multiscale in velocity space)

𝑟/𝑎

̅𝜈!(𝑎/𝑐$)

(6)

• Weaker pitch of confining magnetic field (large safety factor q)

• Large collisionality

– Need advanced collision models

• Large gradients drive multiple instab.

across broad range of spatial scales – Ion-scales 𝒌_𝜽𝝆_𝒊 ≲ 𝟏 : ES modes

(ITG, TEM), EM modes (MTM) – Electron-scales 𝒌_𝜽𝝆_𝒆~𝟏 : ETG

𝑘_$𝜌_%

Electron heat transport will play a dominant role in reactors

à Multiscale resolution needed While ion-scale turbulence in the core has been modeled extensively,

multiscale pedestal simulations are far more challenging.

(7)

7 0 5 10 15 20 25 kµΩs

0.00 0.02 0.04 0.06 0.08 0.10

Q(kµ) e/QGB

0 5 10 15 20 25

k_µΩ_s 0.0

0.5 1.0 1.5 2.0

Q(kµ) e/QGB

”Typical” ion-scale ITG turbulence

High resolution multiscale ETG turbulence

3456 x 574 FFT

Requires leadership- scale computing

resources and highly optimized solvers.

𝑘_%𝜌_&

𝑄 !(() /𝑄"#𝑄 !(() /𝑄"#

(8)

• Solves the 5D (3 spatial+2 velocity) 𝛿𝑓 gyrokinetic-Poisson- Ampere equations using Eulerian approach

• Motivations are accurate collisions in H-mode pedestal and and to provide efficient nonlinear, electromagnetic

multiscale simulations.

– Complex nonlinear cross-scaling coupling requires extremely fine mesh in real space

à Specialized numerical schemes are needed to prevent severe bottlenecks related to:

– gyroaveraging

– Maxwell field solve – ExB nonlinearity

CGYRO: A multiscale-optimized gyrokinetic turbulence solver

(9)

• Fully spectral in 𝒙, 𝜶 provides maximal multiscale efficiency

– Ensures collision operator is algebraic in space

– Allows for most efficient

evaluation of gyroaverages

– Evaluation of nonlinear term on GPUs (cuFFT) ensures maximum performance and scalability

CGYRO implements highly efficient spectral/pseudospectral numerical schemes optimized for multiscale simulations.

x Radial spectral y Binormal spectral 𝜃 Poloidal Finite diff

𝜉 Pitch angle pseudospectral v Velocity psuedospectral

𝑯

_𝒂

𝒙, 𝒚, 𝜽, 𝝃, v

• Pseudospectral in 𝝃, v provides optimal

accuracy of collisions

(10)

Unlike most gyrokinetic codes, CGYRO uses velocity-space coordinates optimized for the collisional problem.

GYRO & GS2 use (λ,ε) coordinates 𝜆 = v_*⁺

v⁺^! 𝜀 = ^"^,v⁺

#$_,

Advantage:

No need for derivatives across

trapped/passing boundary since θ

discretization is aligned with particle orbits Disadvantage:

Difficult to evaluate collision operator due to irregular grid in (𝜉,θ)

NEO has instead had great success with (ξ,v) coordinates, implementing spectrally-

accurate collision operators.

GYRO

NEO/CGYRO

ℒ = 1 2

𝜕

𝜕𝜉 1 − 𝜉^- 𝜕

𝜕𝜉

(11)

CGYRO has the first pseudospectral implementation of the collision operator in a gyrokinetic code.

• Legendre polynomials in 𝝃

• Nonstandard orthogonal polynomials in v – Accurate for energy integration and

differentiation

– Appropriate for semi-infinite energy domain

,

%

&

𝑑𝑢 𝑒^'(⁺𝑄₎ 𝑢 𝑄_* 𝑢 = 𝛾₎𝛿_)*

𝑢 ∈ [0, 𝑏]

bà0: shifted monic Legendre bà

∞

: half-range Hermite

𝝃 = v_∥/v

𝒖_𝒂 = v/ 𝟐v_𝒕𝒂

ℒ = 1 2

𝜕

𝜕𝜉 1 − 𝜉^- 𝜕

𝜕𝜉

(12)

Spectral convergence in 𝝃, v velocity space is observed.

4 8 12 16 20 24

N_ª 10^°7

10^°6 10^°5 10^°4 10^°3 10^°2 10^°1

¢!

¯

∫_e = 0.4

¯

∫_e = 1.0

¯

∫e = 10.0

--- 4^th order

(13)

The GKE exhibits highly-collisional behavior at the lowest energies, transitioning to collisionless behavior at high energies.

|H_e| at various energy meshpoints

u_e=0.04481 u_e=0.22722 u_e=0.52603 u_e=0.90922

u_e=1.35004 u_e=1.82579 u_e=2.30486 u_e=2.70338

&̅𝜈 = 1.0

(14)

• Nonlinear, Collisionless step:

– Operates primarily in space (distributed in velocity dimensions)

• Spectral in x and y; finite difference in 𝜃

– Nonlinearity via 2D FFT with dealiasing (well-suited to GPUs à cuFFT) – Explicit in time: Adaptive embedded RK5(4)

• Time step restriction set by fastest Alfven wave

• Efficient for nonlinear multiscale (large number of radial &

binormal wavenumbers)

• Adaptive algorithm gives faster solution for systems with impulse and oscillatory behavior

CGYRO operator splitting for time integration

𝜕ℎ_.

𝜕𝜏 + 𝑨 𝑯_𝒂, 𝜳_𝒂 + 𝑩 𝑯_𝒂, 𝜳_𝒂 = 0

(15)

• Focus on schemes suitable for explicit advection

• Implement a 5^th order upwind scheme:

In the field line direction, a novel 5^th order conservative algorithms is used to permit high-accuracy electromagnetic simulation.

𝜕ℎ_'

𝜕𝜏 + 6v_∥ 𝜕𝐻_'

𝜕𝜃 𝐻_' = ℎ_' − 𝐺_)';v_∥𝛿𝐴_∥ + 𝐺_)'𝛿𝜙

𝑔_' = ℎ_' − 𝐺_)';v_∥𝛿𝐴_∥

𝜕(ℎ_')_*

𝜕𝜏 + 6v_∥𝑫^(𝟔)(𝐻_')_*+ 6v_∥ 𝑺^(𝟔)(𝑔_')_* 6^th-order finite difference

(7-pt derivative)

𝑺^(𝟔)~ ∆𝜽 ^𝟓: Continuum limit obtained as num gridpts increases 6^th-order filter

(7-pt smoother)

(16)

• Dissipation causes inaccuracy due to violation of number

conservation

• Project out the gyrocenter

density distribution caused by the dissipation

• Method conserves gyrocenter number with respect to the numerical dissipation

The conservative upwind scheme yields accurate discretization in the long-wavelength, high beta limit and for high-k ETG modes.

𝜕(ℎ_')_*

𝜕𝜏 + 6v_∥𝑫^(𝟔)(𝐻_')_*+𝑺^(𝟔) v6_∥ 𝒈_{𝒂 𝒋} − 𝒈_{𝒂 𝒋}

Density conservation

(17)

• Collisional + trapping step:

– Operates in velocity space (distributed in spatial dimensions) – Implicit in time: 2nd order CN

• Required for stability due to scaling of 𝜈_& with inverse powers of v

• Matrix is large and well-suited to execution on GPUs CGYRO operator splitting for time integration

𝜕ℎ_.

𝜕𝜏 + 𝑨 𝑯_𝒂, 𝜳_𝒂 + 𝑩 𝑯_𝒂, 𝜳_𝒂 = 0

𝐻₀¹ 𝐻₂¹

⋮ 𝐻_3'¹

= 𝕄

𝐻₀⁴ 𝐻₂⁴ 𝐻_3'⋮⁴

Rank(𝕄)=N_𝜉N_vN_a

(18)

• Operates on 5+1 dimensional grid

• Several steps in the simulation loop, where each step

– Can cleanly partition the problem in at least one dimension

• But no one dimension in common between all of them

– All dimensions are compute-parallel

• But some dimensions may rely on neighbor data from previous step

CGYRO uses a spatial discretization & array distribution scheme that targets scalability on next-generation HPC systems

Easy to split among several CPU/GPU cores and nodes

Requires frequent transpose ops (i.e.

MPI_ALLtoALL)

à Communication heavy

(19)

Kernel Data dependence Dominant operation Communication Collisionless 𝒌_𝒙^𝟎, 𝜽 [𝒌_𝒚]_𝟏,[𝝃, v, 𝒂]_𝟐 Loop (linear) MPI_ALLREDUCE Nonlinear 𝒌_𝒙^𝟎,𝒌_𝒚 𝜽, 𝝃,v, 𝒂 _{𝟐 0} FFT MPI_ALLTOALL Collisional 𝝃,v, 𝒂 [𝒌_𝒚]_𝟏, 𝒌_𝒙^𝟎, 𝜽 _𝟐 Matrix-vec multiply MPI_ALLTOALL

• All CGYRO kernels are ported to GPUs using OpenACC and cuFFT

• Critical use of GPUDirect MPI minimizes cost of memory movement

• Gives 30-40% reduction in comm timing on OLCF Summit

• Optimal for Frontier

Communication happens on 2 orthogonal communicators

(20)

CGYRO 2D communication pattern

• Communication happens on 2 orthogonal communicators

• One fixed size, the other increases with #MPI

• Different amounts of data on the two communicators

• First communicator typically much more “chatty”

• For small to medium simulations

• Keeping most of it inside the node will reduce network traffic

• But if we increase #MPI, the

other communicator data will increase

Jan 18, 2022 16

-30%

-27%

-12% +10%

• For small to medium simulations:

– Comm1 is typically more ”chatty”

– Keep most of data inside the node to reduce network traffic

– But increasing #MPI, increases data comm of Comm2

• For multiscale:

– When #MPI is multiple of #species, can exchange only per-species data and comm2 data volume cut by #species – Smarter, adapative time advance

reduces both compute time and data volume

(21)

CGYRO strong scaling shows excellent performance on GPU systems on both a per node and maximum performance basis.

16 64 256 1024

Number of Nodes 10

100 1000

Wallclocktime(s)

CGYRO strong scaling

CPU-only

GPU

Skylake Cori KNL

Summit Perlmutter

(22)

On CPU systems, compute time is dominated by nonlinear FFT and cost of communication:compute ratio is smaller.

0.0 0.2 0.4 0.6 0.8

Fractionoftime

Skylake Cori Summit Perlmutter

str nl field coll

CPU-only GPU

compute

Comm

compute

Comm

(23)

On GPU systems, high performance cuFFT means short time spent in nonlinear kernel & code is communication-intensive.

Reflects high absolute performance of GPUs rather than poor

performance of interconnect

0.0 0.2 0.4 0.6 0.8

Fractionoftime

Skylake Cori Summit Perlmutter

str nl field coll

CPU-only GPU

compute compute Comm

Comm

Requires 600 GB of GPU memory

(24)

CGYRO multiscale simulation is well-suited to capability simulation on accelerated systems like Summit/Frontier.

512 1024 2048 4096

Number of Nodes 2

8 16 32 64

Wallclocktime(s) > 20% Summit

strong scaling weak scaling

Strong scaling:

4-species

Weak scaling:

Varying 𝑁_' = (2 − 8)

Requires >10 TB of GPU memory

(25)

25

0 10 20 30

Qtot/QGB

Exp Target Multiscale branch

Ion-scale branch

0 1 2 3 4

a/L 0

5 10 15

Qa/QGB

e i

CGYRO multiscale gyrokinetic turbulence analysis in the tokamak pedestal

DIII-D ITER Baseline H-mode #164988, r/a=0.92

The experiment lies in a

bifurcation region between ion-scale-dominated and multiscale-dominated

turbulence regimes.

CGYRO: 250K Summit node-hrs

1

𝐿_:% = −𝑑𝑙𝑛𝑇_% 𝑑𝑟

(26)

26

10^°1 10⁰ 10¹

kµΩs

10^°1 10⁰ 10¹

(a/cs)∞

a/LT i = 3.8 a/L_{T i} = 3.0 a/L_{T i} = 2.5 a/LT i = 1.5 a/L_{T i} = 0

Multiple drift modes are linearly unstable across a broad range of spatial scales from ion-scales to electron-scales.

𝛾𝑎/𝑐_$

𝑘_%𝜌_{$ /!0(} = 0.25

𝛾𝑎/𝑐_$

𝑘_%𝜌_{$ /!0(} = 𝟎. 𝟓 → 𝟎. 𝟐𝟓

ETG

MTM

Hybrid mode

(27)

Multiscale branch

Ion-scale branch

10^°1 10⁰ 10¹

(kµΩs)_max 10^°3

10^°2 10^°1 10⁰

Q(kµ) e/QGB

a/L_{T i} = 0 a/LT i = 1.5 a/LT i = 2.5 a/L_{T i} = 3 a/LT i = 3.8

𝛿𝑛_&/𝑛_&)

The electron-scale turbulence is reduced by ion-scale fluctuations through nonlinear mode-mode interaction & an increase in zonal flows.

(28)

On the ion-scale branch, the fluctuating electrostatic potential intensity is peaked around 𝒌_𝒙^𝟎=0 and the total amplitude is large.

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 k_x⁰Ωs

10^°9 10^°7 10^°5 10^°3 10^°1

I(0)

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 k⁰_xΩs

10^°9 10^°7 10^°5 10^°3 10^°1

I(+)

a/L_{T i}= 3 a/L_{T i}= 3.8

High-k_x nonzonal modes damped by ion FLR.

Zf potential is driven by low-𝑘_%

modes and increases significantly with ion drive.

nonzonal zonal

𝐼^(") = 7

$_!%&

𝐼 𝑘_', 𝑘₍^& 𝐼⁽¹⁾ = 𝐼 0, 𝑘₂¹

𝐼 𝑘_%, 𝑘₂¹ ≐ 𝛿 &𝜙 𝑘_%, 𝑘₂^{1 -}

3

(29)

Increase in the nonzonal fluctuating intensity is well correlated with increase in the ion energy flux.

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 k_x⁰Ω_s

10^°9 10^°7 10^°5 10^°3 10^°1

I(0)

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 k_x⁰Ω_s

10^°9 10^°7 10^°5 10^°3 10^°1

I(+)

a/L_{T i} = 0 a/LT i = 1.5 a/L_{T i} = 2.5 a/L_{T i} = 3 a/L_{T i} = 3.8

Long wavelength nonzonal fluctuations

& low-k-driven zfs play a role in suppressing high-kETG turbulence.

𝐼 𝑘_%, 𝑘₂¹ ≐ 𝛿 &𝜙 𝑘_%, 𝑘₂^{1 -}

3

nonzonal zonal

𝐼^(") = 7

$_!%&

𝐼 𝑘_', 𝑘₍^& 𝐼⁽¹⁾ = 𝐼 0, 𝑘₂¹

ETG-driven zf’s give secondary peaking & help to regularize multiscale turbulence.

(30)

30

The drift energy associated with the fluctuating ExB velocity is enhanced for the multiscale branch.

0 5 10 15 20 25

kµΩs

a/LT i= 0

0 5 10 15 20 25

kµΩs

a/LT i= 2.5

0 5 10 15 20 25

kµΩs

a/LT i= 3

°72 °36 0 36 72

0 5 10 15 20 25

kµΩs

a/LT i= 3.8

𝐾 𝑘_$, 𝑘_;⁾ ≐ 𝑘_<²𝜌₌² 𝛿 W𝜙 𝑘_$, 𝑘_;^{) 2}

>

v_?;@ 𝑘_< = −𝑖 𝑐

𝐵𝛿 W𝜙 𝑘_< 𝑘_<×𝑏

(31)

31

In multiscale to ion-scale transition, the energy shifts from dominantly drift kinetic to potential intensity & is correlated with the energy flux.

10^°1 10⁰

Multiscale branch

Ion-scale branch

E_tot/E_avg Q_tot/Q_avg

0 1 2 3 4

a/L 10^°3

10^°2 10^°1

I_tot K_tot Etot

Fluctuating electrostatic potential intensity:

Drift energy associated with ExB velocity:

𝑬_𝒕𝒐𝒕 = 𝑲_𝒕𝒐𝒕 + 𝑰_𝒕𝒐𝒕

𝑲 𝒌_𝜽, 𝒌_𝒙^𝟎 ≐ 𝑘_<²𝜌₌² 𝛿 W𝜙 𝑘_$, 𝑘_;^{) 2}

>

𝑰 𝒌_𝜽, 𝒌_𝒙^𝟎 ≐ 𝛿 W𝜙 𝑘_$, 𝑘_;^{) 2}

>

(32)

32

• A multiscale-optimized gyrokinetic turbulence solver (CGYRO) was developed.

– Uses highly efficient spectral/pseudospectral numerical schemes in 4/5 dims – Pseudospectral in velocity space gives optimal accuracy of collisions

– Fully spectral in 𝑘₄ gives spectral gyroaverages à efficiency for multiscale – Nonlinear evaluation on GPUs (cuFFT) à maximum performance & scalability – Novel conservative upwind scheme in 𝜃 permits high accuracy EM simulation – Spatial discretization and array distribution scheme targets scalability on next-

generation, exascale HPC systems (GPU-accelerated)

• Optimizations enabled a first multiscale analysis of pedestal-like transport with full ion-electron cross-coupling.

– Experiment lies in bifurcation region btw multiscale-dominated & ion-scale- dominated turbulence regimes. In the transition, electron-scale transport is reduced by nonlinear mixing w/ ion-scale fluctuations & ion-scale-driven zfs.

Summary

First CGYRO Exascale simulations of multi-species burning plasmas on Frontier in 2023