• No se han encontrado resultados

Multiscale Gyrokinetic Analysis in the Tokamak Pedestal

N/A
N/A
Protected

Academic year: 2023

Share "Multiscale Gyrokinetic Analysis in the Tokamak Pedestal"

Copied!
32
0
0

Texto completo

(1)

Multiscale Gyrokinetic Analysis in the Tokamak Pedestal

By E.A. Belli1, J. Candy1, I. Sfiligoi2

1 General Atomics

2 UCSD San Diego

Supercomputing Center Presented at the

Fusion HPC Workshop December 2022

Supported by U.S. DOE under DE-FG02-95ER54309 and DE-FC02-06ER54873

(2)

• The most promising operating scenario for achieving fusion in tokamaks is H-mode confinement regime.

• H-mode is characterized by the formation of edge transport

• Region of reduced transport leads to steeper gradients in density and temperature à “pedestal” structure at the edge of the plasma

• Pedestal plays a key role in determining global energy confinement.

Turbulent transport in pedestal is less well understood than in the core.

Understanding transport in the H-mode pedestal can help to develop operating regimes for optimal confinement and fusion performance.

0.0 0.2 0.4 0.6 0.8

r/a 0

2 4 6 8 10

Te(keV ) ne(£1019/m3)

DIII-D H-mode

ITER baseline #164988

𝑇!(keV)

𝑟/𝑎

𝑛!(𝑥10"#/𝑚$) DIII-D H-mode

ITER baseline #164988

(3)

While ion-scale turbulence in the core has been modeled extensively, multiscale pedestal simulations are far more challenging.

DIII-D H-mode ITER Baseline #164988

Strongly shaped edge geometry Need 2-6X increase in resolution

in parallel (field-line) direction 𝜃

(4)

4

Strongly shaped edge geometry Need 2-6X increase in resolution

in parallel (field-line) direction 𝜃

Weaker pitch of confining magnetic field (large safety factor q)

Need large radial resolution

While ion-scale turbulence in the core has been modeled extensively, multiscale pedestal simulations are far more challenging.

0.0 0.2 0.4 0.6 0.8

r/a 1

2 3 4 5 6

q

DIII-D H-mode ITER baseline #164988

1.0 1.5 2.0 2.5 3.0 3.5 4.0

q 0

5 10 15 20 25

Qe/QGB

exp r/a = 0.92

𝑞

𝑟/𝑎

𝑞 𝑄!/𝑄"#

𝑟

𝑎 = 0.92

exp

(5)

Strongly shaped edge geometry Need 2-6X increase in resolution

in parallel (field-line) direction 𝜃

Weaker pitch of confining magnetic field (large safety factor q)

Need large radial resolution

Large collisionality

Need advanced collision models

While ion-scale turbulence in the core has been modeled extensively, multiscale pedestal simulations are far more challenging.

0.0 0.2 0.4 0.6 0.8

r/a 10°2

10°1 100 101

(a/cs)¯∫e

DIII-D H-mode ITER baseline #164988

Standard e pitch angle scattering + energy diffusion

+ conservation (n,v,E)

+ inter-species (multiscale in velocity space)

𝑟/𝑎

̅𝜈!(𝑎/𝑐$)

(6)

Strongly shaped edge geometry Need 2-6X increase in resolution

in parallel (field-line) direction 𝜃

Weaker pitch of confining magnetic field (large safety factor q)

Need large radial resolution

Large collisionality

Need advanced collision models

Large gradients drive multiple instab.

across broad range of spatial scales Ion-scales 𝒌𝜽𝝆𝒊 ≲ 𝟏 : ES modes

(ITG, TEM), EM modes (MTM) Electron-scales 𝒌𝜽𝝆𝒆~𝟏 : ETG

𝑘$𝜌%

Electron heat transport will play a dominant role in reactors

à Multiscale resolution needed While ion-scale turbulence in the core has been modeled extensively,

multiscale pedestal simulations are far more challenging.

(7)

7 0 5 10 15 20 25 kµΩs

0.00 0.02 0.04 0.06 0.08 0.10

Q(kµ) e/QGB

0 5 10 15 20 25

kµΩs 0.0

0.5 1.0 1.5 2.0

Q(kµ) e/QGB

”Typical” ion-scale ITG turbulence

High resolution multiscale ETG turbulence

3456 x 574 FFT

Requires leadership- scale computing

resources and highly optimized solvers.

While ion-scale turbulence in the core has been modeled extensively, multiscale pedestal simulations are far more challenging.

𝑘%𝜌&

𝑘%𝜌&

𝑄 !(() /𝑄"#𝑄 !(() /𝑄"#

(8)

• Solves the 5D (3 spatial+2 velocity) 𝛿𝑓 gyrokinetic-Poisson- Ampere equations using Eulerian approach

Motivations are accurate collisions in H-mode pedestal and and to provide efficient nonlinear, electromagnetic

multiscale simulations.

– Complex nonlinear cross-scaling coupling requires extremely fine mesh in real space

à Specialized numerical schemes are needed to prevent severe bottlenecks related to:

– gyroaveraging

– Maxwell field solve – ExB nonlinearity

CGYRO: A multiscale-optimized gyrokinetic turbulence solver

(9)

Fully spectral in 𝒙, 𝜶 provides maximal multiscale efficiency

– Ensures collision operator is algebraic in space

– Allows for most efficient

evaluation of gyroaverages

– Evaluation of nonlinear term on GPUs (cuFFT) ensures maximum performance and scalability

CGYRO implements highly efficient spectral/pseudospectral numerical schemes optimized for multiscale simulations.

x Radial spectral y Binormal spectral 𝜃 Poloidal Finite diff

𝜉 Pitch angle pseudospectral v Velocity psuedospectral

𝑯

𝒂

𝒙, 𝒚, 𝜽, 𝝃, v

Pseudospectral in 𝝃, v provides optimal

accuracy of collisions

(10)

Unlike most gyrokinetic codes, CGYRO uses velocity-space coordinates optimized for the collisional problem.

GYRO & GS2 use (λ,ε) coordinates 𝜆 = v*+

v+! 𝜀 = ",v+

#$,

Advantage:

No need for derivatives across

trapped/passing boundary since θ

discretization is aligned with particle orbits Disadvantage:

Difficult to evaluate collision operator due to irregular grid in (𝜉,θ)

NEO has instead had great success with (ξ,v) coordinates, implementing spectrally-

accurate collision operators.

GYRO

NEO/CGYRO

ℒ = 1 2

𝜕

𝜕𝜉 1 − 𝜉- 𝜕

𝜕𝜉

(11)

CGYRO has the first pseudospectral implementation of the collision operator in a gyrokinetic code.

Legendre polynomials in 𝝃

Nonstandard orthogonal polynomials in v – Accurate for energy integration and

differentiation

– Appropriate for semi-infinite energy domain

,

%

&

𝑑𝑢 𝑒'(+𝑄) 𝑢 𝑄* 𝑢 = 𝛾)𝛿)*

𝑢 ∈ [0, 𝑏]

bà0: shifted monic Legendre bà

: half-range Hermite

𝝃 = v/v

𝒖𝒂 = v/ 𝟐v𝒕𝒂

ℒ = 1 2

𝜕

𝜕𝜉 1 − 𝜉- 𝜕

𝜕𝜉

(12)

Spectral convergence in 𝝃, v velocity space is observed.

4 8 12 16 20 24

Nª 10°7

10°6 10°5 10°4 10°3 10°2 10°1

¢!

¯

e = 0.4

¯

e = 1.0

¯

e = 10.0

--- 4th order

(13)

The GKE exhibits highly-collisional behavior at the lowest energies, transitioning to collisionless behavior at high energies.

|He| at various energy meshpoints

ue=0.04481 ue=0.22722 ue=0.52603 ue=0.90922

ue=1.35004 ue=1.82579 ue=2.30486 ue=2.70338

&̅𝜈 = 1.0

(14)

Nonlinear, Collisionless step:

Operates primarily in space (distributed in velocity dimensions)

Spectral in x and y; finite difference in 𝜃

Nonlinearity via 2D FFT with dealiasing (well-suited to GPUs à cuFFT) Explicit in time: Adaptive embedded RK5(4)

Time step restriction set by fastest Alfven wave

Efficient for nonlinear multiscale (large number of radial &

binormal wavenumbers)

Adaptive algorithm gives faster solution for systems with impulse and oscillatory behavior

CGYRO operator splitting for time integration

𝜕ℎ.

𝜕𝜏 + 𝑨 𝑯𝒂, 𝜳𝒂 + 𝑩 𝑯𝒂, 𝜳𝒂 = 0

(15)

Focus on schemes suitable for explicit advection

• Implement a 5th order upwind scheme:

In the field line direction, a novel 5th order conservative algorithms is used to permit high-accuracy electromagnetic simulation.

𝜕ℎ'

𝜕𝜏 + 6v 𝜕𝐻'

𝜕𝜃 𝐻' = ℎ' − 𝐺)';v𝛿𝐴 + 𝐺)'𝛿𝜙

𝑔' = ℎ' − 𝐺)';v𝛿𝐴

𝜕(ℎ')*

𝜕𝜏 + 6v𝑫(𝟔)(𝐻')*+ 6v 𝑺(𝟔)(𝑔')* 6th-order finite difference

(7-pt derivative)

𝑺(𝟔)~ ∆𝜽 𝟓: Continuum limit obtained as num gridpts increases 6th-order filter

(7-pt smoother)

(16)

Dissipation causes inaccuracy due to violation of number

conservation

Project out the gyrocenter

density distribution caused by the dissipation

Method conserves gyrocenter number with respect to the numerical dissipation

The conservative upwind scheme yields accurate discretization in the long-wavelength, high beta limit and for high-k ETG modes.

𝜕(ℎ')*

𝜕𝜏 + 6v𝑫(𝟔)(𝐻')*+𝑺(𝟔) v6 𝒈𝒂 𝒋 − 𝒈𝒂 𝒋

Density conservation

(17)

Collisional + trapping step:

Operates in velocity space (distributed in spatial dimensions) Implicit in time: 2nd order CN

Required for stability due to scaling of 𝜈& with inverse powers of v

Matrix is large and well-suited to execution on GPUs CGYRO operator splitting for time integration

𝜕ℎ.

𝜕𝜏 + 𝑨 𝑯𝒂, 𝜳𝒂 + 𝑩 𝑯𝒂, 𝜳𝒂 = 0

𝐻01 𝐻21

𝐻3'1

= 𝕄

𝐻04 𝐻24 𝐻3'4

Rank(𝕄)=N𝜉NvNa

(18)

Operates on 5+1 dimensional grid

Several steps in the simulation loop, where each step

Can cleanly partition the problem in at least one dimension

But no one dimension in common between all of them

All dimensions are compute-parallel

But some dimensions may rely on neighbor data from previous step

CGYRO uses a spatial discretization & array distribution scheme that targets scalability on next-generation HPC systems

Easy to split among several CPU/GPU cores and nodes

Requires frequent transpose ops (i.e.

MPI_ALLtoALL)

à Communication heavy

(19)

Kernel Data dependence Dominant operation Communication Collisionless 𝒌𝒙𝟎, 𝜽 [𝒌𝒚]𝟏,[𝝃, v, 𝒂]𝟐 Loop (linear) MPI_ALLREDUCE Nonlinear 𝒌𝒙𝟎,𝒌𝒚 𝜽, 𝝃,v, 𝒂 𝟐 0 FFT MPI_ALLTOALL Collisional 𝝃,v, 𝒂 [𝒌𝒚]𝟏, 𝒌𝒙𝟎, 𝜽 𝟐 Matrix-vec multiply MPI_ALLTOALL

All CGYRO kernels are ported to GPUs using OpenACC and cuFFT

Critical use of GPUDirect MPI minimizes cost of memory movement

Gives 30-40% reduction in comm timing on OLCF Summit

Optimal for Frontier

CGYRO uses a spatial discretization & array distribution scheme that targets scalability on next-generation HPC systems

Communication happens on 2 orthogonal communicators

(20)

CGYRO 2D communication pattern

• Communication happens on 2 orthogonal communicators

One fixed size, the other increases with #MPI

Different amounts of data on the two communicators

• First communicator typically much more “chatty”

• For small to medium simulations

Keeping most of it inside the node will reduce network traffic

But if we increase #MPI, the

other communicator data will increase

Jan 18, 2022 16

-30%

-27%

-12% +10%

For small to medium simulations:

Comm1 is typically more ”chatty”

Keep most of data inside the node to reduce network traffic

But increasing #MPI, increases data comm of Comm2

For multiscale:

When #MPI is multiple of #species, can exchange only per-species data and comm2 data volume cut by #species Smarter, adapative time advance

reduces both compute time and data volume

CGYRO uses a spatial discretization & array distribution scheme that targets scalability on next-generation HPC systems

(21)

CGYRO strong scaling shows excellent performance on GPU systems on both a per node and maximum performance basis.

16 64 256 1024

Number of Nodes 10

100 1000

Wallclocktime(s)

CGYRO strong scaling

CPU-only

GPU

Skylake Cori KNL

Summit Perlmutter

(22)

On CPU systems, compute time is dominated by nonlinear FFT and cost of communication:compute ratio is smaller.

0.0 0.2 0.4 0.6 0.8

Fractionoftime

Skylake Cori Summit Perlmutter

str nl field coll

CPU-only GPU

compute

Comm

compute

Comm

(23)

On GPU systems, high performance cuFFT means short time spent in nonlinear kernel & code is communication-intensive.

Reflects high absolute performance of GPUs rather than poor

performance of interconnect

0.0 0.2 0.4 0.6 0.8

Fractionoftime

Skylake Cori Summit Perlmutter

str nl field coll

CPU-only GPU

compute compute Comm

Comm

Requires 600 GB of GPU memory

(24)

CGYRO multiscale simulation is well-suited to capability simulation on accelerated systems like Summit/Frontier.

512 1024 2048 4096

Number of Nodes 2

8 16 32 64

Wallclocktime(s) > 20% Summit

strong scaling weak scaling

Strong scaling:

4-species

Weak scaling:

Varying 𝑁' = (2 − 8)

Requires >10 TB of GPU memory

(25)

25

0 10 20 30

Qtot/QGB

Exp Target Multiscale branch

Ion-scale branch

0 1 2 3 4

a/L 0

5 10 15

Qa/QGB

e i

CGYRO multiscale gyrokinetic turbulence analysis in the tokamak pedestal

DIII-D ITER Baseline H-mode #164988, r/a=0.92

The experiment lies in a

bifurcation region between ion-scale-dominated and multiscale-dominated

turbulence regimes.

CGYRO: 250K Summit node-hrs

1

𝐿:% = −𝑑𝑙𝑛𝑇% 𝑑𝑟

(26)

26

10°1 100 101

kµΩs

10°1 100 101

(a/cs)

a/LT i = 3.8 a/LT i = 3.0 a/LT i = 2.5 a/LT i = 1.5 a/LT i = 0

Multiple drift modes are linearly unstable across a broad range of spatial scales from ion-scales to electron-scales.

𝛾𝑎/𝑐$

𝑘%𝜌$ /!0( = 0.25

𝛾𝑎/𝑐$

𝑘%𝜌$ /!0( = 𝟎. 𝟓 𝟎. 𝟐𝟓

ETG

MTM

Hybrid mode

(27)

Multiscale branch

Ion-scale branch

10°1 100 101

(kµΩs)max 10°3

10°2 10°1 100

Q(kµ) e/QGB

a/LT i = 0 a/LT i = 1.5 a/LT i = 2.5 a/LT i = 3 a/LT i = 3.8

𝛿𝑛&/𝑛&)

The electron-scale turbulence is reduced by ion-scale fluctuations through nonlinear mode-mode interaction & an increase in zonal flows.

(28)

On the ion-scale branch, the fluctuating electrostatic potential intensity is peaked around 𝒌𝒙𝟎=0 and the total amplitude is large.

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 kx0Ωs

10°9 10°7 10°5 10°3 10°1

I(0)

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 k0xΩs

10°9 10°7 10°5 10°3 10°1

I(+)

a/LT i= 3 a/LT i= 3.8

High-kx nonzonal modes damped by ion FLR.

Zf potential is driven by low-𝑘%

modes and increases significantly with ion drive.

nonzonal zonal

𝐼(") = 7

$!%&

𝐼 𝑘', 𝑘(& 𝐼(1) = 𝐼 0, 𝑘21

𝐼 𝑘%, 𝑘21 ≐ 𝛿 &𝜙 𝑘%, 𝑘21 -

3

(29)

Increase in the nonzonal fluctuating intensity is well correlated with increase in the ion energy flux.

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 kx0Ωs

10°9 10°7 10°5 10°3 10°1

I(0)

°10.0 °7.5 °5.0 °2.5 0.0 2.5 5.0 7.5 10.0 kx0Ωs

10°9 10°7 10°5 10°3 10°1

I(+)

a/LT i = 0 a/LT i = 1.5 a/LT i = 2.5 a/LT i = 3 a/LT i = 3.8

Long wavelength nonzonal fluctuations

& low-k-driven zfs play a role in suppressing high-kETG turbulence.

𝐼 𝑘%, 𝑘21 ≐ 𝛿 &𝜙 𝑘%, 𝑘21 -

3

nonzonal zonal

𝐼(") = 7

$!%&

𝐼 𝑘', 𝑘(& 𝐼(1) = 𝐼 0, 𝑘21

ETG-driven zf’s give secondary peaking & help to regularize multiscale turbulence.

(30)

30

The drift energy associated with the fluctuating ExB velocity is enhanced for the multiscale branch.

0 5 10 15 20 25

kµΩs

a/LT i= 0

0 5 10 15 20 25

kµΩs

a/LT i= 2.5

0 5 10 15 20 25

kµΩs

a/LT i= 3

°72 °36 0 36 72

0 5 10 15 20 25

kµΩs

a/LT i= 3.8

𝐾 𝑘$, 𝑘;) ≐ 𝑘<2𝜌=2 𝛿 W𝜙 𝑘$, 𝑘;) 2

>

v?;@ 𝑘< = −𝑖 𝑐

𝐵𝛿 W𝜙 𝑘< 𝑘<×𝑏

(31)

31

In multiscale to ion-scale transition, the energy shifts from dominantly drift kinetic to potential intensity & is correlated with the energy flux.

10°1 100

Multiscale branch

Ion-scale branch

Etot/Eavg Qtot/Qavg

0 1 2 3 4

a/L 10°3

10°2 10°1

Itot Ktot Etot

Fluctuating electrostatic potential intensity:

Drift energy associated with ExB velocity:

𝑬𝒕𝒐𝒕 = 𝑲𝒕𝒐𝒕 + 𝑰𝒕𝒐𝒕

𝑲 𝒌𝜽, 𝒌𝒙𝟎 ≐ 𝑘<2𝜌=2 𝛿 W𝜙 𝑘$, 𝑘;) 2

>

𝑰 𝒌𝜽, 𝒌𝒙𝟎 ≐ 𝛿 W𝜙 𝑘$, 𝑘;) 2

>

(32)

32

A multiscale-optimized gyrokinetic turbulence solver (CGYRO) was developed.

Uses highly efficient spectral/pseudospectral numerical schemes in 4/5 dims Pseudospectral in velocity space gives optimal accuracy of collisions

Fully spectral in 𝑘4 gives spectral gyroaverages à efficiency for multiscale Nonlinear evaluation on GPUs (cuFFT) à maximum performance & scalability Novel conservative upwind scheme in 𝜃 permits high accuracy EM simulation Spatial discretization and array distribution scheme targets scalability on next-

generation, exascale HPC systems (GPU-accelerated)

Optimizations enabled a first multiscale analysis of pedestal-like transport with full ion-electron cross-coupling.

Experiment lies in bifurcation region btw multiscale-dominated & ion-scale- dominated turbulence regimes. In the transition, electron-scale transport is reduced by nonlinear mixing w/ ion-scale fluctuations & ion-scale-driven zfs.

Summary

First CGYRO Exascale simulations of multi-species burning plasmas on Frontier in 2023

Referencias

Documento similar

In addition, given the widespread use of prescription benzodiazepines in society, and their diversion to the illicit drug market, the increase in new benzodiazepines might also

Robertson, &#34;Heart rate spectral analysis as an indicator of autonomie function in patients with autonomie failure&#34;, Annual International Conference of the IEEE Engineering

8 Thus, this thesis contrasts the metaphorical expressions and their underlying conceptual metaphors identified in AmE and PneSp, showing the differences between

In this paper we analyze the spectra of a sample of asteroids in cometary orbits (ACOs) in order to understand the relationship between them, the Jupiter family comets (JFCs), and

Recent observations of the bulge display a gradient of the mean metallicity and of [Ƚ/Fe] with distance from galactic plane.. Bulge regions away from the plane are less

In this work we present a fully automatic system based on formant trajectories extracted from linguistic units, with high potential of fusion with state of the art spectral

(hundreds of kHz). Resolution problems are directly related to the resulting accuracy of the computation, as it was seen in [18], so 32-bit floating point may not be appropriate

In this paper, inspired by the analysis carried in [22–24], we exploit the holographic structure of the Multiscale Entanglement Renormalization Ansatz (MERA) tensor net- works,