• No se han encontrado resultados

An Ecosystem of Tools

N/A
N/A
Protected

Academic year: 2023

Share "An Ecosystem of Tools"

Copied!
12
0
0

Texto completo

(1)

An Ecosystem of Tools for Broad

Heterogeneous Memory Usage

27 June, 2022 Marc Jordà

(2)

Overview & Motivation

• To efficiently exploit heterogeneous memory:

• Bring them as first-class citizens

• Move from hierarchical to explicitly managed

• Application’s data distribution?

• OS? Heuristics? On-the-fly monitoring? Hardware-assisted? Historic data? User hints?

• Need ecosystem to assist users/developers: tools

• Profilers, libraries, runtime systems

2

(3)

Overview & Motivation

• Heterogeneous Memory Systems

• KNL: DRAM + MCDRAM (↑ BW, ↑ Lat.)  R.I.P.

• Byte-addressable NVRAM DIMMs (↓ BW, ↑ Lat., persistent)

• Intel® Optane™ Persistent Memory (PMem)

• Goal: Assess optimal data distribution

• Maximize performance

• Minimize energy

• …

DRAM DRAMDIMMs SSDs Optane

(4)

Systems w/ Optane DIMMs

• Two levels of memory

• Main memory (DRAM)

• Processor has direct access to all of main memory

• Regular DRAM latency/bandwidth

• Intel® Optane™ PMem

• Very high capacity + persistency

• Higher latency, but much better than SSDs

4

Processors

DRAM

Optane

DIMMs

(5)

Optane PMem Modes

• App Direct Mode (Heterogeneous Memory)

• DRAM and Optane DIMMs are both available

• More overall memory available

• Software managed (applications need to handle themselves)

• Memory Mode (Deep Memory)

• DRAM as cache for Optane DIMMs

• Only Optane DIMMs address space

• Done in hardware (applications and OS don’t need to be modified)

Processor DRAM Optane DIMMs Processor

DRAM

Optane

DIMMs

(6)

Framework

(7)

Our Framework

Steps

 Profiling (Extrae & Paramedir)

– Collects allocation´s address, size, and allocation point callstack (our object ID)

– Sampling of precise profiling events (PEBS) for LLC load and L1 store misses, which include the accessed address

 Analysis (Hmem Advisor)

– Computes per-object cost using heuristics based on profiling data to assign each object to a memory tier – Greedy relaxation of the 0/1 multiple knapsack problem + optional fine-tuning heuristics (e.g. BW-aware)

 Execution with data placement

– Execution of the original binary with the FlexMalloc library, which interposes memory allocation functions to redirect each allocation to the corresponding memory tier

H. Servat, A. J. Peña, G. Llort, E. Mercadal, H. C. Hoppe, and J. Labarta

“Automating the application data placement in hybrid memory systems”

Initial framework work:

Compiler Toolchain

Memory Profiler

Profile Analyzer Source

Code

Executable Object

Execution Input

Runtime Allocator

Profile Data

Object Distribution

1

2 3

4

5 6

7 8

Extrae

Hmem Advisor FlexMalloc

Paramedir

 Based on offline profiling + user-level interposition

 Manages memory at the dynamic allocation granularity

(8)

Experimental Evaluation

 Hardware

– Intel Xeon Platinum 8260L CPU @ 2.30GHz, HT disabled

– 16 GB of DRAM

– Two Optane configurations

• PMem6: 6x 512 GB Optane™ PMem

• PMem2: 2x 512 GB Optane™ PMem

 Software stack

– Fedora 27 (kernel 4.18.8-100.fc27.x86_64) – Intel Compiler Suite 2019u3

– Intel MPI

8

 Hmem Advisor parameters – DRAM limit: 4, 8, 12 GB

– Profiling data: Loads only, loads+stores

 Comparison with – Memory Mode

– Experimental Kernel-level page migration (tiering-0.7 kernel)

– ProfDP

(9)

Some Results

Mini apps

 Comparison with memory-mode and an experimental kernel-level page migration approach

 With 12GB, in all PMem-6 and most Pmem-2 cases, our framework performs better than MM and KPM

– In some of them even using 4x less DRAM

 Similar to ProfDP performance

Full apps

 LAMMPS

– Ld: 4% overhead vs. MM – Ld+St: 3% overhead

 OpenFOAM

– Ld: 3% speedup – Ld+St: 6.2% speedup

 More complex memory access patterns -> harder to capture object relevance in the cost heuristics

 We plan to analyse more applications in the future

Miniapps performance results

(10)

Ongoing & Future work

• Applications:

• Try out ecoHMEM with more HPC applications (DEEP-SEA project, …)

• TensorFlow

• Automatic application code modification

• Alternative to Flexmalloc´s runtime-interposition approach

• Source-to-source compiler to modify app’s allocation points according to Hmem Advisor output

• Integration with Kernel-level page migration approaches

• Initial placement from ecoHMEM

(11)

ecoHMEM 1.0 Released

• Download tarball from BSC SW repository:

• https://www.bsc.es/discover-bsc/organisation/scientific-structure/accelerators-and-communications- hpc/team-software

• Source code currently available from EPEEC’s Project repository:

• https://github.com/epeec/ecoHMEM

• Flexible Memory Allocation Tool also available:

• https://github.com/intel/flexmalloc

(12)

Referencias

Documento similar

Our goal was to study immune responses against viral infections in order to obtain potentially useful information for the design of new vaccination strategies, focusing on

2)  Cycle  through  the  snapshots,  recording  the  object  each  participant  was  looking  at,  using  a  set  of  categories,  including  the  devices, 

In the light of these reflections, and in order to fulfil the initial premises, our proposal presents a knowledge-model for object identification that: 1) describes an object

The language is called OOCSMP, an object-oriented extension of the CSMP simulation language, also extended to solve partial differential equations using different methods.. Programs

It sends a packet of data containing the current state of the telescope mechanisms once per second to TCS_TELD.. Also it sends a packet containing object-related data when

Method: This article aims to bring some order to the polysemy and synonymy of the terms that are often used in the production of graphic representations and to

These data can lead us to think that within Latin American and European universities aimed at teaching and researching on communication as an object of study,

1) Stationary: This problem is also known as sleeping person and could lead, in some approaches, to the ghost phenomenon. It consists on having in a spatial location an object