• No se han encontrado resultados

Hardware accelerators for HEP

N/A
N/A
Protected

Academic year: 2024

Share "Hardware accelerators for HEP"

Copied!
18
0
0

Texto completo

(1)

Computing Challenges (COMCHA):

Hardware accelerators for HEP

A. Oyanguren (IFIC – Valencia)

XIII CPAN Days – Huelva, March 2022

(2)

 Motivation

 Hardware accelerators

 LHCb

 ATLAS

 Outlook

 CMS

Outline

 About COMCHA

(3)

Motivation

2028 bunches of protons per beam

1011 protons per bunch Beam energy of 7 TeV (access to ~10-16 cm) Luminosity 1034 cm-2 s-1

Crossing rate 40 MHz, i.e. 40 M collisions/s About 1 MB data per collision

40 TB/s

(New physics rate: ~ 0.00001 events/s) Proton-proton collision

=h/m

B

s

-

+

3

(4)

ATLAS

CMS LHCb

1 MHz

(Run3: 2022)

>

New strategy !

Motivation

The trigger systems:

(5)

Motivation

Bandwidth [GB/s] ~ Trigger output rate [kHz] x Average event size [MB]

The need of storage is given by the trigger bandwidth:

How many data can we record?

Raw event data size

~1 MB (ATLAS and CMS)

~0.1 MB (LHCb) 1 kHz (ATLAS & CMS)

12.5 kHz (LHCb)

~ 1 GB/s

 Moving to Real Time Analysisschemes: turbo (LHCb), scouting (CMS) and TLA (ATLAS)  Analysis Object Data formats (AOD) at the trigger level

~ 1kB in Run 3 and <5kB in HL-LHC

 Fast reconstruction in Real Time becomes crucial !

 Fast decisions: event must be either discarded forever or sent online for permanent storage between two collisions

5

(6)

Motivation

(From S. Campana, LHCC March 2022)

 The HL-LHC:

- LHCb and ALICE have already been upgraded for Run3 (LHCb x 5 luminosity) - HL-LHC will come next…

 Data reconstruction and storage will become a tough issue, reduced data formats will not be enough

→ need to move more complex event reconstruction at the earliest stage of the trigger

(7)

Hardware accelerators

 Use more than one kind of processor or cores to maximize performance or energy efficiency.

 Exploit the high level of parallelism to handle particular tasks.

Graphic Processor Units (GPUs) Field Programmable Gate Arrays (FPGAs)

- Programmable and flexible devices - Low latency

- Low power consumption

- Ideal for compute- and data-intensive workloads - Multicore processors, highly commercial

- High throughput

- Ideal for data –intensive parallelizable applications

7

(8)

Hardware accelerators

 In practice (ex: at LHCb)

(9)

PCIe slots

3 PCIe40 (FPGAs)

2 network connections

1-3 GPUs

40 Tb/s

Event Building

 In practice (ex: at LHCb), mounted server’s CPUs:

Hardware accelerators

CPU RAM fans

(10)

The upgraded LHCb for Run3:

LHCb

- No L0 hardware trigger  full detector read-out at 30 MHz !

- Detector data received by O(500) FPGAs and built into events in the Event Building servers - Full HLT1 on Real Time with GPUs (Allen project ) O(200) Nvidia RTX A5000

RAW DATA

Global Event Cut

Selected events

Selected event Muon decoding

Muon ID Find 2aryPV SCiFi decoding

SCiFi tracking

Parameterized KALMAN UT decoding

UT tracking VELO decoding

and clustering VELO tracking

Simple KALMAN

Find PV

[LHCB-TDR-021]

(11)

LHCb

VELO

UT PV

SCIFI

MUONS

KALMAN

 GPUs HLT1 sequence: algorithms breakdown (indicative), throughput and performance:

 Working on the implementation of more time-consuming algorithms (LLPs)

LHCb-FIGURE-2020-014 arXiv:2105.04031[physics.ins-det]

[Comput Softw Big Sci 4, 7 (2020)]

11

(12)

LHCb

 Real-time reconstruction on FPGAs with the “artificial retina”architecture

 VELO clustering already implemented for Run3 in FPGAs !

 Tracking in development for Run5 (~2030), coprocessor testbed established at CERN for tests in realistic conditions

[NIMA 453 (2000) 425-429]

(13)

ATLAS

Investigating FPGA implementation of deep learning algorithms for real-time signal

reconstruction in particle detectors under high pile-up conditions [JINST 14 (2019) 09, P09002]

Tests of quantized models in FPGA (Xilinx ZC104) showed up to 5 times more power efficiency with respect to a GPU (Nvidia RTX 2080TI) for the CNN reconstruction.

Machine Learning: NN models usually use floating point models, not efficient for FPGAs

study of the impact of the quantization on convolutional neural network models

 The HL-LHC pileup degrades the pulse quality of detectors, and then the performance of the reconstruction algorithms deteriorates.

13

(14)

Demonstrator of the HL-LHC electronics equipped with prototype of Phase-II electronics installed at Point 1 and reading- out a slice of ATLAS calorimeter:  x 

=1.0x0.1

Inserted since July 2019, it will read-out data during Run3

Thoroughly validated in multiple testbeam campaigns

ATLAS Tile CPM (2020): AMC ~ 7.4 x 18 cm,

Throughput (16 Gb/s line): TX: 512 Gb/s ; RX: 512 Gb/s 8 Firefly (24 links) , 1 Xilinx KU115FPGA.

FPGA Total: 1 Tbps

7.4 Gbps/cm2

ATLAS

(15)

CMS

Real-time muon tracking algorithm on FPGAs for the Upgrade CMS

DT Trigger primitives from the input hits (asynchronous)

Maximum resolution and reduced dead time: resolutions ~ offline

400 ns drift time, but 25 ns between collisions. Also, Left-right hits ambiguity

Expansions of the algorithm to include Pseudobayes approach for improving grouping step particularly under aging

Better performance expected for chamber aged scenarios towards end of HL-LHC

Full algorithm includes DT+RPC Superprimitives (as in Phase 1) 15

(16)

CMS

Firmware demonstration performed in Xilinx Virtex 7 (1 chamber phi view).

Validated at the lab (firmware-emulator comparison).

Installed at P5 and validated with Cosmic campaigns.

Exercised in KU115. Target implementation in Xilinx Virtex Ultrascale Plus VU13P (ATCA module).

Aiming at 1 or 2 sectors/FPGA.

Results for phi view DT chamber AM algo in KU115 (9%)

(17)

About COMCHA

2nd COMCHA School

FIC, Valencia, November 2021

https://twiki.ific.uv.es/twiki/bin/view/Main/ComCha

 Forum of discussions related to Computing Challenges in HEP and other fields

 Aiming to be transversal and synergetic

 Important for communication, coordination of activities, use of infrastructures, formation, etc …

Artificial Intelligence Machine Learning,

GPU and FPGA programming Use of Artemisa @ IFIC

(Contact: L. Fiorini, A. Oyanguren)

(18)

Outlook

On FPGAs:

 Around 90% of FPGA market is dominated by Xilinx and Altera.

(Intel acquired Altera in 2015 and AMD acquired Xilinx in 2020)

 Wide range of FPGA models. Families and models for

> High performance

> System On Chip

> General purpose

On GPUs:

Market dominated by NVIDIA and AMD

Huge amount of commercial models, both professional and gaming (cheaper)

Large AI developments and tools

 Hybrid: Systems combining FPGA, GPU and CPU features:

- Xilinx/AMD ACAP Versal - Altera/Intel Agilex

 Others processors (IPUs… ) ….

 HL-LHC will be characterized by improved detectors and huge data volumes

 Hardware accelerators are becoming crucial, in particular for the trigger systems

Referencias

Documento similar