Computing Challenges (COMCHA):
Hardware accelerators for HEP
A. Oyanguren (IFIC – Valencia)
XIII CPAN Days – Huelva, March 2022
Motivation
Hardware accelerators
LHCb
ATLAS
Outlook
CMS
Outline
About COMCHA
Motivation
2028 bunches of protons per beam
1011 protons per bunch Beam energy of 7 TeV (access to ~10-16 cm) Luminosity 1034 cm-2 s-1
Crossing rate 40 MHz, i.e. 40 M collisions/s About 1 MB data per collision
40 TB/s
(New physics rate: ~ 0.00001 events/s) Proton-proton collision
=h/m
B
s-
+
3
ATLAS
CMS LHCb
1 MHz
(Run3: 2022)
>
New strategy !
Motivation
The trigger systems:
Motivation
Bandwidth [GB/s] ~ Trigger output rate [kHz] x Average event size [MB]
The need of storage is given by the trigger bandwidth:
How many data can we record?
Raw event data size
~1 MB (ATLAS and CMS)
~0.1 MB (LHCb) 1 kHz (ATLAS & CMS)
12.5 kHz (LHCb)
~ 1 GB/s
Moving to Real Time Analysisschemes: turbo (LHCb), scouting (CMS) and TLA (ATLAS) Analysis Object Data formats (AOD) at the trigger level
~ 1kB in Run 3 and <5kB in HL-LHC
Fast reconstruction in Real Time becomes crucial !
Fast decisions: event must be either discarded forever or sent online for permanent storage between two collisions
5
Motivation
(From S. Campana, LHCC March 2022)
The HL-LHC:
- LHCb and ALICE have already been upgraded for Run3 (LHCb x 5 luminosity) - HL-LHC will come next…
Data reconstruction and storage will become a tough issue, reduced data formats will not be enough
→ need to move more complex event reconstruction at the earliest stage of the trigger
Hardware accelerators
Use more than one kind of processor or cores to maximize performance or energy efficiency.
Exploit the high level of parallelism to handle particular tasks.
Graphic Processor Units (GPUs) Field Programmable Gate Arrays (FPGAs)
- Programmable and flexible devices - Low latency
- Low power consumption
- Ideal for compute- and data-intensive workloads - Multicore processors, highly commercial
- High throughput
- Ideal for data –intensive parallelizable applications
7
Hardware accelerators
In practice (ex: at LHCb)
PCIe slots
3 PCIe40 (FPGAs)
2 network connections
1-3 GPUs
40 Tb/s
Event Building
In practice (ex: at LHCb), mounted server’s CPUs:
Hardware accelerators
CPU RAM fans
The upgraded LHCb for Run3:
LHCb
- No L0 hardware trigger full detector read-out at 30 MHz !
- Detector data received by O(500) FPGAs and built into events in the Event Building servers - Full HLT1 on Real Time with GPUs (Allen project ) O(200) Nvidia RTX A5000
RAW DATA
Global Event Cut
Selected events
Selected event Muon decoding
Muon ID Find 2aryPV SCiFi decoding
SCiFi tracking
Parameterized KALMAN UT decoding
UT tracking VELO decoding
and clustering VELO tracking
Simple KALMAN
Find PV
[LHCB-TDR-021]
LHCb
VELO
UT PV
SCIFI
MUONS
KALMAN
GPUs HLT1 sequence: algorithms breakdown (indicative), throughput and performance:
Working on the implementation of more time-consuming algorithms (LLPs)
LHCb-FIGURE-2020-014 arXiv:2105.04031[physics.ins-det]
[Comput Softw Big Sci 4, 7 (2020)]
11
LHCb
Real-time reconstruction on FPGAs with the “artificial retina”architecture
VELO clustering already implemented for Run3 in FPGAs !
Tracking in development for Run5 (~2030), coprocessor testbed established at CERN for tests in realistic conditions
[NIMA 453 (2000) 425-429]
ATLAS
Investigating FPGA implementation of deep learning algorithms for real-time signal
reconstruction in particle detectors under high pile-up conditions [JINST 14 (2019) 09, P09002]
Tests of quantized models in FPGA (Xilinx ZC104) showed up to 5 times more power efficiency with respect to a GPU (Nvidia RTX 2080TI) for the CNN reconstruction.
Machine Learning: NN models usually use floating point models, not efficient for FPGAs
study of the impact of the quantization on convolutional neural network models
The HL-LHC pileup degrades the pulse quality of detectors, and then the performance of the reconstruction algorithms deteriorates.
13
Demonstrator of the HL-LHC electronics equipped with prototype of Phase-II electronics installed at Point 1 and reading- out a slice of ATLAS calorimeter: x
=1.0x0.1
Inserted since July 2019, it will read-out data during Run3
Thoroughly validated in multiple testbeam campaigns
ATLAS Tile CPM (2020): AMC ~ 7.4 x 18 cm,
Throughput (16 Gb/s line): TX: 512 Gb/s ; RX: 512 Gb/s 8 Firefly (24 links) , 1 Xilinx KU115FPGA.
FPGA Total: 1 Tbps
7.4 Gbps/cm2
ATLAS
CMS
Real-time muon tracking algorithm on FPGAs for the Upgrade CMS
• DT Trigger primitives from the input hits (asynchronous)
• Maximum resolution and reduced dead time: resolutions ~ offline
• 400 ns drift time, but 25 ns between collisions. Also, Left-right hits ambiguity
• Expansions of the algorithm to include Pseudobayes approach for improving grouping step particularly under aging
• Better performance expected for chamber aged scenarios towards end of HL-LHC
• Full algorithm includes DT+RPC Superprimitives (as in Phase 1) 15
CMS
Firmware demonstration performed in Xilinx Virtex 7 (1 chamber phi view).
Validated at the lab (firmware-emulator comparison).
Installed at P5 and validated with Cosmic campaigns.
Exercised in KU115. Target implementation in Xilinx Virtex Ultrascale Plus VU13P (ATCA module).
Aiming at 1 or 2 sectors/FPGA.
Results for phi view DT chamber AM algo in KU115 (9%)
About COMCHA
2nd COMCHA School –
FIC, Valencia, November 2021
https://twiki.ific.uv.es/twiki/bin/view/Main/ComCha
Forum of discussions related to Computing Challenges in HEP and other fields
Aiming to be transversal and synergetic
Important for communication, coordination of activities, use of infrastructures, formation, etc …
Artificial Intelligence Machine Learning,
GPU and FPGA programming Use of Artemisa @ IFIC
(Contact: L. Fiorini, A. Oyanguren)
Outlook
On FPGAs:
Around 90% of FPGA market is dominated by Xilinx and Altera.
(Intel acquired Altera in 2015 and AMD acquired Xilinx in 2020)
Wide range of FPGA models. Families and models for
> High performance
> System On Chip
> General purpose
On GPUs:
Market dominated by NVIDIA and AMD
Huge amount of commercial models, both professional and gaming (cheaper)
Large AI developments and tools
Hybrid: Systems combining FPGA, GPU and CPU features:
- Xilinx/AMD ACAP Versal - Altera/Intel Agilex
Others processors (IPUs… ) ….
HL-LHC will be characterized by improved detectors and huge data volumes
Hardware accelerators are becoming crucial, in particular for the trigger systems