Status of Deep-Inelastic Scattering and PDFs for the LHC

(1)

ATLAS Data Acquisition: from Run I to Run II

William Panduro Vazquez

Royal Holloway, University of London

On behalf of the ATLAS collaboration

(2)

Overview

• DAQ System for Run 1

• Run 2 Challenges & Solutions

• Readout System upgrade

• DataFlow network overhaul

• High Level Trigger (HLT) changes

• New software concepts, techniques, libraries

04/07/2014 W. Panduro Vazquez (RHUL), ICHEP 2014 2

(3)

ATLAS TDAQ System (Run 1)

3

04/07/2014 W. Panduro Vazquez (RHUL), ICHEP 2014

(4)

ATLAS TDAQ System (Run 1)

4

The focus of this talk

(5)

Run 2 Challenges

• Run conditions from 2015

– Pileup

(multiple interactions per bunch crossing)

– Peak of 40 in 2012

(luminosity 7x10³³)

• Design: avg pileup of 21 for luminosity 10³⁴

– For Run 2 estimate max average 55

• Assuming 25 ns crossings

– With 50 ns could reach 80

• Peak value more significant for DAQ system

– Potentially larger event size (up to 2 MB) – Longer HLT processing time

• Increased buffering capacity required

• ATLAS DAQ changes for 2015

– L1 trigger rate 100 kHz (70 kHz in 2012)

– 20% increase in number of readout channels

• Higher luminosity and energy requiring more channels to read out existing hardware

• New detector components and trigger readout

– Insertable B-layer (IBL) – new pixel detector layer (talks by W. Lukas & B. Mandelli – Detector RD & Performance) – L1 topological trigger (talk by F. Pastore – Detector RD & Performance)

– In 2016: Fast tracker (FTK) (poster by T. Iizawa)

– Average rate written to disc ~1 kHz

• Run 1 reality becomes a requirement

• Peak rates potentially much higher

2012 Avg: 20.7

(6)

Run 2 Solutions

• Global requirement: higher capacity, more flexible DAQ system

– Increase buffering capacity and readout rate

• Overhaul of readout system (ROS)

– Increase network capacity

• Simpler, more scalable design

– Upgraded High Level Trigger

• Increase number of available cores

• Merge/optimise Level 2 and Event Filter Layers

• Major undertaking over tight 2 year timescale!

(7)

Readout System in Run 1

• Based on off-the shelf PCs (~150 in run 1)

• Receives data via ~1600 fibre readout links (ROLs) after L1 accept

– Stored in buffer on custom PCI card (ROBIN) – 3 ROLs per ROBIN

– 4-5 ROBINs per PC (12 – 15 ROLs) – 2 x 1 GbE output network connections

• Serve data to HLT and hold buffered data for entire HLT latency period

• Read out desired data to event building system on L2 accept, delete rejected events

• Not quite capable of satisfying max required rates for Run 2

• Run 1 experience suggests requirements will evolve beyond expectation

• Increase in number of ROLs (20%) will use majority of remaining spares

– Rack space a problem with current link density

• Beginning to run into hardware obsolescence and end-of-life problems

• Limited memory capacity per channel could limit HLT performance

• Upgrade Goals:

– Manage 100 kHz input rate from L1 while reading out 50% of this data

– Significantly increase link density and reduce system footprint Max Output Rate

Predicted

Readout Scenario

Pixel Detector

(kHz)

Hadron Calo Endcap

(kHz) 2012 Data Taking

(75 kHz L1 Rate)

30 33

2015 Rate Predictions (100 kHz L1 Rate)

53 48

Current ROS Max Lab Performance (100 kHz L1 Rate)

50 45

Example rates for ROS PCs from most demanding systems

(8)

8

Card ROBIN RobinNP

Number of Readout Links 3 12

Total RAM on-board (MB) 192 8192

Memory per link (MB) 64 682

Max Input Bandwidth (MB/s) 480 1920 Output Bandwidth (MB/s) 264 1600

(3200 possible)

Readout System Upgrade

12 x 6 Gb/s optical link

1156-pin Xilinx Virtex 6 FPGA 2x DDR3 SODIMM

Gen2 x8 PCI Express

• Faster , smaller ROS PCs with higher link density

– Buffers housed on new PCIe card: RobinNP – New ROS PCs will be half the height of the old – 2 RobinNPs per ROS PC (24 ROLs)

– Link density 3x greater

• Significant space, power and cooling saving

• Much greater scope for later expansion

– Significant output network upgrade (4 x 10 GbE)

ROS Features Run 1 Run 2

Number of PCs 150 100

Max Number of PCs per Rack 10 15 Average Readout Links per PC 12 24 Max PC Input Bandwidth (MB/s) 1920 3840 Max PC Output Bandwidth (MB/s) 200 4000

• Data management done by CPU of host PC

– Original ROBIN tasks managed by on-board Power PC processor

– Host processing easier to maintain, can upgrade CPU for performance boost

• Hardware based on board developed by ALICE (Common Readout Receiver Card or ‘C-RORC’)

– Custom ATLAS firmware implementing enhanced

‘ROBIN’ functionality

(9)

Dataflow Network in Run 1

Data Collection Network (DC1, DC2)

Back-end

Network (BE1, BE2)

DC1 DC2

BE1 BE2

L2 Trigger and Event Filter Nodes (~8k cores)

Event Builder

Nodes (48) running 96 processes in total

Readout PCs ~150

Data Logger (~5) Event Filter

Nodes

(~8k cores)

Data from front-end

Offline processing and storage

10 GbE 1 GbE rack fan in/out

Level 2

Supervisors (~6)

04/07/2014

W. Panduro Vazquez (RHUL), ICHEP 2014 9

(10)

Dataflow Network in Run 2

Data Collection Network (DC1, DC2)

DC1 DC2

Combined High Level Trigger (L2, EF & EB) Nodes (~20k Cores)

Next Gen Readout PCs ~100

Data Logger (~10) Data from

front-end

Offline processing and storage

10 GbE 1 GbE rack fan in/out

• Direct 10 GbE connection to readout PCs

• Merger of L2, EF and EB functionality into HLT

• No need for back-end network layer

• Backbone implemented and

• ROS connectivity Q4 2014

HLT

Supervisor (1)

04/07/2014

W. Panduro Vazquez (RHUL), ICHEP 2014 10

(11)

High Level Trigger Upgrade

• Remove distinction between Level-2 (L2) nodes and Event Filter (EF) nodes

– Details in talk by F. Pastore (Detector RD & Performance)

• Previous complicated dataflow and control replaced by Data Collection Manager (DCM)

Run 1 Run 2

Level 1 Trigger

RoI Builder

HLT Supervisor

HLT Processor

DCM

Readout System

Data Logger

Regions of Interest

Regions of Interest Discard data

Serve Data

End-of-event processing

Data Request

Functionality Combined

• Common nodes performing complete decision process

– Optimised and more flexible event building

• Eliminates previous limit of 7 kHz

• Major saving in bandwidth and processing time

– No need to pack/unpack and transfer data between nodes

– Event building no longer needs to duplicate requests for data that the HLT already holds through prior L2 request

• More advanced processing allowing simpler, more efficient dataflow

– Better HLT node load balancing

(12)

New Software Technologies

• To achieve upgrade goals many new techniques implemented

– Advantage also taken of opportunity to upgrade from obsolete architectures

– Widespread adoption of C++11 capabilities

• Dataflow request-gathering protocol

– Now based on asynchronous messaging based on boost::asio

• Easier to maintain (fewer custom implementations/packages)

• Simpler, more generic threads

• Threading overhaul (throughout ROS and HLT)

– Adoption of C++11 threading libraries accompanied by well supported externals: Intel threaded building blocks (tbb)

(13)

ATLAS TDAQ System (Run 2)

13

(14)

Summary

• Major overhaul of DAQ system under way

– Needed to handle new run conditions (pileup) and performance requirements (higher rates) – New detector and trigger components being integrated

• ROS upgrade

– New smaller, faster PCs, 3 x denser packing of links – New custom buffer hardware (RobinNP)

– Installation scheduled for Q4 2014

• Dataflow network

– Comprehensive capacity upgrade

– Simpler, more efficient structure eliminating need for backend layer – Upgrade nearing completion

• Testing and validation ongoing

• ROS connectivity to be implemented throughout summer 2014

• HLT upgrade

– Merging of level 2 trigger and event filter

– Major overhaul of trigger algorithms (see talk by F. Pastore – Detector RD & Performance) – Ongoing integration and testing

• New technology (hardware, firmware and software) throughout

– Integrating lessons of Run 1

• On schedule to be ready by end of 2014

– Regular programme of milestone weeks throughout the year to drive integration

• 2014 is proving a busy but rewarding year!

(15)

Extra Slides

(16)

Effect of Pileup on DAQ

L2 Trigger Event Filter

• Processing time vs average pileup <μ> for different parts of the HLT

• Different performance for different core types

• Average pileup also results in an

increased overall event size

(17)

Run 1 ROBIN Card

• Memory buffers implemented on custom PCI hardware (ROBIN)

– 3 readout links per card – 64 MB memory per link

– Virtex-II FPGA + PPC chip for event indexing and transfer management – 64 bit PCI (264 MB/s) Bus for data transfer to ROS PC

– Up to 5 ROBINs per PC (15 channels)

• 4 in most cases (12 channels) Virtex II FPGA

3 x 64 MB Memory buffers Power PC

Processor 64 bit 66 Mhz PCI Bus

264 MB/s 3 x 2 Gb/s

optical links

(18)

New ROS Performance

• Current baseline measurements (more performance improvement possible)

• New ROS PC: Intel E5-1650v2 (6 cores + HT) @ 3.5 Ghz

• Exceed Run 2 performance requirements

• Readout of 100% of data possible for fewer links

(19)

RobinNP Indexer Thread

(1 per ROBgroup)

Request Processor Thread

(1 per ROBgroup)

DMA Monitoring Thread

(1 per ROBgroup)

Async IO Threads

Free Descriptor Queue (Semaphore)

Request Queue (Semaphore)

Memory Pages (FIFO + Interrupts)

RobinNP Hardware

Queue DMA

Receive DMA (FIFO + Interrupts) Descriptor Flow

Page Flow

DMA & DMA Control

HLT Traffic and Data Collection

HLT Traffic

Data Collector Thread

Processed

Descriptor Queue (Semaphore)

RobinNPReadoutModule

TriggerIn

Status of Deep-Inelastic Scattering and PDFs for the LHC

ATLAS Data Acquisition: from Run I to Run II

ATLAS TDAQ System (Run 1)

– Potentially larger event size (up to 2 MB) – Longer HLT processing time

– Average rate written to disc ~1 kHz

– Increase network capacity

Readout System Upgrade

• Hardware based on board developed by ALICE (Common Readout Receiver Card or ‘C-RORC’)

Data from front-end

front-end

High Level Trigger Upgrade

Run 1 Run 2

New Software Technologies

• Threading overhaul (throughout ROS and HLT)

• ROS upgrade

• HLT upgrade

• 2014 is proving a busy but rewarding year!

increased overall event size

New ROS Performance

Free Descriptor Queue (Semaphore)

RobinNP Hardware

Queue DMA

Data Collector Thread

Processed

ROS Software Architecture