PDF Illustrave Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3

(1)

IllustraCve Example of Distributed  Analysis in ATLAS Spanish Tier2 and 

Tier3 

S. González, E. Oliver, M. Villaplana,  A. Fernández, M. Kaci, A. Lamas, J. 

Salt, J. Sánchez 

PCI2010 Workshop 

Rabat, 5

^th

‐7

^th

 October 2011 

(2)

The ATLAS CompuCng   Challenge 

•  Since November of 2009 when LHC started: 

–  700 millions of events recorded 

–  66 PetaBytes stored (1 PB = million of Gigabytes)  –  2700 physicist (from 174 insCtutes) 

•  This task has been done thanks to the Worldwide LHC Compu0ng  Grid project (WLCG) 

–  Global collaboraCon linking grid infrastructures  

•  References: 

–  hap://lcg.web.cern.ch/LCG/Default.htm  –  hap://atlas‐runquery.cern.ch 

–  hap://bourricot.cern.ch/dq2/accounCng/global_view/0 

(3)

Distributed Analysis in ATLAS 

•  GRID consists of compuCng resources around the world 

•   The WLCG deﬁnes diﬀerent type of compuCng centres in  Tiers: 

•  Reference: hap://lcg.web.cern.ch/LCG/Default.htm 

(4)

Distributed Analysis in ATLAS 

•  ATLAS has a speciﬁcally  system for ProducCon  and Distributed Analysis  (PANDA): 

–  Including all ATLAS  requirements 

–  Highly automated  –  Low manpower 

–  Uniﬁes the diﬀerent grid  environments (EGI‐Glite,  OSG and EGI‐ARC) 

–  Monitoring web pages 

•  Reference: 

hap://panda.cern.ch 

(5)

Distributed Analysis in ATLAS 

•  For ATLAS users, GRID tools have been developed: 

–  For Data management 

•  Don Quijote 2 (DQ2) 

–  Data info: name, ﬁles, sites, number,… 

–  Download and register ﬁles on GRID,.. 

•  ATLAS Metadata Interface (AMI) 

–  Data info: events number, availability  –  For simulaCon: generaCon parameter, … 

•  Data Transfer Request (DaTri) 

–  Users make request a set of data (datasets) to create replicas in other  sites (under restricCons) 

–  For Grid jobs 

•  PanDa Client 

–  Tools from PanDa team for sending jobs in a easy way for user 

•  Ganga (Gaudi/Athena and Grid alliance) 

–  A job management tool for local, batch system and the grid 

(6)

Distributed Analysis in ATLAS 

References: hap://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasCompuCng 

(7)

Tier2 and Tier3 examples from Spain 

•  The ATLAS Spanish Tier2 (T2‐ES) consists in a federaCon of 3 Spanish  InsCtuCons (see Jose’s talk): 

–  IFAE‐Barcelona (25%)  –  UAM‐ Madrid (25%) 

–  IFIC‐Valencia (50%, coordinator) 

•  The T2‐ES represents 5% of the ATLAS resources (between 30‐40 T2s): 

•  References:  

–  J. Phys. Conf. Ser. 219 072046 

–  hap://indico.iﬁc.uv.es/indico/conferenceDisplay.py?confId=440 

(8)

Tier2 and Tier3 examples from Spain 

•  At IFIC the Tier3 resources are  being split into two parts: 

–  Resources coupled to IFIC Tier2 

•  Grid environment 

•  Use by IFIC‐ATLAS users 

•  Resources are idle, used by the  ATLAS community 

–  A computer farm to perform  interacCve analysis (proof) 

•  outside the grid framework 

•  Reference:  

–  ATL‐SOFT‐PROC‐2011‐018 

(9)

Daily user acCvity in Distributed  Analysis 

•  An example of Distributed Analysis in heavy  exoCc parCcles 

–  Input ﬁles 

–  Work ﬂow: 

(10)

Daily user acCvity in Distributed Analysis 

•  1) A python script is created where requirements are deﬁned  

–  ApplicaCon address,  –  Input, Output 

–  A replica request to IFIC  –  Splipng 

•  2) Script executed with Ganga/Panda 

–  Grid job is sent 

•  3) Job ﬁnished successfully, output ﬁles are copied in the IFIC Tier3 

–  Easy access for the user 

Just in two weeks, 6 users for this  analysis sent: 

•   35728 jobs,  

•   64 sites, 

•   1032 jobs ran in T2‐ES (2.89%), 

•  Input: 815 datasets 

•  Output: 1270 datasets 

(11)

New ATLAS CompuCng Model 

•  Hierarchical ATLAS CompuCng Model 

–  Tier2/3s are receiving data transfers from their assigned Tier1. 

•  New CompuCng Model (Mesh) some Tier2s (T2D) are connecCng to others Tier1s and  Tier2s directly. 

–  Requirements for being/becoming a T2D are based on saCsfying transfer metrics with all Tier1s  (network) and providing a certain level of commitment and reliability (robustness). 

–  Any site can replicate data from any other site. 

–  Dynamic data caching. Analysis sites receive datasets from any other site “on demand” based on  usage paaern and possibly using a dynamic placement of datasets by centrally managed 

replicaCon of whole datasets. Unused data is removed. 

–  Remote data access. Local jobs could access data stored at remote sites using a local caching on a  ﬁle or sub‐ﬁle level. 

–  Panda Dynamic Data Placement (PD2P) is making replicas to other sites according the users  acCvity 

•  References: 

–  haps://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperaConsFTS/#T2Ds_channels 

(12)

New ATLAS CompuCng Model 

•  Hammercloud: 

–  Distributed Analysis tesCng system 

–  For avoiding jobs go to problemaCc sites 

–  Can excluded sites if test jobs are not passed  –  Reference: 

•  haps://twiki.cern.ch/twiki/bin/view/IT/HammerCloud 

•  ATLAS grid tools are improving day to day 

–  For instance: automaCc jobs for merging output ﬁles  –  haps://twiki.cern.ch/twiki/bin/viewauth/ATLAS/

AnalysisJobOutputMerging 

•  ATLAS users can ask to Distributed Analysis Support Team  (DAST, hn‐atlas‐dist‐analysis‐[email protected]): 

–  Problems with her/his jobs  –  Useful for developers 

•  Improve the tools and services 

(13)

Analysis Eﬃciency in September  ATLAS Tier0 + Tier1s 

ANALY*_queues 

(14)

Analysis Eﬃciency in September 

ATLAS Tier2s (ANALY*_queues) 

PDF Illustrave Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3

IllustraCve Example of Distributed Analysis in ATLAS Spanish Tier2 and

– Global collaboraCon linking grid infrastructures

• Reference: hap://lcg.web.cern.ch/LCG/Default.htm

Distributed Analysis in ATLAS

• Data Transfer Request (DaTri)

Distributed Analysis in ATLAS

Tier2 and Tier3 examples from Spain

• outside the grid framework

– A replica request to IFIC – Splipng

• 64 sites,

– Requirements for being/becoming a T2D are based on saCsfying transfer metrics with all Tier1s (network) and providing a certain level of commitment and reliability (robustness).

– Panda Dynamic Data Placement (PD2P) is making replicas to other sites according the users acCvity

• haps://twiki.cern.ch/twiki/bin/view/IT/HammerCloud

• Improve the tools and services

IllustraCve Example of Distributed  Analysis in ATLAS Spanish Tier2 and 

–  Global collaboraCon linking grid infrastructures  

•  Reference: hap://lcg.web.cern.ch/LCG/Default.htm 

Distributed Analysis in ATLAS 

•  Data Transfer Request (DaTri) 

Distributed Analysis in ATLAS 

Tier2 and Tier3 examples from Spain 

•  outside the grid framework 

–  A replica request to IFIC  –  Splipng 

•   64 sites, 

–  Requirements for being/becoming a T2D are based on saCsfying transfer metrics with all Tier1s  (network) and providing a certain level of commitment and reliability (robustness). 

–  Panda Dynamic Data Placement (PD2P) is making replicas to other sites according the users  acCvity 

•  haps://twiki.cern.ch/twiki/bin/view/IT/HammerCloud 

•  Improve the tools and services