IllustraCve Example of Distributed Analysis in ATLAS Spanish Tier2 and
Tier3
S. González, E. Oliver, M. Villaplana, A. Fernández, M. Kaci, A. Lamas, J.
Salt, J. Sánchez
PCI2010 Workshop
Rabat, 5
th‐7
thOctober 2011
The ATLAS CompuCng Challenge
• Since November of 2009 when LHC started:
– 700 millions of events recorded
– 66 PetaBytes stored (1 PB = million of Gigabytes) – 2700 physicist (from 174 insCtutes)
• This task has been done thanks to the Worldwide LHC Compu0ng Grid project (WLCG)
– Global collaboraCon linking grid infrastructures
• References:
– hap://lcg.web.cern.ch/LCG/Default.htm – hap://atlas‐runquery.cern.ch
– hap://bourricot.cern.ch/dq2/accounCng/global_view/0
Distributed Analysis in ATLAS
• GRID consists of compuCng resources around the world
• The WLCG defines different type of compuCng centres in Tiers:
• Reference: hap://lcg.web.cern.ch/LCG/Default.htm
Distributed Analysis in ATLAS
• ATLAS has a specifically system for ProducCon and Distributed Analysis (PANDA):
– Including all ATLAS requirements
– Highly automated – Low manpower
– Unifies the different grid environments (EGI‐Glite, OSG and EGI‐ARC)
– Monitoring web pages
• Reference:
hap://panda.cern.ch
Distributed Analysis in ATLAS
• For ATLAS users, GRID tools have been developed:
– For Data management
• Don Quijote 2 (DQ2)
– Data info: name, files, sites, number,…
– Download and register files on GRID,..
• ATLAS Metadata Interface (AMI)
– Data info: events number, availability – For simulaCon: generaCon parameter, …
• Data Transfer Request (DaTri)
– Users make request a set of data (datasets) to create replicas in other sites (under restricCons)
– For Grid jobs
• PanDa Client
– Tools from PanDa team for sending jobs in a easy way for user
• Ganga (Gaudi/Athena and Grid alliance)
– A job management tool for local, batch system and the grid
Distributed Analysis in ATLAS
References: hap://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasCompuCng
Tier2 and Tier3 examples from Spain
• The ATLAS Spanish Tier2 (T2‐ES) consists in a federaCon of 3 Spanish InsCtuCons (see Jose’s talk):
– IFAE‐Barcelona (25%) – UAM‐ Madrid (25%)
– IFIC‐Valencia (50%, coordinator)
• The T2‐ES represents 5% of the ATLAS resources (between 30‐40 T2s):
• References:
– J. Phys. Conf. Ser. 219 072046
– hap://indico.ific.uv.es/indico/conferenceDisplay.py?confId=440
Tier2 and Tier3 examples from Spain
• At IFIC the Tier3 resources are being split into two parts:
– Resources coupled to IFIC Tier2
• Grid environment
• Use by IFIC‐ATLAS users
• Resources are idle, used by the ATLAS community
– A computer farm to perform interacCve analysis (proof)
• outside the grid framework
• Reference:
– ATL‐SOFT‐PROC‐2011‐018
Daily user acCvity in Distributed Analysis
• An example of Distributed Analysis in heavy exoCc parCcles
– Input files
– Work flow:
Daily user acCvity in Distributed Analysis
• 1) A python script is created where requirements are defined
– ApplicaCon address, – Input, Output
– A replica request to IFIC – Splipng
• 2) Script executed with Ganga/Panda
– Grid job is sent
• 3) Job finished successfully, output files are copied in the IFIC Tier3
– Easy access for the user
Just in two weeks, 6 users for this analysis sent:
• 35728 jobs,
• 64 sites,
• 1032 jobs ran in T2‐ES (2.89%),
• Input: 815 datasets
• Output: 1270 datasets
New ATLAS CompuCng Model
• Hierarchical ATLAS CompuCng Model
– Tier2/3s are receiving data transfers from their assigned Tier1.
• New CompuCng Model (Mesh) some Tier2s (T2D) are connecCng to others Tier1s and Tier2s directly.
– Requirements for being/becoming a T2D are based on saCsfying transfer metrics with all Tier1s (network) and providing a certain level of commitment and reliability (robustness).
– Any site can replicate data from any other site.
– Dynamic data caching. Analysis sites receive datasets from any other site “on demand” based on usage paaern and possibly using a dynamic placement of datasets by centrally managed
replicaCon of whole datasets. Unused data is removed.
– Remote data access. Local jobs could access data stored at remote sites using a local caching on a file or sub‐file level.
– Panda Dynamic Data Placement (PD2P) is making replicas to other sites according the users acCvity
• References:
– haps://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperaConsFTS/#T2Ds_channels
New ATLAS CompuCng Model
• Hammercloud:
– Distributed Analysis tesCng system
– For avoiding jobs go to problemaCc sites
– Can excluded sites if test jobs are not passed – Reference:
• haps://twiki.cern.ch/twiki/bin/view/IT/HammerCloud
• ATLAS grid tools are improving day to day
– For instance: automaCc jobs for merging output files – haps://twiki.cern.ch/twiki/bin/viewauth/ATLAS/
AnalysisJobOutputMerging
• ATLAS users can ask to Distributed Analysis Support Team (DAST, hn‐atlas‐dist‐analysis‐[email protected]):
– Problems with her/his jobs – Useful for developers
• Improve the tools and services
Analysis Efficiency in September ATLAS Tier0 + Tier1s
ANALY*_queues
Analysis Efficiency in September
ATLAS Tier2s (ANALY*_queues)