• No se han encontrado resultados

PDF Illustrave Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3

N/A
N/A
Protected

Academic year: 2024

Share "PDF Illustrave Example of Distributed Analysis in ATLAS Spanish Tier2 and Tier3"

Copied!
14
0
0

Texto completo

(1)

IllustraCve
Example
of
Distributed
 Analysis
in
ATLAS
Spanish
Tier2
and


Tier3


S.
González,
E.
Oliver,
M.
Villaplana,
 A.
Fernández,
M.
Kaci,
A.
Lamas,
J.


Salt,
J.
Sánchez


PCI2010
Workshop


Rabat,
5

th

‐7

th


October
2011


(2)

The
ATLAS
CompuCng

 Challenge


•  Since
November
of
2009
when
LHC
started:


–  700
millions
of
events
recorded


–  66
PetaBytes
stored
(1
PB
=
million
of
Gigabytes)
 –  2700
physicist
(from
174
insCtutes)


•  This
task
has
been
done
thanks
to
the
Worldwide
LHC
Compu0ng
 Grid
project
(WLCG)


–  Global
collaboraCon
linking
grid
infrastructures



•  References:


–  hap://lcg.web.cern.ch/LCG/Default.htm
 –  hap://atlas‐runquery.cern.ch


–  hap://bourricot.cern.ch/dq2/accounCng/global_view/0


(3)

Distributed
Analysis
in
ATLAS


•  GRID
consists
of
compuCng
resources
around
the
world


•   The
WLCG
defines
different
type
of
compuCng
centres
in
 Tiers:


•  Reference:
hap://lcg.web.cern.ch/LCG/Default.htm


(4)

Distributed
Analysis
in
ATLAS


•  ATLAS
has
a
specifically
 system
for
ProducCon
 and
Distributed
Analysis
 (PANDA):


–  Including
all
ATLAS
 requirements


–  Highly
automated
 –  Low
manpower


–  Unifies
the
different
grid
 environments
(EGI‐Glite,
 OSG
and
EGI‐ARC)


–  Monitoring
web
pages


•  Reference:


hap://panda.cern.ch


(5)

Distributed
Analysis
in
ATLAS


•  For
ATLAS
users,
GRID
tools
have
been
developed:


–  For
Data
management


•  Don
Quijote
2
(DQ2)


–  Data
info:
name,
files,
sites,
number,…


–  Download
and
register
files
on
GRID,..


•  ATLAS
Metadata
Interface
(AMI)


–  Data
info:
events
number,
availability
 –  For
simulaCon:
generaCon
parameter,
…


•  Data
Transfer
Request
(DaTri)


–  Users
make
request
a
set
of
data
(datasets)
to
create
replicas
in
other
 sites
(under
restricCons)


–  For
Grid
jobs


•  PanDa
Client


–  Tools
from
PanDa
team
for
sending
jobs
in
a
easy
way
for
user


•  Ganga
(Gaudi/Athena
and
Grid
alliance)


–  A
job
management
tool
for
local,
batch
system
and
the
grid


(6)

Distributed
Analysis
in
ATLAS


References:
hap://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasCompuCng


(7)

Tier2
and
Tier3
examples
from
Spain


•  The
ATLAS
Spanish
Tier2
(T2‐ES)
consists
in
a
federaCon
of
3
Spanish
 InsCtuCons
(see
Jose’s
talk):


–  IFAE‐Barcelona
(25%)
 –  UAM‐
Madrid
(25%)


–  IFIC‐Valencia
(50%,
coordinator)


•  The
T2‐ES
represents
5%
of
the
ATLAS
resources
(between
30‐40
T2s):


•  References:



–  J.
Phys.
Conf.
Ser.
219
072046


–  hap://indico.ific.uv.es/indico/conferenceDisplay.py?confId=440


(8)

Tier2
and
Tier3
examples
from
Spain


•  At
IFIC
the
Tier3
resources
are
 being
split
into
two
parts:


–  Resources
coupled
to
IFIC
Tier2


•  Grid
environment


•  Use
by
IFIC‐ATLAS
users


•  Resources
are
idle,
used
by
the
 ATLAS
community


–  A
computer
farm
to
perform
 interacCve
analysis
(proof)


•  outside
the
grid
framework


•  Reference:



–  ATL‐SOFT‐PROC‐2011‐018


(9)

Daily
user
acCvity
in
Distributed
 Analysis


•  An
example
of
Distributed
Analysis
in
heavy
 exoCc
parCcles


–  Input
files


–  Work
flow:


(10)

Daily
user
acCvity
in
Distributed
Analysis


•  1)
A
python
script
is
created
where
requirements
are
defined



–  ApplicaCon
address,
 –  Input,
Output


–  A
replica
request
to
IFIC
 –  Splipng


•  2)
Script
executed
with
Ganga/Panda


–  Grid
job
is
sent


•  3)
Job
finished
successfully,
output
files
are
copied
in
the
IFIC
Tier3


–  Easy
access
for
the
user


Just
in
two
weeks,
6
users
for
this
 analysis
sent:


•  
35728
jobs,



•  
64
sites,


•  
1032
jobs
ran
in
T2‐ES
(2.89%),


•  Input:
815
datasets


•  Output:
1270
datasets


(11)

New
ATLAS
CompuCng
Model


•  Hierarchical
ATLAS
CompuCng
Model


–  Tier2/3s
are
receiving
data
transfers
from
their
assigned
Tier1.


•  New
CompuCng
Model
(Mesh)
some
Tier2s
(T2D)
are
connecCng
to
others
Tier1s
and
 Tier2s
directly.


–  Requirements
for
being/becoming
a
T2D
are
based
on
saCsfying
transfer
metrics
with
all
Tier1s
 (network)
and
providing
a
certain
level
of
commitment
and
reliability
(robustness).


–  Any
site
can
replicate
data
from
any
other
site.


–  Dynamic
data
caching.
Analysis
sites
receive
datasets
from
any
other
site
“on
demand”
based
on
 usage
paaern
and
possibly
using
a
dynamic
placement
of
datasets
by
centrally
managed


replicaCon
of
whole
datasets.
Unused
data
is
removed.


–  Remote
data
access.
Local
jobs
could
access
data
stored
at
remote
sites
using
a
local
caching
on
a
 file
or
sub‐file
level.


–  Panda
Dynamic
Data
Placement
(PD2P)
is
making
replicas
to
other
sites
according
the
users
 acCvity


•  References:


–  haps://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperaConsFTS/#T2Ds_channels


(12)

New
ATLAS
CompuCng
Model


•  Hammercloud:


–  Distributed
Analysis
tesCng
system


–  For
avoiding
jobs
go
to
problemaCc
sites


–  Can
excluded
sites
if
test
jobs
are
not
passed
 –  Reference:


•  haps://twiki.cern.ch/twiki/bin/view/IT/HammerCloud


•  ATLAS
grid
tools
are
improving
day
to
day


–  For
instance:
automaCc
jobs
for
merging
output
files
 –  haps://twiki.cern.ch/twiki/bin/viewauth/ATLAS/

AnalysisJobOutputMerging


•  ATLAS
users
can
ask
to
Distributed
Analysis
Support
Team
 (DAST,
hn‐atlas‐dist‐analysis‐[email protected]):


–  Problems
with
her/his
jobs
 –  Useful
for
developers


•  Improve
the
tools
and
services


(13)

Analysis
Efficiency
in
September
 ATLAS
Tier0
+
Tier1s


ANALY*_queues


(14)

Analysis
Efficiency
in
September


ATLAS
Tier2s
(ANALY*_queues)


Referencias

Documento similar