artículo 48 de la RSRIE, donde encontramos dos condiciones básicas de tal
3.4 CIRCUNSTANCIAS QUE EXCLUYEN LA ILICITUD DE UN HECHO INTERNACIONALMENTE ILÍCITO
3.4.2 CIRCUNSTANCIAS QUE EXCLUYEN LA MATERIALIDAD DEL ILÍCITO
lenge
At the time of writing, there are five ATLAS sites acting as TAG sites, four in Europe and one in Canada. They all host one or more TAG services as defined in Section 2.3.4. More sites are currently considering joining the TAG project.
Regarding the data distribution, there is a central unit at Tier-0 configuring the TAG uploads with regards to the sites the data are sent to. The general strategy is to have at least two copies of
2.5. DISTRIBUTION OF DATA AND SERVICES AND RESULTING CHALLENGE 23
Call iELSSI Define Query Review Selection
Selection ok? yes no
Extract data Extracted TAG File
Extract TAG data.
Involved services: TASK Lookup, Trigger Decoding, iELSSI, TAG Database, Extract XML Builder, Extract
Display events in iELSSI.
Involved services: TASK Lookup, Trigger Decoding, iELSSI, TAG Database Call iELSSI Define Query Review Selection
Selection ok? yes no
Stored Selection Count/Display Events
Figure 2.5: Typical TAG Workflows
each set of data for backup purposes, and the latest data available at more sites, because it is queried most frequently. The distributed TAG databases currently have an overall capacity of 80 terabytes. Data are grouped in so-called projects and reprocessing passes, and usually a whole reprocessing set it sent to a site, not only a fragment. However, such a fragmentation can happen and might even be emphasized in the future, when space limitations arise. A system keeping track of the data distribution in real time has been implemented in the context of this thesis work. The so-called TASK - TAG Application Service Knowledge base - is described in detail in Chapter 4 (Section 4.4). This knowledge base also records all available services, together with monitoring information and computed metrics. It allows treating all TAG databases as a single distributed database management system.
TAG sites are hosting other services than the TAG database on a voluntary basis, as resources are available. The minimum requirement is to have a set of web services per continent, in order not to suffer from high network latencies. However, it is preferable to have more service deployments, in order to allow for load balancing and fail-over.
To summarize, the TAG system is composed of several data sources, several abstract services and one or more concrete deployment(s) per service. All deployments of a service are identical in terms of functionality and software version, but have different QoS attributes, as they run on heterogeneous hardware, the hosting sites have different availabilities, different access policies, etc. Figure 2.6 is a simplified schematic representation of the distributed TAG system.
When a concrete workflow, such as one of the examples described in Section 2.4, is built to respond to a request, there are thus several choices for composing deployments. The choice of data sources and service deployments should be made automatically and based on rational decisions, conforming to the goals stated in Section 1.2 (minimize the response time per request and ensure a fair and efficient usage of all available resources). As TAG services are getting used more frequently, since data taking at ATLAS is a continuous process, and an important amount of data are available
Services and QoS Registry TAG DB TAG DB TAG DB TAG DB TAG DB TAG Site TAG Site TAG Site TAG Site TAG Site
Figure 2.6: Distributed TAG System
for analysis, a careful selection of services on a per-request basis is getting crucial in order to ensure an efficient event selection system based on TAGs. With the current setup, the problem size is rather small and it might be argued that no specific algorithm or approach is needed to address the service selection problem. However, the system is expected to grow, both regarding the number of sites and the number of deployments. Figure 2.5 shows the differences in execution times on the simple example of composing an iELSSI deployment with a TAG database deployment for an event count based on the query stated in Equation 2.4. This query is returning 9146 events satisfying the query out of a total of 36056886 events. CERN refers to the Tier-0 facility, TRIUMF is a data center in Vancouver, Canada, and RAL is the data center at Rutherford Appleton Laboratory in the U.K. The Total time is the time in seconds from the query submission to the display of the results in the IELSSI browser, thus encompassing the query time on the database and the time for sending the results over the network, building the page, etc. Figure 2.7(a) shows the values measured in ten subsequent runs with a user at CERN using the iELSSI interface at CERN and varying database back-ends. The decrease between the first run and the others can be explained by database caching. It can be seen that the total time when querying the database at TRIUMF is approximately 20 times higher than when the CERN database is queried, although the two database times (i.e., the actual query time on the database) do not differ much. The difference thus clearly lies in the link between CERN and the geographically far away TRIUMF. Figure 2.7(b) allows drawing similar conclusions. In this test series, the user issuing the query is still at CERN, but accessing iELSSI at TRIUMF.
2.5. DISTRIBUTION OF DATA AND SERVICES AND RESULTING CHALLENGE 25 It can be seen that, although the database time at TRIUMF seems to be higher than at CERN and RAL, the total time for the combination of the iELSSI at TRIUMF and the TAG Database at TRIUMF has the lowest total time. In summary, the values show important differences in the total response times, depending on the invoked deployments. These differences can be explained by different database server performances, different network latencies due to geographical distances, the place of the user, and time-dependent factors such as the current load on the invoked resources. A random or rank-based selection of deployments can thus not ensure an appropriate system response, making a more sophisticated deployment selection mechanism a necessity.
Collection = data10 7TeV physics Egamma r1774 p327 p333 p372 READ (2.3)
AND RunNumber = 166094 AND NLooseElectron > 1 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 Ti m e in se co n d s
User@CERN, iELSSI@CERN
CERN DB time CERN total time RAL DB time RAL total time TRIUMF DB time TRIUMF total timedistinct runs 0 10 20 30 40 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 Ti m e in se co n d s
User@CERN, iELSSI@TRIUMF
CERN DB time CERN total time RAL DB time RAL total time TRIUMF DB time TRIUMF total timedistinct runs
(a) User at CERN, iELSSI at CERN
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 Ti m e in se co n d s
User@CERN, iELSSI@CERN
CERN DB time CERN total time RAL DB time RAL total time TRIUMF DB time TRIUMF total timedistinct runs 0 10 20 30 40 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 Ti m e in se co n d s
User@CERN, iELSSI@TRIUMF
CERN DB time CERN total time RAL DB time RAL total time TRIUMF DB time TRIUMF total timedistinct runs
(b) User at CERN, iELSSI at TRIUMF (Canada)
Before addressing the service selection challenge, several aspects have to be studied and taken into account:
• What are the components building up the TAG system? How are they mutually related? • What system QoS attributes can be gathered, and which ones are important to be considered
in the service selection process?
• What are the possible workflows and which patterns do they implement? Based on these patterns, how are the QoS attributes aggregated?
Chapter 4 addresses those questions, both in a generic way and tailored to the TAG system. Service selection, and especially web service selection, is a widely studied research topic, and several approaches have been proposed, as will be discussed in Chapter 3. However, all system characteristics, requirements and optimization goals have to be studied in detail in order to find an appropriate solution approach. In particular, it is argued that in an experiment like ATLAS that is scheduled over several decades, use cases and optimization goals of an event selection system can change unexpectedly, and the service selection framework has to be able to adapt to these changes. In this thesis, a detailed study of the service selection problem is carried out, motivated by the challenges arising in the TAG system. The proposed solution approach is however generic enough to be applied to other systems.
Chapter 3
Background and Related Work:
Service Selection in Heterogeneous
Environments
Optimizing the selection of things conforming to defined objective functions and satisfying given constraints is a common and broadly studied problem in many applied research areas such as op- erations research, production planning, logistics, and many more. In computer science, it arises for example in the context of scheduling in distributed systems, database query optimization, and routing in networks. Through all of these examples, while the underlying problem is similar, the things – i.e., the considered entities to be selected –, the context, the objectives and the constraints differ. Additionally, depending on the concrete definition and context, the problems can map to slightly different mathematical models.
QoS-aware service selection is another example of such problems and has recently gained deep interest with the emergence of distributed, service-oriented systems, in which multiple service in- stances, differing only in their QoS properties, can be used for a given task in a service chain or workflow. This chapter defines the basic concepts and investigates related work, in order to pre- cisely set the problem context and present a state of the art literature survey. While this chapter sets the basics, related work on more specific concepts used in the course of this thesis is presented in respective dedicated chapters.
This chapter is organized as follows. Section 3.1 defines the basic concepts used throughout the thesis. Section 3.2 defines the problem context and setting by comparing it to related prob- lems. In Section 3.3 models of service-oriented systems and frequently-considered QoS attributes are discussed. Section 3.4 surveys the most common approaches to solve QoS-aware service selection problems. Finally, Section 3.5 summarizes research efforts and results in the area of multi-objective service selection optimization.
3.1
Basic Concepts
Terms like Web Service and Service-Oriented Architecture (SOA) are widespread in the area of computer science and beyond, but several – sometimes conflicting – definitions exist. It is thus important to set working definitions in the context of this thesis, in order to precisely state how the terms are used. To this end, this section provides definitions of basic concepts relevant to the understanding of this thesis.
Web Service. Although in this thesis we are referring to services in general, not only to Web services in the strict sense, the term Web service is often used, especially in the context of related work on Web service selection. The following definition from the World Wide Web Consortium (W3C) is adopted: “A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.” [101]. Service (extended service definition). In this work, “services” do not strictly refer to Web
services implementing particular standards, but to a software component providing a func- tionality, independent of its internal realization and interfaces. W3C provides the following general definition of a service: “A service is an abstract resource that represents a capability of performing tasks that form a coherent functionality from the point of view of providers entities and requesters entities. To be used, a service must be realized by a concrete provider agent.” [101]. A service can thus be a Web service, a Grid service, or an independent piece of software that can be invoked over a network. For instance, iELSSI as defined in 2.3.4 is considered as a service.
Service-Oriented Architecture (SOA). A software system composed of several components – or services – that can act independently or together to provide a defined overall functionality, is referred to as being service-oriented. In a SOA, the functionality is thus packaged in services, making the system flexible and reusability of software components easy. W3C provides the following definition of a SOA: “A set of components which can be invoked, and whose interface descriptions can be published and discovered.” [101].
Service Selection. In case several concrete services (deployments) exist for a given functionality, one has to be chosen out of all possible or feasible ones. This requires the ability to determine what the possible ones are, which is usually done by querying a service registry. The process of querying a registry and choosing one concrete service (deployment) for a given task according to some defined criteria is referred to as service selection.
Service Composition. If several services need to be selected to respond to a given request, they
need to be composed in order to operate together. This process involves defining inputs
and outputs as well as exchange formats, i.e., the interoperability between the services has to be defined and instantiated. Two terms are recurrent in the area of service composition: orchestration and choreography, as defined below. The Business Process Execution Language for Web Services (BPEL4WS) is an example of a Web service composition specification.
3.2. OVERVIEW AND COMPARISON OF RELATED RESEARCH AREAS 29