7. Commissioning
7.1 Filling and bleeding of the storage cylinder circuit
Any AQP technique requires at least one of each different kind of framework compo- nents to cover all the adaptivity phases. Thus, even the simplest adaptation is based on collaboration of decoupled entities. To this end, the components support a pub- lish/subscribe interface [EFGK03] to provide and ask for services to and from other
1In the context of the adaptivity framework, the terms event and notification will be used interchange-
components, respectively. The behaviour of the framework components, i.e., their functionality and interactions, is as follows:
Monitoring: a monitoring component (MC) acts as a source of notifications on the dynamic behaviour of distributed resources and of the ongoing query execution. Other adaptivity components (including monitoring ones) interact with the MC in order to subscribe to it. The subscription procedure, as well as enabling the MC to compile a list of the modules that are interested in its notifications, specifies the mode of transmission (either push or pull, i.e., on request), and the kind of notifications it requires from the set of all the possible notifications that the MC is able to produce.
The MC interacts with other adaptivity components to deliver the notifications re- quested. Such notifications are in a standardised or commonly agreed form that hides how monitoring is carried out. Finally, the MC performs basic integration and filtering of events both to avoid flooding the system with low-level notifications, and to provide support for higher-level notification specification (e.g., by sending a notification only if the load of a machine and the amount of available memory have changed by more than 10%).
Assessment: The role of the assessment component (AC) is to establish whether there exist opportunities for improvement of plan performance (or any other QoS cri- teria), and whether there is a problem with the current execution that needs to be ad- dressed in order to activate the self-adaptive mechanisms. In either case, the AC sends a relevant notification to the appropriate response component. The AC performs its task by correlating and analysing notifications from multiple monitoring components. The notification analysis may involve the evaluation of event-condition-action rules, the rerun of (parts of) the query optimiser, the computation of rolling averages, the update of prediction models, the comparison against static estimates and more. An AC interacts with other services for two purposes: to subscribe to monitoring components, and to send notifications about problems and opportunities to response components. Response components interact with ACs to subscribe, and monitoring components in- teract with ACs to deliver notifications.
Response: the response component (RC) is responsible for: (i) identifying valid responses to the issues identified by the assessment component (e.g., by exploring a search space, or by using predefined lists for each identified issue); (ii) evaluating the expected benefit and cost for each valid response (e.g., by using cost functions); (iii) selecting the most efficient one, and (iv) interacting with the evaluation engine in order to enforce its decisions. The RC is able to subscribe to other components (e.g., ACs)
and to receive notifications.
4.3
Analysis of Adaptive Query Processing
To date, there is no mechanism or benchmark to compare AQP techniques systemat- ically. In fact, there are too many non-comparable features amongst them [GPFS02]. The framework introduced in the previous section provides a background for describ- ing each technique in terms of the way in which it ranges over the three adaptivity phases of monitoring, assessment and response. As these phases form a pipeline, the input and the output of each phase are of particular interest. This section provides a taxonomy of the different types of such output in existing AQP proposals, along with other, complementary aspects of the behaviour of the components.
4.3.1
Monitoring
4.3.1.1 Monitoring Events and Focus of Adaptivity
The monitoring component analyses raw monitoring information (or feedback, as these terms will be used interchangeably in the context of adaptivity monitoring) from the query engine and the resources available. It also produces notifications (monitoring events in Figure 4.1) processed by other adaptivity components, such as the assessment ones. According to the different type of focus of AQP systems, such notifications cover various interesting, “suspicious” observations that denote updated property values, but not necessarily problems. The updated values of these properties are the conditions the AQP attempts to adapt to. Thus there is a very close correlation between the focus of adaptivity and the type of monitoring events produced. The focus of AQP can fall into the following categories:
1. Dataset Volume, which includes
(a) Dataset Cardinality (i.e., the number of tuples in a dataset), and (b) Dataset Size (i.e., the size of tuples in a dataset).
2. Data Characteristics, which has to do with the properties of the initial, interme- diate and final datasets. It also includes
(b) Indices, and (c) Data Order.
3. Operator Cost, either in time units or in any other QoS metric. 4. Resources, which can be divided into
(a) Resource Pool for changes in the number of resources that are candidate for participating in the query evaluation,
(b) Resource Memory for up-to-date information on the amount of memory available, and
(c) Resource Performance and capabilities for statistics on i. Resource Processing Power, and/or
ii. Resources Connection Bandwidth.
5. User Input, such as priority ratings for different parts of the result, or for the rate of updates of partial results.
It is worth noting that these updates may be filtered in such a potentially config- urable way that non-interesting changes are not passed on to other components. For example, changes in the expected selectivity may be considered interesting if they dif- fer more than 10% from the previous expected value, or, modifications in the resource pool may be examined only if the new machines have connection speed and memory that exceeds a corresponding threshold. Additionally, they can be combined to pro- vide more meaningful information. For example, the selectivity of an operator can be calculated from the ratio of the output and input dataset cardinalities of this operator.
4.3.1.2 Raw Monitoring Information
The focus of adaptivity, whose categories were presented previously, defines, to a large extent, the nature of the feedback collected from the query execution and/or the exe- cution environment. Indeed, the relationship between the raw monitoring information and the type of monitoring events produced by the monitoring component is quite straightforward and intuitive in most of the cases. For example, for changes in dataset cardinalities and dataset sizes, monitoring information about the number of tuples con- sumed and produced by operators, and their sizes, respectively, is needed. Monitoring the number of available resources is the most basic information required for detecting
modifications in the resource pool. Changes in the resources’ memory can be identi- fied either by monitoring the memory available in that resource or the memory con- sumed by the operators running in it. To adapt to resource processing power, several approaches can be followed: e.g., to monitor either the CPU load of a machine explic- itly, or to infer the CPU load by monitoring the time cost of CPU-bound operators. Similarly, for the connection bandwidth changes, the core monitoring information can be either the bandwidth, or, in some scenarios, the time operators wait for data from remote sources.
4.3.1.3 Monitoring frequency
The monitoring frequency is a significant characteristic of any AQP technique, as the frequency of collecting feedback determines the maximum frequency of potential adaptations. This happens because it is the feedback collection that drives the com- plete adaptivity cycle, which consists of the three phases of monitoring, assessment and response. The different levels of frequency are as follows:
1. Inter-operator, which refers to the cases in which the adaptivity cycle can be triggered through acquisition of raw monitoring information only between the execution of the operators in the query plan; and
2. Intra-operator, which refers to the cases in which the adaptivity cycle can be triggered many times during the execution of the same operator.