Título del gráfico
LA UNIDAD EDUCATIVA EXPERIMENTAL TEODORO GÓMEZ DE LA TORRE
butes Influencing Filtering and Routing
Having introduced the general concepts of content-based pub-sub systems and the tasks arising, we now elaborate on the commonly accepted quality measures for pub-sub systems, and the parameters and attributes influencing them.Quality Measures For Filtering and Routing
We can identify two main quality measures, system efficiency and system scal- ability, that influence the suitability of content-based pub-sub systems in prac- tice. These general measures largely comply with current assumptions given, for example, in [CCC+01, CRW00, FJL+01]:
By system efficiency, we refer to the average time to process an event mes- sage by the overall system for a given problem size. We define the processing of an event message e in this context as the task of determining all subscrip- tions within the distributed system that are fulfilled by e. The problem size in this context refers to the number of registered subscriptions or advertisements.
This measure thus includes both the event filtering and the event routing task, but it excludes the event delivery task. Pub-sub systems aim at high system efficiency, that is, a small processing time per message.
By system scalability1, we refer to the behavior of system efficiency with
an increasing problem size. By “problem size” we again refer to the number of subscriptions or advertisements. Thus, this definition focuses on event filtering and routing, but not on event delivery. Our definition of scalability refers to the notions of space-time scalability, as described in [Bon00]. Pub-sub systems aim for sound scalability properties.
Parameters Influencing the Quality Measures
Using these definitions, the two named quality measures have an effect on, and are themselves influenced by, the solutions that are applied to the filtering and routing task in content-based pub-sub systems (see Section 2.1.3):
• Filtering algorithm
• Routing algorithm and routing optimization
These solutions influence two important parameters of pub-sub systems that, in turn, also affect the two quality measures:
• Memory usage • Network load
We give an overview of the direct dependencies among these quality measures, algorithms, and parameters in Figure 2.9. We describe these dependencies in detail later on.
Internal-Subscription-Model Attribute
An important attribute that, on the one hand, affects the algorithms in pub- sub systems is the internal model of subscription. On the other hand, the internal model of subscriptions influences what algorithms can applied in these systems. We illustrate this twofold effect in Figure 2.9 and elaborate on its occurrence later on (page 27).
1By providing our own definition of scalability in this dissertation, we hope to avoid confusion about this term arising from the lack of consensus as to its meaning [DRW06] and the absence of a generally accepted definition [Hil90].
To describe what we mean by the term “internal subscription model”, we have to start by analyzing the use of the notion of expressiveness in the cur- rent literature: the term “expressiveness” in the context of pub-sub systems, to our knowledge, has never been properly defined. There are some general explanations, but these fail to provide an acknowledged definition:
Carzaniga and colleagues [CRW00] define expressiveness as the ability of a pub-sub system to express subscriptions2. Eugster and colleagues [EFGH02]
state that the expressiveness of subscriptions defines how accurately subscrip- tions can represent the interests of subscribers3. Various other work, for exam-
ple, [AAGC04, BBC+04, CS04, CMPC03, EFGK03, LJ03, PCM03], identify
different levels of expressiveness in the distinction between topic-based and content-based pub-sub systems. However, content-based systems (even sup- porting range queries) can be mapped to topic-based ones, as shown in [TAJ03]. Hence, the term “expressiveness” in the context of pub-sub systems does not model the general notion of expressiveness describing what facts can be repre- sented by a language [MG85].
Li and colleagues [LHJ05] explicitly include the opportunities to combine predicates in subscriptions into their expressiveness definition. They state that in contrast to conjunctive approaches, by providing for arbitrarily complicated Boolean functions in subscriptions, an expressive subscription language can be naturally supported4. M¨uhl [M¨uh02] also takes this approach and states
that the restriction to conjunctions in current pub-sub systems reduces the expressiveness of these systems5. These descriptions again show the different
use of the term “expressiveness” in the pub-sub context.
To avoid this mismatch between the notion of expressiveness in the general literature and the various notions of expressiveness in the pub-sub area, we refer to the concept of the “expressiveness of a subscription language” (in terms of pub-sub) as internal subscription model in the following (we similarly use
2
“Expressiveness refers to the power of the data model that is offered to publishers and subscribers of notifications.” [CRW00]
3
“The expressiveness of subscriptions defines how accurately subscriptions can represent the interests of the subscribers. With different kinds of subscription languages, it is possible to achieve different ‘levels’ of expressiveness.” [EFGH02]
4“Siena and Jedi exploit covering-based routing. Unfortunately, they restrict the ex- pressiveness of content-based routing, and do not consider merging techniques. . . . Since BDDs can be used to represent arbitrarily complicated Boolean functions, an expressive subscription language can be naturally supported.” [LHJ05]
5“Siena and Rebeca restrict filters to be conjunctions of attribute filters. On one hand, this restriction reduces the expressiveness of the filter model, but on the other hand it enables routing optimizations like covering to be applied efficiently.” [M¨uh02]
the term internal advertisement model for advertisements). For content-based pub-sub systems, we can distinguish between subscriptions and advertisements as purely conjunctive filter expressions, and subscriptions and advertisements as general Boolean filter expressions.
Applicability Attribute
As we already outlined previously, the focus in this dissertation is on general- purpose pub-sub systems, as opposed to system solutions for a particular ap- plication setting. The filtering and routing solutions applied in such a sys- tem, therefore, need to constitute generic approaches to filtering and routing. Clearly, the suitability as a general-purpose solution is not contradicted if a filtering or routing approach effectively exploits certain application-specific at- tributes. As long as such attributes are only exploited by an algorithm, but their absence does not impair the functioning of the algorithm, that is, these attributes are no mandatory requirement, this algorithm classifies as a general- purpose solution.
However, if, for example, a filtering algorithm is entirely restricted to a cer- tain specific application, we do not consider this algorithm a general-purpose solution. A suitability as a general-purpose approach is also not given if, for example, in general settings6 the space or time efficiency properties of a filter-
ing algorithm degrade to those of a basic approach and contradict its original design goals. Furthermore, we do not consider solutions to be generally appli- cable if they merely represent a static system solution, for example, filtering algorithms that, due to their inherent structure, cannot efficiently register or deregister subscriptions.
We also refer to the attribute of the suitability of a solution as a general- purpose approach by the term applicability. In accordance with the current practice, we consider subscriptions as highly selective filter expressions. That is, usually only a small proportion of messages fulfills a registered subscription. Thus, the consideration of such an application scenario does not oppose the applicability attribute.
We included this applicability attribute into our overview of the dependen- cies among quality measures, algorithms, and parameters in Figure 2.9. The illustrated cross-influences are caused by the following observations.
Applicability (1) (2) (3) (6) (5) Routing algorithm/optimization Filtering algorithm
Quality measure Influencing parameter
Algorithm (11) (8) (9) (10) (4) (7)
Internal Subscription Model Efficiency
Scalability Memory usage
Network load
Figure 2.9: Overview of the cross-influences among quality measures, algo- rithms, and parameters. We named these influences to be able to reference them.
Dependencies Among Quality Measures, Algorithms, and Parame- ters
All recent content-based pub-sub systems apply main memory filtering algo- rithms to achieve a high system efficiency. This development has become fea- sible due to the employment of cheap, large main memories in computers. The result is an efficient event filtering in individual broker components. Evidently, the applied main memory filtering algorithm still plays an extremely relevant part regarding filter efficiency (Influence 4 in Figure 2.9). The efficiency of the overall distributed system, however, depends on the applied event routing algorithm and optimization as well, due to their influence on the network load (Influence 5 in conjunction with Influence 3 in Figure 2.9).
Although large main memories are a standard today, filtering algorithms should require as few memory resources as possible. It is the one influence on space scalability [Bon00] and a crucial influence on space-time scalability in individual broker components (Influence 1 in Figure 2.9). The less mem- ory the filtering algorithm demands per subscription and advertisement, the more subscriptions and advertisements are supported. Thus a filtering algo- rithm requiring less memory than another one achieving the same efficiency
is the preferred choice. Requiring too much memory resource, on the other hand, leads to frequent page swaps, degrading the achieved system efficiency by several orders of magnitude.
The overall scalability of a content-based pub-sub system additionally de- pends on the utilized event routing algorithm, partially determining the sizes of routing tables, the complexity of routing table entries (Influence 6 in Fig- ure 2.9 for both of them), and the number of routed messages (Influence 5 in Figure 2.9). These properties influence each other. For example, an increase in internally routed (and processed) event messages with a simultaneous de- crease in the complexity of routing entries might improve the overall system efficiency. The number of routed event messages, however, is the strongest in- fluence on system scalability if assuming limited network resources (Influence 2 in Figure 2.9). Next to the applied routing algorithm, the utilized routing op- timization strongly affects overall scalability (Influences 6 and 5 in conjunction with Influences 1 and 2, respectively, in Figure 2.9).
The internal subscription model affects the choice of a filtering algorithm and thus implies memory usage and filter efficiency. That is, the internal subscription model has an indirect effect on both scalability (Influences 10, 7, and 1 in Figure 2.9) and efficiency (Influences 10 and 4 in Figure 2.9), the two quality measures. For the other direction, the internal subscription model of a system, obviously, has to be supported by the applied routing algorithm and optimization (Influence 11 in Figure 2.9), and the filtering algorithm used (Influence 10 in Figure 2.9).
With respect to applicability, filtering algorithm, routing algorithm, and routing optimization have to fulfill this attribute (Influences 8 and 9 in Fig- ure 2.9). These algorithms directly influence the quality measures of the sys- tem, as illustrated in Figure 2.9 and stated in our definition of applicabil- ity. Considering the other direction, the applied algorithms either constitute general-purpose approaches or specialized solutions, that is, based on their internal functioning a general applicability is given or not given.
In the following section, we present the state of the art for solutions to one of the introduced tasks, the filtering algorithm. As can be seen in the figure, the applied filtering algorithm influences the two identified quality measures (system efficiency and scalability). Furthermore, the filtering algorithm needs to fulfill the general-purpose attribute and supports a particular internal sub- scription model.
In Section 2.4, we then present current event routing algorithms, followed by an analysis of existing routing optimizations in Section 2.5 (solutions to the other task). Routing algorithm and optimization indirectly influence both quality measures, as illustrated in Figure 2.9. Later on, we develop novel solutions (for the filtering and routing task) and show their effects on the identified parameters and quality measures.