Objetivos Específicos - MARCO METODOLÓGICO

2. MARCO METODOLÓGICO

2.2 OBJETIVO

2.2.2 Objetivos Específicos

As mentioned previously, Polar* follows the two-step optimisation paradigm, which is popular for both parallel and distributed database systems [Kos00]. For the single-node optimisation, large parts of the optimiser of lambda-DB [FSRM00] have been incorpo- rated. The internal representation of plans in lambda-DB, and consequently in Polar*, is based upon the monoid calculus and the associated algebra [FM00] of Fegaras and Maier. As such, the OQL queries, are parsed and transformed into a monoid calculus expression. Subsequently, the plan in this intermediate format is typechecked and then is ready for the actual plan construction.

2.4.3.1 Construction of the single-node plan

The single-node plan is constructed by applying logical and physical optimisation to the output of the parser.

The logical optimiser operates as follows.

• It normalises the plan in monoid calculus, which involves (i) query unnesting,

(ii) fusion of multiple selection operators into a single one, and (iii) application of DeMorgan’s laws to the predicates.

• It maps the monoid calculus into the logical algebra of [FM00]. Figure 2.7(a) de-

picts a plan for the example query in Figure 2.3 expressed in the logical algebra. The logical algebra contains object-relational operators such as scan, unnest, and project without specifying their actual implementation, when more than one ex- ists (e.g., hash join and nested loop are both possible implementations of the join logical operator).

• It creates multiple equivalent logical plans and chooses the one that results in the

production of less intermediate data [Feg98]. Changing the order of the operators and the shape of the query results in plans that produce different numbers of intermediate results.

• It pushes, or inserts, projections as close to the scans as possible.

The overall aim of the optimisation is to create a plan with as low a query response time as possible. The main heuristic employed (i.e., to reduce the volume of temporary data) performs well to this end [Feg98], incurring minimal overhead, as it is based on a greedy bottom-up algorithm.

The physical optimiser transforms the optimised logical expressions into physical plans by selecting algorithms that implement each of the operations in the logical plan (Figure 2.7(b)). For example, in the presence of data ordering, the optimiser prefers sort-merge joins to hash joins. The mapping policies are defined in rules, and trans- formed into code by the OPTGEN optimiser generator [Feg97].

The number of physical plans that are produced is configurable; the higher this number is, the more options for the multi-node optimiser exist, but also, the higher the query compilation cost is.

hash_join table_scan table_scan ! ! " " " " # # $ $ $ $ % % % % & & & & ' ' ( ( ( ( ) ) ) ) * * + + , , - - . . / / (a) project operation_call project join scan project (blast(I.protein)) (blast) (C.cproteinid=I.iproteinid) (C.cproteinid) scan (Classifications) (contologyid−catA) (Interactions) (I.iproteinid, I.protein) (b) project (blast) (blast(I.protein)) operation_call (C.cproteinid=I.iproteinid) project (C.cproteinid) (Classifications) (contologyid−catA) (Interactions) (I.iproteinid, I.protein) project (iproportion>0.5) (iproportion>0.5) exchange hash_join exchange exchange table_scan table_scan 4,5 operation_call (c) 2,3 6 3,6 project (blast) (blast(I.protein)) (C.cproteinid=I.iproteinid) (C.cproteinid) project (Classifications) (contologyid−catA) (Interactions) (iproportion>0.5) project (I.iproteinid, I.protein)

Figure 2.7: Example query: (a) single-node logical plan, (b) single-node physical plan, (c) multi-node physical plan.

2.4.3.2 Construction of the multi-node plan

The multi-node optimiser comprises two components: the partitioner, which splits the plan into fragments, thereby defining the points where communication over the network takes place, and the scheduler, which allocates a set of machines to each such fragment.

Plan partitioning The first step to transform a single-node plan into a multi-node one is by inserting parallelisation operators into the query plan, i.e., Polar* follows the operator model of parallelisation [Gra90], by using the exchange operator. Ex- changes comprise of two parts that can run independently: exchange producers and exchange consumers (Ex-Prod and Ex-Cons in Figure 2.8, respectively). The producers have an outgoing buffer for each consumer, in which they add the tuples they collect from the operators lower in the query plan. For each exchange operator, a

Ex−Cons Ex−Cons Ex−Prod Ex−Prod data flow thread boundaries Machine A Machine B

Figure 2.8: The exchange operators

data distribution policy needs to defined, in order to identify the consumer for each tuple. Currently, the policies Polar* supports include round robin, hash distribution and range partitioning. The last two policies provide support for non-uniform data distribution among instances of the same physical operator, which is desirable for het- erogeneous environments.

The partitioner firstly identifies whether an operator requires its input data to be partitioned by a specific attribute when executed on multiple processors (for example, so that the potentially matching tuples from the operands of a join can be compared). These operators are called attribute sensitive operators [Has96]. [SPWS99] presents the classification of the parallel operators as attribute sensitive or attribute insensitive. Secondly, the partitioner checks whether data repartitioning is required, i.e., whether data needs to be exchanged among the processors, for example for joining or for sub- mitting to an operation call on a specific machine. There are two such cases: (i) when the children of an attribute sensitive operator in a query plan are partitioned by an attribute other than its partitioning attribute, or there is no data partitioning defined for them, then data repartitioning needs to take place; and (ii), when not all the candidate nodes for evaluating parts of the query can evaluate a physical operator, which may be case for operation calls. The physical algebra extended with exchange constitutes the parallel algebra used by Polar*. The exchanges are placed immediately below the operators that require the data to be repartitioned. A multi-node query plan is shown in Figure 2.7(c), where the exchanges partition the initial plan into many subplans (delimited by dashed lines in the figure).

In the example query, there is one attribute sensitive operator, the hash join, which needs to receive tuples partitioned by the proteinID attribute. Since the children of the hash join are not partitioned in this way, two exchanges are inserted at the top of each subplan rooted by the join. Operation call is attribute insensitive, but since it can be called only from specific nodes, an exchange is inserted as its child. The optimiser checks if the exchanges transmit redundant data. A project operator ensures that only data that are required by operators higher in the query plan are transferred through the network. If there are no such projects placed by the single-node optimiser, the parallel optimiser inserts them.

Plan scheduling The final phase of query optimisation is to allocate machine re- sources to each of the subplans derived by the partitioner, a task carried out by the scheduler in Figure 2.6 using an algorithm based on that of Rahm and Marek [RM95]. This algorithm is principally for parallel databases, which consist of homogeneous nodes. As such, it can operate well in a limited range of cases in grid querying.

To run the example query, suppose that six machines are available, and that three of the machines host databases (numbers 2 and 3 for the Classification database table, and 6 for the Interactions). The table scan operators are placed on these machines in order to save communication cost. For the hash join, the scheduler tries to ensure that the relation used to construct the hash table can fit into main memory, for example, by allocating more nodes to the join until predicted memory requirements are satisfied or all available memory is exhausted [RM95]. In the example, nodes 3 and 6 are allocated to run the hash join. As some of the data is already on these nodes, this helps to reduce the total network traffic. The data dictionary records which nodes support BLAST, and thus the scheduler is able to place the operation call for BLAST on suitable nodes (4 and 5 in Figure 2.7(c)). The cost of a call to BLAST is much higher than the cost to send a tuple to another node over the network. Additionally, in any case, the whole set of the results of the hash join needs to be moved from the nodes 3 and 6 at some point, to be returned to the user, which means that some communication cost is inevitable. For these two reasons, the optimiser has chosen the maximal degree of parallelism for the BLAST operation. This choice incurs lower computation cost, without increasing the communication.

In document DECANATO DE POSGRADO TRABAJO DE TESIS PARA OBTAR POR EL TITULO DE MAESTRÍA EN GERENCIA DE LA COMUNICACIÓN CORPORATIVA (página 42-0)