PROPOSICIÓN NO DE LEY
5. ACTIVIDAD PARLAMENTARIA 1 COMPARECENCIAS
5.1.1 COMPARECENCIAS ANTE EL PLENO
The goal of this phase is to add optimizer support for the selection and project operators which make the query optimizer able to process the whole range of PJS-queries. The selection operator would require the planning step for the optimizer to be compatible with constraints and how these will change the data size exchanged by the sites. When dealing with the selection and project operators it is important to pay attention to their placement in the algebra tree. Both operators remove elements from the data set (selection horizontally and project vertically), and if some element is removed at the wrong point, it will have consequences for the rest of the execution.
3.3.1
Design
The first step towards realizing selection support in the query optimizer is to add constraint information to the RSID and the basic building blocks for the optimizer, the options. We now consider phase 1, described in Section 3.2, to an integral part of DASCOSA and will design the operators in such a way that they support semantic cache.
To determine if a cache RSID will be usable, it will be checked for contain- ment against the query RSID. By containment we mean that every element of the cache RSID must exist, or be contained, in the query RSID. In the previous phase this simply required the tables of the cache RSID to be a
5 8
value range
contained by A
contained by B
Figure 3.2: RSID containment.
subset of the tables in the query RSID, as given in Equation 3.1. This rule is still relevant, but it will have to be extended. We will now refer to this rule as table containment as given in Equation 3.2.
tableCon(RSIDA, RSIDB) ⇔ T ablesA ⊇ T ablesB (3.2) In addition to table containment, we have to consider selections and projec- tions. So we extend Equation 3.1 to become Equation 3.3.
RSIDA ⊇ RSIDB ⇔tableCon(RSIDA, RSIDB)
∧ projectionCon(RSIDA, RSIDB)
∧ selectionCon(RSIDA, RSIDB)
(3.3)
The projection containment is not so different from table containment, except it is the other way around. If a query is to use a cache entry, all the columns it requires have to be included for all the tables in the query. Extra columns in the cache entry is no problem. Then we get Equation 3.4.
projectionCon(RSIDA, RSIDB) ⇔ ∀p[(p ∈ AllColumns(RB)
∧ p ∈ πA) → p ∈ πB] (3.4) We define selection containment as that all selection constraints in the cache entry has to be part of the query or be contained by a constraint in the query and we get Equation 3.5.
selectionCon(RSIDA, RSIDB) ⇔ ∀yB∃yA[yB∈ (σB− σA)
∧ yA∈ σA∧ contained(yB, yA)] (3.5) An example of constraint containment can be seen in Figure 3.2, where the value range for B is contained in A. This is the value ranges for the query represented by Tree A and B in Figure 3.1. From Figure 3.2 it is clear that the constraint B can be applied either to the entire table or on the result of constraint A. In order to adhere with previous terminology we would like to say that constraint B contains constraint A. This is why we define the contained relation between two constraints by the tuples they exclude, as given by Equation 3.6.
constraintCon(x, y) ⇔ ∀txtyexcludedby(ty, y) → excludedby(tx, x) (3.6)
In other words the containee does not exclude any tuple not excluded by the container. In which case it would also be applicable for the given query as the container can be applied after the containee and the result would be exactly the same as if only the container had been executed.
Table 3.3 shows which logical operators which can be substituted in the containment technique used for matching queries against cached data. From the table we can read if a constraint is contained within another constraint, and if necessary, what data is not contained and must be fetched indepen- dently. A contained query must satisfy Equation 3.6 to qualify. The process of generating such remainder queries and evaluating if they are profitable is outside the scope of this report. Although we do not see any immediate problems with doing so, our time frame is limited and remainder queries are not necessary to do cache investment.
A = != < <= > >= 1 xy xy xy 2 xy xy xy 3 x y y xy x 4 y x y x xy 5 xy xy xy
Query logging History Analysis Publish candidate Cache creation
Queries are logged in history
Candidates are identified from history
Profitable candidates are added to cache index Evaluation benefit of candidate to historyThe optimizer evaluates the
Cache Investment
Figure 3.3: The process of cache investement.
3.3.2
Implementation
The implementation of selection support was done by adding a set of con- straints to the options used in query planning. With the new set of con- straints the planner was able to determine a reduction estimate for join op- erations. As explained in the design section above, the constraints on each option was made into independent constraint nodes when building the query tree from the query execution plan produced in the optimizer. These nodes was placed directly above the algebra node linked to the option with con- straint. Except for constraints on table scans which was performed by Derby before the data was loaded into DASCOSA.
Because of time constraints and low priority, the project operator was not a part of the solution at this point. This will be a part of future work on the DASCOSA optimizer.
After the best plan has been determined, it must be translated into an executable algebra tree. The plan will be traversed top-down. When an
Nation
σ
Region Norway Derby Dascosa Nationn_name = Norway Region Composite Option
Figure 3.4: A plan with constraints from the query optimizer is translated into an algebra tree.
option with constraints is encountered, two nodes will be created. First a constraint node with the given constraint is created, then the algebra node for the option itself is created as a child node of the constraint. This way, the constraint will be executed immediately over the algebra node, as seen in Figure 3.4.
Constraints on leaf options, or table access operators, will be handled as a special case. The constraint will in this case by applied directly on the table access in the database layer under DASCOSA. This is a highly efficient way of reducing the data set before it enters DASCOSA.