INTRODUCCIÓN - FACULTAD DE CIENCIAS EMPRESARIALES

Two essential problems that are involved in a multi-view based adaptation approach (as presented in [Bel04]) are: how to identify common sub-expressions between views and the extent of the change between the new and old views; and how to determine those view segments that must be materialised. The first issue is closely related to the query contain-ment problem, whereas, the latter one is considered as a selection problem over existing views. Although those problems have been well researched in the relational context, e.g., [FTU98, FTU99] for the containment problem and [CHS02, Bel04] for the selection prob-lem, different challenges are encountered for XML. In the remainder of this section, we will examine related research in containment checking for XML views, and in the follow-ing section, we will examine fragment selection.

Existing efforts for the containment problem are based on a subset of XPath expressions, mainly on two most important axes, child and descendant axes. Compared to the classical containment problem for relational conjunctive queries, the challenge for containment in XPath is that queries might involve recursion (e.g., queries may require navigation along the descendant-axis). The first attempt at containment checking for XPath queries was proposed by [MS02, MS04]. They proposed two techniques, canonical model and homomorphism,

both covered in this section.

The general concept of containment between two XPath expressions is that the evaluation result of the first expression over an XML tree is contained in the result of the second expression. Thus, the first expression is said to be contained by the second one. To verify the containment relationship, it is necessary to determine that for all trees, the evaluation result of the first expression is always contained by the second one. It has been shown in [MS02] that it is sufficient to find a counter example where the evaluation result of the first expression over an XML tree is not contained by the evaluation result of the second one.

However, as there may be an infinite set of trees [MS04], it is thus, necessary to reduce the search space. The canonical model approach reduces the search space of the containment checking from an infinite set of trees to a finite set. However, the search space is still very large which leads to an exponential-time algorithm for checking containment.

The homomorphism approach provides a much more efficient mechanism for containment checking. However, it is an incomplete algorithm, which means that the existence of the homomorphism is not a necessary criterion for containment, as it may return false negatives.

Besides the canonical model and homomorphism, there is also automata based technique [Nev02, NS03], which is based on tree automata. The idea of the automata approach is to find a set of all counter examples, where a containment relationship does not exist. If this is not a NULL set, it is represented by a tree automaton. At the start of containment checking, the process constructs an automaton for the first expression representing all trees from which the result of the expression is obtained and it then builds an automaton for the second expression representing all trees that no result is returned when evaluating the second expression over them. The containment relationship is verified by joining the two automata and checking whether the join returns an empty set. The process returns an empty set if containment relationship exists, otherwise, a non-empty set is returned. While this approach provides a complete containment checking algorithm, it requires exponential time for processing that is not feasible for practical applications.

As the consequence, the mainstream of the existing research tries to narrow the gap between canonical model and homomorphism approaches, that is to provide an approach which is more efficient than the canonical model approach and more complete than the homomor-phism approach.

To extend the homomorphism based approach, existing efforts focus on the containment problem in the presence of the DTD or XML Schema file, e.g., the chase technique [Woo01, AYCLS02, Woo03], where the containment relationship is checked against the constraints outlined by the DTD or XML Schema. However, although those approaches derive the advantage of the homomorphism approach, they also suffer from the disadvantage brought by the homomorphism. Another problem of those approaches is that recursion may be defined in a DTD or XML Schema file and exists in an XML tree, however, the depth of the recursion is unknown which makes containment difficult to be verified.

The conditioned and hidden conditioned approaches proposed by [FLZ07] also extend the homomorphism technique, the algorithms they provided are complete under the conjunc-tion of some condiconjunc-tions. The problem of the condiconjunc-tioned homomorphism is that there are special cases that still return false negatives. The hidden conditioned homomorphism cov-ers those special cases and provides a more complete algorithm. However, both approaches need to compute all potential conditions that might be required to satisfy the containment relationship. Due to the lack to metadata information, e.g., constraints covered by DTD and XML Schema files, some of the computed conditions are redundant.

The summary-based [ABMP07] approach provides containment checking under constraints outlined by a strong DataGuide [GW97] of tree-structured data. The benefit of a sum-mary based approach is that it provides more precision regarding the structure of the XML data and explores the exact depth of the recursion defined in the DTD or XML Schema files. However, the containment algorithm they provide is concerned only with the path constraints. The path constraint restricts the node to be processed must satisfy the root-to-node path defined in the constraint. A similar approach is also presented in [LWH10]

which determines the equivalence between two tree patterns in the presence of a DataGuide.

However, neither approach takes subtree constraints into consideration. The subtree con-straint restricts nodes to be processed need to have the exact subtree structure defined in the constraints. As a result, both approaches may lead to incorrect result of the containment checking.

Summary and Issues. Containment checking with constraints, reduces the search space by avoiding unnecessary checking. However, the approaches discussed here, are based

on XML schemas, DTD or a path-based data guide. Here, only root-to-node paths are considered and the potential subtree constraint is ignored. Both the search space required by the containment checking algorithm, and the performance should be further optimised by taking a more comprehensive set of constraints into account. For example, the subtree structure and ordering of nodes can significantly reduce the search space for containment checking.

In document FACULTAD DE CIENCIAS EMPRESARIALES (página 8-11)