• No se han encontrado resultados

2.4 ANÁLISIS MACROECONÓMICO

2.4.3 ESTADÍSTICAS Y ACTIVIDADES ECONÓMICAS

2.4.3.5 Transportación existente

2.4.3.7.3 Salarios y beneficios

Graphs

The language Xcerpt presented in this thesis uses pattern matching to select data items in a semistructured database (or XML document). A pattern can be considered as an example of the data in the database, albeit one that usually is augmented by variables and omits much of the structure that is irrelevant for the selection. A pattern thus has to be similar to the queried data.

Pattern matching in Xcerpt (and UnQL, for that matter) is based on a similarity relation between the graphs induced by two semistructured expressions, which is called graph simulation [56, 77]. Graph simulation is a relation very similar to graph homomorphisms, but more general in the sense that it allows to match two nodes in one graph with a single node in the other graph and vice versa.

2.6. ROOTED GRAPH SIMULATION – A SIMILARITY RELATION FOR ROOTED GRAPHS "Mickey" "Mouse" "50773" "Donald" "Duck" address−book{...} person{...} person{...}

name[...] phone[...] knows[...] name[...] knows[...]

first[...] last[...] first[...] last[...]

1 1 1 1 2 1 2 1 1 1 1 1 1 3 2 1 2 2

Figure 2.7: Graph Induced by the semistructured expression of Example 2.15.

E B A G F E D A D B C B A G F D B A B D E B

Figure 2.8: Rooted Graph Simulations (with respect to vertex adornment equality)

The following definition is inspired from [56, 77] and refines the simulation considered in [24]. Recall that a (directed) rooted graph G= (V,E,r)consists in a set V of vertices, a set E of edges (i.e. ordered pairs of vertices), and a vertex r called the root of G such that G contains a path from r to each vertex of G. Note that the initial definition of a rooted graph simulation does not take into account the edge labels of graphs induced by a semistructured expression, it is defined on generic, node labelled and rooted graphs. Note furthermore, that in general, there might be more than one simulation between two graphs, which leads to the notion of minimal simulations also defined below.

Definition 2.2 (Rooted Graph Simulation)

Let G1= (V1,E1,r1)and G2= (V2,E2,r2)be two rooted graphs and let∼ ⊆VV2be an order or equiv-

alence relation. A relationS⊆VV2is a rooted simulation of G1in G2with respect to∼if:

1. r1Sr2.

2. If v1Sv2, then v1∼v2.

3. If v1Sv2and(v1,v01,i)∈E1, then there exists v02∈V2such that v01Sv02and(v2,v02,j)∈E2

A rooted simulationSof G1in G2with respect to∼is minimal if there are no rooted simulationsS0of G1

in G2with respect to∼such thatS0(S(and S6=S0).

Definition 2.2 does not preclude that two distinct vertices v1and v01of G1are simulated by the same

vertex v2of G2, i.e. v1S v2 and v01 Sv2. Figure 2.8 gives examples of simulations with respect to the

CHAPTER 2. DATA REPRESENTATION ON THE WEB

The existance of a simulation relation between two graphs (without variables) can be computed effi- ciently: results presented in [67] give rise to the assumption that such problems can generally be solved in polynomial time and space. However, computation of pattern matching usually requires to compute not only one, but all minimal simulations between two graphs, in which case the complexity increases with the size of the “answer”.

Interestingly, graph simulation can also be used for schema validation (cf. e.g. [4]). In this case, a schema is considered as a graph in which all instances have to simulate. This suggests that schema valida- tion and querying are closely related: schema validation can be considered as querying the schema with a semistructured expression. If the query succeeds, the expression is an instance of the schema (i.e. valid). If the query fails, the expression is no instance of the schema (i.e. invalid).

CHAPTER

THREE

Web Query Languages

As we have seen in Chapter 2, XML is increasingly used not only as a format for representing text docu- ments, but also as a format for representing semistructured databases and for exchange of data on the Web. As such, it becomes more and more important to be able to query XML data. Obviously, query languages for XML need to respect the peculiarities of the data and thus differ from traditional query languages. Likewise, a query language for the Web needs not only to be capable of querying XML data, it also needs to be able to perform network operations, and — following the Reasoning Capabilities design principle of Section 1.3.8 — support reasoning mechanisms for the Semantic Web.

This Chapter first argues why Web query languages need to provide a higher expressive power than traditional database query languages (Section 3.1). It then continues with an overview of desirable charac- teristics of Web query languages following [73] (Section 3.2). Finally, existing Web query languages are summarised (Section 3.3), with a focus on the predominant languages XPath, XSLT and XQuery.

3.1

Database vs. Web Query Languages

Traditionally, access to a database management system is realised using a query language (the so-called

data manipulation language) embedded in a so-called host language (which can be any programming

language available on a system, e.g. Java or C). In this setup, the query language only has limited expressive power, whereas more complex computations are performed in the host language [100]. For example, in relational database systems, query languages are usually relationally complete (i.e. they support all of the operations of the relational algebra, like projection, selection and joins), but exclude recursion and thus do not provide the same expressive power as general purpose programming languages.

Example 3.1

The original versions of SQL, e.g. did not allow to compute the transitive closure of a relation (this func- tionality has later been added to SQL’99 [6], but is not part of the core standard). Consider e.g. a binary relationunclethat relates nephews with their (immediate) uncles:

uncle nephew uncle

Donald Duck Scrooge Duck

Huey Duck Donald Duck

Dewey Duck Donald Duck

Louie Duck Donald Duck

Note that the (transitive)uncle relationship between e.g. Dewey Duck and Scrooge is not directly represented in the table. Query languages like SQL (earlier than SQL3) are in general (i.e. if the number of transitive steps is not known in advance) not capable of retrieving this information. In contrast, more expressive languages like Datalog are capable of doing this by using recursion. The following recursive Datalog query describes this transitive closure:

Documento similar