3.4 DISTRIBUCIÓN: DESDE EL ORIGEN HASTA EL DESTINO
3.4.2 EMPAQUE
XPath, the XML Path Language [108], is a selection language aiming at addressing parts of an XML docu- ment. As it lacks capabilities for restructuring data items, it cannot be considered a true query language, it is rather a selection language. However, many other query languages are based on XPath, in particular the two most prominent query languages XSLT and XQuery, which are presented below.
Data Model: Ordered Tree
XPath models an XML document as an ordered tree. XPath differentiates between several kinds of nodes, including document nodes, element nodes, attribute nodes and text nodes. This document tree induces the so-called document order, which is obtained by traversing the document tree in a depth-first, left-to-right manner. XPath does not consider non-tree graph structures like semistructured expressions, and ID/IDREF are only supported by explicit dereferencing.
Navigation Steps
An XPath expression specifies a sequence of navigation or location steps (separated by and beginning with “/”) in this tree, similar to what a car navigation system might provide to locate a certain address. For example, to select the phone number ofMickey Mousein the address book used in Chapter 2, an XPath expression would specify to start at the document node, proceed to the element nodeaddress-book, from there move to each of the children, and for each child to the name to determine whether the name isMickey Mouse. In this case, it would select in the next step the child node with labelphone:
/child::address-book/child::person[
CHAPTER 3. WEB QUERY LANGUAGES
axis description
/ select the document root (which is considered the parent of the
document element)
ancestor proper ancestor of current node
ancestor-or-self current node or proper ancestor of current node
attribute attribute of current node
child immediate descendant (child) of current node
descendant proper descendant of current node
descendant-or-self current node or proper descendant of current node
following node following the current node in document oder
following-sibling node following the current node in document oder and at the same depth as the current node
preceding node preceding the current node in document oder
preceding-sibling node preceding the current node in document oder and at the same depth as the current node
namespace namespace node of the current node
parent immediate ancestor (parent) of current node
self current node
Table 3.1: Axis Specifications available in XPath
The result of such a selection is always a sequence of nodes. XPath does not differentiate between a single value and the sequence consisting only of that value. This has serious implications, for instance, the=
operator is not true equality but only “existential” equality, i.e. it tests whether the intersection of two sequences is non-empty.
Axis Specifications
The navigation steps in XPath expressions contain so-called axis specifications that specify the “direction” of the traversal in the document tree. In the example above, the only axis specifier used waschild. Other frequently used axis specifiers aredescendant, which selects not only immediate child nodes but also child nodes of child nodes and so forth, andfollowing-siblings selects all siblings that come after the currently selected node in document order. Axis specifications are separated from node tests by::. Table 3.1 summarises the axis specifications available in XPath.
An XPath expression beginning with a forward slash (i.e./) always specifies a traversal anchored at the root, and is thus called an absolute XPath expression. An XPath expression beginning with any other axis specifications is relative to the current context node.
Node Tests
Navigation steps consist of node tests that specify what kinds of nodes to select. XPath supports, among others, the following node tests:
name matches elements of typename
* matches every element
namespace:name matches elements of typenamefrom the given names- pace
namespace:* matches every element from the given namespace
comment() matches comment nodes
text() matches text nodes
node() matches every node
The most common form of node test is to specify the element name, as in the example above.
3.3. EXISTING WEB QUERY LANGUAGES
Predicates
Predicates express further conditions on node tests that go beyond the capabilities of simple matching. For example, they may be used to select every second element node, or allpersonelement nodes that contain a child node with labelfirstand further text childMickey, together with a child node with labellast
and further text childMouse. Predicates are enclosed in square brackets[ ]and follow the node test (or other predicates). Predicates may contain:
location path the predicate succeeds if the evaluation of the location path returns a non-empty sequence
exp OP exp compares two expressions, which may either be atomic values, location paths or function calls, with OP. The following comparison operators are supported:
• =tests whether the intersection of two sequences is non-empty • !=tests whether the intersection of two sequences is empty • >,>=,<and<=convert the two expressions to numbers and com-
pare them accordingly
predandpred connects two predicates withand
predorpred connects two predicates withor Abbreviated Syntax
Those axis specifications that are most frequently used (e.g.childanddescendant-or-self) can also be expressed using an abbreviated syntax, which closely resembles path specifications for directories and files in UNIX. The following table summarises the available abbreviations:
Expression Abbreviation child::name name /descendant-or-self::name //name self::node() . parent::node() .. attribute::name @name [position()=n] [n]
All other axes have no counterpart in the abbreviated syntax, but it is possible to mix abbreviated and non-abbreviated syntax as required.
In the abbreviated syntax, the selection of the phone number of “Mickey Mouse” is more conveniently expressed as:
/address-book/person[name[first = "Mickey" and last = "Mouse"]]/phone