CAPÍTULO VI. INSTRUMENTOS DE BUEN GOBIERNO Y TRANSPARENCIA Artículo 17.- Transparencia y acceso a la información
LIBROS DE TEXTO DE LA COMUNIDAD DE MADRID ENMIENDA NÚM. 1
As stated, XML species a set of rules that make up the grammar of an XML document. Besides the possible components, it determines for instance, where elements may be placed, which names are allowed, how attributes are included, and so on. Documents that fulll the grammar are said to be well-formed. There are many rules, but some of the most important ones that a well-formed XML document must satisfy are the following: i) it has an unique root element, ii) every start-tag has its matching end-tag, iii) elements can not overlap (i.e. an element can not be closed until all the elements it contains have been closed), iv) attribute values must be quoted, v) an element may not have two attributes with the same name, vi) markup characters `<' and `&' may not occur in the character data
of elements and attributes. Notice that the three rst rules induce a proper tree structure on an XML document. Figure 2.1 illustrates an example. Furthermore, the grammar sets the basis needed to create XML parsers, able to read any XML document.
<book>
<title>Three ways to capsize a boat</title> <year>2010</year> <author> <name>Chris Stewart</name> <country>UK</country> </author> <price>11.25</price> </book> XML Document book
title year author price
Three ways to capsize a boat
2010 name country
Chris Stewart UK
11.25
Figure 2.1: Tree view of a sample XML document.
There are, basically, two main APIs for XML. The Simple API for XML (SAX) [SAXa] is an event-based API. It sequentially scans an XML document and throws events that are further handled by the parser. Examples of events are, for instance, an occurrence of a start-tag or an end-tag, content characters, a processing instruction, a comment, etc. In contrast, the Document Object Model (DOM) [DOM], is another API that builds a tree representation of the entire document in memory, thus using much more memory than the former approach, but permitting to randomly access and manipulate the document.
In addition to being well-formed, an XML document may also be valid. Particular XML applications may need to ensure that a given XML document adheres to some guidelines (rules) imposed by the application itself. In that case, the allowed markups, as well as their composition are specied in a schema. Whenever an XML document matches the schema it is said to be valid. If not, we say that the XML document is invalid. Hence, the validity of a document depends on which schema is used to compare it with. Documents do not always need to be valid, for many applications it is enough that the document is well-formed. There are several XML schema languages, each one having dierent levels of expressiveness. The most widely supported XML schema language3 is the Document Type Denition
(DTD). A DTD denes the list of markups (e.g. elements, attributes, entities, etc.) that can be used in a document, and how they can be combined, together with basic content specications. For example:
<!ELEMENT library (book+)>
<!ELEMENT book (title, summary, chapter*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT summary (#PCDATA | keyword)*> <!ELEMENT chapter (#PCDATA)>
<!ATTLIST book ref CDATA #REQUIRED href CDATA #IMPLIED>
The rst element declaration of the DTD sample above states that each library element must contain one or more book child elements4. In turn, the second line
indicates that each book element must have exactly one title child element followed also by exactly one summary element, and zero or more chapter elements5. That
is, every book must contain a title and a summary, and may or may not have a chapter or multiple chapter elements. Nevertheless, the title must come before the summary, and this one must appear before all chapters.
Regarding title and chapter elements, lines 3 and 5 say that each occurrence of any of these elements may only contain parsed character data (referred with #PCDATA), that is, raw text, but not any child element. In case mixed content is allowed, then we use an element declaration similar to that shown in line 4. This states that a summary element may contain parsed character data as well as keyword children. It does not specify in which order they appear, nor how many instances of each occur. This declaration allows a summary to have 0 keyword children, 1 keyword children, or 26 keyword children.
In addition, the use of ATTLIST declarations are used to declare element attributes. For instance, if we consider lines 6 and 7 of the sample DTD we have been analyzing, they indicate that any book element must have a ref attribute (#REQUIRED). However, the href attribute is optional (#IMPLIED), and may be omitted from particular book elements. Both attributes are asserted to contain character data (i.e. any string of text)6.
Therefore, according to the DTD sample just seen, the following XML document would be valid:
<library>
<book ref="CHS001">
<title>Three ways to capsize a boat</title>
<summary>A charming and lyrical read, awash with the joy of discovery</summary>
<chapter>The proposal</chapter>
<chapter>When dreams come true</chapter> <chapter>Sailing to Greek Islands</chapter>
4The `+' after book stands for one or more. 5This time `*' after chapter denotes zero or more.
6CDATA is the most generic attribute type. Other attribute types are: NMTOKEN, NMTOKENS, Enumeration, ENTITY, ID, IDREF, etc.
... </book> </library>
However, it would not be the case of the next document, since the summary element comes before the title one, and also the book element does not have the mandatory attribute ref:
<library> <book>
<summary>A charming and lyrical read, awash with the joy of discovery</summary>
<title>Three ways to capsize a boat</title> <chapter>The proposal</chapter>
<chapter>When dreams come true</chapter> <chapter>Sailing to Greek Islands</chapter> ...
</book> </library>
Usually schemas are supplied in separated les from the documents they describe. Yet, DTDs are the only ones that can also be included inside the XML document. In both cases, the XML markup corresponding to the document type declaration is used. It is included in the prolog of the XML document, just after the XML declaration and before the root element, and it allows one to specify either a reference to an external DTD to which the document should be compared or even the DTD itself (between square brackets). For instance, let us assume that the previously discussed sample DTD is available at http://dtdsamples.com/library.dtd. Then, the document type declaration of an XML document conforming to this DTD looks like:
<?xml version=1.0 encoding=UTF-8 standalone=yes?> <?xml-stylesheet href=book.css type=text/css?>
<!DOCTYPE library SYSTEM http://dtdsamples.com/library.dtd>
<library> ... </library>
This document type declaration tells that the root element of the document is library and that the DTD for the document can be found at http://dtdsamples .com/library.dtd.
Nevertheless, DTDs may not always be enough, since they provide limited support for type denition of the contained data. That is, a DTD does not allow
one to specify, for instance, that an element contains a real number or a date range. Some other well-known and more powerful schema languages that permit these kind of constraints are the W3C XML Schema Language [XSD], RELAX NG [CM01] or Schematron [Sch].