CAPÍTULO 4. ANÁLISIS E INTERPRETACIÓN DE LOS DATOS
4.6 Estrategias de manejo frente a la presión social para ser madre
4.6.1 Confrontación: reacciones ante la presión detectada
As a final conceptual note before getting into the code, newbies to XML may be wondering why they can't just use SAX for dealing with XML. But sometimes using SAX is like taking a hammer to a scratch on a wall; it's just not the right tool for the job. I discuss a few issues with SAX that make it less than ideal in certain situations.
5.1.3.1 SAX is sequential
The sequential model that SAX provides does not allow for random access to an XML document. In other words, in SAX you get information about the XML document as the parser does, and lose that information when the parser does. When the second element in a document comes along, it cannot access information in the fourth element, because that fourth element hasn't been parsed yet. When the fourth element does comes along, it can't "look back" on that second element. Certainly, you have every right to save the information encountered as the process moves along; coding all these special cases can be very tricky, though. The other, more extreme option is to build an in-memory representation of the XML document. We will see in a moment that a DOM parser does exactly that, so performing the same task in SAX would be pointless, and probably slower and more difficult.
Moving laterally between elements is also difficult with the SAX model. The access provided in SAX is largely hierarchical, as well as sequential. You are going to reach leaf nodes of the first element, then move back up the tree, then down again to leaf nodes of the second element, and so on. At no point is there any clear indication of what "level" of the hierarchy you are at. Although this can be implemented with some clever counters, it is not what SAX is designed for. There is no concept of a sibling element, or of the next element at the same level, or of which elements are nested within which other elements.
The problem with this lack of information is that an XSLT processor (refer to Chapter 2) must be able to determine the siblings of an element, and more importantly, the children of an element. Consider the following code snippet in an XSL template:
<xsl:template match="parentElement"> <!-- Add content to the output tree -->
<xsl:apply-templates select="childElementOne|childElementTwo" /> </xsl:template>
Here, templates are applied via the xsl:apply-templates construct, but they are being applied to a specific node set that matches the given XPath expression. In this example, the template should be applied only to the elements childElementOne or
childElementTwo (separated by the XPath OR operator, the pipe). In addition, because a relative path is used, these must be direct children of the element parentElement.
Determining and locating these nodes with a SAX representation of an XML document would be extremely difficult. With an in-memory, hierarchical representation of the XML document, locating these nodes is trivial, a primary reason why the DOM approach is heavily used for input into XSLT processors.
5.1.3.3 Why use SAX at all?
All these discussions about the "shortcomings" of SAX may have you wondering why one would ever choose to use SAX at all. But these shortcomings are all in regard to a specific application of XML data, in this case processing it through XSL, or using random access for any other purpose. In fact, all of these "problems" with using SAX are the exact reason you would choose to use SAX.
Imagine parsing a table of contents represented in XML for an issue of National Geographic. This document could easily be 500 lines in length, more if there is a lot of content within the issue. Imagine an XML index for an O'Reilly book: hundreds of words, with page numbers, cross-references, and more. And these are all fairly small, concise applications of XML. As an XML document grows in size, so does the in-memory
representation when represented by a DOM tree. Imagine (yes, keep imagining) an XML document so large and with so many nestings that the representation of it using the DOM begins to affect the performance of your application. And now imagine that the same results could be obtained by parsing the input document sequentially using SAX, and would only require one-tenth, or one-hundredth, of your system's resources to accomplish the task. Just as in Java there are many ways to do the same job, there are many ways to obtain the data in an XML document. In some scenarios, SAX is easily the better choice for quick, less-intensive parsing and processing. In others, the DOM provides an easy-to-use, clean interface to data in a desirable format. You, the developer, must always analyze your application and its purpose to make the correct decision as to which method to use, or how to use both in concert. As always, the power to make good or bad decisions lies in your knowledge of the alternatives. Keeping that in mind, it's time to look at the DOM in action.
5.2 Serialization
One of the most common questions about using DOM is, "I have a DOM tree; how do I write it out to a file?" This question is asked so often because DOM Levels 1 and 2 do not provide a standard means of serialization for DOM trees. While this is a bit of a shortcoming of the
API, it provides a great example in using DOM (and as you'll see in the next chapter, DOM Level 3 seeks to correct this problem). In this section, to familiarize you with the DOM, I'm going to walk you through a class that takes a DOM tree as input, and serializes that tree to a supplied output.