CAPÍTULO 4. ANÁLISIS E INTERPRETACIÓN DE LOS DATOS
4.5 Presión social para ser madre Los discursos
First on the list is a class that comes in the basic SAX download from David Megginson's site, and should be included with any parser distribution supporting SAX 2.0. The class in question here is org.xml.sax.XMLFilter. This class extends the XMLReader interface, and adds two new methods to that class:
public void setParent(XMLReader parent); public XMLReader getParent( );
It might not seem like there is much to say here; what's the big deal, right? Well, by allowing a hierarchy of XMLReaders through this filtering mechanism, you can build up a processing chain, or pipeline, of events. To understand what I mean by a pipeline, here's the normal flow of a SAX parse:
Events in an XML document are passed to the SAX reader.
The SAX reader and registered handlers pass events and data to an application. What developers started realizing, though, is that it is simple to insert one or more additional links into this chain:
Events in an XML document are passed to the SAX reader.
The SAX reader performs some processing and passes information to another SAX reader. Repeat until all SAX processing is done.
Finally, the SAX reader and registered handlers pass events and data to an application. It's the middle steps that introduce a pipeline, where one reader that performed specific processing passes its information on to another reader, repeatedly, instead of having to lump all code into one reader. When this pipeline is set up with multiple readers, modular and efficient programming results. And that's what the XMLFilter class allows for:
chaining of XMLReader implementations through filtering. Enhancing this even further is the class org.xml.sax.helpers.XMLFilterImpl , which provides a helpful
implementation of XMLFilter. It is the convergence of an XMLFilter and the
DefaultHandler class I showed you in the last section; the XMLFilterImpl class implements XMLFilter, ContentHandler, ErrorHandler, EntityResolver, and
DTDHandler, providing pass-through versions of each method of each handler. In other words, it sets up a pipeline for all SAX events, allowing your code to override any methods that need to insert processing into the pipeline.
Let's use one of these filters. Example 4-5 is a working, ready-to-use filter. You're past the basics, so we will move through this rapidly.
Example 4-5. NamespaceFilter class
package javaxml2;
import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
public class NamespaceFilter extends XMLFilterImpl {
/** The old URI, to replace */ private String oldURI;
/** The new URI, to replace the old URI with */ private String newURI;
public NamespaceFilter(XMLReader reader, String oldURI, String newURI) { super(reader);
this.oldURI = oldURI; this.newURI = newURI;
}
public void startPrefixMapping(String prefix, String uri) throws SAXException {
// Change URI, if needed if (uri.equals(oldURI)) { super.startPrefixMapping(prefix, newURI); } else { super.startPrefixMapping(prefix, uri); } }
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
// Change URI, if needed if (uri.equals(oldURI)) {
super.startElement(newURI, localName, qName, attributes); } else {
super.startElement(uri, localName, qName, attributes); }
}
public void endElement(String uri, String localName, String qName) throws SAXException {
// Change URI, if needed if (uri.equals(oldURI)) {
super.endElement(newURI, localName, qName); } else {
super.endElement(uri, localName, qName); }
} }
I start out by extending XMLFilterImpl, so I don't have to worry about any events that I don't explicitly need to change; the XMLFilterImpl class takes care of them by passing on all events unchanged unless a method is overridden. I can get down to the business of what I want the filter to do; in this case, that's changing a namespace URI from one to another. If this task seems trivial, don't underestimate its usefulness. Many times in the last several years, the URI of a namespace for a specification (such as XML Schema or XSLT) has changed. Rather than having to hand-edit all of my XML documents or write code for XML that I receive, this NamespaceFilter takes care of the problem for me.
Passing an XMLReader instance to the constructor sets that reader as its parent, so the parent reader receives any events passed on from the filter (which is all events, by virtue of the XMLFilterImpl class, unless the NamespaceFilter class overrides that behavior). By supplying two URIs, the original and the URI to replace it with, you set this filter up. The three overridden methods handle any needed interchanging of that URI. Once you have a filter like this in place, you supply a reader to it, and then operate upon the filter, not the
reader. Going back to contents.xml and SAXTreeViewer, suppose that O'Reilly has informed me that my book's online URL is no longer http://www.oreilly.com/javaxml2, but
http://www.oreilly.com/catalog/javaxml2. Rather than editing all my XML samples and uploading them, I can just use the NamespaceFilter class:
public void buildTree(DefaultTreeModel treeModel,
throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); NamespaceFilter filter = new NamespaceFilter(reader, "http://www.oreilly.com/javaxml2", "http://www.oreilly.com/catalog/javaxml2"); ContentHandler jTreeContentHandler =
new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handler
filter.setContentHandler(jTreeContentHandler); // Register error handler
filter.setErrorHandler(jTreeErrorHandler);
// Register entity resolver
filter.setEntityResolver(new SimpleEntityResolver( )); // Parse
InputSource inputSource = new InputSource(xmlURI); filter.parse(inputSource); }
Notice, as I said, that all operation occurs upon the filter, not the reader instance. With this filtering in place, you can compile both source files (NamespaceFilter.java and
SAXTreeViewer.java), and run the viewer on the contents.xml file. You'll see that the O'Reilly namespace URI for my book is changed in every occurrence, shown in Figure 4- 2.
Of course, you can chain these filters together as well, and use them as standard libraries. When I'm dealing with older XML documents, I often create several of these with old XSL and XML Schema URIs and put them in place so I don't have to worry about incorrect URIs:
XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); NamespaceFilter xslFilter = new NamespaceFilter(reader, "http://www.w3.org/TR/XSL", "http://www.w3.org/1999/XSL/Transform"); NamespaceFilter xsdFilter = new NamespaceFilter(xslFilter, "http://www.w3.org/TR/XMLSchema", "http://www.w3.org/2001/XMLSchema");
Here, I'm building a longer pipeline to ensure that no old namespace URIs sneak by and cause my applications any trouble. Be careful not to build too long a pipeline; each new link in the chain adds some processing time. All the same, this is a great way to build reusable components for SAX.