How to Speed Up Apache Xalan’s XPath Processor by Factor 10x

There has been a bit of an awkward bug in Apache Xalan for a while now, and that bug is XALANJ-2540. The effect of this bug is that an internal SPI configuration file is loaded by Xalan thousands of times per XPath expression evaluation, which can be measured easily as such:

this:

Element e = (Element)
  document.getElementsByTagName("SomeElementName")
          .item(0);
String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();

// Negligible
XPath xpath = factory.newXPath();

// Negligible
XPathExpression expression =
  xpath.compile("//SomeElementName");

// Accounts for 70%
String result = (String) expression
  .evaluate(document, XPathConstants.STRING);

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file’s configuration file:

META-INF/service/org.apache.xml.dtm.DTMManager

Every time!:

A profiling session on Xalan

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from c.s.o.a.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(
       String factoryId,
       String propertiesFilename,
       String fallbackClassName) {
  SecuritySupport ss = SecuritySupport
    .getInstance();

  try {
    String systemProp = ss
      .getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

References

The above text is an extract from a Stack Overflow question and answer that I’ve contributed to the public a while ago. I’m posting it again, here on my blog, such that the community’s awareness for this rather heavy bug can be raised. Feel free to upvote on this ticket here, as every Sun/Oracle JDK on this planet is affected:

https://issues.apache.org/jira/browse/XALANJ-2540

Contributing a fix to Apache would be even better, of course…

CSS selectors in Java

CSS selectors are a nice and intuitive alternative to XPath for DOM navigation. While XPath is more complete and has more functionality, CSS selectors were tailored for HTML DOM, where the document content is usually less structured than in XML.

Here are some examples of CSS selector and equivalent XPath expressions:

CSS:   document > library > books > book
XPath: //document/library/books/book

CSS:   document book
XPath: //document//book

CSS:   document book#id3
XPath: //document//book[@id='3']

CSS:   document book[title='CSS for dummies']
XPath: //document//book[@title='CSS for dummies']

 

This becomes more interesting when implementing pseudo-selectors in XPath:

CSS:   book:first-child
XPath: //book[not(preceding-sibling::*)]

CSS:   book:empty
XPath: //book[not(*|@*|node())]

 

A very nice library that allows for parsing selector expressions according to the w3c specification is this “css-selectors” by Christer Sandberg:

https://github.com/chrsan/css-selectors

The next version of jOOX will include css-selector’s parser for simpler DOM navigation. The following two expressions will hold the same result:

Match match1 = $(document).find("book:empty");
Match match2 = $(document).xpath("//book[not(*|@*|node())]");

jOOX answers many Stack Overflow questions

jOOX - a jQuery port to Java When you search for Stack Overflow questions regarding XML, DOM, XPath, JAXB, etc, you could very often answer them simply with an example involving jOOX. Take this question extract for example:

Goal

My goal is to achieve following from this ex xml file :

<root>
    <elemA>one</elemA>
    <elemA attribute1='first' attribute2='second'>two</elemA>
    <elemB>three</elemB>
    <elemA>four</elemA>
    <elemC>
        <elemB>five</elemB>
    </elemC>
</root>

to produce the following :

//root[1]/elemA[1]='one'
//root[1]/elemA[2]='two'
//root[1]/elemA[2][@attribute1='first']
//root[1]/elemA[2][@attribute2='second']
//root[1]/elemB[1]='three'
//root[1]/elemA[3]='four'
//root[1]/elemC[1]/elemB[1]='five'

jOOX answer

The above can be achieved quite simply with jOOX (compared to the other answers to the question involving XSLT, SAX, DOM, etc):

List<String> list = $(document).xpath("//*[not(*)]").map(new Mapper<String>() {
  public String map(Context context) {
    return $(context).xpath() + "='" + $(context).text() + "'";
  }
}});

This will produce

/root[1]/elemA[1]='one'
/root[1]/elemA[2]='two'
/root[1]/elemB[1]='three'
/root[1]/elemA[3]='four'
/root[1]/elemC[1]/elemB[1]='five'

It is an “almost” solution to the OP’s problem, jOOX does not (yet) support matching/mapping attributes. Hence, attributes will not produce any output. This will be implemented in the near future, though.

See the full answer and question here:

http://stackoverflow.com/questions/4746299/generate-get-xpath-from-xml-node-java/8943144#8943144