Using SQL Server FOR XML and FOR JSON Syntax on Other RDBMS With jOOQ

SQL Server supports transforming flat tabular SQL result sets into hierarchical structures by convention using the convenient FOR XML or FOR JSON syntaxes. This is really convenient and less verbose than the standard SQL/XML or SQL/JSON APIs – although the standard ones are more powerful.

In this blog post, I’d like to show a few core features of the SQL Server syntax, and what they correspond to in standard SQL. jOOQ 3.14 will support both SQL Server’s syntax and the standard syntax, and will be able to translate from one to the other, such that you can use SQL Server syntax also on Db2, MariaDB, MySQL, Oracle, PostgreSQL. You can play around with the current state of development on our website here.

As always, using the Sakila database, here’s a simple example as a teaser:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
FOR XML RAW;

-- Db2, Oracle, PostgreSQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlattributes(
    t.first_name AS first_name,
    t.last_name AS last_name,
    t.title AS title
  )
))
FROM (
  -- Original query here
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
) AS t

Producing in both cases something like:

<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/>
<row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/>
<row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/>
<row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/>
<row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/>
<row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>

FOR XML and FOR JSON concepts

As could be seen in the above teaser, the SQL Server syntax is far less verbose and concise, and it seems to produce a reasonable default behaviour, where the Db2, Oracle, PostgreSQL (and SQL Standard) SQL/XML APIs are more verbose, but also more powerful. For example, it is possible to map column a to attribute x and column b to a nested XML element y very easily.

The advantages of both approaches are clear. SQL Server’s approach is much more usable in the general case. But what is the general case? Let’s summarise a few key parallels between SQL result sets, and XML/JSON data structures:


Tables are XML elements or JSON arrays

Tables (i.e. sets of data) are not a foreign concept to both XML and JSON documents. The most natural way to represent a set of data in XML is a set of elements using the same element name, optionally wrapped by a wrapper element. For example:

<!-- With wrapper element -->
<films>
  <film title="OKLAHOMA JUMANJI"/>
  <film title="RULES HUMAN"/>
  <film title="SPLASH GUMP"/>
</films>

<!-- Without wrapper element -->
<film title="OKLAHOMA JUMANJI"/>
<film title="RULES HUMAN"/>
<film title="SPLASH GUMP"/>

The distinction of whether a wrapper element is added is mostly significant when nesting data.

With JSON, the obvious choice of data structure to represent a table is an array. For example:

[
  {"title": "OKLAHOMA JUMANJI"},
  {"title": "RULES HUMAN"},
  {"title": "SPLASH GUMP"}
]


Rows are XML elements or JSON objects

As we’ve already seen above, a SQL row is represented in XML using an element.

<film title="OKLAHOMA JUMANJI"/>

The question is only what the element name should be. It can usually be any of:

  • A standard name, such as “row”
  • The name of the table the row stems from
  • A custom name

In JSON, it is an object.

{"title": "OKLAHOMA JUMANJI"}

Unlike in XML, there is no such thing as an element name, so the row is “anonymous”. The row type is defined by what table / array the JSON object is contained in.


Column values are XML elements or attributes, or JSON attributes

We have a bit more choices of how to represent SQL column values in XML. Mainly two choices:

  • Represent values as attributes
  • Represent values as elements

Scalar values can easily be represented as attributes. If a value needs further nesting (e.g. an array, user defined type, etc.), then elements are a better choice. In most cases, the choice is not relevant, so we can pick both:

<!-- Using attributes -->
<film film_id="635" title="OKLAHOMA JUMANJI"/>

<!-- Using elements from table and column names -->
<film>
  <film_id>635</film_id>
  <title>OKLAHOMA JUMANJI</title>
</film>

<!-- Using standard element names
<row>
  <value name="film_id" value="635"/>
  <value name="title" value="OKLAHOMA JUMANJI"/>
</row>

There are a few other reasonable default options for the representation of a column value in XML.

In JSON, on the other hand, there are two main reasonable approaches. In most cases, an object will be chosen, where column values are identified by column name. But just like SQL records are a mixture between “structs” and “tuples”, we could imagine a representation that maps column values to array indexes as well:

// Using objects
{"film_id": 635, "title": "OKLAHOMA JUMANJI"}

// Using arrays
[635, "OKLAHOMA JUMANJI"]


GROUP BY and ORDER BY can be seen as a way to nest data

So far, all data was represented in a flat way, just like the SQL table. There was some nesting when wrapping XML elements of JSON arrays in some wrapper element or object, or when representing XML data with more elements rather than attributes, but the data was still always tabular.

Very often, we want to consume data in a hierarchical form, though. An actor played in films, so we’d like to group the films by actor, rather than repeating the actor information for every film. In general, operations like GROUP BY or ORDER BY serve this purpose. GROUP BY allows for aggregating all data into nested data structures per group (e.g. into strings, arrays, XML elements, JSON arrays, JSON objects). ORDER BY does the same, “visually” – perhaps a bit less formally. When we look at this set of XML elements, we can see visually that they’re “grouped” (i.e. ordered) by actor:

<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/>
<row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/>
<row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/>
<row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/>
<row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/>
<row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>

SQL Server supports such grouping in at least two ways:

  • Implicitly by convention, using ORDER BY
  • Explicity by creating correlated subqueries

The implicit approach could transform the above flat representation into something like this:

<a first_name="PENELOPE" last_name="GUINESS">
    <f title="OKLAHOMA JUMANJI"/>
    <f title="RULES HUMAN"/>
    <f title="SPLASH GUMP"/>
    <f title="VERTIGO NORTHWEST"/>
    <f title="WESTWARD SEABISCUIT"/>
    <f title="WIZARD COLDBLOODED"/>
</a>
<a first_name="NICK" last_name="WAHLBERG">
    <f title="ADAPTATION HOLES"/>
    <f title="APACHE DIVINE"/>
</a>

… where “a” and “f” are the table names in the query (actor a and film f).

How do FOR XML and FOR JSON work in detail?

There are several features that can be combined in SQL Server. The complete picture can be seen from the docs. We’ll omit a few features in this blog post here.

  • The transformation algorithm RAW (flat results, only in XML), AUTO (hierarchical, automatic results), PATH (hierarchical, explicit results)
  • The “root” name, which corresponds to an XML wrapper element, or a JSON wrapper object
  • XML only: Whether values should be placed in ELEMENTS or attributes
  • JSON only: INCLUDE_NULL_VALUES specifies whether NULL values are explicit, or implicit (absent from the JSON object).
  • JSON only: WITHOUT_ARRAY_WRAPPER specifies whether the set of JSON objects should be listed as a JSON array, or a comma separated list of objects (which could be combined with other queries)

This is not complete, there are more flags and features, but instead of discussing them in theory, let’s look at a few examples:

FOR XML RAW

Producing flat results with attributes for values

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlattributes(
    t.first_name AS first_name,
    t.last_name AS last_name,
    t.title AS title
  )
))
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) AS t

This produces

<row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/>
<row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/>
<row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/>
<row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/>

FOR XML RAW, ROOT

Producing flat results with attributes for values, and a root element to wrap the listed elements

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW, ROOT('rows');

-- Standard SQL
SELECT xmlelement(
  NAME rows,
  xmlagg(xmlelement(
    NAME row,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name,
      t.title AS title
    )
  ))
)
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) AS t

This produces

<rows>
  <row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/>
  <row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/>
  <row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/>
  <row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/>
</rows>

FOR XML RAW, ELEMENTS

Producing flat results with elements for values.

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW, ELEMENTS;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlelement(
    NAME first_name,
    first_name
  ),
  xmlelement(
    NAME last_name,
    last_name
  ),
  xmlelement(
    NAME title,
    title
  )
))
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
  FOR XML RAW, ELEMENTS
) AS t

This produces

<row>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <title>SMILE EARRING</title>
</row>
<row>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <title>WARDROBE PHANTOM</title>
</row>
<row>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <title>ACADEMY DINOSAUR</title>
</row>
<row>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <title>ANACONDA CONFESSIONS</title>
</row>

This could also be combined with ROOT, which we’re omitting for brevity.

FOR XML/JSON AUTO

This approach derives results completely automatically from your query structure. Mainly:

  • The SELECT clause defines in what order XML or JSON data is nested.
  • The FROM clause defines the table names (via aliasing), which are translated to XML element or JSON object attribute names.
  • The ORDER BY clause produces the “grouping”, which is translated to nesting XML elements or JSON objects.
-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO;

-- Standard SQL
SELECT xmlagg(e)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlattributes(t.title AS title)
    ))
  ) AS e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) AS t
  GROUP BY
    first_name,
    last_name
) AS t

Notice how this emulation requires two steps of XMLAGG with GROUP BY. It gets more hairy with more tables being joined and projected! I won’t add more complex examples here, but try it online!

This produces

<a first_name="NICK" last_name="WAHLBERG">
    <f title="SMILE EARRING"/>
    <f title="WARDROBE PHANTOM"/>
</a>
<a first_name="PENELOPE" last_name="GUINESS">
    <f title="ACADEMY DINOSAUR"/>
    <f title="ANACONDA CONFESSIONS"/>
</a>

Let’s try the same thing again with JSON:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON AUTO;

-- Standard SQL
SELECT json_arrayagg(e)
FROM (
  SELECT JSON_OBJECT(
    KEY 'FIRST_NAME' VALUE first_name,
    KEY 'LAST_NAME' VALUE last_name,
    KEY 'F' VALUE JSON_ARRAYAGG(JSON_OBJECT(
      KEY 'TITLE' VALUE title
      ABSENT ON NULL
    ))
    ABSENT ON NULL
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

The result being:

[
    {
        "first_name": "NICK",
        "last_name": "WAHLBERG",
        "f": [
            {
                "title": "SMILE EARRING"
            },
            {
                "title": "WARDROBE PHANTOM"
            }
        ]
    },
    {
        "first_name": "PENELOPE",
        "last_name": "GUINESS",
        "f": [
            {
                "title": "ACADEMY DINOSAUR"
            },
            {
                "title": "ANACONDA CONFESSIONS"
            }
        ]
    }
]

FOR XML/JSON AUTO, ROOT

Like before, we could wrap this in a root XML element or a root JSON object if need be.

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO, ROOT;

-- Standard SQL
SELECT xmlelement(
  NAME join,
  xmlagg(e)
)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlattributes(t.title AS title)
    ))
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

This does the same thing as before, but just wraps the previous root XMLAGG() element in another XMLELEMENT() function call.

This produces

<root>
    <a first_name="NICK" last_name="WAHLBERG">
        <f title="SMILE EARRING"/>
        <f title="WARDROBE PHANTOM"/>
    </a>
    <a first_name="PENELOPE" last_name="GUINESS">
        <f title="ACADEMY DINOSAUR"/>
        <f title="ANACONDA CONFESSIONS"/>
    </a>
</root>

Let’s try the same thing again with JSON:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON AUTO, ROOT;

-- Standard SQL
SELECT JSON_OBJECT(KEY 'a' VALUE json_arrayagg(e))
FROM (
  SELECT JSON_OBJECT(
    KEY 'FIRST_NAME' VALUE first_name,
    KEY 'LAST_NAME' VALUE last_name,
    KEY 'F' VALUE JSON_ARRAY_AGG(JSON_OBJECT(
      KEY 'TITLE' VALUE title
      ABSENT ON NULL
    ))
    ABSENT ON NULL
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

The result being:

{
    "a": [
        {
            "first_name": "NICK",
            "last_name": "WAHLBERG",
            "f": [
                {
                    "title": "SMILE EARRING"
                },
                {
                    "title": "WARDROBE PHANTOM"
                }
            ]
        },
        {
            "first_name": "PENELOPE",
            "last_name": "GUINESS",
            "f": [
                {
                    "title": "ACADEMY DINOSAUR"
                },
                {
                    "title": "ANACONDA CONFESSIONS"
                }
            ]
        }
    ]
}

FOR XML AUTO, ELEMENTS

Like before, instead of producing attributes, we might decide to produce elements instead (in XML only):

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO, ELEMENTS;

-- Standard SQL
SELECT xmlagg(e)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlelement(
      NAME first_name,
      first_name
    ),
    xmlelement(
      NAME last_name,
      last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlelement(
        NAME title,
        title
      )
    ))
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

Not much has changed, except for the fact that a set of XMLELEMENT() calls are made, rather than XMLATTRIBUTES() callse.

This produces

<a>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <f>
        <title>SMILE EARRING</title>
    </f>
    <f>
        <title>WARDROBE PHANTOM</title>
    </f>
</a>
<a>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <f>
        <title>ACADEMY DINOSAUR</title>
    </f>
    <f>
        <title>ANACONDA CONFESSIONS</title>
    </f>
</a>

FOR XML/JSON PATH

The PATH strategy is my personal favourite. It is used to create nested XML or JSON path structures more explicitly, and also allows for additional nesting levels when grouping projections together. This is best shown by example. Notice, how I’m now using aliases for my columns, and the alias looks like an XPath expression using '/' (slashes):

-- SQL Server
SELECT 
  a.first_name AS [author/first_name], 
  a.last_name AS [author/last_name], 
  f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML PATH;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlelement(
    NAME author,
    xmlelement(
      NAME first_name,
      "author/first_name"
    ),
    xmlelement(
      NAME last_name,
      "author/last_name"
    )
  ),
  xmlelement(
    NAME title,
    title
  )
))
FROM (
  SELECT 
    a.first_name AS "author/first_name", 
    a.last_name AS "author/last_name", 
    f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) t

Check out how by convention, we’re now getting an additional level of nesting for author related columns under the row/author element:

<row>
    <author>
        <first_name>NICK</first_name>
        <last_name>WAHLBERG</last_name>
    </author>
    <title>SMILE EARRING</title>
</row>
<row>
    <author>
        <first_name>NICK</first_name>
        <last_name>WAHLBERG</last_name>
    </author>
    <title>WARDROBE PHANTOM</title>
</row>
<row>
    <author>
        <first_name>PENELOPE</first_name>
        <last_name>GUINESS</last_name>
    </author>
    <title>ACADEMY DINOSAUR</title>
</row>
<row>
    <author>
        <first_name>PENELOPE</first_name>
        <last_name>GUINESS</last_name>
    </author>
    <title>ANACONDA CONFESSIONS</title>
</row>

This is really neat! The SQL Server syntax is definitely much more convenient for this common use-case.

Let’s try the same thing again with JSON. The only thing we change is we now use a JSON-path-ish syntax using dots ('.') rather than slashes ('/'):

-- SQL Server
SELECT 
  a.first_name AS [author.first_name], 
  a.last_name AS [author.last_name], 
  f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON PATH;

-- Standard SQL
SELECT JSON_ARRAYAGG(JSON_OBJECT(
  KEY 'author' VALUE JSON_OBJECT(
    KEY 'first_name' VALUE author.first_name,
    KEY 'last_name' VALUE author.last_name
  ),
  KEY 'TITLE' VALUE title
  ABSENT ON NULL
))
FROM (
  SELECT 
    a.first_name AS "author.first_name", 
    a.last_name AS "author.last_name", 
    f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) t

The result being (again, with nested objects):

[
    {
        "author": {
            "first_name": "NICK",
            "last_name": "WAHLBERG"
        },
        "title": "SMILE EARRING"
    },
    {
        "author": {
            "first_name": "NICK",
            "last_name": "WAHLBERG"
        },
        "title": "WARDROBE PHANTOM"
    },
    {
        "author": {
            "first_name": "PENELOPE",
            "last_name": "GUINESS"
        },
        "title": "ACADEMY DINOSAUR"
    },
    {
        "author": {
            "first_name": "PENELOPE",
            "last_name": "GUINESS"
        },
        "title": "ANACONDA CONFESSIONS"
    }
]

For more sophisticated nesting, including nesting of collections, a correlated subquery is needed in SQL Server, also with a FOR XML or FOR JSON syntax.

Conclusion

XML and JSON are popular document formats outside and inside of the database. SQL Server has some of the most conventient syntax for most cases, while standard SQL supports much more basic, and thus more powerful constructs. In standard SQL, almost any kind of XML or JSON projection is possible, and with XMLTABLE() and JSON_TABLE(), the documents can be transformed back to SQL tables, as well. In many applications, using these XML or JSON features natively would lead to much less boilerplate code, as many applications do not need middleware between the database and some client, just to transform data between formats.

Most ORMs don’t expose this functionality for a variety of reasons, the main one being that the devil is in the details. While both XML and JSON are nicely standardised, the implementations differ greatly:

  • The SQL/XML standard is implemented mostly by DB2, Oracle, and PostgreSQL. Many dialects offer some XML capabilities, but not as impressive as the standard and the previous three. SQL Server has FOR XML which is very powerful for standard XML serialisations, but may be a bit difficult to use for edge cases
  • The SQL/JSON standard was added late and is implemented again to large extents by DB2 and Oracle, but inceasingly also by MariaDB and MySQL. PostgreSQL (and by consequence, compatible dialects, like CockroachDB) had their own proprietary functions and APIs, which are not compatible with the standard. And again, SQL Server has FOR JSON which works well for standard serialisations, but a bit less well for edge cases

These technologies are poorly adopted in clients because of the many subtle differences. jOOQ has been leveling out these minor differences for many years without hiding the core functionality. SQL/XML and SQL/JSON are perfect use-cases for jOOQ 3.14 (due in Q2 2020), which now allows for using both the standard SQL/XML and SQL/JSON syntaxes as well as the SQL Server FOR XML and FOR JSON syntax in the jOOQ Professional and Enterprise Editions.

Before jOOQ 3.14 is out, you can already play with the current functionality on our website: https://www.jooq.org/translate

Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

It’s been a while since I’ve ranted on this blog, but I was recently challenged by a reddit thread to write about this topic, so here goes…

So, you’re writing a service that produces some JSON from your database model. What do you need? Let’s see:

  • Read a book on DDD
  • Read another book on DDD
  • Write some entities, DTOs, factories, and factory builders
  • Discuss whether your entities, DTOs, factories, and factory builders should be immutable, and use Lombok, Autovalue, or Immutables to ease the pain of construction of said objects
  • Discuss whether you want to use standard JPA, or Hibernate specific features for your mapping
  • Plug in Jackson, the XML and JSON mapper library, because you’ve read a nice blog post about it
  • Debug 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations. Minor thing
  • Debug 1-2 N+1 cases

STOP IT

No, seriously. Just stop it right there!

What you needed was this kind of JSON structure, exported form your favourite Sakila database:

[{
  "first_name": "PENELOPE",
  "last_name": "GUINESS",
  "categories": [{
    "name": "Animation",
    "films": [{
      "title": "ANACONDA CONFESSIONS"
    }]
   }, {
    "name": "Family",
    "films": [{
      "title": "KING EVOLUTION"
    }, {
      "title": "SPLASH GUMP"
    }]
  }]
}, {
   ...

In English: We need a list of actors, and the film categories they played in, and grouped in each category, the individual films they played in.

Let me show you how easy this is with SQL Server SQL (all other database dialects can do it these days, I just happen to have a SQL Server example ready:

-- 1) Produce actors
SELECT
  a.first_name,
  a.last_name, (

    -- 2) Nest categories in each actor
    SELECT
      c.name, (

        -- 3) Nest films in each category
        SELECT title
        FROM film AS f
        JOIN film_category AS fc ON f.film_id = fc.film_id
        JOIN film_actor AS fa ON fc.film_id = fa.film_id
        WHERE fc.category_id = c.category_id
        AND a.actor_id = fa.actor_id
        FOR JSON PATH -- 4) Turn into JSON
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR JSON PATH -- 4) Turn into JSON
  ) AS categories
FROM
  actor AS a 
FOR JSON PATH, ROOT ('actors') -- 4) Turn into JSON

That’s it. That’s all there is to it. Only basic SQL-92, enhanced with some vendor-specific JSON export syntax. (There are also SQL standard JSON APIs as implemented in other RDBMS). Let’s discuss it quickly:

  1. The outer most query produces a set of actors. As you would have expected
  2. For each actor, a correlated subquery produces a nested JSON array of categories
  3. For each category, another correlated subquery finds all the films per actor and category
  4. Finally, turn all the result structures into JSON

That’s it.

Want to change the result structure? Super easy. Just modify the query accordingly. No need to modify:

  • Whatever you thought your DDD “root aggregate was”
  • Your gazillion entities, DTOs, factories, and factory builders
  • Your gazillion Lombok, Autovalue, or Immutables annotations
  • Your hacks and workarounds to get this stuff through your standard JPA, or Hibernate specific features for your mapping
  • Your gazilion Jackson, the XML and JSON mapper library annotations
  • Debugging another 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations
  • Debugging another 1-2 N+1 cases

No! No need! It’s so simple. Just stream the JSON directly from the database to the client using whatever SQL API of your preference: JDBC, jOOQ, JdbcTemplate, MyBatis, or even JPA native query. Just don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware. Let me repeat that for emphasis:

Don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware.

Oh, want to switch to XML? Easy. In SQL Server, this amounts to almost nothing but replacing JSON by XML:

SELECT
  a.first_name,
  a.last_name, (
    SELECT
      c.name, (
	    SELECT title
	    FROM film AS f
	    JOIN film_category AS fc ON f.film_id = fc.film_id
	    JOIN film_actor AS fa ON fc.film_id = fa.film_id
	    WHERE fc.category_id = c.category_id
	    AND a.actor_id = fa.actor_id
	    FOR XML PATH ('film'), TYPE
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR XML PATH ('category'), TYPE
  ) AS categories
FROM
  actor AS a 
FOR XML PATH ('actor'), ROOT ('actors')

And now, you’re getting:

<actors>
  <actor>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <categories>
      <category>
        <name>Animation</name>
        <films>
          <film>
            <title>ANACONDA CONFESSIONS</title>
          </film>
        </films>
      </category>
      <category>
        <name>Family</name>
        <films>
          <film>
            <title>KING EVOLUTION</title>
          </film>
          <film>
            <title>SPLASH GUMP</title>
          </film>
        </films>
      </category>
      ...

It’s so easy with SQL!

Want to support both without rewriting too much logic? Produce XML and use XSLT to automatically generate the JSON. Whatever.

FAQ, Q&A

But my favourite Java SQL API can’t handle it

So what. Write a view and query that instead. By the way, jOOQ 3.14 supports all of it.

But this doesn’t fit our architecture

Then fix the architecture

But SQL is bad

No, it’s great. It’s based on relational algebra and augments it in many many useful ways. It’s a declarative 4GL, the optimiser produces way better execution plans than you could ever imagine (see my talk), and it’s way more fun than your gazillion 3GL mapping libraries.

But SQL is evil because of Oracle

Then use PostgreSQL. It can do JSON.

But what about testing

Just spin up a test database with https://www.testcontainers.org, install your schema with some migration framework like Flyway or Liquibase in it, fill in some sample data, and write your simple integration tests.

But mocking is better

It is not. The more you mock away the database, the more you’re writing your own database.

But I’m paid by the lines of code

Well, good riddance, then.

But what if we have to change the RDBMS

So what? Your management paid tens of millions for the new licensing. They can pay you tens of hundreds to spend 20 minutes rewriting your 5-10 SQL queries. You already wrote the integration tests above.

Anyway. It won’t happen. And if it will, then those few JSON queries will not be your biggest problem.

What was that talk of yours again?

Here, highly recommended:

But we’ve already spent so many person years implementing our middleware

It has a name

But I’ve read this other blog post…

And now you’ve read mine.

But that’s like 90s style 2 tier architecture

So what? You’ve spent 5% the time to implement it. That’s 95% more time adding value to your customers, rather than bikeshedding mapping technology. I call that a feature.

What about ingestion? We need abstraction over ingestion

No, you don’t. You can send the JSON directly into your database, and transform / normalise it from there, using the same technique. You don’t need middleware abstraction and mapping, you just want middleware abstraction and mapping.

How to Implement Your Own XJC Plugin to Generate toString(), equals(), and hashCode() Methods

When you work with JAXB to generate Java code from XSD (or *ghasp* WSDL) files, you’re going to use the XJC tool, which is shipped with the JDK (at least until JDK 9 – with Jigsaw, it will soon be externalised into its own external dependency).

Adding plugins to XJC when running it via Maven is fairly straightforward. There are a few useful plugins available for free from here: https://github.com/highsource/jaxb2-basics and from other sources. But if you’re not entirely happy with the results, you might need to roll your own.

In this article, we’ll look into how we can write a simple plugin to generate custom renditions of these methods:

  • toString()
  • equals()
  • hashCode()

Set up a project

First, we need to set up a new Maven project, which contains the plugin code. This is quite straightforward. Just work with a single dependency and you’re done:

<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>org.jooq</groupId>
    <artifactId>jooq-tools-xjc-plugin</artifactId>
    <version>3.11.0-SNAPSHOT</version>
    <name>jOOQ XJC Code Generation Plugin</name>
    
    <dependencies>      
        <dependency>
            <groupId>com.sun.xml.bind</groupId>
            <artifactId>jaxb-xjc</artifactId>
            <version>2.3.0</version>
        </dependency>
    </dependencies>
</project>

Now, add the plugin logic

An empty plugin essentially looks like this:

package org.jooq.xjc;

import org.xml.sax.ErrorHandler;

import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.ClassOutline;
import com.sun.tools.xjc.outline.Outline;

/**
 * @author Lukas Eder
 */
public class XJCPlugin extends Plugin {

    @Override
    public String getOptionName() {
        return "Xjooq-equals-hashcode-tostring";
    }

    @Override
    public int parseArgument(Options opt, String[] args, int i) {
        return 1;
    }

    @Override
    public String getUsage() {
        return "  -Xjooq-equals-hashcode-tostring    :  xjc plugin";
    }

    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler) {
        return true;
    }
}

The important parts in our case are the getOptionName() method, which provides a flag that can be used from XJC code generation configuration, to activate our plugin, and the run() method, which will contain our code, adding the three desired methods.

Let’s fill in some actual code!

package org.jooq.xjc;

import static com.sun.codemodel.JMod.FINAL;
import static com.sun.codemodel.JMod.PUBLIC;
import static com.sun.codemodel.JMod.STATIC;

import java.util.Map.Entry;

import org.xml.sax.ErrorHandler;

import com.sun.codemodel.JBlock;
import com.sun.codemodel.JClass;
import com.sun.codemodel.JCodeModel;
import com.sun.codemodel.JConditional;
import com.sun.codemodel.JExpr;
import com.sun.codemodel.JFieldVar;
import com.sun.codemodel.JMethod;
import com.sun.codemodel.JOp;
import com.sun.codemodel.JVar;
import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.ClassOutline;
import com.sun.tools.xjc.outline.Outline;

/**
 * @author Lukas Eder
 */
public class XJCPlugin extends Plugin {

    @Override
    public String getOptionName() {
        return "Xjooq-equals-hashcode-tostring";
    }

    @Override
    public int parseArgument(Options opt, String[] args, int i) {
        return 1;
    }

    @Override
    public String getUsage() {
        return "  -Xjooq-equals-hashcode-tostring    :  xjc example plugin";
    }

    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler) {
        JCodeModel m = new JCodeModel();

        for (ClassOutline o : model.getClasses()) {

            // toString()
            // ---------------------------------------------------------------------------
            {
                JMethod method = o.implClass.method(PUBLIC, String.class, "toString");
                method.annotate(Override.class);
                JBlock body = method.body();
                JClass sbType = m.ref(StringBuilder.class);
                JVar sb = body.decl(0, sbType, "sb", JExpr._new(sbType));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        body.invoke(sb, "append").arg("<" + e.getKey() + ">");
                        body.invoke(sb, "append").arg(v);
                        body.invoke(sb, "append").arg("</" + e.getKey() + ">");
                    }
                }

                body._return(JExpr.invoke(sb, "toString"));
            }

            // equals()
            // ---------------------------------------------------------------------------
            {
                JMethod method = o.implClass.method(PUBLIC, boolean.class, "equals");
                method.annotate(Override.class);
                JVar that = method.param(Object.class, "that");
                JBlock body = method.body();
                body._if(JExpr._this().eq(that))
                    ._then()._return(JExpr.lit(true));
                body._if(that.eq(JExpr._null()))
                    ._then()._return(JExpr.lit(false));
                body._if(JExpr.invoke("getClass").ne(JExpr.invoke(that, "getClass")))
                    ._then()._return(JExpr.lit(false));

                JVar other = body.decl(0, o.implClass, "other", JExpr.cast(o.implClass, that));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        if (v.type().isPrimitive()) {
                            body._if(v.ne(other.ref(v)))
                                ._then()._return(JExpr.lit(false));
                        }
                        else {
                            JConditional i = body._if(v.eq(JExpr._null()));
                            i._then()._if(other.ref(v).ne(JExpr._null()))
                                     ._then()._return(JExpr.lit(false));
                            i._elseif(v.invoke("equals").arg(other.ref(v)).not())
                             ._then()._return(JExpr.lit(false));
                        }
                    }
                }

                body._return(JExpr.lit(true));
            }

            // hashCode()
            {
                JMethod method = o.implClass.method(PUBLIC, int.class, "hashCode");
                method.annotate(Override.class);
                JBlock body = method.body();
                JVar prime = body.decl(FINAL, m.INT, "prime", JExpr.lit(31));
                JVar result = body.decl(0, m.INT, "result", JExpr.lit(1));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        body.assign(result, prime.mul(result).plus(
                            v.type().isPrimitive()
                          ? v
                          : JOp.cond(v.eq(JExpr._null()), JExpr.lit(0), v.invoke("hashCode"))
                        ));
                    }
                }

                body._return(result);
            }
        }

        return true;
    }
}

The above logic generates an XML document fragment of the JAXB-annotated classes (without formatting) and an equals() and hashCode() implementation that is inspired by the generated code you would obtain from an IDE like Eclipse.

Some example output:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "MappedTable", propOrder = {

})
public class MappedTable {

    protected String input;
    @XmlElement(type = String.class)
    protected String inputExpression;
    @XmlElement(required = true)
    protected String output;

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("<input>");
        sb.append(input);
        sb.append("</input>");
        sb.append("<inputExpression>");
        sb.append(inputExpression);
        sb.append("</inputExpression>");
        sb.append("<output>");
        sb.append(output);
        sb.append("</output>");
        return sb.toString();
    }

    @Override
    public boolean equals(Object that) {
        if (this == that) {
            return true;
        }
        if (that == null) {
            return false;
        }
        if (getClass()!= that.getClass()) {
            return false;
        }
        MappedTable other = ((MappedTable) that);
        if (input == null) {
            if (other.input!= null) {
                return false;
            }
        } else {
            if (!input.equals(other.input)) {
                return false;
            }
        }
        if (inputExpression == null) {
            if (other.inputExpression!= null) {
                return false;
            }
        } else {
            if (!inputExpression.equals(other.inputExpression)) {
                return false;
            }
        }
        if (output == null) {
            if (other.output!= null) {
                return false;
            }
        } else {
            if (!output.equals(other.output)) {
                return false;
            }
        }
        return true;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = ((prime*result)+((input == null)? 0 :input.hashCode()));
        result = ((prime*result)+((inputExpression == null)? 0 :inputExpression.hashCode()));
        result = ((prime*result)+((output == null)? 0 :output.hashCode()));
        return result;
    }

}

Don’t forget to register your plugin

The last step prior to building your plugin is to create a file in your project at:

src/main/resources/META-INF/services/com.sun.tools.xjc.Plugin

And put the qualified name of your plugin in it:

org.jooq.xjc.XJCPlugin

Done. Now install your plugin…

mvn clean install

… and use it from your code generation configuration as follows:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb2-plugin</artifactId>
    <version>0.13.1</version>
    <executions>
        <execution>
            <id>codegen</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>

                <!-- The usual configuration -->
                <encoding>UTF-8</encoding>
                <locale>us</locale>
                <forceRegenerate>true</forceRegenerate>
                <extension>true</extension>
                <strict>false</strict>
                <schemaDirectory>../jOOQ-meta/src/main/resources/xsd</schemaDirectory>
                <bindingDirectory>../jOOQ-meta/src/main/resources/xjb/codegen</bindingDirectory>
                <generateDirectory>../jOOQ-meta/src/main/java</generateDirectory>
                <generatePackage>org.jooq.util.jaxb</generatePackage>
                <schemaIncludes>
                    <include>jooq-codegen-3.11.0.xsd</include>
                </schemaIncludes>

                <!-- activate it with this line. Must match getOptionName() -->
                <args>
                    <arg>-Xjooq-equals-hashcode-tostring</arg>
                </args>
                <plugins>

                    <!-- include it with these lines. -->
                    <plugin>
                        <groupId>org.jooq.trial</groupId>
                        <artifactId>jooq-tools-xjc-plugin</artifactId>
                        <version>3.11.0-SNAPSHOT</version>
                    </plugin>
                </plugins>
            </configuration>
        </execution>
    </executions>
</plugin>

Done!

When to Choose SQL and When to Choose NoSQL

Some people make architecture decisions purely based on the loudest consultant:

For most others, however, decisions are not as simple as this. For instance: When should we start evaluating NoSQL storage systems as an alternative to RDBMS?

RDBMS as the hammer for all nails

This question obviously assumes that you’re starting out with an RDBMS, which is classically the database system that solves pretty much any problem decently enough not to be replaced easily. What does this mean? Simply put:

  • RDBMS have been around forever, so they have a huge advantage compared to “newcomers” in the market, who don’t have all the excellent tooling, community, support, maturity yet
  • E.F. Codd’s work may have been the single biggest influence on our whole industry. There has hardly been anything as revolutionary as the relational model ever since. It’s hard for an alternative database to be equally universal, i.e. they’re mostly solving niche problems

Having said so, sometimes you do have a niche problem. For instance a graph database problem. In fact, a graph is nothing fundamentally different from what you can represent in the relational model. It is easy to model a graph with a many-to-many relationship table.

The same is true for XML/JSON in the database (don’t forget, JSON is just XML with less syntax and less features, so it’s less awesome than XML). Sometimes you need to actually store document structures (hierarchical data) in their hierarchical form in your database, instead of normalising them. Sure, you could normalise a document, but that’s a lot of work and not necessarily beneficial. This was actually the question on Twitter by Gareth, that triggered the subsequent discussion:

Most modern RDBMS offer means to store (and much more importantly, query) XML/JSON data structures in simple forms. This includes PostgreSQL, Oracle, DB2, SQL Server and others.

So, when do we decide to switch?

As developers, we may be inclined to switch quite quickly. E.g. when working with graphs, we’d love to use Neo4j which implements the awesome Cypher query language. When working with JSON, we’d love to use something like Couchbase which implements the interesting N1QL query language. Both of these query languages are heavily inspired by SQL, which were wise vendor choices in my opinion (as opposed to MongoDB’s weird JSON-based query language), because in the end, the SQL language is still the most powerful and popular 4GL that was ever created.

But as developers, we shouldn’t make that decision lightly. Yes, at first, these specialised databases seem like a better fit. But the additional cost for operations teams to maintain, monitor, patch, tune production systems must not be underestimated. This is even true within the RDBMS ecosystem. A prominent recent example was Uber’s switch (back!) from PostgreSQL to MySQL:

Why Uber Engineering Switched from Postgres to MySQL

Do note that they switched the other way round before, only to regret things. Truth is, there are tons of reasons why your operations teams prefer to always use the same database, even if that’s quite expensive in terms of licensing. In many cases, however, it would be even more expensive to:

  • Engage in additional license and/or support contracts with new database vendors
  • Find skilled DBA for the new technology (can be very hard with niche databases)
  • Maintain two data silos and possibly sustain the cost of keeping them in sync

Ultimately, there’s a threshold:

In terms of using JSON in the database, it’s simple:

  • Occasional JSON storage: Stick with RDBMS.
  • Everything is JSON: Perhaps not RDBMS.

The same is true for graph problems. SQL can totally handle graphs and traverse them with recursive SQL. Here’s a funky statement that recursively calculates the subset sum problem:

subset-sum

Full article here: How to Find the Closest Subset Sum with SQL

If you only have a few tree/graph traversals to calculate (e.g. for a simple menu structure), don’t jump ship with RDBMS yet. If graphs are your business and that’s all you do, then probably RDBMS aren’t the right choice.

Conclusion

Whatever problem you’re solving. Do remember. Yes, all you have is a hammer, every problem starts looking like a nail. But don’t dismiss RDBMS as a silly hammer. It is a very mighty hammer and in 2016, it can do a lot of non-relational niche things decently well.

RDBMS is still the best default choice for all sorts of data problems. Only once you have gone above a certain threshold (or if you can foresee doing that), then you should look for alternatives. Because those alternatives will give you a much rougher time when you go outside of that niche (JSON, graphs, etc.) back to your “ordinary” relational business.

Please, Java. Do Finally Support Multiline String Literals

I understand the idea of Java-the-language being rather hard to maintain in a backwards-compatible way. I understand the idea of JDK API, such as the collections, to be rather tough not to break. Yes.

I don’t understand why Java still doesn’t have multiline string literals.

How often do you write JDBC code (or whatever other external language or markup, say, JSON or XML you want to embed in Java) like this?

try (PreparedStatement s = connection.prepareStatement(
    "SELECT * "
  + "FROM my_table "
  + "WHERE a = b "
)) {
    ...
}

What’s the issue?

  • Syntax correctness, i.e. don’t forget to add a whitespace at the end of each line
  • Style in host language vs style in external language, sure the above code looks “nicely” formatted in Java, but it’s not formatted for the consuming server side
  • SQL injection, didn’t we teach our juniors not to perform this kind of string concatenation in SQL, to prevent SQL injection? Sure, the above is still safe, but what keeps a less experienced maintainer from embedding, accidentally, user input?

Today, I was working with some code written in Xtend, a very interesting language that compiles into Java source code. Xtend is exceptionally useful for templating (e.g. for generating jOOQ’s Record1Record22 API). I noticed another very nice feature of multi line strings:

The lack of need for escaping!

Multi line strings in Xtend are terminated by triple-apostrophes. E.g.

// Xtend
val regex = '''import java\.lang\.AutoCloseable;'''

Yes, the above is a valid Java regular expression. I’m escaping the dots when matching imports of the AutoCloseable type. I don’t have to do this tedious double-escaping that I have to do in ordinary strings to tell the Java compiler that the backslash is really a backslash, not Java escaping of the following character:

// Java
String regex = "import java\\.lang\\.AutoCloseable;";

So… Translated to our original SQL example, I would really like to write this, instead:

try (PreparedStatement s = connection.prepareStatement(
    '''SELECT *
       FROM my_table
       WHERE a = b'''
)) {
    ...
}

With a big nice-to-have plus: String interpolation (even PHP has it)!

String tableName = "my_table";
int b = 1;
try (PreparedStatement s = connection.prepareStatement(
    '''SELECT *
       FROM ${tableName}
       WHERE a = ${b}'''
)) {
    ...
}

Small but very effective improvement

This would be a very small (in terms of language complexity budget: Just one new token) but very effective improvement for all of us out there who are embedding an external language (SQL, XML, XPath, Regex, you name it) in Java. We do that a lot. And we hate it.

It doesn’t have to be as powerful as Xtend’s multiline string literals (which really rock with their whitespace management for formatting, and templating expressions). But it would be a start.

Please, make this a New Year’s resolution! :)

Java 8 Friday Goodies: Lambdas and XML

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. We have blogged a couple of times about some nice Java 8 goodies, and now we feel it’s time to start a new blog series, the…

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub. tweet this

Java 8 Goodie: Lambdas and XML

There isn’t too much that Java 8 can do to the existing SAX and DOM APIs. The SAX ContentHandler has too many abstract methods to qualify as a @FunctionalInterface, and DOM is a huge, verbose API specified by w3c, with little chance of adding new extension methods.

Luckily there is a small Open Source library called jOOX that allows for processing the w3c standard DOM API through a wrapper API that mimicks the popular jQuery library. jQuery leverages JavaScript’s language features by allowing users to pass functions to the API for DOM traversal. The same is the case with jOOX. Let’s have a closer look:

Assume that we’re using the following pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>org.jooq</groupId>
  <artifactId>java8-goodies</artifactId>
  <version>1.0-SNAPSHOT</version>

  <dependencies>
    <dependency>
      <groupId>org.jooq</groupId>
      <artifactId>joox</artifactId>
      <version>1.2.0</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <configuration>
          <fork>true</fork>
          <maxmem>512m</maxmem>
          <meminitial>256m</meminitial>
          <encoding>UTF-8</encoding>
          <source>1.8</source>
          <target>1.8</target>
          <debug>true</debug>
          <debuglevel>lines,vars,source</debuglevel>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

Let’s assume we wanted to know all the involved artifacts in Maven’s groupId:artifactId:version notation. Here’s how we can do that with jOOX and lambda expressions:

$(new File("./pom.xml")).find("groupId")
                        .each(ctx -> {
    System.out.println(
        $(ctx).text() + ":" +
        $(ctx).siblings("artifactId").text() + ":" +
        $(ctx).siblings("version").text()
    );
});

Executing the above yields:

org.jooq:java8-goodies:1.0-SNAPSHOT
org.jooq:joox:1.2.0
org.apache.maven.plugins:maven-compiler-plugin:2.3.2

Let’s assume we only wanted to display those artifacts that don’t have SNAPSHOT in their version numbers. Simply add a filter:

$(new File("./pom.xml"))
    .find("groupId")
    .filter(ctx -> $(ctx).siblings("version")
                         .matchText(".*-SNAPSHOT")
                         .isEmpty())
    .each(ctx -> {
        System.out.println(
        $(ctx).text() + ":" +
        $(ctx).siblings("artifactId").text() + ":" +
        $(ctx).siblings("version").text());
    });

This will now yield

org.jooq:joox:1.2.0
org.apache.maven.plugins:maven-compiler-plugin:2.3.2

We can also transform the XML content. For instance, if the target document doesn’t need to be a POM, we could replace the matched groupId elements by an artifical artifact element that contains the artifact name in Maven notation. Here’s how to do this:

$(new File("./pom.xml"))
    .find("groupId")
    .filter(ctx -> $(ctx).siblings("version")
                         .matchText(".*-SNAPSHOT")
                         .isEmpty())
    .content(ctx ->
        $(ctx).text() + ":" +
        $(ctx).siblings("artifactId").text() + ":" +
        $(ctx).siblings("version").text()
    )
    .rename("artifact")
    .each(ctx -> System.out.println(ctx));

The above puts new content in place of the previous one through .content(), and then renames the groupId elements to artifact, before printing out the element. The result is:

<artifact>org.jooq:joox:1.2.0</artifact>
<artifact>org.apache.maven.plugins:maven-compiler-plugin:2.3.2</artifact>

More goodies next week

What becomes immediately obvious is the fact that the lambda expert group’s choice of making all SAMs (Single Abstract Method interfaces) eligible for use with lambda expressions adds great value to pre-existing APIs. Quite a clever move.

But there are also new APIs. Last week, we have discussed how the existing JDK 1.2 File API can be improved through the use of lambdas. Some of our readers have expressed their concerns that the java.io API has been largely replaced by java.nio (nio as in New I/O). Next week, we’ll have a look at Java 8’s java.nnio API (for new-new I/O ;-) ) and how it relates to the Java 8 Streams API.

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page

How to Speed Up Apache Xalan’s XPath Processor by Factor 10x

There has been a bit of an awkward bug in Apache Xalan for a while now, and that bug is XALANJ-2540. The effect of this bug is that an internal SPI configuration file is loaded by Xalan thousands of times per XPath expression evaluation, which can be measured easily as such:

this:

Element e = (Element)
  document.getElementsByTagName("SomeElementName")
          .item(0);
String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();

// Negligible
XPath xpath = factory.newXPath();

// Negligible
XPathExpression expression =
  xpath.compile("//SomeElementName");

// Accounts for 70%
String result = (String) expression
  .evaluate(document, XPathConstants.STRING);

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file’s configuration file:

META-INF/service/org.apache.xml.dtm.DTMManager

Every time!:

A profiling session on Xalan

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from c.s.o.a.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(
       String factoryId,
       String propertiesFilename,
       String fallbackClassName) {
  SecuritySupport ss = SecuritySupport
    .getInstance();

  try {
    String systemProp = ss
      .getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

References

The above text is an extract from a Stack Overflow question and answer that I’ve contributed to the public a while ago. I’m posting it again, here on my blog, such that the community’s awareness for this rather heavy bug can be raised. Feel free to upvote on this ticket here, as every Sun/Oracle JDK on this planet is affected:

https://issues.apache.org/jira/browse/XALANJ-2540

Contributing a fix to Apache would be even better, of course…

A jOOX First-Time Experience Article

Here’s some nice first-time user experience about jOOX, my lesser-known product:
http://www.kubrynski.com/2013/03/as-developer-i-want-to-use-xml.html

As a reminder, here’s what jOOX is all about:

jOOX stands for Java Object Oriented XML. It is a simple wrapper for the org.w3c.dom package, to allow for fluent XML document creation and manipulation where DOM is required but too verbose. jOOX only wraps the underlying document and can be used to enhance DOM, not as an alternative.

Unlike other, similar tools that mimick jQuery (e.g. jsoup, jerry, gwtquery), jOOX really aims to leverage standard w3c DOM usage, which isn’t such a bad thing after all, with its performant, standard Xerces implementation.

Some simple example code:

// Find the order at index 4 and 
// add an element "paid"
$(document).find("orders")
           .children().eq(4)
           .append("<paid>true</paid>");

// Find those orders that are paid 
// and flag them as "settled"
$(document).find("orders")
           .children().find("paid")
           .after("<settled>true</settled>");

jOOX and XSLT. An XML love story, continued

jOOX - a jQuery port to Java The somewhat functional way of thinking involved with jOOX’s XML manipulation cries for an additional API enhancement simply supporting XSLT. XSL transformation has become quite a standard way of transforming large amounts of XML into other structures, where normal DOM manipulation (or jOOX manipulation) becomes too tedious. Let’s have a look at how things are done in standard Java

Example input:

<books>
  <book id="1"/>
  <book id="2"/>
</books>

Example XSL:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- Match all books and increment their IDs -->
    <xsl:template match="book">
        <book id="{@id + 1}">
            <xsl:apply-templates/>
        </book>
    </xsl:template>

    <!-- Identity-transform all the other elements and attributes -->
    <xsl:template match="@*|*">
        <xsl:copy>
            <xsl:apply-templates select="*|@*"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Verboseness of XSL transformation in Java

The standard way of doing XSL transformation in Java is pretty verbose – as just about anything XML-related in standard Java. See an example of how to apply the above transformation:

Source source = new StreamSource(new File("increment.xsl"));
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(source);
DOMResult result = new DOMResult();
transformer.transform(new DOMSource(document), result);

Node output = result.getNode();

Drastically decrease verbosity with jOOX

With jOOX, you can write exactly the same in much less code:

Apply transformation:
// Applies transformation to the document element:
$(document).transform("increment.xsl");

// Applies transformation to every book element:
$(document).find("book").transform("increment.xsl");

The result in both cases is:

<books>
  <book id="2"/>
  <book id="3"/>
</books>

CSS selectors in Java

CSS selectors are a nice and intuitive alternative to XPath for DOM navigation. While XPath is more complete and has more functionality, CSS selectors were tailored for HTML DOM, where the document content is usually less structured than in XML.

Here are some examples of CSS selector and equivalent XPath expressions:

CSS:   document > library > books > book
XPath: //document/library/books/book

CSS:   document book
XPath: //document//book

CSS:   document book#id3
XPath: //document//book[@id='3']

CSS:   document book[title='CSS for dummies']
XPath: //document//book[@title='CSS for dummies']

 

This becomes more interesting when implementing pseudo-selectors in XPath:

CSS:   book:first-child
XPath: //book[not(preceding-sibling::*)]

CSS:   book:empty
XPath: //book[not(*|@*|node())]

 

A very nice library that allows for parsing selector expressions according to the w3c specification is this “css-selectors” by Christer Sandberg:

https://github.com/chrsan/css-selectors

The next version of jOOX will include css-selector’s parser for simpler DOM navigation. The following two expressions will hold the same result:

Match match1 = $(document).find("book:empty");
Match match2 = $(document).xpath("//book[not(*|@*|node())]");