Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

It’s been a while since I’ve ranted on this blog, but I was recently challenged by a reddit thread to write about this topic, so here goes…

So, you’re writing a service that produces some JSON from your database model. What do you need? Let’s see:

  • Read a book on DDD
  • Read another book on DDD
  • Write some entities, DTOs, factories, and factory builders
  • Discuss whether your entities, DTOs, factories, and factory builders should be immutable, and use Lombok, Autovalue, or Immutables to ease the pain of construction of said objects
  • Discuss whether you want to use standard JPA, or Hibernate specific features for your mapping
  • Plug in Jackson, the XML and JSON mapper library, because you’ve read a nice blog post about it
  • Debug 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations. Minor thing
  • Debug 1-2 N+1 cases

STOP IT

No, seriously. Just stop it right there!

What you needed was this kind of JSON structure, exported form your favourite Sakila database:

[{
  "first_name": "PENELOPE",
  "last_name": "GUINESS",
  "categories": [{
    "name": "Animation",
    "films": [{
      "title": "ANACONDA CONFESSIONS"
    }]
   }, {
    "name": "Family",
    "films": [{
      "title": "KING EVOLUTION"
    }, {
      "title": "SPLASH GUMP"
    }]
  }]
}, {
   ...

In English: We need a list of actors, and the film categories they played in, and grouped in each category, the individual films they played in.

Let me show you how easy this is with SQL Server SQL (all other database dialects can do it these days, I just happen to have a SQL Server example ready:

-- 1) Produce actors
SELECT
  a.first_name,
  a.last_name, (

    -- 2) Nest categories in each actor
    SELECT
      c.name, (

        -- 3) Nest films in each category
        SELECT title
        FROM film AS f
        JOIN film_category AS fc ON f.film_id = fc.film_id
        JOIN film_actor AS fa ON fc.film_id = fa.film_id
        WHERE fc.category_id = c.category_id
        AND a.actor_id = fa.actor_id
        FOR JSON PATH -- 4) Turn into JSON
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR JSON PATH -- 4) Turn into JSON
  ) AS categories
FROM
  actor AS a 
FOR JSON PATH, ROOT ('actors') -- 4) Turn into JSON

That’s it. That’s all there is to it. Only basic SQL-92, enhanced with some vendor-specific JSON export syntax. (There are also SQL standard JSON APIs as implemented in other RDBMS). Let’s discuss it quickly:

  1. The outer most query produces a set of actors. As you would have expected
  2. For each actor, a correlated subquery produces a nested JSON array of categories
  3. For each category, another correlated subquery finds all the films per actor and category
  4. Finally, turn all the result structures into JSON

That’s it.

Want to change the result structure? Super easy. Just modify the query accordingly. No need to modify:

  • Whatever you thought your DDD “root aggregate was”
  • Your gazillion entities, DTOs, factories, and factory builders
  • Your gazillion Lombok, Autovalue, or Immutables annotations
  • Your hacks and workarounds to get this stuff through your standard JPA, or Hibernate specific features for your mapping
  • Your gazilion Jackson, the XML and JSON mapper library annotations
  • Debugging another 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations
  • Debugging another 1-2 N+1 cases

No! No need! It’s so simple. Just stream the JSON directly from the database to the client using whatever SQL API of your preference: JDBC, jOOQ, JdbcTemplate, MyBatis, or even JPA native query. Just don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware. Let me repeat that for emphasis:

Don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware.

Oh, want to switch to XML? Easy. In SQL Server, this amounts to almost nothing but replacing JSON by XML:

SELECT
  a.first_name,
  a.last_name, (
    SELECT
      c.name, (
	    SELECT title
	    FROM film AS f
	    JOIN film_category AS fc ON f.film_id = fc.film_id
	    JOIN film_actor AS fa ON fc.film_id = fa.film_id
	    WHERE fc.category_id = c.category_id
	    AND a.actor_id = fa.actor_id
	    FOR XML PATH ('film'), TYPE
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR XML PATH ('category'), TYPE
  ) AS categories
FROM
  actor AS a 
FOR XML PATH ('actor'), ROOT ('actors')

And now, you’re getting:

<actors>
  <actor>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <categories>
      <category>
        <name>Animation</name>
        <films>
          <film>
            <title>ANACONDA CONFESSIONS</title>
          </film>
        </films>
      </category>
      <category>
        <name>Family</name>
        <films>
          <film>
            <title>KING EVOLUTION</title>
          </film>
          <film>
            <title>SPLASH GUMP</title>
          </film>
        </films>
      </category>
      ...

It’s so easy with SQL!

Want to support both without rewriting too much logic? Produce XML and use XSLT to automatically generate the JSON. Whatever.

FAQ, Q&A

But my favourite Java SQL API can’t handle it

So what. Write a view and query that instead.

But this doesn’t fit our architecture

Then fix the architecture

But SQL is bad

No, it’s great. It’s based on relational algebra and augments it in many many useful ways. It’s a declarative 4GL, the optimiser produces way better execution plans than you could ever imagine (see my talk), and it’s way more fun than your gazillion 3GL mapping libraries.

But SQL is evil because of Oracle

Then use PostgreSQL. It can do JSON.

But what about testing

Just spin up a test database with https://www.testcontainers.org, install your schema with some migration framework like Flyway or Liquibase in it, fill in some sample data, and write your simple integration tests.

But mocking is better

It is not. The more you mock away the database, the more you’re writing your own database.

But I’m paid by the lines of code

Well, good riddance, then.

But what if we have to change the RDBMS

So what? Your management paid tens of millions for the new licensing. They can pay you tens of hundreds to spend 20 minutes rewriting your 5-10 SQL queries. You already wrote the integration tests above.

Anyway. It won’t happen. And if it will, then those few JSON queries will not be your biggest problem.

What was that talk of yours again?

Here, highly recommended:

But we’ve already spent so many person years implementing our middleware

It has a name

But I’ve read this other blog post…

And now you’ve read mine.

But that’s like 90s style 2 tier architecture

So what? You’ve spent 5% the time to implement it. That’s 95% more time adding value to your customers, rather than bikeshedding mapping technology. I call that a feature.

What about ingestion? We need abstraction over ingestion

No, you don’t. You can send the JSON directly into your database, and transform / normalise it from there, using the same technique. You don’t need middleware abstraction and mapping, you just want middleware abstraction and mapping.