Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

It’s been a while since I’ve ranted on this blog, but I was recently challenged by a reddit thread to write about this topic, so here goes…

So, you’re writing a service that produces some JSON from your database model. What do you need? Let’s see:

  • Read a book on DDD
  • Read another book on DDD
  • Write some entities, DTOs, factories, and factory builders
  • Discuss whether your entities, DTOs, factories, and factory builders should be immutable, and use Lombok, Autovalue, or Immutables to ease the pain of construction of said objects
  • Discuss whether you want to use standard JPA, or Hibernate specific features for your mapping
  • Plug in Jackson, the XML and JSON mapper library, because you’ve read a nice blog post about it
  • Debug 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations. Minor thing
  • Debug 1-2 N+1 cases

STOP IT

No, seriously. Just stop it right there!

What you needed was this kind of JSON structure, exported form your favourite Sakila database:

[{
  "first_name": "PENELOPE",
  "last_name": "GUINESS",
  "categories": [{
    "name": "Animation",
    "films": [{
      "title": "ANACONDA CONFESSIONS"
    }]
   }, {
    "name": "Family",
    "films": [{
      "title": "KING EVOLUTION"
    }, {
      "title": "SPLASH GUMP"
    }]
  }]
}, {
   ...

In English: We need a list of actors, and the film categories they played in, and grouped in each category, the individual films they played in.

Let me show you how easy this is with SQL Server SQL (all other database dialects can do it these days, I just happen to have a SQL Server example ready:

-- 1) Produce actors
SELECT
  a.first_name,
  a.last_name, (

    -- 2) Nest categories in each actor
    SELECT
      c.name, (

        -- 3) Nest films in each category
        SELECT title
        FROM film AS f
        JOIN film_category AS fc ON f.film_id = fc.film_id
        JOIN film_actor AS fa ON fc.film_id = fa.film_id
        WHERE fc.category_id = c.category_id
        AND a.actor_id = fa.actor_id
        FOR JSON PATH -- 4) Turn into JSON
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR JSON PATH -- 4) Turn into JSON
  ) AS categories
FROM
  actor AS a 
FOR JSON PATH, ROOT ('actors') -- 4) Turn into JSON

That’s it. That’s all there is to it. Only basic SQL-92, enhanced with some vendor-specific JSON export syntax. (There are also SQL standard JSON APIs as implemented in other RDBMS). Let’s discuss it quickly:

  1. The outer most query produces a set of actors. As you would have expected
  2. For each actor, a correlated subquery produces a nested JSON array of categories
  3. For each category, another correlated subquery finds all the films per actor and category
  4. Finally, turn all the result structures into JSON

That’s it.

Want to change the result structure? Super easy. Just modify the query accordingly. No need to modify:

  • Whatever you thought your DDD “root aggregate was”
  • Your gazillion entities, DTOs, factories, and factory builders
  • Your gazillion Lombok, Autovalue, or Immutables annotations
  • Your hacks and workarounds to get this stuff through your standard JPA, or Hibernate specific features for your mapping
  • Your gazilion Jackson, the XML and JSON mapper library annotations
  • Debugging another 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations
  • Debugging another 1-2 N+1 cases

No! No need! It’s so simple. Just stream the JSON directly from the database to the client using whatever SQL API of your preference: JDBC, jOOQ, JdbcTemplate, MyBatis, or even JPA native query. Just don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware. Let me repeat that for emphasis:

Don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware.

Oh, want to switch to XML? Easy. In SQL Server, this amounts to almost nothing but replacing JSON by XML:

SELECT
  a.first_name,
  a.last_name, (
    SELECT
      c.name, (
	    SELECT title
	    FROM film AS f
	    JOIN film_category AS fc ON f.film_id = fc.film_id
	    JOIN film_actor AS fa ON fc.film_id = fa.film_id
	    WHERE fc.category_id = c.category_id
	    AND a.actor_id = fa.actor_id
	    FOR XML PATH ('film'), TYPE
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR XML PATH ('category'), TYPE
  ) AS categories
FROM
  actor AS a 
FOR XML PATH ('actor'), ROOT ('actors')

And now, you’re getting:

<actors>
  <actor>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <categories>
      <category>
        <name>Animation</name>
        <films>
          <film>
            <title>ANACONDA CONFESSIONS</title>
          </film>
        </films>
      </category>
      <category>
        <name>Family</name>
        <films>
          <film>
            <title>KING EVOLUTION</title>
          </film>
          <film>
            <title>SPLASH GUMP</title>
          </film>
        </films>
      </category>
      ...

It’s so easy with SQL!

Want to support both without rewriting too much logic? Produce XML and use XSLT to automatically generate the JSON. Whatever.

FAQ, Q&A

But my favourite Java SQL API can’t handle it

So what. Write a view and query that instead.

But this doesn’t fit our architecture

Then fix the architecture

But SQL is bad

No, it’s great. It’s based on relational algebra and augments it in many many useful ways. It’s a declarative 4GL, the optimiser produces way better execution plans than you could ever imagine (see my talk), and it’s way more fun than your gazillion 3GL mapping libraries.

But SQL is evil because of Oracle

Then use PostgreSQL. It can do JSON.

But what about testing

Just spin up a test database with https://www.testcontainers.org, install your schema with some migration framework like Flyway or Liquibase in it, fill in some sample data, and write your simple integration tests.

But mocking is better

It is not. The more you mock away the database, the more you’re writing your own database.

But I’m paid by the lines of code

Well, good riddance, then.

But what if we have to change the RDBMS

So what? Your management paid tens of millions for the new licensing. They can pay you tens of hundreds to spend 20 minutes rewriting your 5-10 SQL queries. You already wrote the integration tests above.

Anyway. It won’t happen. And if it will, then those few JSON queries will not be your biggest problem.

What was that talk of yours again?

Here, highly recommended:

But we’ve already spent so many person years implementing our middleware

It has a name

But I’ve read this other blog post…

And now you’ve read mine.

But that’s like 90s style 2 tier architecture

So what? You’ve spent 5% the time to implement it. That’s 95% more time adding value to your customers, rather than bikeshedding mapping technology. I call that a feature.

What about ingestion? We need abstraction over ingestion

No, you don’t. You can send the JSON directly into your database, and transform / normalise it from there, using the same technique. You don’t need middleware abstraction and mapping, you just want middleware abstraction and mapping.

17 thoughts on “Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

  1. I liked that you accepted the challenge, but the point of my article was to provide a simple way to avoid the Cartesian Product when having to fetch a model like this one:

    		post
    		 |
    		/ \
    post_tag   post_comment
    	/   
     tag
    
    1. I did not reply to your article, I replied to a comment on reddit you’ve made. And that model can be fetched in just the same way, without any cartesian products (I’m using correlated subqueries here):

      SELECT 
        p.*,
        -- Magic happens here
        (SELECT t.* 
         FROM tag AS t 
         JOIN post_tag AS pt ON t.id = pt.tag_id
         WHERE pt.post_id = p.id
         FOR JSON PATH) AS tags,
        -- And then, more magic here
        (SELECT pc.*
         FROM post_comment AS pc
         WHERE pc.post_id = p.id
         FOR JSON PATH) AS comments
      FROM post AS p
      FOR JSON PATH, ROOT ('post')
      
  2. I also hate mappings. Mapping “objects” (i.e. inanimate data objects) is what turns us into puppet masters and polutes our code with dozens of useless things. Whenever possible I use the concept of “animated data”.
    That is, let a Java Interface be your model and add an implementation of it which has a JsonObject or a ResultSet as its backbone:

    1. My article here is mostly about mapping stuff just to send it elsewhere as JSON or XML. There’s absolutely no need to do that with a 3GL, as SQL can already do it, much more concisely.

      If the data is consumed in the middleware, that’s a different story, in case of which I have less of an opinion.

  3. “So, you’re writing a service that produces some JSON from your database model.”
    Is that all, really? No business logic, no external service to call? Well in that case sure, the DB can do all the work. But what’s the point of such a “service”?
    “You don’t need middleware abstraction and mapping”
    Ah, I see now.
    Stay tuned for the next post: How to return HTML and CSS from an SQL query :) “So you’re using a web framework? Just stop it right there!” :D

    1. Look. I’m not advocating all or nothing solutions. Middleware has lots of uses. But very often, it does not, and these myriad of lines of code that just shovel around stupid data between different representations add no value at all, but prevent developer productivity, and more often than not, query performance.

      It’s perfectly possible to have 20 services with business logic in middleware next to 20 other services with no logic in middleware

  4. I read that using the voice of the guy that does the voice over for Cinema Sins.
    Thats an interesting idea, I think I will give it a try sometime.

  5. That’s what I did last years on postgres. I’m with you about general approach. There is a drawback to take into acocunt: “Indexes”. In my case Indexes works on jsonb in some situations but does not cover
    all kind of 1-n queries (when you have a multi-nested json).
    If your json i.e. represent a complex nested item and you need to extract some data to have a search grid on gui you need a fast query (i.e. server side filtering) to extract some data flattened at some level and if indexes does not work it’s impossible to use it due performance issues.
    My approach was to emerge some fields in a slave (not master data) table to represent something like a materialized view to achieve performance requirements when indexes are not usable.

    1. I’m not sure if we’re talking about the same thing here. This article doesn’t advocate storing data as JSON. It advocates using SQL to map relational (normalised) data to JSON when querying. Of course, the two things can be combined, and of course, each vendor has their limits.

      However, I’m not sure if this particular set of problems arising from these limits is related to the mapping requirement that this article talks about, or am I perhaps missing something?

  6. Hello Lukas!

    Thank you, for your interesting article.
    So you are suggesting to write complex SQL queries to generate JSON. Okay.

    You, as the author of jooq, told us to use jooq with all the advantages of having a typed Java API and not to write SQL statements as text.

    But why is the json output of jooq so “hard” to handle on the client side?
    I would love to see “better” JSON output, namely as an array of records and then key/value pairs, like MOST of the other API’s are working. This way, we could combine complex statements with jooq and outputting a reasonable JSON format for REST APIs.

    Thank you!

    1. I would love to see “better” JSON output, namely as an array of records and then key/value pairs, like MOST of the other API’s are working. This way, we could combine complex statements with jooq and outputting a reasonable JSON format for REST APIs.

      jOOQ 3.12 finally supports some standardised JSON API, including:

      jsonArray()
      jsonEntry()
      jsonObject()

      And we’re working on adding more API and more vendor support. Would love to hear your feedback on what the priorities could be for you, specifically, here or as feature requests: https://github.com/jOOQ/jOOQ/issues/new/choose

  7. Hey Lukas,
    thanks for the awesome post.
    I did the same stuff with PostgreSQL in my last project instead of the DDD you’ve described above and absolutely agreed with you. So much OOP overhead can be reduced by just some aggregate functions!
    Happy to see more interesting posts from you. :)

    -> So you are suggesting to write complex SQL queries to generate JSON. Okay.
    I’ve started to work more intensive with PostgreSQL about six weeks ago and would not say it’s complicated to aggregate JSON output using SQL.

    1. It really isn’t complicated, indeed. All that opposition is just cargo culting. Of course, OO has its merits, but Java, specifically, is making a lot of things very complicated, and a lot of ceremony is being done for ceremony’s sake only…

      Glad to hear you’ve been successful with this approach!

  8. Hi Lucas,
    Great article. I was using this technique for years with SQL Server, MySQL, and PostgreSQL, all solutions running in production for years. Firs thing is to convince your management, I say that it’s same SQL code and data, just packaged differently. Now, think about database as a service: single entry point, JSON/XML in/out, dynamic routing, consistent data format.
    My setup is: 1. Create schema ‘service’. 2. Create entry point procedure ‘service.process’; it does (a) error handling and (b) routing to other procedures.
    Entry point procedure has three parameters: who (like token), what (action string like ‘user.add’), data package (XML/JSON data elements needed to execute the action)
    Bottom line is that UI developers love consuming JSON/XML data, as they no longer need to build ORM or DAL solutions, no code to map parameters and output for hundreds of various stored procedures. Developers use de-serialization from JSON/XML to simple Java, .Net, JS, etc. data structures and work the those in the app.
    Moving from one database to another is mostly change for vendor specific syntax.
    One additional feature worth mentioning is that your output JSON/XML structure may contain additional instructions for app/web server, like setting cookie, sending email, or saving this output in cache, see ‘session_create’ in my code:
    https://github.com/latomcus/api-platform/blob/master/sql/mssql.sql
    It would be great to popularize this design pattern and teach more developers to use it.
    Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.