Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

It’s been a while since I’ve ranted on this blog, but I was recently challenged by a reddit thread to write about this topic, so here goes…

So, you’re writing a service that produces some JSON from your database model. What do you need? Let’s see:

  • Read a book on DDD
  • Read another book on DDD
  • Write some entities, DTOs, factories, and factory builders
  • Discuss whether your entities, DTOs, factories, and factory builders should be immutable, and use Lombok, Autovalue, or Immutables to ease the pain of construction of said objects
  • Discuss whether you want to use standard JPA, or Hibernate specific features for your mapping
  • Plug in Jackson, the XML and JSON mapper library, because you’ve read a nice blog post about it
  • Debug 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations. Minor thing
  • Debug 1-2 N+1 cases

STOP IT

No, seriously. Just stop it right there!

What you needed was this kind of JSON structure, exported form your favourite Sakila database:

[{
  "first_name": "PENELOPE",
  "last_name": "GUINESS",
  "categories": [{
    "name": "Animation",
    "films": [{
      "title": "ANACONDA CONFESSIONS"
    }]
   }, {
    "name": "Family",
    "films": [{
      "title": "KING EVOLUTION"
    }, {
      "title": "SPLASH GUMP"
    }]
  }]
}, {
   ...

In English: We need a list of actors, and the film categories they played in, and grouped in each category, the individual films they played in.

Let me show you how easy this is with SQL Server SQL (all other database dialects can do it these days, I just happen to have a SQL Server example ready:

-- 1) Produce actors
SELECT
  a.first_name,
  a.last_name, (

    -- 2) Nest categories in each actor
    SELECT
      c.name, (

        -- 3) Nest films in each category
        SELECT title
        FROM film AS f
        JOIN film_category AS fc ON f.film_id = fc.film_id
        JOIN film_actor AS fa ON fc.film_id = fa.film_id
        WHERE fc.category_id = c.category_id
        AND a.actor_id = fa.actor_id
        FOR JSON PATH -- 4) Turn into JSON
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR JSON PATH -- 4) Turn into JSON
  ) AS categories
FROM
  actor AS a 
FOR JSON PATH, ROOT ('actors') -- 4) Turn into JSON

That’s it. That’s all there is to it. Only basic SQL-92, enhanced with some vendor-specific JSON export syntax. (There are also SQL standard JSON APIs as implemented in other RDBMS). Let’s discuss it quickly:

  1. The outer most query produces a set of actors. As you would have expected
  2. For each actor, a correlated subquery produces a nested JSON array of categories
  3. For each category, another correlated subquery finds all the films per actor and category
  4. Finally, turn all the result structures into JSON

That’s it.

Want to change the result structure? Super easy. Just modify the query accordingly. No need to modify:

  • Whatever you thought your DDD “root aggregate was”
  • Your gazillion entities, DTOs, factories, and factory builders
  • Your gazillion Lombok, Autovalue, or Immutables annotations
  • Your hacks and workarounds to get this stuff through your standard JPA, or Hibernate specific features for your mapping
  • Your gazilion Jackson, the XML and JSON mapper library annotations
  • Debugging another 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations
  • Debugging another 1-2 N+1 cases

No! No need! It’s so simple. Just stream the JSON directly from the database to the client using whatever SQL API of your preference: JDBC, jOOQ, JdbcTemplate, MyBatis, or even JPA native query. Just don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware. Let me repeat that for emphasis:

Don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware.

Oh, want to switch to XML? Easy. In SQL Server, this amounts to almost nothing but replacing JSON by XML:

SELECT
  a.first_name,
  a.last_name, (
    SELECT
      c.name, (
	    SELECT title
	    FROM film AS f
	    JOIN film_category AS fc ON f.film_id = fc.film_id
	    JOIN film_actor AS fa ON fc.film_id = fa.film_id
	    WHERE fc.category_id = c.category_id
	    AND a.actor_id = fa.actor_id
	    FOR XML PATH ('film'), TYPE
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR XML PATH ('category'), TYPE
  ) AS categories
FROM
  actor AS a 
FOR XML PATH ('actor'), ROOT ('actors')

And now, you’re getting:

<actors>
  <actor>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <categories>
      <category>
        <name>Animation</name>
        <films>
          <film>
            <title>ANACONDA CONFESSIONS</title>
          </film>
        </films>
      </category>
      <category>
        <name>Family</name>
        <films>
          <film>
            <title>KING EVOLUTION</title>
          </film>
          <film>
            <title>SPLASH GUMP</title>
          </film>
        </films>
      </category>
      ...

It’s so easy with SQL!

Want to support both without rewriting too much logic? Produce XML and use XSLT to automatically generate the JSON. Whatever.

FAQ, Q&A

But my favourite Java SQL API can’t handle it

So what. Write a view and query that instead. By the way, jOOQ 3.14 supports all of it.

But this doesn’t fit our architecture

Then fix the architecture

But SQL is bad

No, it’s great. It’s based on relational algebra and augments it in many many useful ways. It’s a declarative 4GL, the optimiser produces way better execution plans than you could ever imagine (see my talk), and it’s way more fun than your gazillion 3GL mapping libraries.

But SQL is evil because of Oracle

Then use PostgreSQL. It can do JSON.

But what about testing

Just spin up a test database with https://www.testcontainers.org, install your schema with some migration framework like Flyway or Liquibase in it, fill in some sample data, and write your simple integration tests.

But mocking is better

It is not. The more you mock away the database, the more you’re writing your own database.

But I’m paid by the lines of code

Well, good riddance, then.

But what if we have to change the RDBMS

So what? Your management paid tens of millions for the new licensing. They can pay you tens of hundreds to spend 20 minutes rewriting your 5-10 SQL queries. You already wrote the integration tests above.

Anyway. It won’t happen. And if it will, then those few JSON queries will not be your biggest problem.

What was that talk of yours again?

Here, highly recommended:

But we’ve already spent so many person years implementing our middleware

It has a name

But I’ve read this other blog post…

And now you’ve read mine.

But that’s like 90s style 2 tier architecture

So what? You’ve spent 5% the time to implement it. That’s 95% more time adding value to your customers, rather than bikeshedding mapping technology. I call that a feature.

What about ingestion? We need abstraction over ingestion

No, you don’t. You can send the JSON directly into your database, and transform / normalise it from there, using the same technique. You don’t need middleware abstraction and mapping, you just want middleware abstraction and mapping.

34 thoughts on “Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

  1. I liked that you accepted the challenge, but the point of my article was to provide a simple way to avoid the Cartesian Product when having to fetch a model like this one:

    		post
    		 |
    		/ \
    post_tag   post_comment
    	/   
     tag
    
    1. I did not reply to your article, I replied to a comment on reddit you’ve made. And that model can be fetched in just the same way, without any cartesian products (I’m using correlated subqueries here):

      SELECT 
        p.*,
        -- Magic happens here
        (SELECT t.* 
         FROM tag AS t 
         JOIN post_tag AS pt ON t.id = pt.tag_id
         WHERE pt.post_id = p.id
         FOR JSON PATH) AS tags,
        -- And then, more magic here
        (SELECT pc.*
         FROM post_comment AS pc
         WHERE pc.post_id = p.id
         FOR JSON PATH) AS comments
      FROM post AS p
      FOR JSON PATH, ROOT ('post')
      
  2. I also hate mappings. Mapping “objects” (i.e. inanimate data objects) is what turns us into puppet masters and polutes our code with dozens of useless things. Whenever possible I use the concept of “animated data”.
    That is, let a Java Interface be your model and add an implementation of it which has a JsonObject or a ResultSet as its backbone:

    1. My article here is mostly about mapping stuff just to send it elsewhere as JSON or XML. There’s absolutely no need to do that with a 3GL, as SQL can already do it, much more concisely.

      If the data is consumed in the middleware, that’s a different story, in case of which I have less of an opinion.

    2. The Facade pattern is your friend. I did that kind of stuff a few years ago on a legacy code base, the datamodel used tightly-packed binary structures and I wrote Facade classes above it to hide all the bit-twiddling code behind accessors. It worked great.

  3. “So, you’re writing a service that produces some JSON from your database model.”
    Is that all, really? No business logic, no external service to call? Well in that case sure, the DB can do all the work. But what’s the point of such a “service”?
    “You don’t need middleware abstraction and mapping”
    Ah, I see now.
    Stay tuned for the next post: How to return HTML and CSS from an SQL query :) “So you’re using a web framework? Just stop it right there!” :D

    1. Look. I’m not advocating all or nothing solutions. Middleware has lots of uses. But very often, it does not, and these myriad of lines of code that just shovel around stupid data between different representations add no value at all, but prevent developer productivity, and more often than not, query performance.

      It’s perfectly possible to have 20 services with business logic in middleware next to 20 other services with no logic in middleware

  4. I read that using the voice of the guy that does the voice over for Cinema Sins.
    Thats an interesting idea, I think I will give it a try sometime.

  5. That’s what I did last years on postgres. I’m with you about general approach. There is a drawback to take into acocunt: “Indexes”. In my case Indexes works on jsonb in some situations but does not cover
    all kind of 1-n queries (when you have a multi-nested json).
    If your json i.e. represent a complex nested item and you need to extract some data to have a search grid on gui you need a fast query (i.e. server side filtering) to extract some data flattened at some level and if indexes does not work it’s impossible to use it due performance issues.
    My approach was to emerge some fields in a slave (not master data) table to represent something like a materialized view to achieve performance requirements when indexes are not usable.

    1. I’m not sure if we’re talking about the same thing here. This article doesn’t advocate storing data as JSON. It advocates using SQL to map relational (normalised) data to JSON when querying. Of course, the two things can be combined, and of course, each vendor has their limits.

      However, I’m not sure if this particular set of problems arising from these limits is related to the mapping requirement that this article talks about, or am I perhaps missing something?

  6. Hello Lukas!

    Thank you, for your interesting article.
    So you are suggesting to write complex SQL queries to generate JSON. Okay.

    You, as the author of jooq, told us to use jooq with all the advantages of having a typed Java API and not to write SQL statements as text.

    But why is the json output of jooq so “hard” to handle on the client side?
    I would love to see “better” JSON output, namely as an array of records and then key/value pairs, like MOST of the other API’s are working. This way, we could combine complex statements with jooq and outputting a reasonable JSON format for REST APIs.

    Thank you!

    1. I would love to see “better” JSON output, namely as an array of records and then key/value pairs, like MOST of the other API’s are working. This way, we could combine complex statements with jooq and outputting a reasonable JSON format for REST APIs.

      jOOQ 3.12 finally supports some standardised JSON API, including:

      jsonArray()
      jsonEntry()
      jsonObject()

      And we’re working on adding more API and more vendor support. Would love to hear your feedback on what the priorities could be for you, specifically, here or as feature requests: https://github.com/jOOQ/jOOQ/issues/new/choose

    2. I completely agree with this. I would rather have jooq do normal sql queries and return modeled classes that can be worked on and serialized/deserialized to JSON (ex via Google’s GSON library). It’s a shame that the generated jooq model classes weren’t made gson friendly (or at least an accessible member). It would make it so API projects can embed jooq generated classes into a final API to be later serialized to JSON (and if imported as a library on the client side, the ability to deserialize back to classes). This would give middleware full control on querying data, including the ability to work on it, the ability to quickly make API projects ready for client code, and the ability to build a custom JSON APIs out of jooq classes with very little effort, resulting in better kept Middleware wrapped around jooq and gson.

    3. Why aren’t jooqs generated model / classes gson friendly? I’d prefer that over returning JSON from SQL.

      There’s a much better use case to having Middleware that can query a database as well as other sources to produce a final JSON API, backed by Java classes so it can be packaged as a server/client library for JSON de/serialization. Something like GSON with the backing classes already generated is a hot answer. It’s actually what I thought I was getting, but wasn’t.

      If the jooq generated model (or at least a package in the model) was gson friendly it would solve the JSON packaging issue, create both client and server code and allow a developer to wrap jooq strait into as a JSON service with little effort using something like GSON.

      1. The point the article was trying to make (how easy it is to nest collections directly in SQL) would still not be addressed…

        I’m sure we can improve integrations with libraries like gson. May I invite you to discuss your feature request with some details about what you’re trying to do, and why the status quo doesn’t work for you on github? https://github.com/jOOQ/jOOQ/issues/new/choose

        I’m particularly interested in a more formal description of what it means to be “gson friendly”.

  7. Hey Lukas,
    thanks for the awesome post.
    I did the same stuff with PostgreSQL in my last project instead of the DDD you’ve described above and absolutely agreed with you. So much OOP overhead can be reduced by just some aggregate functions!
    Happy to see more interesting posts from you. :)

    -> So you are suggesting to write complex SQL queries to generate JSON. Okay.
    I’ve started to work more intensive with PostgreSQL about six weeks ago and would not say it’s complicated to aggregate JSON output using SQL.

    1. It really isn’t complicated, indeed. All that opposition is just cargo culting. Of course, OO has its merits, but Java, specifically, is making a lot of things very complicated, and a lot of ceremony is being done for ceremony’s sake only…

      Glad to hear you’ve been successful with this approach!

  8. Hi Lucas,
    Great article. I was using this technique for years with SQL Server, MySQL, and PostgreSQL, all solutions running in production for years. Firs thing is to convince your management, I say that it’s same SQL code and data, just packaged differently. Now, think about database as a service: single entry point, JSON/XML in/out, dynamic routing, consistent data format.
    My setup is: 1. Create schema ‘service’. 2. Create entry point procedure ‘service.process’; it does (a) error handling and (b) routing to other procedures.
    Entry point procedure has three parameters: who (like token), what (action string like ‘user.add’), data package (XML/JSON data elements needed to execute the action)
    Bottom line is that UI developers love consuming JSON/XML data, as they no longer need to build ORM or DAL solutions, no code to map parameters and output for hundreds of various stored procedures. Developers use de-serialization from JSON/XML to simple Java, .Net, JS, etc. data structures and work the those in the app.
    Moving from one database to another is mostly change for vendor specific syntax.
    One additional feature worth mentioning is that your output JSON/XML structure may contain additional instructions for app/web server, like setting cookie, sending email, or saving this output in cache, see ‘session_create’ in my code:
    https://github.com/latomcus/api-platform/blob/master/sql/mssql.sql
    It would be great to popularize this design pattern and teach more developers to use it.
    Thank you!

  9. I always think that we don’t need mapping and middle layer when fetching data from database and try to bypass them and send data to controller directly. But to write data to database, it’s reasonable to have middle layer and business logic check. especially in CRUD systems.

    I thought this technique is illegal but now glad to see your great post .
    Thanks lukaseder

  10. I’m trying to apply what you’re suggesting with jOOQ but today, I don’t think it’s possible strongly typed. How would you do it today? Pure SQL with jOOQ? Mix of strong-typed and SQL (I haven’t succeeded on this one)?

    I’m convinced that our application architecture are overly complicated and would love to simplify ours. Thanks in advance for your answer

    1. Granted, we’re not entirely there yet. The missing feature is this one here: https://github.com/jOOQ/jOOQ/issues/3884, and its emulations using XML/JSON. Another option would be to resort to using views for more complex cases.

      There’s also a middle ground. What would be an example of what you’re trying/hoping to do? And with what RDBMS?

        1. Well, you said type safety for nested collections, and that will be best implemented through MULTISET and its emulations. Unfortunately, a full elaboration would be a bit out of scope here.

          I don’t know what “middle ground” would look like for you. This turns into quite a hypothetical discussion, starting from the premise “it should be better” (or “I am convinced” in your words), but without a clear goal.

          Again, we don’t have all the tools yet to make SQL as imagined by the third manifesto and the SQL standard MULTISET operator and others (nested collections) as powerful as they should be. Without them, everyone is just discussing workarounds, as is the stack overflow question you’ve linked, as is this article…

          We will also natively support more JSON operators, and you can already add support yourself via plain SQL templating.

          1. Sorry if I come up as vague or hypothetical, that’s not the case. I’m actually trying to apply your article on a product I’m building (because I think it’s worth a shot). I finally have a working solution. I wondered if I could have done a better use of jOOQ. In my query, the json_agg and json_build_object are not supported by jOOQ. Could (/should) I use jOOQ more heavily?
            I have the SQL query below in a dedicated file and then I’m using jOOQ plain SQL templating and executing the query with jOOQ:

            with
              fields as (
                select nf.note_id as note_id,
                       json_agg(json_build_object(
                           'noteTypeField', nf.note_type_field_id,
                           'text', nf.text
                         ))       as json
                from (select * from note_field order by id) nf
                group by nf.note_id
              ),
              questions as (
                select q.note_id as note_id, coalesce(json_agg(q.question_definition_id), '[]'::json) as json
                from (select * from question order by id) q
                group by q.note_id
              ),
              notes as (
                select json_build_object(
                           'id', n.id,
                           'noteType', n.note_type_id,
                           'fields', f.json,
                           'questions', q.json
                         ) as json
                from (select note.* from note order by id) n
                       left join fields f on n.id = f.note_id
                       left join questions q on n.id = q.note_id
                where {0} and {1}
              )
            select json_build_object('notes', coalesce(json_agg(notes.json), '[]'::json))
            from notes
            
            1. You certainly could use jOOQ more heavily, but should you? I don’t know :)

              You can easily add support for any missing feature through plain SQL templating as I’ve mentioned. What I meant by that is not to put the entire thing in a single template, but to put the missing functions in individual templates. You can write a library like this:

              static Field<JSONB> jsonBuildObject(...) {
                  return DSL.field("json_build_object({0}, {1}, ...)", SQLDataType.JSONB, ...);
              }
              
              static Field<JSONB> jsonAgg(...) {
                  return DSL.field("json_agg({0}, {1}, ...)", SQLDataType.JSONB, ...);
              }
              

              And now, you have (almost) native jOOQ support for those missing things.

              The benefit of using jOOQ is always: You get the entirety of the jOOQ tool chain for every bit of SQL you’re writing in jOOQ, including the possibility of writing SQL dynamically, to transform SQL (e.g. for multi tenancy, row level security, etc.), to better log SQL, format SQL, translate it to other dialects, have auto completion in your IDE, have your build fail when you rename columns, etc. etc.

              If that’s what you’re already doing otherwise, then adding support for 1-2 missing PostgreSQL specific functions using the above templates is a quick win.

  11. Late to the party but I LOVE this and will start using it. Thank you.

    One thing not mentioned I think is that it must have a positive runtime performance benefits;
    * less network traffic (no duplications), though maybe the JSON/XML markup overhead outweighs the reduced network traffic)
    * less churn in the middle tier as there’s no ResultSet to JSON/XML munging, just return it to the client without all the hateful crap you highlight. Less CPU and less RAM (ResultSet -> DTO -> JSON)

    Vastly reduced development effort and reduced runtime effort. Perhaps good for the environment as well :) Excellent post. Thanks again.

    1. Just in case your target document structure isn’t actual JSON/XML, read this as well, first! https://blog.jooq.org/jooq-3-15s-new-multiset-operator-will-change-how-you-think-about-sql/

      I’m positive that there’s a lot of potential for gains with the JSON or XML format, though there might also be some overhead. Most database network protocols are binary, whereas these formats are text based. Especially the XML format is quite wasteful, so I guess it depends.

      The reason why I haven’t mentioned these things yet in past blog posts, is because I want to thoroughly benchmark this on numerous RDBMS with actual data (e.g. the sakila database: https://www.jooq.org/sakila)

Leave a Reply