Please, Run That Calculation in Your RDBMS

There’s one thing that you can do terribly wrong when working with RDBMS. And that thing is not running your calculations in the database, when you should.

We’re not advocating to blindly move all business logic into the database, but when I see a Stack Overflow question like this, I feel the urge to gently remind you of the second item in our popular 10 Common Mistakes Java Developers Make When Writing SQL.

Gently reminding you of the "right way"
Gently reminding you of the second item.
Some background on this illustration here (in German).tweet this

The Stack Overflow question essentially boils down to this (liberally quoted):

From the following medium-sized table, I wish to count the number of documents with status 0 or 1 per application ID:

AppID | DocID | DocStatus 
1     | 100   | 0
1     | 101   | 1    
2     | 200   | 0    
2     | 300   | 1
...   | ...   | ...

Should I use Hibernate for that?

And the answer: NO! Don’t use Hibernate for that (unless you mean native querying). You should use SQL for that. Es-Queue-El! You have so many trivial options to make your SQL Server help you run this query in a fraction of the time it would take if you loaded all that data into Java memory before aggregating!

For instance (using SQL Server):


This is the most trivial one, but it might not return result in exactly the way you wanted, i.e. different aggregation results are in different rows:

SELECT [AppID], [DocStatus], count(*)
FROM [MyTable]
GROUP BY [AppID], [DocStatus]

Example on SQLFiddle, returning something like

|     1 |         0 |        2 |
|     2 |         0 |        3 |
|     1 |         1 |        3 |
|     2 |         1 |        2 |

Using nested selects

This is probably the solution that this particular user was looking for. They probably want each aggregation in a separate column, and one very generic way to achieve this is by using nested selects. Note that this solution might prove to be a bit slow in some databases that have a hard time optimising these things

       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 0) [Status_0],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 1) [Status_1]
FROM [MyTable] [t1]

Example on SQLFiddle, returning something like

|     1 |        2 |        3 |
|     2 |        3 |        2 |

Using SUM()

This solution is probably the optimal one. It is equivalent to the previous one with nested selects, although it only works for simple queries, whereas the nested selects version is more versatile.

       SUM(IIF([DocStatus] = 0, 1, 0)) [Status_0],
       SUM(IIF([DocStatus] = 1, 1, 0)) [Status_1]
FROM [MyTable] [t1]

Example on SQLFiddle, same result as before


This solution is for the SQL Aficionados among yourselves. It uses the T-SQL PIVOT clause!

SELECT [AppID], [0], [1]
    SELECT [AppID], [DocStatus]
    FROM [MyTable]
) [t]
    FOR [DocStatus] 
    IN ([0], [1])
) [pvt]

SQL aficionados use PIVOT tweet this

Example on SQLFiddle, same result as before


You may freely choose your weapon among the above suggestions, and I’m sure there are more alternatives. All of them will outperform any Java-based aggregation implementation by orders of magnitude, even for trivially small data sets for sure. We’ll say this time and again, and we’ll quote Gavin King time and again for the same thing:

Just because you’re using Hibernate, doesn’t mean you have to use it for everything. A point I’ve been making for about ten years now.

And in our words:

Use SQL whenever appropriate! And that is much more often than you might think!

8 thoughts on “Please, Run That Calculation in Your RDBMS

  1. BTW, are you aware that mixing hibernate/JPA/JDO and update/delete SQL queries might be quite complicated due to cache invalidation? you never mentioned that in your posts about “no hibernate”.

    cache invalidating is much simpler in fjorm and I guess in jooq as well since you are doing SQL translator which is much better than bloated hibernate (yet another hibernate hater here).

    1. yet another hibernate hater here

      We’re no Hibernate haters. We just advocate that you should know your tools. Unfortunately, Hibernate’s cache is used too naively by too many developers…

  2. So can’t you generate queries like these via Hibernate? I don’t have experience with Java ORMs but in the .NET world the ORMs have no problem generating SQL like this one and not pulling everything in memory to do a count.

    1. You probably can do some simple analytics with Hibernate / HQL. It’s certainly possible for the GROUP BY query.

      Could you show an example for the other cases?

      1. Here is what the nested selects and sum queries look like using LINQ

        from doc in DocsTables
        group doc by doc.AppID into g
        select new 
                AppID = g.Key,
                Status0Count = g.Count(d=> d.DocStatus == 0),
                Status1Count = g.Count(d=>d.DocStatus == 1)

        Generates the following SQL:

        -- Region Parameters
        DECLARE @p0 Int = 0
        DECLARE @p1 Int = 1
        -- EndRegion
        SELECT [t1].[AppID], (
            SELECT COUNT(*)
            FROM [DocsTable] AS [t2]
            WHERE ([t2].[DocStatus] = @p0) AND ([t1].[AppID] = [t2].[AppID])
            ) AS [Status0Count], (
            SELECT COUNT(*)
            FROM [DocsTable] AS [t3]
            WHERE ([t3].[DocStatus] = @p1) AND ([t1].[AppID] = [t3].[AppID])
            ) AS [Status1Count]
        FROM (
            SELECT [t0].[AppID]
            FROM [DocsTable] AS [t0]
            GROUP BY [t0].[AppID]
            ) AS [t1]
        from doc in DocsTables
        group doc by doc.AppID into g
        select new 
                AppID = g.Key,
                Status0Count = g.Sum(d=> d.DocStatus == 0 ? 1 : 0),
                Status1Count = g.Sum(d=>d.DocStatus == 1 ? 1 : 0)

        generates the following SQL

        -- Region Parameters
        DECLARE @p0 Int = 0
        DECLARE @p1 Int = 1
        DECLARE @p2 Int = 0
        DECLARE @p3 Int = 1
        DECLARE @p4 Int = 1
        DECLARE @p5 Int = 0
        -- EndRegion
        SELECT SUM(
                WHEN [t0].[DocStatus] = @p0 THEN @p1
                ELSE @p2
             END)) AS [Status0Count], SUM(
                WHEN [t0].[DocStatus] = @p3 THEN @p4
                ELSE @p5
             END)) AS [Status1Count], [t0].[AppID]
        FROM [DocsTable] AS [t0]
        GROUP BY [t0].[AppID]

        Several points
        1. I am using LINQPad to write this. I don’t even know what ORM LINQPad is using I just assume it is LINQ to SQL. It is entirely possible that Entity Framework or NHibernate generate better or worse SQL
        2. I am using the query syntax for LINQ. While this is surely convertible to normal method calls as soon as I wrote the two queries I realized that it would take me quite a bit of time to convert them to extension method syntax. Since Java lacks the query syntax and its fluent syntax is less expressive due to the lack of lambdas I kind of agree with your original suggestion that it will be better to write this as SQL. On the other hand isn’t HQL supposed to solve this problem?
        3. The sum example generates case statements instead of IFF. While it looks to me that they are semantically the same I don’t know if one of them is more efficient.
        4. I doubt it is possible to generate something as specific as PIVOT. To be honest I am not even sure what it does.

        1. Thank you very much for sharing this! Regarding your points:

          1. I don’t think that generated SQL is too bad from a performance perspective.

          2. Maybe Java 8’s Streams API and lambda expressions lend themselves to something comparable to LINQ. I’m hoping to have a relevant guest post on this subject pretty soon on this blog. It involves dynamic bytecode rewriting to transform lambda and other expressions into SQL – and hopefully also into jOOQ API calls.

          2. HQL can do a lot of things, agreed. But its syntax is less expressive than that of SQL, while it adds complexity on an ORM level. When expressing HQL / JPQL queries, you’re not actually querying your database tables, but you’re querying your mapped entities. Our argument here is that this will force you to (deeply) understand HQL and SQL, instead of just writing SQL for the calculation.

          3. I think that more sophisticated databases can transform IFF and CASE and other expressions into each other.

          4. PIVOT is quite neat. It transposes rows into columns. Currently, only Oracle and SQL Server support PIVOT.

  3. HI,
    thanks for nice and informative blog.I also favor SQL for complex queries, but people use ORM to make it generic(in case if we switch to different RDBMS). I just wanted to know upto how long we can use native sql using hibernate because all the databases have some difference in there support for SQL. I mean is there any standard SQL that will support all the database.

    1. I just wanted to know upto how long we can use native sql using hibernate because all the databases have some difference in there support for SQL.

      Good question! To my knowledge, HQL / JPQL make a guarantee that they can be transformed into any SQL dialect. However, in practice, things might be a bit more difficult as indicated in this blog post here:

      Also, HQL and JPQL are very limited in terms of query functionality, compared to SQL.

      jOOQ tackles the multi-RDBMS-support problem differently, by standardising the various SQL dialects into a single SQLesque DSL in Java. This allows for performing quite complex SQL transformation rules to accommodate most SQL databases, as indicated in this blog post:

      The whole jOOQ API is annotated with the @Support annotation. This can be used to see if a given SQL clause or function is supported in your SQL dialect. Take the bitwise AND operator, for instance. It is supported by (or emulated for) any of these SQL dialects:

      • CUBRID
      • DB2
      • Firebird
      • H2
      • HSQLDB
      • MariaDB
      • MySQL
      • Oracle
      • PostgreSQL
      • SQL Server
      • SQLite
      • Sybase Adaptive Server Enterprise
      • Sybase SQL Anywhare

      More SQL function compatibility examples can be seen here:

      Hope this helps,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.