Please, Run That Calculation in Your RDBMS


There’s one thing that you can do terribly wrong when working with RDBMS. And that thing is not running your calculations in the database, when you should.

We’re not advocating to blindly move all business logic into the database, but when I see a Stack Overflow question like this, I feel the urge to gently remind you of the second item in our popular 10 Common Mistakes Java Developers Make When Writing SQL.

Gently reminding you of the "right way"

Gently reminding you of the second item.
Some background on this illustration here (in German).tweet this

The Stack Overflow question essentially boils down to this (liberally quoted):

From the following medium-sized table, I wish to count the number of documents with status 0 or 1 per application ID:

AppID | DocID | DocStatus 
------+-------+----------
1     | 100   | 0
1     | 101   | 1    
2     | 200   | 0    
2     | 300   | 1
...   | ...   | ...

Should I use Hibernate for that?

And the answer: NO! Don’t use Hibernate for that (unless you mean native querying). You should use SQL for that. Es-Queue-El! You have so many trivial options to make your SQL Server help you run this query in a fraction of the time it would take if you loaded all that data into Java memory before aggregating!

For instance (using SQL Server):

Using GROUP BY

This is the most trivial one, but it might not return result in exactly the way you wanted, i.e. different aggregation results are in different rows:

SELECT [AppID], [DocStatus], count(*)
FROM [MyTable]
GROUP BY [AppID], [DocStatus]

Example on SQLFiddle, returning something like

| APPID | DOCSTATUS | COLUMN_2 |
|-------|-----------|----------|
|     1 |         0 |        2 |
|     2 |         0 |        3 |
|     1 |         1 |        3 |
|     2 |         1 |        2 |

Using nested selects

This is probably the solution that this particular user was looking for. They probably want each aggregation in a separate column, and one very generic way to achieve this is by using nested selects. Note that this solution might prove to be a bit slow in some databases that have a hard time optimising these things

SELECT [AppID],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 0) [Status_0],
       (SELECT count(*) FROM [MyTable] [t2]
        WHERE [t1].[AppID] = [t2].[AppID]
        AND [DocStatus] = 1) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, returning something like

| APPID | STATUS_0 | STATUS_1 |
|-------|----------|----------|
|     1 |        2 |        3 |
|     2 |        3 |        2 |

Using SUM()

This solution is probably the optimal one. It is equivalent to the previous one with nested selects, although it only works for simple queries, whereas the nested selects version is more versatile.

SELECT [AppID],
       SUM(IIF([DocStatus] = 0, 1, 0)) [Status_0],
       SUM(IIF([DocStatus] = 1, 1, 0)) [Status_1]
FROM [MyTable] [t1]
GROUP BY [AppID]

Example on SQLFiddle, same result as before

Using PIVOT

This solution is for the SQL Aficionados among yourselves. It uses the T-SQL PIVOT clause!

SELECT [AppID], [0], [1]
FROM (
    SELECT [AppID], [DocStatus]
    FROM [MyTable]
) [t]
PIVOT (
    count([DocStatus]) 
    FOR [DocStatus] 
    IN ([0], [1])
) [pvt]

SQL aficionados use PIVOT tweet this

Example on SQLFiddle, same result as before

Conclusion

You may freely choose your weapon among the above suggestions, and I’m sure there are more alternatives. All of them will outperform any Java-based aggregation implementation by orders of magnitude, even for trivially small data sets for sure. We’ll say this time and again, and we’ll quote Gavin King time and again for the same thing:

Just because you’re using Hibernate, doesn’t mean you have to use it for everything. A point I’ve been making for about ten years now.

And in our words:

Use SQL whenever appropriate! And that is much more often than you might think!

Tags: , , ,

8 responses to “Please, Run That Calculation in Your RDBMS”

  1. Mladen Adamovic says :

    BTW, are you aware that mixing hibernate/JPA/JDO and update/delete SQL queries might be quite complicated due to cache invalidation? you never mentioned that in your posts about “no hibernate”.

    cache invalidating is much simpler in fjorm and I guess in jooq as well since you are doing SQL translator which is much better than bloated hibernate (yet another hibernate hater here).

    • lukaseder says :

      yet another hibernate hater here

      We’re no Hibernate haters. We just advocate that you should know your tools. Unfortunately, Hibernate’s cache is used too naively by too many developers…

  2. Stilgar says :

    So can’t you generate queries like these via Hibernate? I don’t have experience with Java ORMs but in the .NET world the ORMs have no problem generating SQL like this one and not pulling everything in memory to do a count.

    • lukaseder says :

      You probably can do some simple analytics with Hibernate / HQL. It’s certainly possible for the GROUP BY query.

      Could you show an example for the other cases?

      • g2-b6451444b49d128e41971ce1dffa9600 says :

        Here is what the nested selects and sum queries look like using LINQ

        from doc in DocsTables
        group doc by doc.AppID into g
        select new 
            {
                AppID = g.Key,
                Status0Count = g.Count(d=> d.DocStatus == 0),
                Status1Count = g.Count(d=>d.DocStatus == 1)
            }
        

        Generates the following SQL:

        -- Region Parameters
        DECLARE @p0 Int = 0
        DECLARE @p1 Int = 1
        -- EndRegion
        SELECT [t1].[AppID], (
            SELECT COUNT(*)
            FROM [DocsTable] AS [t2]
            WHERE ([t2].[DocStatus] = @p0) AND ([t1].[AppID] = [t2].[AppID])
            ) AS [Status0Count], (
            SELECT COUNT(*)
            FROM [DocsTable] AS [t3]
            WHERE ([t3].[DocStatus] = @p1) AND ([t1].[AppID] = [t3].[AppID])
            ) AS [Status1Count]
        FROM (
            SELECT [t0].[AppID]
            FROM [DocsTable] AS [t0]
            GROUP BY [t0].[AppID]
            ) AS [t1]
        
        from doc in DocsTables
        group doc by doc.AppID into g
        select new 
            {
                AppID = g.Key,
                Status0Count = g.Sum(d=> d.DocStatus == 0 ? 1 : 0),
                Status1Count = g.Sum(d=>d.DocStatus == 1 ? 1 : 0)
            }
        

        generates the following SQL

        -- Region Parameters
        DECLARE @p0 Int = 0
        DECLARE @p1 Int = 1
        DECLARE @p2 Int = 0
        DECLARE @p3 Int = 1
        DECLARE @p4 Int = 1
        DECLARE @p5 Int = 0
        -- EndRegion
        SELECT SUM(
            (CASE 
                WHEN [t0].[DocStatus] = @p0 THEN @p1
                ELSE @p2
             END)) AS [Status0Count], SUM(
            (CASE 
                WHEN [t0].[DocStatus] = @p3 THEN @p4
                ELSE @p5
             END)) AS [Status1Count], [t0].[AppID]
        FROM [DocsTable] AS [t0]
        GROUP BY [t0].[AppID]
        

        Several points
        1. I am using LINQPad to write this. I don’t even know what ORM LINQPad is using I just assume it is LINQ to SQL. It is entirely possible that Entity Framework or NHibernate generate better or worse SQL
        2. I am using the query syntax for LINQ. While this is surely convertible to normal method calls as soon as I wrote the two queries I realized that it would take me quite a bit of time to convert them to extension method syntax. Since Java lacks the query syntax and its fluent syntax is less expressive due to the lack of lambdas I kind of agree with your original suggestion that it will be better to write this as SQL. On the other hand isn’t HQL supposed to solve this problem?
        3. The sum example generates case statements instead of IFF. While it looks to me that they are semantically the same I don’t know if one of them is more efficient.
        4. I doubt it is possible to generate something as specific as PIVOT. To be honest I am not even sure what it does.

        • lukaseder says :

          Thank you very much for sharing this! Regarding your points:

          1. I don’t think that generated SQL is too bad from a performance perspective.

          2. Maybe Java 8′s Streams API and lambda expressions lend themselves to something comparable to LINQ. I’m hoping to have a relevant guest post on this subject pretty soon on this blog. It involves dynamic bytecode rewriting to transform lambda and other expressions into SQL – and hopefully also into jOOQ API calls.

          2. HQL can do a lot of things, agreed. But its syntax is less expressive than that of SQL, while it adds complexity on an ORM level. When expressing HQL / JPQL queries, you’re not actually querying your database tables, but you’re querying your mapped entities. Our argument here is that this will force you to (deeply) understand HQL and SQL, instead of just writing SQL for the calculation.

          3. I think that more sophisticated databases can transform IFF and CASE and other expressions into each other.

          4. PIVOT is quite neat. It transposes rows into columns. Currently, only Oracle and SQL Server support PIVOT.

  3. khushnood Abbas says :

    HI,
    thanks for nice and informative blog.I also favor SQL for complex queries, but people use ORM to make it generic(in case if we switch to different RDBMS). I just wanted to know upto how long we can use native sql using hibernate because all the databases have some difference in there support for SQL. I mean is there any standard SQL that will support all the database.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,083 other followers

%d bloggers like this: