The Cost of JDBC Server Roundtrips

Or: Move That Loop into the Server Already!

This article will illustrate the significance of something that I always thought to be common sense, but I keep seeing people getting this (very) wrong in their productive systems. Chances are, in fact, that most applications out there suffer from this performance problem – and the fix is really easy.

Transferring data in bulk vs. transferring it row by row

This blog post is inspired by a recent Stack Overflow question that was asking about how to call Oracle’s DBMS_OUTPUT.GET_LINES from JDBC. The answer to that specific question can also be seen in a previous blog post here. This time, I don’t want to discuss that particular Oracle-specific technical problem, but the performance aspect of whether to call:

DBMS_OUTPUT.GET_LINES: Which allows for fetching a bulk of server output into an array
DBMS_OUTPUT.GET_LINE: Which fetches a single line of server output into a string

While I will continue to discuss this Oracle-specific functionality, this blog post is in no way strictly related to Oracle. It is generally applicable (and again, it should be common sense) for any database, and in fact, any client-server architecture, or even any distributed architecture. The solution to such performance problems will almost always be that transferring a bulk of data is better than transferring it row by row. Let’s run a benchmark! The full benchmark logic can be seen in this gist. It includes the boring parts of actually calling the GET_LINE[S] procedures. The beef of the benchmark is this:


int max = 50;
long[] getLines = new long[max];
long[] getLine = new long[max];

try (Connection c = DriverManager.getConnection(url, properties);
    Statement s = c.createStatement()) {

    for (int warmup = 0; warmup &lt; 2; warmup++) {
        for (int i = 0; i &lt; max; i++) {
            s.executeUpdate(&quot;begin dbms_output.enable(); end;&quot;);
            String sql =
                &quot;begin &quot;
              + &quot;for i in 1 .. 100 loop &quot;
              + &quot;dbms_output.put_line('Message ' || i); &quot;
              + &quot;end loop; &quot;
              + &quot;end;&quot;;
            long t1 = System.nanoTime();
            logGetLines(c, 100, () -&gt; s.executeUpdate(sql));
            long t2 = System.nanoTime();
            logGetLine(c, 100, () -&gt; s.executeUpdate(sql));
            long t3 = System.nanoTime();
            s.executeUpdate(&quot;begin dbms_output.disable(); end;&quot;);

            if (warmup &gt; 0) {
                getLines[i] = t2 - t1;
                getLine[i] = t3 - t2;
            }
        }
    }
}

System.out.println(LongStream.of(getLines).summaryStatistics());
System.out.println(LongStream.of(getLine).summaryStatistics());

What does it do in prose?

It contains a warmup loop whose first iteration doesn’t contribute to our measurements (this is always a good idea)
It runs the benchmarked logic 50 times
It generates 100 DBMS_OUTPUT.PUT_LINE messages for each run in an anonymous PL/SQL loop …
… and then fetches those 100 messages immediately with either 1 call to GET_LINES or 100 calls to GET_LINE
Finally, all the stored execution times are aggregated and printed out conveniently with Java 8 Stream’s summary statistics feature

So, in both cases, we’re generating and fetching 5000 messages. Both methods using GET_LINES and GET_LINE respectively are functionally equivalent, i.e. the only difference is performance. Again the full benchmark can be seen here (which also benchmarks the effect of JDBC fetchSize, which is none, in this case). The results are devastating:

{count=50, sum=  69120455, min= 1067521, average= 1382409.100000, max= 2454614}
{count=50, sum=2088201423, min=33737827, average=41764028.460000, max=64498375}

We’re gaining a factor of 30x in this benchmark run on my machine. The actual results may vary depending on your hardware and software (e.g. I’m running Oracle 12cR2 in Docker), but regardless of the setup, the results are always very significant.

Note that caching the prepared statement in the client yields better results (see the gist), but nowhere near as good as moving the loop to the server

Does this mean that GET_LINE is slow?

When looking at benchmark results, we must always be very very careful not to draw the wrong conclusions. There can be many reasons why benchmarks show differences in performance. One simple (albeit improbable) reason could be that the GET_LINE implementation simply sucks. So, I’ve tried to re-implement this benchmark in pure PL/SQL. The full benchmark can be seen here. The beef of it is this:


FOR r IN 1..5 LOOP
  v_ts := SYSTIMESTAMP;
      
  FOR i IN 1..v_repeat LOOP
    m();
     
    v_i := v_max;
    dbms_output.get_lines(v_array, v_i);
  END LOOP;
      
  INSERT INTO results VALUES (1, (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    m();
    
    FOR j IN 1 .. v_max LOOP
      dbms_output.get_line(v_string, v_i);
    END LOOP;
  END LOOP;
     
  INSERT INTO results VALUES (2, (SYSTIMESTAMP - v_ts));
END LOOP;

Where m() is:


PROCEDURE m IS BEGIN
  FOR i IN 1 .. v_max LOOP 
    dbms_output.put_line('Message ' || i);
  END LOOP;
END m;

The results are now rather different:

stmt    sum     avg      min     max
1       0.0609  0.01218  0.0073  0.0303
2       0.0333  0.00666  0.0063  0.007

This time, calling GET_LINE individually seems to have been 2x faster than the GET_LINES version. Again, it is important not to draw the wrong conclusions! This could be due to:

GET_LINES allocating an additional array copy of the original lines, which resides in the PGA, might be costly
GET_LINE might have profited from some additional optimisation because we’re never actually consuming the result in the benchmark

But the one thing we can conclude with certainty is: There’s no problem in GET_LINE, so calling it is not inherently worse than calling GET_LINES.

Which brings us back to the JDBC calls

While guess work and hypotheses are usually dangerous, in this case I’m certain of the reason why the JDBC based approach shows such drastic differences. Just watch this excellent talk by Toon Koppelaars from Oracle “NoPLSql and Thick Database Approaches with Toon Koppelaars”, where he explains this with some impressive flame graphs:

The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:

JVM overhead
JDBC logic
Network overhead
Various “outer” layers inside the Oracle database
Oracle’s API layers to get into the SQL and PL/SQL execution engines
The actual code running in the PL/SQL engine

In Toon’s talk (which again, you should definitely watch), the examples are running SQL code, not PL/SQL code, but the results are the same. The actual logic is relatively cheap inside of the database (as we’ve seen in the PL/SQL only benchmark), but the overhead is significant when calling database logic from outside the database. Thus: It is very important to minimise that overhead There are two ways to minimise that overhead:

The super hard way: Change and / or tweak the API technology, e.g. in Oracle’s case, using the C/OCI bindings can be much faster than JDBC
The easy way: Just move some data collection logic into the database and fetch data in bulk

Let me repeat this one more time:

Fetch (or send) data in bulk

… it’s almost always faster than processing things row-by-row (or as the Oracle folks call it: “slow-by-slow”). And it does not matter at all, if that database logic is written in SQL or in a procedural language. The point is that accessing objects over the network (any network) is expensive, so you should minimise the access calls if ever possible. As I’ve tweeted recently:

When you have little data and very complex logic, it's often better to move the data to the logic.

When you have tons of data, it's often better to move the logic (simple or complex ) to the data.

The latter is done with SQL and/or stored procedures. Start using these things!
— Lukas Eder (@lukaseder) December 14, 2017

When calling logic over the network (any network), we should move logic to the data, not data to the logic. When working with RDBMS, we’re doing this through SQL (preferrably) or if SQL doesn’t suffice, we resort to using stored procedures. When working with HTTP, we’re doing this with – well, it doesn’t have a name, but we should prefer making few physical HTTP calls that aggregate several logical API calls in order to transfer a lot of data in bulk. When working with “map reduce” or “serverless” etc technology, we’re calling this “functions” or “lambdas”, which are just fancy, more modern names for stored procedures.

Conclusion

I’m not an expert in how to design complex distributed systems. This stuff is really hard. But I’ve spent many years working with RDBMS, which are also, in a very simple way, distributed systems (data on one server, client logic on another one). A very significant amount of performance problems with RDBMS is related to the simple fact of clients making way too many calls to the database for what could be implemented in a single SQL query. Within the database, once your logic has reached the kernel, stuff gets executed really really fast. Adding more logic to a query is going to cause far less trouble than adding more queries. Does your application take into account these things? Especially, if you’re using an ORM that generates the SQL for you, does it generate the right amount of queries, or are you suffering from “N+1 problems”? The main reason why ORM-based systems tend to be slow is because developers are not aware of the SQL their ORMs generate, e.g. due to excessive lazy loading, when a single SQL (or JPQL) query would have been a much better choice. It’s not the ORM’s fault. Most ORMs can get this right. It’s the developer’s responsibility to think about where the logic should be executed. The rule of thumb is: If you’re looping in the client to fetch individual things from the same server, you’re doing it wrong. Move the loop into the server. And by doing so, you’ve written your first (ghasp) “stored procedure”. You’ll write many more, and you’ll love it, once you realise how much speed you’re gaining. Or, in other words:

Move the source, Luke.
— Brian Fitzgerald (@ExaGridDba) December 19, 2017

Update: Some criticism from the reddit discussion of this article

/u/cogman10 made good points in his comment warning about batching “too big” workloads, which is perfectly correct when batching write heavy tasks. Large batches may increase the contention inside of your database. If you’re working in an MVCC environment (such as Oracle), having transactions that span millions of updates will put a lot of pressure on the UNDO/REDO log, and all other sessions reading from the same data will feel that pain. Also, when working with HTTP, beware of the fact that batches are harder to cache than individual requests. This article made the assumption that HTTP requests are:

Authorised – i.e. caching doesn’t even make sense as the authorisation might be revoked or the session terminated
Operating on similar resources, e.g. fetching a bulk of IDs in one go might be more sensible than fetching each ID individuall

… of course, as always, don’t follow advice you find on the internet blindly :) This article illustrated a common mistake. The fix isn’t always as simple as illustrated here, but often it really is.

7 thoughts on “The Cost of JDBC Server Roundtrips”

rponte says:

December 19, 2017 at 18:40

Great article,

This is a common issue I’ve been seen in almost all projects I passed through. In fact, I’ve done it sometimes when I was completly sure about the amount of data I was going to handle with, but always paying attention to this kind of issue! Usually, I favor sendind/retrieving lot of data in bulk.

And as you well said, one of the top performance problems with most ORMs (Hibernate for example) is N+1 Select!

Thanks, Lukas.

Loading...

cblin says:

December 27, 2017 at 11:18

To me, perf problems mostly occurs with ORMs that hide the SQL queries from the developper (resulting in too many calls to the db with the famous N+1 select or with multiple inserts/updates/deletes in a client-side loop).

However, I dont think that moving logic in the DB is the good idea (harder to deploy, harder to test, arguably harder to develop)

Replacing ORMs with “handcrafted” SQL (via query builders) and replacing loops of insert/updates/deletes by an array-binded query are usually enough in my experience (the 20% remaining needs schema changes, denormalisation, tricks on the db, …).

Also, most developpers don’t know the powerful SQL constrcuts like “merge into” or analytical queries (lead/lag especially) and don’t even know how to read an explain plan.

If you still have a loop in the client-side after that, yes you can write a stored procedure …but a fairy probably dies somewhere 🙂

That was my 2 cents, thanks for the good post !

Loading...

1. lukaseder says:
  
  December 27, 2017 at 19:45
  
  Ehm…:
  
  However, I dont think that moving logic in the DB is the good idea (harder to deploy, harder to test, arguably harder to develop)
  
  Yet, and then…:
  
  Replacing ORMs with “handcrafted” SQL (via query builders) and replacing loops of insert/updates/deletes by an array-binded query are usually enough in my experience (the 20% remaining needs schema changes, denormalisation, tricks on the db, …).
  
  How is writing SQL not moving logic into the database?
  
  Also, most developpers don’t know the powerful SQL constrcuts like “merge into” or analytical queries (lead/lag especially) and don’t even know how to read an explain plan.
  
  First class “moving logic into the database” right there!
  
  Loading...
  
  1. cblin says:
    
    December 28, 2017 at 09:10
    
    By “moving logic into the database”, I was understanding that you talked about stored procedure only.
    
    Quote : “And by doing so, you’ve written your first (ghasp) “stored procedure”. You’ll write many more, and you’ll love it, once you realise how much speed you’re gaining.”
    
    To me, writing correct SQL is not “moving logic into the database”. Writing a stored procedure is dispatching logic between the app and the db, this is why I think it is not a good idea to use them.
    
    Loading...
    
    1. lukaseder says:
      
      December 28, 2017 at 09:33
      
      Why does it matter whether the logic is written in a 4GL (SQL) or a 3GL (PL/SQL et al) to you? The point is, the logic is executed on the server, regardless in what language.
      
      The last paragraph that you’ve quoted is a metaphor to indicate that it doesn’t matter what technology you’re using, whenever you’re executing logic on the server, you’re running a “stored procedure” (thus the double quotes).
      
      Loading...
      
      1. Perso says:
        
        January 3, 2018 at 10:18
        
        “The point is, the logic is executed on the server, regardless in what language.”
        => to me, the point is that apps are multiple servers so even a server is a client of another server.
        
        So what does matter in the end is that the business logic is written as much as possible in an easy to develop, deploy and maintain technology.
        Correct SQL in a server-side language is such a technology.
        PL/SQL et al are no such technologies.
        
        So writing PL/SQL is kind of last resort option : yes it is surely faster in many situations (and some situations required it) but it is so hard to develop and maintain that I keep that to a minimum.
        
        Loading...
        
        
        lukaseder says:
        
        January 3, 2018 at 10:20
        
        => to me, the point is that apps are multiple servers so even a server is a client of another server.
        
        Have you read the article? :-) I’m saying the same thing.
        
        PL/SQL et al are no such technologies.
        
        That’s dogmatic, and thus not very helpful. My experience totally doesn’t match yours.
        
        Loading...

The Cost of JDBC Server Roundtrips

Or: Move That Loop into the Server Already!

Transferring data in bulk vs. transferring it row by row

Does this mean that GET_LINE is slow?

Which brings us back to the JDBC calls

Conclusion

Update: Some criticism from the reddit discussion of this article

Like this:

Published by lukaseder

7 thoughts on “The Cost of JDBC Server Roundtrips”

Leave a Reply to lukasederCancel reply

Or: Move That Loop into the Server Already!

Transferring data in bulk vs. transferring it row by row

Does this mean that GET_LINE is slow?

Which brings us back to the JDBC calls

Conclusion

Update: Some criticism from the reddit discussion of this article

Like this:

Published by lukaseder

7 thoughts on “The Cost of JDBC Server Roundtrips”

Leave a Reply to lukasederCancel reply

Discover more from Java, SQL and jOOQ.