Or: Move That Loop into the Server Already!
This article will illustrate the significance of something that I always thought to be common sense, but I keep seeing people getting this (very) wrong in their productive systems. Chances are, in fact, that most applications out there suffer from this performance problem – and the fix is really easy.Transferring data in bulk vs. transferring it row by row
This blog post is inspired by a recent Stack Overflow question that was asking about how to call Oracle’sDBMS_OUTPUT.GET_LINES
from JDBC. The answer to that specific question can also be seen in a previous blog post here.
This time, I don’t want to discuss that particular Oracle-specific technical problem, but the performance aspect of whether to call:
DBMS_OUTPUT.GET_LINES
: Which allows for fetching a bulk of server output into an arrayDBMS_OUTPUT.GET_LINE
: Which fetches a single line of server output into a string
GET_LINE[S]
procedures. The beef of the benchmark is this:
int max = 50;
long[] getLines = new long[max];
long[] getLine = new long[max];
try (Connection c = DriverManager.getConnection(url, properties);
Statement s = c.createStatement()) {
for (int warmup = 0; warmup < 2; warmup++) {
for (int i = 0; i < max; i++) {
s.executeUpdate("begin dbms_output.enable(); end;");
String sql =
"begin "
+ "for i in 1 .. 100 loop "
+ "dbms_output.put_line('Message ' || i); "
+ "end loop; "
+ "end;";
long t1 = System.nanoTime();
logGetLines(c, 100, () -> s.executeUpdate(sql));
long t2 = System.nanoTime();
logGetLine(c, 100, () -> s.executeUpdate(sql));
long t3 = System.nanoTime();
s.executeUpdate("begin dbms_output.disable(); end;");
if (warmup > 0) {
getLines[i] = t2 - t1;
getLine[i] = t3 - t2;
}
}
}
}
System.out.println(LongStream.of(getLines).summaryStatistics());
System.out.println(LongStream.of(getLine).summaryStatistics());
- It contains a warmup loop whose first iteration doesn’t contribute to our measurements (this is always a good idea)
- It runs the benchmarked logic 50 times
- It generates 100
DBMS_OUTPUT.PUT_LINE
messages for each run in an anonymous PL/SQL loop … - … and then fetches those 100 messages immediately with either 1 call to
GET_LINES
or 100 calls toGET_LINE
- Finally, all the stored execution times are aggregated and printed out conveniently with Java 8 Stream’s summary statistics feature
GET_LINES
and GET_LINE
respectively are functionally equivalent, i.e. the only difference is performance. Again the full benchmark can be seen here (which also benchmarks the effect of JDBC fetchSize, which is none, in this case).
The results are devastating:
{count=50, sum= 69120455, min= 1067521, average= 1382409.100000, max= 2454614} {count=50, sum=2088201423, min=33737827, average=41764028.460000, max=64498375}We’re gaining a factor of 30x in this benchmark run on my machine. The actual results may vary depending on your hardware and software (e.g. I’m running Oracle 12cR2 in Docker), but regardless of the setup, the results are always very significant.
Note that caching the prepared statement in the client yields better results (see the gist), but nowhere near as good as moving the loop to the server
Does this mean that GET_LINE is slow?
When looking at benchmark results, we must always be very very careful not to draw the wrong conclusions. There can be many reasons why benchmarks show differences in performance. One simple (albeit improbable) reason could be that theGET_LINE
implementation simply sucks.
So, I’ve tried to re-implement this benchmark in pure PL/SQL. The full benchmark can be seen here. The beef of it is this:
FOR r IN 1..5 LOOP
v_ts := SYSTIMESTAMP;
FOR i IN 1..v_repeat LOOP
m();
v_i := v_max;
dbms_output.get_lines(v_array, v_i);
END LOOP;
INSERT INTO results VALUES (1, (SYSTIMESTAMP - v_ts));
v_ts := SYSTIMESTAMP;
FOR i IN 1..v_repeat LOOP
m();
FOR j IN 1 .. v_max LOOP
dbms_output.get_line(v_string, v_i);
END LOOP;
END LOOP;
INSERT INTO results VALUES (2, (SYSTIMESTAMP - v_ts));
END LOOP;
PROCEDURE m IS BEGIN
FOR i IN 1 .. v_max LOOP
dbms_output.put_line('Message ' || i);
END LOOP;
END m;
stmt sum avg min max 1 0.0609 0.01218 0.0073 0.0303 2 0.0333 0.00666 0.0063 0.007This time, calling
GET_LINE
individually seems to have been 2x faster than the GET_LINES
version. Again, it is important not to draw the wrong conclusions! This could be due to:
GET_LINES
allocating an additional array copy of the original lines, which resides in the PGA, might be costlyGET_LINE
might have profited from some additional optimisation because we’re never actually consuming the result in the benchmark
GET_LINE
, so calling it is not inherently worse than calling GET_LINES
.
Which brings us back to the JDBC calls
While guess work and hypotheses are usually dangerous, in this case I’m certain of the reason why the JDBC based approach shows such drastic differences. Just watch this excellent talk by Toon Koppelaars from Oracle “NoPLSql and Thick Database Approaches with Toon Koppelaars”, where he explains this with some impressive flame graphs:
The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:
- JVM overhead
- JDBC logic
- Network overhead
- Various “outer” layers inside the Oracle database
- Oracle’s API layers to get into the SQL and PL/SQL execution engines
- The actual code running in the PL/SQL engine
- The super hard way: Change and / or tweak the API technology, e.g. in Oracle’s case, using the C/OCI bindings can be much faster than JDBC
- The easy way: Just move some data collection logic into the database and fetch data in bulk
Fetch (or send) data in bulk… it’s almost always faster than processing things row-by-row (or as the Oracle folks call it: “slow-by-slow”). And it does not matter at all, if that database logic is written in SQL or in a procedural language. The point is that accessing objects over the network (any network) is expensive, so you should minimise the access calls if ever possible. As I’ve tweeted recently: When calling logic over the network (any network), we should move logic to the data, not data to the logic. When working with RDBMS, we’re doing this through SQL (preferrably) or if SQL doesn’t suffice, we resort to using stored procedures. When working with HTTP, we’re doing this with – well, it doesn’t have a name, but we should prefer making few physical HTTP calls that aggregate several logical API calls in order to transfer a lot of data in bulk. When working with “map reduce” or “serverless” etc technology, we’re calling this “functions” or “lambdas”, which are just fancy, more modern names for stored procedures.
Conclusion
I’m not an expert in how to design complex distributed systems. This stuff is really hard. But I’ve spent many years working with RDBMS, which are also, in a very simple way, distributed systems (data on one server, client logic on another one). A very significant amount of performance problems with RDBMS is related to the simple fact of clients making way too many calls to the database for what could be implemented in a single SQL query. Within the database, once your logic has reached the kernel, stuff gets executed really really fast. Adding more logic to a query is going to cause far less trouble than adding more queries. Does your application take into account these things? Especially, if you’re using an ORM that generates the SQL for you, does it generate the right amount of queries, or are you suffering from “N+1 problems”? The main reason why ORM-based systems tend to be slow is because developers are not aware of the SQL their ORMs generate, e.g. due to excessive lazy loading, when a single SQL (or JPQL) query would have been a much better choice. It’s not the ORM’s fault. Most ORMs can get this right. It’s the developer’s responsibility to think about where the logic should be executed. The rule of thumb is: If you’re looping in the client to fetch individual things from the same server, you’re doing it wrong. Move the loop into the server. And by doing so, you’ve written your first (ghasp) “stored procedure”. You’ll write many more, and you’ll love it, once you realise how much speed you’re gaining. Or, in other words:Update: Some criticism from the reddit discussion of this article
/u/cogman10 made good points in his comment warning about batching “too big” workloads, which is perfectly correct when batching write heavy tasks. Large batches may increase the contention inside of your database. If you’re working in an MVCC environment (such as Oracle), having transactions that span millions of updates will put a lot of pressure on the UNDO/REDO log, and all other sessions reading from the same data will feel that pain. Also, when working with HTTP, beware of the fact that batches are harder to cache than individual requests. This article made the assumption that HTTP requests are:- Authorised – i.e. caching doesn’t even make sense as the authorisation might be revoked or the session terminated
- Operating on similar resources, e.g. fetching a bulk of IDs in one go might be more sensible than fetching each ID individuall
Great article,
This is a common issue I’ve been seen in almost all projects I passed through. In fact, I’ve done it sometimes when I was completly sure about the amount of data I was going to handle with, but always paying attention to this kind of issue! Usually, I favor sendind/retrieving lot of data in bulk.
And as you well said, one of the top performance problems with most ORMs (Hibernate for example) is N+1 Select!
Thanks, Lukas.
To me, perf problems mostly occurs with ORMs that hide the SQL queries from the developper (resulting in too many calls to the db with the famous N+1 select or with multiple inserts/updates/deletes in a client-side loop).
However, I dont think that moving logic in the DB is the good idea (harder to deploy, harder to test, arguably harder to develop)
Replacing ORMs with “handcrafted” SQL (via query builders) and replacing loops of insert/updates/deletes by an array-binded query are usually enough in my experience (the 20% remaining needs schema changes, denormalisation, tricks on the db, …).
Also, most developpers don’t know the powerful SQL constrcuts like “merge into” or analytical queries (lead/lag especially) and don’t even know how to read an explain plan.
If you still have a loop in the client-side after that, yes you can write a stored procedure …but a fairy probably dies somewhere 🙂
That was my 2 cents, thanks for the good post !
Ehm…:
Yet, and then…:
How is writing SQL not moving logic into the database?
First class “moving logic into the database” right there!
By “moving logic into the database”, I was understanding that you talked about stored procedure only.
Quote : “And by doing so, you’ve written your first (ghasp) “stored procedure”. You’ll write many more, and you’ll love it, once you realise how much speed you’re gaining.”
To me, writing correct SQL is not “moving logic into the database”. Writing a stored procedure is dispatching logic between the app and the db, this is why I think it is not a good idea to use them.
Why does it matter whether the logic is written in a 4GL (SQL) or a 3GL (PL/SQL et al) to you? The point is, the logic is executed on the server, regardless in what language.
The last paragraph that you’ve quoted is a metaphor to indicate that it doesn’t matter what technology you’re using, whenever you’re executing logic on the server, you’re running a “stored procedure” (thus the double quotes).
“The point is, the logic is executed on the server, regardless in what language.”
=> to me, the point is that apps are multiple servers so even a server is a client of another server.
So what does matter in the end is that the business logic is written as much as possible in an easy to develop, deploy and maintain technology.
Correct SQL in a server-side language is such a technology.
PL/SQL et al are no such technologies.
So writing PL/SQL is kind of last resort option : yes it is surely faster in many situations (and some situations required it) but it is so hard to develop and maintain that I keep that to a minimum.
Have you read the article? :-) I’m saying the same thing.
That’s dogmatic, and thus not very helpful. My experience totally doesn’t match yours.