SQL is full of mythsSQL is a 4GL (Fourth-generation programming language) and as such, has always been a cool, convenient way to express data related constraints and queries. But the declarative nature of the language also often meant that programmers are really looking into a crystal ball. A lot of people have blogged about a lot of half-true discoveries that might have been correct in some context and at some point of time (this blog is no exception). For instance:
- Are correlated subqueries slower than their
- Are derived tables faster than views or common table expressions?
Measure your queriesTo bust a myth, if you have good reasons to think that a differently written, but semantically equivalent query might be faster (on your database), you should measure. Don’t even trust any execution plan, because ultimately, what really counts is the wall clock time in your production system. If you can measure your queries in production, that’s perfect. But often, you cannot – but you don’t always have to. One way to compare two queries with each other is to benchmark them by executing each query hundreds or even thousands of times in a row. As any technique, benchmarking has pros and cons. Here is a non-exhaustive list: Pros
- Easy to do (see examples below)
- Easy to reproduce, also on different environments
- Easy to quickly get an idea in terms of orders of magnitude difference
- Not actually measuring productive situations (no one runs the same query thousands of times in a row, without any other queries in parallel)
- Queries may profit from unrealistic caching due to heavy repetition
- “Real query” might be dynamic, so the “same query” might really manifest itself in dozens of different productive queries
LEFT JOINequivalent for a given query. Note my using italics here, because even if you find out it’s slower for that given query it might be faster for other queries. Never jump to generalised rules before measuring again! (More info and scripts about benchmarks here) For instance, consider these two equivalent queries that run on the Sakila database. Both versions try to find those actors whose last name starts with the letter A and counts their corresponding films: LEFT JOIN
SELECT first_name, last_name, count(fa.actor_id) AS c FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE last_name LIKE 'A%' GROUP BY a.actor_id, first_name, last_name ORDER BY c DESC
The result is always: The queries have different execution plans on PostgreSQL, Oracle, SQL Server as can be seen below: PostgreSQL LEFT JOIN (Plan looks “better”) PostgreSQL correlated subquery (Plan looks “worse”) Oracle LEFT JOIN (Plan looks “more complicated”) Oracle correlated subquery (Plan looks “simpler”) SQL Server LEFT JOIN (Plan looks “reasonable”) SQL Server correlated subquery (Plan looks… geez, where’s my correlated subquery? It’s been transformed to a LEFT JOIN!) Huh, as you can see, in SQL Server, both queries produce the exact same plan (as they should, because the queries are really equivalent). But not all databases recognise this and/or optimise this. At least, that’s what the estimated plans suggest. Also, don’t jump to the conclusion that if the cost of one plan is lower then it’s a better plan than an alternative. Costs can only really be compared when comparing alternative plans for the same query, e.g. in the Oracle example, we had both HASH JOIN and NESTED LOOP JOIN in a single plan, because Oracle 12c may collect runtime statistics and switch plans in flight thanks to the Oracle 12c Adaptive Query Optimization features. But let’s ignore all of this and look at actual execution times, instead:
SELECT first_name, last_name, ( SELECT count(*) FROM film_actor fa WHERE a.actor_id = fa.actor_id ) AS c FROM actor a WHERE last_name LIKE 'A%' ORDER BY c DESC
Benchmarking the alternativesAs always, disclaimer: Some commercial databases do not allow for publishing benchmark results without prior written consent. As I never ask for permission, but always ask for forgiveness, I do not have consent, and I’m thus not publishing actual benchmark results. I have anonymized the benchmark results by introducing hypothetical, non-comparable units of measurement, so you cannot see that PostgreSQL is totally slower than Oracle and/or SQL Server. And you cannot see that SQL Server’s procedural language is totally uglier than PostgreSQL’s and/or Oracle’s.
Legal people. Solving problems we wouldn’t have without legal people, in the first placeEnough ranting. Some important considerations:
- Ideally, you’ll run benchmarks directly in the database using a procedural language, rather than, e.g. over JDBC to avoid network latency that incurs with JDBC calls, and other non-desired side-effects.
- Repeat the benchmarks several times to prevent warmup side-effects and other random issues, as your OS / file system may be busy with accidental Scala compilation, or Slack UI refreshes
- Be sure to actually consume the entire result set of each query in a loop, rather than just executing the query. Some databases may optimise for lazy cursor consumption (and possibly abortion). It would be unfair not to consume the entire result set
The result is:
DO $$ DECLARE v_ts TIMESTAMP; v_repeat CONSTANT INT := 10000; rec RECORD; BEGIN -- Repeat the whole benchmark several times to avoid warmup penalty FOR i IN 1..5 LOOP v_ts := clock_timestamp(); FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT first_name, last_name, count(fa.actor_id) AS c FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE last_name LIKE 'A%' GROUP BY a.actor_id, first_name, last_name ORDER BY c DESC ) LOOP NULL; END LOOP; END LOOP; RAISE INFO 'Run %, Statement 1: %', i, (clock_timestamp() - v_ts); v_ts := clock_timestamp(); FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT first_name, last_name, ( SELECT count(*) FROM film_actor fa WHERE a.actor_id = fa.actor_id ) AS c FROM actor a WHERE last_name LIKE 'A%' ORDER BY c DESC ) LOOP NULL; END LOOP; END LOOP; RAISE INFO 'Run %, Statement 2: %', i, (clock_timestamp() - v_ts); END LOOP; END$$;
INFO: Run 1, Statement 1: 00:00:01.708257 INFO: Run 1, Statement 2: 00:00:01.252012 INFO: Run 2, Statement 1: 00:00:02.33151 -- Slack message received here INFO: Run 2, Statement 2: 00:00:01.064007 INFO: Run 3, Statement 1: 00:00:01.638518 INFO: Run 3, Statement 2: 00:00:01.149005 INFO: Run 4, Statement 1: 00:00:01.670045 INFO: Run 4, Statement 2: 00:00:01.230755 INFO: Run 5, Statement 1: 00:00:01.81718 INFO: Run 5, Statement 2: 00:00:01.166089As you can see, in all 5 benchmark executions, the version with the correlated subquery seemed to have outperformed the version with the LEFT JOIN in this case by roughly 60%! As this is PostgreSQL and open source, benchmark results are in actual seconds for 10000 query executions. Neat. Let’s move on to…
Gee, check out the difference now (and remember, these are totally not seconds, but a hypothetical unit of measurment, let’s call them Newtons. Or Larrys. Let’s call them Larrys (great idea, Axel)):
SET SERVEROUTPUT ON DECLARE v_ts TIMESTAMP WITH TIME ZONE; v_repeat CONSTANT NUMBER := 10000; BEGIN -- Repeat the whole benchmark several times to avoid warmup penalty FOR r IN 1..5 LOOP v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT first_name, last_name, count(fa.actor_id) AS c FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE last_name LIKE 'A%' GROUP BY a.actor_id, first_name, last_name ORDER BY c DESC ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Run ' || r || ', Statement 1 : ' || (SYSTIMESTAMP - v_ts)); v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT first_name, last_name, ( SELECT count(*) FROM film_actor fa WHERE a.actor_id = fa.actor_id ) AS c FROM actor a WHERE last_name LIKE 'A%' ORDER BY c DESC ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Run ' || r || ', Statement 2 : ' || (SYSTIMESTAMP - v_ts)); END LOOP; END; /
Run 1, Statement 1 : 07.721731000 Run 1, Statement 2 : 00.622992000 Run 2, Statement 1 : 08.077535000 Run 2, Statement 2 : 00.666481000 Run 3, Statement 1 : 07.756182000 Run 3, Statement 2 : 00.640541000 Run 4, Statement 1 : 07.495021000 Run 4, Statement 2 : 00.731321000 Run 5, Statement 1 : 07.809564000 Run 5, Statement 2 : 00.632615000Wow, the correlated subquery totally outperformed the LEFT JOIN query by an order of magnitude. This is totally insane. Now, check out… SQL Server … beautiful procedural language in SQL Server: Transact-SQL. With nice features like:
- Needing to cast INT values to VARCHAR when concatenating them.
- No indexed loop, only WHILE loop
- No implicit cursor loops (instead: DEALLOCATE!)
And again, remember, these aren’t seconds. Really. They’re … Kilowatts. Yeah, let’s settle with kilowatts.
DECLARE @ts DATETIME; DECLARE @repeat INT = 10000; DECLARE @r INT; DECLARE @i INT; DECLARE @dummy1 VARCHAR; DECLARE @dummy2 VARCHAR; DECLARE @dummy3 INT; DECLARE @s1 CURSOR; DECLARE @s2 CURSOR; SET @r = 0; WHILE @r < 5 BEGIN SET @r = @r + 1 SET @s1 = CURSOR FOR SELECT first_name, last_name, count(fa.actor_id) AS c FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE last_name LIKE 'A%' GROUP BY a.actor_id, first_name, last_name ORDER BY c DESC SET @s2 = CURSOR FOR SELECT first_name, last_name, ( SELECT count(*) FROM film_actor fa WHERE a.actor_id = fa.actor_id ) AS c FROM actor a WHERE last_name LIKE 'A%' ORDER BY c DESC SET @ts = current_timestamp; SET @i = 0; WHILE @i < @repeat BEGIN SET @i = @i + 1 OPEN @s1; FETCH NEXT FROM @s1 INTO @dummy1, @dummy2, @dummy3; WHILE @@FETCH_STATUS = 0 BEGIN FETCH NEXT FROM @s1 INTO @dummy1, @dummy2, @dummy3; END; CLOSE @s1; END; DEALLOCATE @s1; PRINT 'Run ' + CAST(@r AS VARCHAR) + ', Statement 1: ' + CAST(DATEDIFF(ms, @ts, current_timestamp) AS VARCHAR) + 'ms'; SET @ts = current_timestamp; SET @i = 0; WHILE @i < @repeat BEGIN SET @i = @i + 1 OPEN @s2; FETCH NEXT FROM @s2 INTO @dummy1, @dummy2, @dummy3; WHILE @@FETCH_STATUS = 0 BEGIN FETCH NEXT FROM @s2 INTO @dummy1, @dummy2, @dummy3; END; CLOSE @s2; END; DEALLOCATE @s2; PRINT 'Run ' + CAST(@r AS VARCHAR) + ', Statement 2: ' + CAST(DATEDIFF(ms, @ts, current_timestamp) AS VARCHAR) + 'ms'; END;
Run 1, Statement 1: 2626 Run 1, Statement 2: 20340 Run 2, Statement 1: 2450 Run 2, Statement 2: 17910 Run 3, Statement 1: 2706 Run 3, Statement 2: 18396 Run 4, Statement 1: 2696 Run 4, Statement 2: 19103 Run 5, Statement 1: 2716 Run 5, Statement 2: 20453Oh my… Wait a second. Now suddenly, the correlated subquery is factor 5… more energy consuming (remember: kilowatts). Who would have thought?
ConclusionThis article won’t explain the differences in execution time between the different databases. There are a lot of reasons why a given execution plan will outperform another. There are also a lot of reasons why the same plan (at least what looks like the same plan) really isn’t because a plan is only a description of an algorithm. Each plan operation can still contain other operations that might still be different. In summary, we can say that in this case (I can’t stress this enough. This isn’t a general rule. It only explains what happens in this case. Don’t create the next SQL myth!), the correlated subquery and the LEFT JOIN performed in the same order of magnitude on PostgreSQL (subquery being a bit faster), the correlated subquery drastically outperformed the LEFT JOIN in Oracle, whereas the LEFT JOIN drastically outperformed the correlated subquery in SQL Server (despite the plan having been the same!) This means:
- Don’t trust your intitial judgment
- Don’t trust any historic blog posts saying A) is faster than B)
- Don’t trust execution plans
- Don’t trust this blog post here, because it is using uncomparable time scales (seconds vs newtons vs kilowatts)
- Don’t fully trust your own benchmarks, because you’re not measuring things as they happen in production
- Even for such a simple query, there’s no optimal query for all databases
6 thoughts on “How to Benchmark Alternative SQL Queries to Find the Fastest Query”
Small correction: Oracle performance units are called Larrys, not Newtons ;-)
You’re totally right. Fixed!
Your units are funny, but maybe you could keep it simpler by scaling all values so that the fastest (or slowest) result for a given database would be 1.0. If you don’t like the idea, then I’d recommend to use furlong per fortnight (you’re measuring speed, right?).
A small rant: Assuming there’s really such a thing like 4GL (starting in the seventies), then we should have 20GL by now, shouldn’t we?
I wonder how it comes that nearly no database “sees” that the queries are equivalent. I’d naively expect, such insight is a prerequisite for optimizations.
I did scale the results. There always is a result that is 1.0, I just didn’t display it. Furlong per fortnight sounds reasonable too, will remember for the next time.
Yeah, 4GLs should be a trivial thing compared to modern languages. We should all be asking Siri to compute things by now. But we aren’t.
There are many reasons why a database doesn’t “see” that the queries are equivalent. Mostly, transformations are omitted because each transformation (especially when cost based) adds a tremendous amount of complexity to the optimisation algorithm, which may already have a hard enough time to figure out the best order of joins.
In theory, all these transformations were known in the 70s. In practice, only recent computation power has allowed to actually explore many of them. Perhaps, machine learning will help an optimiser find a better path towards the best plan than costing statistics, though.
Can’t every nested query be replaced by a join? I don’t know, but if so, then it could be easiest to do it always so that the optimizer has don’t have to deal with the former at all.
The way it works now makes it hard for the programmer: There are different equivalent forms of the same query which leads to different timings depending on the RDBMS and also the current stats. So for the best speed, profiling is necessary, but not sufficient, as the plan may change later with changed stats or even a version upgrade.
The same holds for Java microoptimizations, but the microoptimizations are rarely important.
Does a database recognize a bad plan after it gets executed? It could get ashamed and try something else. ;)
I would say that probably (without fully proving it), a correlated subquery can always be replaced by a corresponding outer join, although sometimes, that transformation might be awfully complex. Also, the correlated subquery must not return more than one row, whereas a join may well do so, so the optimiser would make quite different choices as it wouldn’t know that the subquery returns at most one row…
And then: do note that the correlated subquery can outperform the join!
Yes, Oracle 11g knows adaptive cursor sharing (getting ashamed after a single bad execution) and Oracle 12c knows adaptive execution plans (getting ashamed while executing badly). SQL Server probably has something similar.
Also: Don’t get too upset about these differences. They hit me in a benchmark. They (probably) won’t hit you in production, mostly. A factor 5 performance difference is usually irrelevant for most operations because they’re sufficiently fast anyway…