Oracle’s OFFSET .. FETCH Can be Slower than Classic ROWNUM Filtering

One of Oracle 12c’s coolest features was the introduction of the SQL standard OFFSET .. FETCH clause, as we can now write things like:

SELECT *
FROM film 
ORDER BY film_id
FETCH FIRST 1 ROW ONLY

This is querying the Sakila database. Most other databases had this clause (or a non-standard version of it) for ages, e.g. MySQL with LIMIT. For all the different LIMIT syntaxes, check out the jOOQ manual.

Implementation wise, the Oracle folks chose to rewrite this clause as a simple window function filter. In principle, behind the scenes, the following is executed:

Teradata syntax

SELECT *
FROM film
QUALIFY row_number() OVER (ORDER BY film_id) = 1
ORDER BY film_id

Standard syntax

SELECT * -- Except rn
FROM (
  SELECT film.*, row_number() OVER (ORDER BY film_id) rn
  FROM film
) t
WHERE rn = 1
ORDER BY film_id

This does definitely look much better than the “old” approach using ROWNUM filtering, which many of us have written for years:

Legacy Oracle syntax

SELECT t.*
FROM (
  SELECT *
  FROM film 
  ORDER BY film_id
) t
WHERE ROWNUM = 1

What I don’t like about this “old” approach is that we’re relying on the ORDER BY clause of a derived table, which should not guaranteed to be retained in the outer most query in my opinion (although it is, in Oracle, in this case).

So, having the SQL standard syntax is definitely good.

What’s the problem?

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  COUNT STOPKEY                |         |       |
|   2 |   VIEW                        |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

-------------------------------------------------
| Id  | Operation                | Name | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |
|*  1 |  VIEW                    |      |     1 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000 |
|   3 |    TABLE ACCESS FULL     | FILM |  1000 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

The cardinalities are both correct, but one of the queries seems to traverse the entire table to find the top FILM_ID, which the other query found in the index directly. A workaround would be to hint the number of rows to the FETCH FIRST query:

SELECT /*+FIRST_ROWS(1)*/ *
FROM film
ORDER BY film_id
FETCH FIRST 1 ROW ONLY;

… in case of which we’ll get a similar plan as that of the ROWNUM filtering query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  VIEW                         |         |     1 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Measuring this using our measurement technique yields quite devastating results:

Run 1, Statement 1 :  1.11230  -- ROWNUM
Run 1, Statement 2 :  1.15508  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 46.92781  -- FETCH FIRST

Run 2, Statement 1 :  1.68449
Run 2, Statement 2 :  1.99465
Run 2, Statement 3 : 47.32620

Run 3, Statement 1 :  1.10428
Run 3, Statement 2 :  1.13904
Run 3, Statement 3 : 68.06417

Run 4, Statement 1 :  1
Run 4, Statement 2 :  6.00535
Run 4, Statement 3 : 44.88235

The above results don’t show any time measurement, but a number relative to the fastest execution (1)

There is a 40x performance difference between the approaches, with ROWNUM based filtering being the fastest, FETCH FIRST plus +FIRST_ROWS hint being slightly slower, and “naked” FETCH FIRST being terribly slow, when repeating the measurement 5x and running each query 10000x on my machine on Oracle 12.2.0.1.0 in Docker.

Things get worse when joining tables. Let’s run a query that fetches the first customer and their full address information:

-- Legacy Oracle syntax
SELECT t.*
FROM (
  SELECT *
  FROM customer 
  JOIN address USING (address_id)
  JOIN city USING (city_id)
  JOIN country USING (country_id)
  ORDER BY customer_id
) t
WHERE ROWNUM = 1;

-- Standard syntax with hint
SELECT /*+FIRST_ROWS(1)*/ *
FROM customer 
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;

-- Standard syntax without hint
SELECT *
FROM customer 
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;

The two queries are equivalent, they both produce the same result. Yet, the plans are very different.

Oracle’s legacy syntax

-----------------------------------------------------------------
| Id  | Operation                         | Name        | Rows  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT                  |             |       |
|*  1 |  COUNT STOPKEY                    |             |       |
|   2 |   VIEW                            |             |     1 |
|   3 |    NESTED LOOPS                   |             |     1 |
|   4 |     NESTED LOOPS                  |             |     1 |
|   5 |      NESTED LOOPS                 |             |     1 |
|   6 |       NESTED LOOPS                |             |     1 |
|   7 |        TABLE ACCESS BY INDEX ROWID| CUSTOMER    |   302 |
|   8 |         INDEX FULL SCAN           | PK_CUSTOMER |     1 |
|   9 |        TABLE ACCESS BY INDEX ROWID| ADDRESS     |     1 |
|* 10 |         INDEX UNIQUE SCAN         | PK_ADDRESS  |     1 |
|  11 |       TABLE ACCESS BY INDEX ROWID | CITY        |     1 |
|* 12 |        INDEX UNIQUE SCAN          | PK_CITY     |     1 |
|* 13 |      INDEX UNIQUE SCAN            | PK_COUNTRY  |     1 |
|  14 |     TABLE ACCESS BY INDEX ROWID   | COUNTRY     |     1 |
-----------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
  10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
  12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
  13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

We’re seeing tons of nested loop joins because that’s what we’ll expect given the low cardinalities imposed by the COUNT STOPKEY operation.

SQL standard syntax with hint

-----------------------------------------------------------------
| Id  | Operation                         | Name        | Rows  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT                  |             |       |
|*  1 |  VIEW                             |             |     1 |
|*  2 |   WINDOW NOSORT STOPKEY           |             |     1 |
|   3 |    NESTED LOOPS                   |             |     1 |
|   4 |     NESTED LOOPS                  |             |     1 |
|   5 |      NESTED LOOPS                 |             |     1 |
|   6 |       NESTED LOOPS                |             |     1 |
|   7 |        TABLE ACCESS BY INDEX ROWID| CUSTOMER    |   302 |
|   8 |         INDEX FULL SCAN           | PK_CUSTOMER |     1 |
|   9 |        TABLE ACCESS BY INDEX ROWID| ADDRESS     |     1 |
|* 10 |         INDEX UNIQUE SCAN         | PK_ADDRESS  |     1 |
|  11 |       TABLE ACCESS BY INDEX ROWID | CITY        |     1 |
|* 12 |        INDEX UNIQUE SCAN          | PK_CITY     |     1 |
|* 13 |      INDEX UNIQUE SCAN            | PK_COUNTRY  |     1 |
|  14 |     TABLE ACCESS BY INDEX ROWID   | COUNTRY     |     1 |
-----------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
  10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
  12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
  13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Quite a similar plan

SQL standard syntax without hint

---------------------------------------------------------------
| Id  | Operation                        | Name       | Rows  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                 |            |       |
|*  1 |  VIEW                            |            |     1 |
|*  2 |   WINDOW SORT PUSHED RANK        |            |   599 |
|*  3 |    HASH JOIN                     |            |   599 |
|   4 |     TABLE ACCESS FULL            | CUSTOMER   |   599 |
|*  5 |     HASH JOIN                    |            |   603 |
|   6 |      MERGE JOIN                  |            |   600 |
|   7 |       TABLE ACCESS BY INDEX ROWID| COUNTRY    |   109 |
|   8 |        INDEX FULL SCAN           | PK_COUNTRY |   109 |
|*  9 |       SORT JOIN                  |            |   600 |
|  10 |        TABLE ACCESS FULL         | CITY       |   600 |
|  11 |      TABLE ACCESS FULL           | ADDRESS    |   603 |
---------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
   3 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
   5 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
   9 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
       filter("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Oops. Many HASH JOIN and MERGE JOIN and TABLE ACCESS FULL operations, as well as a WINDOW SORT operation, rather than a WINDOW NOSORT operation. That cannot be good. Let’s measure this again.

Run 1, Statement 1 :  1.26157  -- ROWNUM
Run 1, Statement 2 :  1.32394  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 66.97384  -- FETCH FIRST

Run 2, Statement 1 :  1.31992
Run 2, Statement 2 :  1.76459
Run 2, Statement 3 : 72.76056

Run 3, Statement 1 :  1
Run 3, Statement 2 :  1.36419
Run 3, Statement 3 : 74.06439

Run 4, Statement 1 :  1.08451
Run 4, Statement 2 :  1.64990
Run 4, Statement 3 : 66.83702

The difference is even worse: Factor 60x. And make no mistake, these are trivial data set sizes. As we can see in the last execution plan, the cardinality of the CUSTOMER table is 599. This can get much worse for larger tables.

Why even use this syntax?

The SQL standard syntax is much nicer to write, and it allows for nice TOP-N style queries using CROSS APPLY or LATERAL, e.g. to find the TOP 3 longest film titles per actor:

SELECT actor_id, first_name, last_name, title
FROM actor a
OUTER APPLY (
  SELECT /*+FIRST_ROWS(1)*/ title
  FROM film f
  JOIN film_actor fa USING (film_id)
  WHERE fa.actor_id = a.actor_id
  ORDER BY length(title) DESC
  FETCH FIRST 3 ROWS ONLY
) t
ORDER BY actor_id, length(title) DESC;

This would have been much harder with the ROWNUM approach. In older Oracle versions, it was even impossible, because we could not reference A.ACTOR_ID from doubly nested derived tables / correlated subqueries. Luckily, this is no longer the case. So, syntactically, it is definitely a much better way to do paginated queries or TOP-N queries, but the price is very high.

Disclaimer

The optimiser might make much better choices when:

The base data set is much bigger than the above 600 to 1000 rows “strong” tables.

Indeed, when fetching the first row from the PAYMENT table (with ~16000 rows), the difference becomes smaller or even inexistent:

Run 1, Statement 1 : 1        -- ROWNUM
Run 1, Statement 2 : 1.72246  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 1.76165  -- FETCH FIRST

Run 2, Statement 1 : 1.03919
Run 2, Statement 2 : 1.78284
Run 2, Statement 3 : 1.75742

Run 3, Statement 1 : 1.2553
Run 3, Statement 2 : 1.86441
Run 3, Statement 3 : 2.39089

Run 4, Statement 1 : 2.28814
Run 4, Statement 2 : 3.02436
Run 4, Statement 3 : 2.39407

Run 5, Statement 1 : 1.31462
Run 5, Statement 2 : 2.27225
Run 5, Statement 3 : 1.70975

As can be seen, the measurement errors start to outweigh the difference in performance, so the difference isn’t really as significant anymore.

The limit is not 1 or 3, but 10 or 50

When fetching the top 50 rows from the joined customer/address query, the measurements actually changed quite a bit. Suddenly, the ROWNUM query wasn’t optimal anymore and behaved like the un-hinted FETCH FIRST query. Adding a /*+FIRST_ROWS(1)*/ hint (not /*+FIRST_ROWS(50)*/ !) did help:

Run 1, Statement 1 : 1.00545  -- ROWNUM +FIRST_ROWS
Run 1, Statement 2 : 7.24842  -- ROWNUM
Run 1, Statement 3 : 1.35691  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 4 : 7.15264  -- FETCH FIRST

Run 2, Statement 1 : 1.08054
Run 2, Statement 2 : 6.51922
Run 2, Statement 3 : 1.35960
Run 2, Statement 4 : 7.94527

Run 3, Statement 1 : 1.02824
Run 3, Statement 2 : 7.16228
Run 3, Statement 3 : 1.19702
Run 3, Statement 4 : 7.55008

Run 4, Statement 1 : 1.08364
Run 4, Statement 2 : 6.66652
Run 4, Statement 3 : 1.18559
Run 4, Statement 4 : 7.36938

Run 5, Statement 1 : 1
Run 5, Statement 2 : 6.89051
Run 5, Statement 3 : 1.24211
Run 5, Statement 4 : 7.15167

Conclusion

What we’ve seen here is a bit unfortunate. For some cases, one approach is better than the other, performance wise. For others, it’s vice versa. Paginated queries are still a bit tricky for Oracle to get right and we have to measure things explicitly.

Workaround in jOOQ

Until this is fixed by Oracle, if you’re using jOOQ, you can use the SQLDialect.ORACLE11G dialect to run classic ROWNUM filtering queries also on Oracle 12c. Alternatively, a future version of jOOQ will optionally generate a +FIRST_ROWS hint with a reasonably approximated cardinality: https://github.com/jOOQ/jOOQ/issues/5793