A large-ish customer in banking (largest tables on that particular system: ~1 billion rows) once decided to separate the OLTP database from the “log database” in order to better use resources and prevent contention on some tables, as the append-only log database is used heavily for analytic querying of all sorts. That seems to make perfect sense. Except that sometimes, joins need to be done between “main database” and “log database” tables. This is when things get really hard to tune in Oracle – and probably in other databases too.
In this article, however, I’d like to focus on a much simpler example. One that seems to cause no trouble to the optimiser because all joined tables are from the “log database” only. Let’s use the following setup:
-- This is the database link CREATE PUBLIC DATABASE LINK LOOPBACK CONNECT TO TEST IDENTIFIED BY TEST USING 'ORCLCDB'; -- Just making sure we get all statistics in execution plans ALTER SESSION SET statistics_level = ALL;
And then, create this schema:
CREATE TABLE t ( a INT NOT NULL, b INT NOT NULL, CONSTRAINT pk_t PRIMARY KEY (a) ); CREATE TABLE u ( a INT NOT NULL, b INT NOT NULL, CONSTRAINT pk_u PRIMARY KEY (a) ); INSERT INTO t SELECT level, level FROM dual CONNECT BY level <= 500000; INSERT INTO u SELECT level, level FROM dual CONNECT BY level <= 500000; CREATE INDEX i_t ON t(b); ALTER TABLE u ADD CONSTRAINT fk_u FOREIGN KEY (a) REFERENCES t; EXEC dbms_stats.gather_table_stats('TEST', 'T'); EXEC dbms_stats.gather_table_stats('TEST', 'U');
It’s a really boring emulation of the real schema, and it doesn’t have nearly as many columns / rows. But the essence is:
- There are (at least) two tables
- Both have quite a few rows (that’s important here. I’ll show why, later)
- We’ll use an index for searching rows
- We’ll join by a one-to-many relationship
There may be other setups to reproduce the same issue, of course.
Now, let’s consider the following query (not using the database link yet).
SELECT CASE WHEN EXISTS ( SELECT * FROM t JOIN u USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual
Unfortunately, Oracle doesn’t support boolean types and always requires a FROM
clause. Otherwise, we could be writing this more concise version, as in PostgreSQL:
SELECT EXISTS ( SELECT * FROM t JOIN u USING (a) WHERE t.b BETWEEN 0 AND 1000 )
We’re checking for the existence of rows in both tables, given a predicate that runs on the previously created index.
As shown in a previous article, it’s much better to use EXISTS
rather than COUNT(*), in pretty much all databases. The algorithm is optimal, because the usage of the EXISTS
predicate hints to the optimiser that a SEMI JOIN can be used instead of an INNER JOIN
-------------------------------------------------------------------------- | Operation | Name | Starts | E-Rows | A-Rows | -------------------------------------------------------------------------- | SELECT STATEMENT | | 1 | | 1 | | NESTED LOOPS SEMI | | 1 | 4 | 1 | | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1 | | INDEX RANGE SCAN | I_T | 1 | 1000 | 1 | | INDEX UNIQUE SCAN | PK_U | 1 | 333K| 1 | | FAST DUAL | | 1 | 1 | 1 | --------------------------------------------------------------------------
Some observations:
- The database doesn’t see the very strong correlation between the data in our two tables and estimates there to be 333,000 rows to be selected in U. This might be topic of a future blog post.
- Even if we’re selecting an estimated 1,000 rows with our predicate on
T.B
, the actual number of rows (“A-Rows” column) is always 1, because thanks to EXISTS() we can stop looking for rows as soon as we found one. - This can also be seen in the “Starts” column, which shows that each operation in the execution plan is started only once, even if we’re inside of a nested loop
So, this is optimal (until I learn a new trick, of course).
Let’s bring in database links
Assuming that these two tables are on a remote database, we might naively proceed with writing this query:
SELECT CASE WHEN EXISTS ( SELECT * FROM t@loopback JOIN u@loopback USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual;
So now, we’re selecting from T@LOOPBACK
and U@LOOPBACK
, but the rest is exactly the same. For the sake of simplicity, I’m running this reproduction on the same instance, thus “LOOPBACK”. The logical impact is the same, though.
------------------------------------------------------ | Operation | Name | Starts | E-Rows | A-Rows | ------------------------------------------------------ | SELECT STATEMENT | | 1 | | 1 | | REMOTE | | 1 | | 1 | | FAST DUAL | | 1 | 1 | 1 | ------------------------------------------------------
Interesting. Or rather: Not too interesting. Sure, our own database knows the correct estimate: 1 row that comes out of the EXISTS()
predicate. But the interesting thing happens at the remote database. Let’s look at the plan there. The query being executed on the remote database is this:
SQL_ID 80fczs4r1c9yd, child number 0 ------------------------------------- SELECT 0 FROM "T" "A2","U" "A1" WHERE "A2"."B">=0 AND "A2"."B"=0
So, the EXISTS()
predicate is not propagated to the remote database. Thus, the plan:
Plan hash value: 165433672 -------------------------------------------------------------------------- | Operation | Name | Starts | E-Rows | A-Rows | -------------------------------------------------------------------------- | SELECT STATEMENT | | 1 | | 1 | | HASH JOIN | | 1 | 1000 | 1 | | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 | | INDEX RANGE SCAN | I_T | 1 | 1000 | 1000 | | INDEX FAST FULL SCAN | PK_U | 1 | 500K| 1 | --------------------------------------------------------------------------
Oops. Observations:
- We’re now running a hash join (as expected, given the query that the remote database knows of)
- We’re materialising the expected 1000 rows from the predicate on
T.B
- But we’re still not fetching all the expected 500,000 rows from the
U
table because the database that calls this query will abort as soon as it finds a single row
Huh. Bummer. So while we’re not running into a major catastrophe (of materialising all the rows from U
), this is still far from optimal. The remote database has no knowledge at all of the fact that we’re going to be selecting 0
or 1
rows only, and that it thus should always run a SEMI JOIN
.
You can try adding a /*+FIRST_ROWS(1)*/
hint, but that doesn’t work. It won’t make it to the remote database.
Arcane DUAL@LINK to the rescue
This is when I had an idea. The problem might just be the fact that Oracle always needs a FROM
clause, even if it doesn’t make any sense here. So what if we use DUAL@LOOPBACK
instead of DUAL
. Because that DUAL
table, technically, is a table on our own database, so even if it looks as though the entire query can be run on the remote database, that seems not to be true here! So let’s try this:
SELECT CASE WHEN EXISTS ( SELECT * FROM t@loopback JOIN u@loopback USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual@loopback; -- Subtle difference here!
As I hoped, this subtle change leads to the EXISTS()
predicate being sent to the remote database. The query executed on the remote database is now:
SQL_ID 9bz87xw0zc23c, child number 0 ------------------------------------- SELECT CASE WHEN EXISTS (SELECT 0 FROM "T" "A3","U" "A2" WHERE "A3"."B">=0 AND "A3"."B"<=1000 AND "A3"."A"="A2"."A") THEN 1 ELSE 0 END FROM "DUAL" "A1"
And the plan, now again including the desired SEMI JOIN
:
Plan hash value: 1561559448 -------------------------------------------------------------------------- | Operation | Name | Starts | E-Rows | A-Rows | -------------------------------------------------------------------------- | SELECT STATEMENT | | 1 | | 1 | | NESTED LOOPS SEMI | | 1 | 4 | 1 | | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1 | | INDEX RANGE SCAN | I_T | 1 | 1000 | 1 | | INDEX UNIQUE SCAN | PK_U | 1 | 333K| 1 | | FAST DUAL | | 1 | 1 | 1 | --------------------------------------------------------------------------
Excellent!
Benchmark time
Plans and estimates are one thing. What ultimately counts to business is wall clock time. So, let’s try this again using a benchmark:
SET SERVEROUTPUT ON DECLARE v_ts TIMESTAMP WITH TIME ZONE; v_repeat CONSTANT NUMBER := 100; BEGIN -- Repeat benchmark several times to avoid warmup penalty FOR r IN 1..5 LOOP v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT CASE WHEN EXISTS ( SELECT * FROM t JOIN u USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Run ' || r ||', Statement 1 : ' || (SYSTIMESTAMP - v_ts)); v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT CASE WHEN EXISTS ( SELECT * FROM t@loopback JOIN u@loopback USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Run ' || r ||', Statement 2 : ' || (SYSTIMESTAMP - v_ts)); v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT CASE WHEN EXISTS ( SELECT * FROM t@loopback JOIN u@loopback USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END FROM dual@loopback ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Run ' || r ||', Statement 3 : ' || (SYSTIMESTAMP - v_ts)); dbms_output.put_line(''); END LOOP; END; /
And here are the resulting times:
Run 1, Statement 1 : +000000000 00:00:00.008110000 Run 1, Statement 2 : +000000000 00:00:00.213404000 Run 1, Statement 3 : +000000000 00:00:00.043044000 Run 2, Statement 1 : +000000000 00:00:00.003466000 Run 2, Statement 2 : +000000000 00:00:00.198487000 Run 2, Statement 3 : +000000000 00:00:00.042717000 Run 3, Statement 1 : +000000000 00:00:00.003077000 Run 3, Statement 2 : +000000000 00:00:00.191802000 Run 3, Statement 3 : +000000000 00:00:00.048740000 Run 4, Statement 1 : +000000000 00:00:00.005008000 Run 4, Statement 2 : +000000000 00:00:00.192828000 Run 4, Statement 3 : +000000000 00:00:00.043461000 Run 5, Statement 1 : +000000000 00:00:00.002970000 Run 5, Statement 2 : +000000000 00:00:00.190786000 Run 5, Statement 3 : +000000000 00:00:00.043910000
Clearly, not using the database link is always the fastest, roughly by a factor of 10 compared to the DUAL@LOOPBACK
solution. But due to the system design, we don’t have this choice. Nonetheless, you can still see that DUAL@LOOPBACK
consistently outperforms DUAL
by another factor of around 5 as it still prevents the HASH JOIN
!
Caveat: Small data != Big data
There, I said it. “Big Data”. Before, we had a predicate that ran on 1,000 rows in a 500,000 row strong table. Our customer had millions of rows. But what happens if you query small data sets? Let’s reduce the predicate to this:
WHERE t.b BETWEEN 0 AND 10
The benchmark result is now completely different:
Run 1, Statement 1 : +000000000 00:00:00.007093000 Run 1, Statement 2 : +000000000 00:00:00.047539000 Run 1, Statement 3 : +000000000 00:00:00.071546000 Run 2, Statement 1 : +000000000 00:00:00.003023000 Run 2, Statement 2 : +000000000 00:00:00.041259000 Run 2, Statement 3 : +000000000 00:00:00.052132000 Run 3, Statement 1 : +000000000 00:00:00.002767000 Run 3, Statement 2 : +000000000 00:00:00.034190000 Run 3, Statement 3 : +000000000 00:00:00.054023000 Run 4, Statement 1 : +000000000 00:00:00.003468000 Run 4, Statement 2 : +000000000 00:00:00.026141000 Run 4, Statement 3 : +000000000 00:00:00.047415000 Run 5, Statement 1 : +000000000 00:00:00.002818000 Run 5, Statement 2 : +000000000 00:00:00.026100000 Run 5, Statement 3 : +000000000 00:00:00.046875000
And as you can see, the DUAL@LOOPBACK
solution actually worsens performance for these queries. The reason for this is that we’re now running, again, a NESTED LOOP JOIN
(but not SEMI JOIN
) rather than a HASH JOIN
on the remote database:
Query on remote database:
SQL_ID 7349t2363uc9m, child number 0 ------------------------------------- SELECT 0 FROM "T" "A2","U" "A1" WHERE "A2"."B">=0 AND "A2"."B"=0
Plan on remote database:
Plan hash value: 2558931407 -------------------------------------------------------------------------- | Operation | Name | Starts | E-Rows | A-Rows | -------------------------------------------------------------------------- | SELECT STATEMENT | | 1 | | 1 | | NESTED LOOPS | | 1 | 10 | 1 | | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 10 | 1 | | INDEX RANGE SCAN | I_T | 1 | 10 | 1 | | INDEX UNIQUE SCAN | PK_U | 1 | 1 | 1 | --------------------------------------------------------------------------
I haven’t analysed what the reason for this difference is, as the difference is not significant enough, compared to the improvement for large data sets.
Conclusion
Tuning queries over database links is hard. Much much harder than tuning “ordinary” queries. Ideally, you’ll simply avoid database links and run all queries on a single instance. But sometimes that’s not possible.
In that case, the best solution is to move the logic to the remote query completely and collect only the result. Ideally, this is done using a stored procedure on the remote database and calculating this 1/0
result completely remotely. I think, hipsters these days call this a Microservice, or better, a Lambda:
CREATE FUNCTION t_u RETURN NUMBER IS v_result NUMBER; BEGIN SELECT CASE WHEN EXISTS ( SELECT * FROM t JOIN u USING (a) WHERE t.b BETWEEN 0 AND 1000 ) THEN 1 ELSE 0 END INTO v_result FROM dual; RETURN v_result; END t_u; /
Comparing the benchmark call with the other options:
T.B BETWEEN 0 AND 10
Statement 1 : +000000000 00:00:00.003022000 -- Local query Statement 2 : +000000000 00:00:00.027416000 -- DUAL Statement 3 : +000000000 00:00:00.043823000 -- Remote DUAL Statement 4 : +000000000 00:00:00.022181000 -- Remote stored procedure
T.B BETWEEN 0 AND 1000
Statement 1 : +000000000 00:00:00.002877000 -- Local query Statement 2 : +000000000 00:00:00.188588000 -- Local DUAL Statement 3 : +000000000 00:00:00.050163000 -- Remote DUAL Statement 4 : +000000000 00:00:00.018736000 -- Remote stored procedure
But that, too, might not be possible as you may not have the required rights to create stored procedures on that database. You could call DBMS_SQL
on the remote database and run a PL/SQL block dynamically on the remote database (didn’t try that in my benchmark).
Or, you simply use an occasional DUAL@LINK
, which might already do the trick with a minimal change to the original query.