The Difficulty of Tuning Queries Over a Database Link – Or How I Learned to Stop Worrying and Love the DUAL@LINK Table

A large-ish customer in banking (largest tables on that particular system: ~1 billion rows) once decided to separate the OLTP database from the “log database” in order to better use resources and prevent contention on some tables, as the append-only log database is used heavily for analytic querying of all sorts. That seems to make perfect sense. Except that sometimes, joins need to be done between “main database” and “log database” tables. This is when things get really hard to tune in Oracle – and probably in other databases too.

In this article, however, I’d like to focus on a much simpler example. One that seems to cause no trouble to the optimiser because all joined tables are from the “log database” only. Let’s use the following setup:

-- This is the database link
CREATE PUBLIC DATABASE LINK LOOPBACK 
CONNECT TO TEST IDENTIFIED BY TEST 
USING 'ORCLCDB';

-- Just making sure we get all statistics in execution plans
ALTER SESSION SET statistics_level = ALL;

And then, create this schema:

CREATE TABLE t (
  a INT NOT NULL,
  b INT NOT NULL,
  CONSTRAINT pk_t PRIMARY KEY (a)
);
CREATE TABLE u (
  a INT NOT NULL,
  b INT NOT NULL,
  CONSTRAINT pk_u PRIMARY KEY (a)
);

INSERT INTO t
SELECT
  level,
  level
FROM dual
CONNECT BY level <= 500000;

INSERT INTO u
SELECT
  level,
  level
FROM dual
CONNECT BY level <= 500000;

CREATE INDEX i_t ON t(b);

ALTER TABLE u ADD CONSTRAINT fk_u FOREIGN KEY (a) REFERENCES t;

EXEC dbms_stats.gather_table_stats('TEST', 'T');
EXEC dbms_stats.gather_table_stats('TEST', 'U');

It’s a really boring emulation of the real schema, and it doesn’t have nearly as many columns / rows. But the essence is:

  • There are (at least) two tables
  • Both have quite a few rows (that’s important here. I’ll show why, later)
  • We’ll use an index for searching rows
  • We’ll join by a one-to-many relationship

There may be other setups to reproduce the same issue, of course.

Now, let’s consider the following query (not using the database link yet).

SELECT CASE WHEN EXISTS (
  SELECT *
  FROM t 
  JOIN u USING (a)
  WHERE t.b BETWEEN 0 AND 1000
) THEN 1 ELSE 0 END
FROM dual

Unfortunately, Oracle doesn’t support boolean types and always requires a FROM clause. Otherwise, we could be writing this more concise version, as in PostgreSQL:

SELECT EXISTS (
  SELECT *
  FROM t 
  JOIN u USING (a)
  WHERE t.b BETWEEN 0 AND 1000
)

We’re checking for the existence of rows in both tables, given a predicate that runs on the previously created index.

As shown in a previous article, it’s much better to use EXISTS rather than COUNT(*), in pretty much all databases. The algorithm is optimal, because the usage of the EXISTS predicate hints to the optimiser that a SEMI JOIN can be used instead of an INNER JOIN

--------------------------------------------------------------------------
| Operation                            | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
| SELECT STATEMENT                     |      |      1 |        |      1 |
|  NESTED LOOPS SEMI                   |      |      1 |      4 |      1 |
|   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |   1000 |      1 |
|    INDEX RANGE SCAN                  | I_T  |      1 |   1000 |      1 |
|   INDEX UNIQUE SCAN                  | PK_U |      1 |    333K|      1 |
|  FAST DUAL                           |      |      1 |      1 |      1 |
--------------------------------------------------------------------------

Some observations:

So, this is optimal (until I learn a new trick, of course).

Let’s bring in database links

Assuming that these two tables are on a remote database, we might naively proceed with writing this query:

SELECT CASE WHEN EXISTS (
  SELECT *
  FROM t@loopback 
  JOIN u@loopback USING (a)
  WHERE t.b BETWEEN 0 AND 1000
) THEN 1 ELSE 0 END
FROM dual;

So now, we’re selecting from T@LOOPBACK and U@LOOPBACK, but the rest is exactly the same. For the sake of simplicity, I’m running this reproduction on the same instance, thus “LOOPBACK”. The logical impact is the same, though.

------------------------------------------------------
| Operation        | Name | Starts | E-Rows | A-Rows |
------------------------------------------------------
| SELECT STATEMENT |      |      1 |        |      1 |
|  REMOTE          |      |      1 |        |      1 |
|  FAST DUAL       |      |      1 |      1 |      1 |
------------------------------------------------------

Interesting. Or rather: Not too interesting. Sure, our own database knows the correct estimate: 1 row that comes out of the EXISTS() predicate. But the interesting thing happens at the remote database. Let’s look at the plan there. The query being executed on the remote database is this:

SQL_ID  80fczs4r1c9yd, child number 0
-------------------------------------

SELECT 0 FROM "T" "A2","U" "A1" WHERE "A2"."B">=0 AND "A2"."B"=0

So, the EXISTS() predicate is not propagated to the remote database. Thus, the plan:

Plan hash value: 165433672
 
--------------------------------------------------------------------------
| Operation                            | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
| SELECT STATEMENT                     |      |      1 |        |      1 |
|  HASH JOIN                           |      |      1 |   1000 |      1 |
|   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |   1000 |   1000 |
|    INDEX RANGE SCAN                  | I_T  |      1 |   1000 |   1000 |
|   INDEX FAST FULL SCAN               | PK_U |      1 |    500K|      1 |
--------------------------------------------------------------------------

Oops. Observations:

  • We’re now running a hash join (as expected, given the query that the remote database knows of)
  • We’re materialising the expected 1000 rows from the predicate on T.B
  • But we’re still not fetching all the expected 500,000 rows from the U table because the database that calls this query will abort as soon as it finds a single row

Huh. Bummer. So while we’re not running into a major catastrophe (of materialising all the rows from U), this is still far from optimal. The remote database has no knowledge at all of the fact that we’re going to be selecting 0 or 1 rows only, and that it thus should always run a SEMI JOIN.

You can try adding a /*+FIRST_ROWS(1)*/ hint, but that doesn’t work. It won’t make it to the remote database.

Arcane DUAL@LINK to the rescue

This is when I had an idea. The problem might just be the fact that Oracle always needs a FROM clause, even if it doesn’t make any sense here. So what if we use DUAL@LOOPBACK instead of DUAL. Because that DUAL table, technically, is a table on our own database, so even if it looks as though the entire query can be run on the remote database, that seems not to be true here! So let’s try this:

SELECT CASE WHEN EXISTS (
  SELECT *
  FROM t@loopback 
  JOIN u@loopback USING (a)
  WHERE t.b BETWEEN 0 AND 1000
) THEN 1 ELSE 0 END
FROM dual@loopback; -- Subtle difference here!

As I hoped, this subtle change leads to the EXISTS() predicate being sent to the remote database. The query executed on the remote database is now:

SQL_ID  9bz87xw0zc23c, child number 0
-------------------------------------
SELECT CASE  WHEN  EXISTS (SELECT 0 FROM "T" "A3","U" "A2" WHERE 
"A3"."B">=0 AND "A3"."B"<=1000 AND "A3"."A"="A2"."A") THEN 1 ELSE 0 END 
 FROM "DUAL" "A1"

And the plan, now again including the desired SEMI JOIN:

Plan hash value: 1561559448
 
--------------------------------------------------------------------------
| Operation                            | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
| SELECT STATEMENT                     |      |      1 |        |      1 |
|  NESTED LOOPS SEMI                   |      |      1 |      4 |      1 |
|   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |   1000 |      1 |
|    INDEX RANGE SCAN                  | I_T  |      1 |   1000 |      1 |
|   INDEX UNIQUE SCAN                  | PK_U |      1 |    333K|      1 |
|  FAST DUAL                           |      |      1 |      1 |      1 |
--------------------------------------------------------------------------

Excellent!

Benchmark time

Plans and estimates are one thing. What ultimately counts to business is wall clock time. So, let’s try this again using a benchmark:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 100;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT CASE WHEN EXISTS (
          SELECT *
          FROM t 
          JOIN u USING (a)
          WHERE t.b BETWEEN 0 AND 1000
        ) THEN 1 ELSE 0 END
        FROM dual
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 1 : ' 
      || (SYSTIMESTAMP - v_ts));
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT CASE WHEN EXISTS (
          SELECT *
          FROM t@loopback 
          JOIN u@loopback USING (a)
          WHERE t.b BETWEEN 0 AND 1000
        ) THEN 1 ELSE 0 END
        FROM dual
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 2 : ' 
      || (SYSTIMESTAMP - v_ts));
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT CASE WHEN EXISTS (
          SELECT *
          FROM t@loopback 
          JOIN u@loopback USING (a)
          WHERE t.b BETWEEN 0 AND 1000
        ) THEN 1 ELSE 0 END
        FROM dual@loopback
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 3 : ' 
      || (SYSTIMESTAMP - v_ts));
    dbms_output.put_line('');
  END LOOP;
END;
/

And here are the resulting times:

Run 1, Statement 1 : +000000000 00:00:00.008110000
Run 1, Statement 2 : +000000000 00:00:00.213404000
Run 1, Statement 3 : +000000000 00:00:00.043044000

Run 2, Statement 1 : +000000000 00:00:00.003466000
Run 2, Statement 2 : +000000000 00:00:00.198487000
Run 2, Statement 3 : +000000000 00:00:00.042717000

Run 3, Statement 1 : +000000000 00:00:00.003077000
Run 3, Statement 2 : +000000000 00:00:00.191802000
Run 3, Statement 3 : +000000000 00:00:00.048740000

Run 4, Statement 1 : +000000000 00:00:00.005008000
Run 4, Statement 2 : +000000000 00:00:00.192828000
Run 4, Statement 3 : +000000000 00:00:00.043461000

Run 5, Statement 1 : +000000000 00:00:00.002970000
Run 5, Statement 2 : +000000000 00:00:00.190786000
Run 5, Statement 3 : +000000000 00:00:00.043910000

Clearly, not using the database link is always the fastest, roughly by a factor of 10 compared to the DUAL@LOOPBACK solution. But due to the system design, we don’t have this choice. Nonetheless, you can still see that DUAL@LOOPBACK consistently outperforms DUAL by another factor of around 5 as it still prevents the HASH JOIN!

Caveat: Small data != Big data

There, I said it. “Big Data”. Before, we had a predicate that ran on 1,000 rows in a 500,000 row strong table. Our customer had millions of rows. But what happens if you query small data sets? Let’s reduce the predicate to this:

WHERE t.b BETWEEN 0 AND 10

The benchmark result is now completely different:

Run 1, Statement 1 : +000000000 00:00:00.007093000
Run 1, Statement 2 : +000000000 00:00:00.047539000
Run 1, Statement 3 : +000000000 00:00:00.071546000

Run 2, Statement 1 : +000000000 00:00:00.003023000
Run 2, Statement 2 : +000000000 00:00:00.041259000
Run 2, Statement 3 : +000000000 00:00:00.052132000

Run 3, Statement 1 : +000000000 00:00:00.002767000
Run 3, Statement 2 : +000000000 00:00:00.034190000
Run 3, Statement 3 : +000000000 00:00:00.054023000

Run 4, Statement 1 : +000000000 00:00:00.003468000
Run 4, Statement 2 : +000000000 00:00:00.026141000
Run 4, Statement 3 : +000000000 00:00:00.047415000

Run 5, Statement 1 : +000000000 00:00:00.002818000
Run 5, Statement 2 : +000000000 00:00:00.026100000
Run 5, Statement 3 : +000000000 00:00:00.046875000

And as you can see, the DUAL@LOOPBACK solution actually worsens performance for these queries. The reason for this is that we’re now running, again, a NESTED LOOP JOIN (but not SEMI JOIN) rather than a HASH JOIN on the remote database:

Query on remote database:

SQL_ID  7349t2363uc9m, child number 0
-------------------------------------
SELECT 0 FROM "T" "A2","U" "A1" WHERE "A2"."B">=0 AND "A2"."B"=0

Plan on remote database:

Plan hash value: 2558931407
 
--------------------------------------------------------------------------
| Operation                            | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
| SELECT STATEMENT                     |      |      1 |        |      1 |
|  NESTED LOOPS                        |      |      1 |     10 |      1 |
|   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |     10 |      1 |
|    INDEX RANGE SCAN                  | I_T  |      1 |     10 |      1 |
|   INDEX UNIQUE SCAN                  | PK_U |      1 |      1 |      1 |
--------------------------------------------------------------------------

I haven’t analysed what the reason for this difference is, as the difference is not significant enough, compared to the improvement for large data sets.

Conclusion

Tuning queries over database links is hard. Much much harder than tuning “ordinary” queries. Ideally, you’ll simply avoid database links and run all queries on a single instance. But sometimes that’s not possible.

In that case, the best solution is to move the logic to the remote query completely and collect only the result. Ideally, this is done using a stored procedure on the remote database and calculating this 1/0 result completely remotely. I think, hipsters these days call this a Microservice, or better, a Lambda:

CREATE FUNCTION t_u RETURN NUMBER IS
  v_result NUMBER;
BEGIN
  SELECT CASE WHEN EXISTS (
    SELECT *
    FROM t 
    JOIN u USING (a)
    WHERE t.b BETWEEN 0 AND 1000
  ) THEN 1 ELSE 0 END
  INTO v_result
  FROM dual;
  
  RETURN v_result;
END t_u;
/

Comparing the benchmark call with the other options:

T.B BETWEEN 0 AND 10

Statement 1 : +000000000 00:00:00.003022000 -- Local query
Statement 2 : +000000000 00:00:00.027416000 -- DUAL
Statement 3 : +000000000 00:00:00.043823000 -- Remote DUAL
Statement 4 : +000000000 00:00:00.022181000 -- Remote stored procedure

T.B BETWEEN 0 AND 1000

Statement 1 : +000000000 00:00:00.002877000 -- Local query
Statement 2 : +000000000 00:00:00.188588000 -- Local DUAL
Statement 3 : +000000000 00:00:00.050163000 -- Remote DUAL
Statement 4 : +000000000 00:00:00.018736000 -- Remote stored procedure

But that, too, might not be possible as you may not have the required rights to create stored procedures on that database. You could call DBMS_SQL on the remote database and run a PL/SQL block dynamically on the remote database (didn’t try that in my benchmark).

Or, you simply use an occasional DUAL@LINK, which might already do the trick with a minimal change to the original query.

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

When you’re spoiled with Oracle’s fabulous query transformation capabilities and its really well-done cost-based optimiser, then you might forget how difficult SQL query tuning used to be in the “old days” or with those less sophisticated databases. Here’s a really nice explanation of the various means of implementing an ANTI-JOIN in MySQL:

http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/