Beware of Hidden PL/SQL to SQL Context Switches

I recently stumbled upon a curious query on a customer’s productive Oracle database:

SELECT USER FROM SYS.DUAL

Two things caught my attention:

  • The query was executed many billions of times per month, accounting for about 0.3% of that system’s load. That’s 0.3% for something extremely silly!
  • I don’t think that customer would ever qualify the DUAL table as SYS.DUAL, which hints at some system functionality

I found it in Oracle Enterprise Manager, but you could also find it using a query like this one:

SELECT 
  sql_id, 
  executions, 
  elapsed_time, 
  ratio_to_report(elapsed_time) over() p, 
  sql_text
FROM v$sql
ORDER BY p DESC;

Why was this query being run so often? In Enterprise Manager, the query’s statistics overview displayed that the query originated from a function called STANDARD.USER (I don’t know yet where I could find this information in the dictionary views, manually).

Naively, I had always thought that the USER pseudo column or pseudo constant is some value from the context, but like many other functions, it’s really just a function in that package.

What does STANDARD.USER() do?

Now, I’m not 100% sure if that source code is something that I am allowed to reproduce from a legal perspective, this being Oracle and all. But if you run this query here, which I am freely allowing you to:

WITH s AS (
  SELECT s.*,
    MIN(CASE 
      WHEN upper(text) LIKE '%FUNCTION USER%' 
      THEN line END
    ) OVER () s
  FROM all_source s
  WHERE owner = 'SYS' 
  AND name = 'STANDARD'
  AND type = 'PACKAGE BODY'
)
SELECT text
FROM s
WHERE line >= s AND line < s + 6;

Then you might be able to see something like this:

  function USER return varchar2 is
  c varchar2(255);
  begin
        select user into c from sys.dual;
        return c;
  end;

This is just the result of some SQL query I’ve shown you. Any correspondence with actual source code is merely coincidental.

Let’s assume this were the actual source code of the STANDARD.USER() function. We can now clearly see that this very SQL query that I’ve observed before is being executed! Want to verify this?

Let’s benchmark

As always, I’m using the benchmark technique described here. The full benchmark logic is at the end of the article.

In essence, I’m comparing the performances of 500000 executions of this loop:

FOR i IN 1 .. v_repeat LOOP
  v := USER;
END LOOP;

With this one:

FOR i IN 1 .. v_repeat LOOP
  SELECT USER INTO v FROM dual;
END LOOP;

And this one:

FOR i IN 1 .. v_repeat LOOP
  v := sys_context('USERENV', 'CURRENT_USER');
END LOOP;

The result of this benchmark is:

Run 1, Statement 1 : 2.40509 (avg : 2.43158)
Run 1, Statement 2 : 2.13208 (avg : 2.11816)
Run 1, Statement 3 : 1.01452 (avg : 1.02081)

Run 2, Statement 1 : 2.41889 (avg : 2.43158)
Run 2, Statement 2 : 2.09753 (avg : 2.11816)
Run 2, Statement 3 : 1.00203 (avg : 1.02081)

Run 3, Statement 1 : 2.45384 (avg : 2.43158)
Run 3, Statement 2 : 2.09060 (avg : 2.11816)
Run 3, Statement 3 : 1.02239 (avg : 1.02081)

Run 4, Statement 1 : 2.39516 (avg : 2.43158)
Run 4, Statement 2 : 2.14140 (avg : 2.11816)
Run 4, Statement 3 : 1.06512 (avg : 1.02081)

Run 5, Statement 1 : 2.48493 (avg : 2.43158)
Run 5, Statement 2 : 2.12922 (avg : 2.11816)
Run 5, Statement 3 : 1.00000 (avg : 1.02081)

How to read this benchmark result? These aren’t actual times, which are not interesting, but relative times compared to the fastest run (run 5, statement 3 = 1). The explicit SELECT USER FROM DUAL is about half as fast as the SYS_CONTEXT call, and the USER call is a bit slower, even.

When re-running this query:

SELECT 
  sql_id, 
  executions, 
  ratio_to_report(elapsed_time) over() p, 
  sql_text
FROM v$sql
ORDER BY p DESC;

We can see:

SQL_ID          EXECUTIONS  P     SQL_TEXT
6r9s58qfu339c   1           0.26  DECLARE ...
1v717nvrhgbn9   2500000     0.14  SELECT USER FROM SYS.DUAL
...

So, this query has definitely been run way too many times, including the PL/SQL to SQL context switch that is involved.

I’m running this benchmark in Oracle 18.0.0.0.0 in Docker on a Windows machine. More close-to-the-metal and less virtualised setups might achieve more drastic results. See, e.g. Connor McDonald got a much better improvement from using SYS_CONTEXT:

In this particular case, The STANDARD.USER() reference was used very often in triggers to fill in audit columns of many tables. Very easy to fix. Just use sys_context('USERENV', 'CURRENT_USER') instead.

Full benchmark logic

SET SERVEROUTPUT ON

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 500000;
  v NUMBER;
BEGIN

  -- Repeat the whole benchmark several times to 
  -- avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1 .. v_repeat LOOP
      v := v + length(USER);
    END LOOP;
  
    INSERT INTO results VALUES (r, 1, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1 .. v_repeat LOOP
      SELECT v + length(USER) INTO v FROM dual;
    END LOOP;
      
    INSERT INTO results VALUES (r, 2, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1 .. v_repeat LOOP
      v := v + length(sys_context('USERENV', 'CURRENT_USER'));
    END LOOP;
      
    INSERT INTO results VALUES (r, 3, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) / 
           MIN(elapsed) OVER() AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE results;

Auto-Creation of Indexes in RDBMS

[…] generally speaking, I’m also surprised to see that in 2013 we’re creating our indexes manually.

Interesting thought! Has this thought ever occurred to you?

How this comment came about

Hackernews is very predictable. Our latest pro-SQL marketing campaign for jOOQ got quite a bit of traction as expected. It is easy to trigger love and hate for NoSQL databases with a little bit of humour, such as with Mark Madsen’s “history of databases in no-tation”.

A much more interesting and more serious blog post is Doug Turnbull’s “Codd’s Relational Vision – Has NoSQL Come Full Circle?”, which we are going to blog about soon, in a separate post. We’ve also put the latter on Hackernews and on Reddit, both of which generated tremendous traction for the subject. Comparing the current state of “NoSQL” with pre-Codd, pre-relational, pre-SQL times is clever and matches Michael Stonebraker’s and Joseph M. Hellerstein’s observations in “What Goes Around Comes Around”.

NoSQL is a movement that emerged out of necessity, when SQL databases did not evolve fast enough to keep up with what keen observers and Gartner now call “Webscale”, a new buzzword to name old things. But as history has shown, the old elephants can be taught new tricks, and SQL databases will eventually catch up.

Auto-creation of indexes in RDBMS

In the middle of the above Hackernews discussion, MehdiEG made this interesting observation about creating indexes manually being tedious. Indeed, why do we have to maintain all of our indexes manually? While platforms like Use-The-Index-Luke.com profit from teaching people how to do proper indexing, I wonder if a highly sophisticated database couldn’t gather statistics about a productive system and then generate suggestions for index additions / removals. Even more so, if the database is “absolutely sure”, it could also create/drop or at least activate/deactivate relevant indexes.

What does “absolutely sure” mean?

The Oracle database for instance, is already quite good at gathering relevant statistics and giving DBA hints about potentially effective new indexes, as it can simulate execution plans in case indexes were added. Some more information can be seen in this Stack Overflow question.

But wouldn’t it be great if Oracle (or SQL Server, DB2, any other database) had an auto-index-creation feature? On a productive system, the database could gather statistics for the longest-running queries, analyse their execution plans, simulate alternative execution plans in case potentially useful indexes were added to improve SELECT statements, or removed to improve INSERT, UPDATE, DELETE, MERGE statements. This wouldn’t be a simple task, as all available (or at least the 100 most executed) execution plans would have to be re-calculated to see how the newly added or removed index would impact the productive system.

There are a couple of things to note here:

  1. Fine-tuning indexing is easiest on a productive system. If you’re tuning your development environment, you will get most of the cases right. But only the productive system will show all those weird edge-cases that you simply cannot foresee
  2. Analysing the productive system is hard and is usually performed by the devops team or the DBA team. They’re often not the same people as the ones who developed the application / database. Since they often cannot access the DML or DDL of the application, it’s always good if they have some automatic tuning features such as the existing cost-based optimiser
  3. Blindly adding indexes without measuring is bad practice. If you know that a table is mostly-read-only, then you’re mostly-on-the-safe-side. But what happens if a table is often bulk updated? If a batch job creates large transactions with long UNDO / REDO logs? Each unnecessary index will only slow down the batch job, increasing the risk of race conditions, rollbacks or even deadlocks.

Automatic index creation or deletion could greatly improve the productive experience with commercial databases that already have many very useful tuning features. Let us hope that Oracle, IBM, Microsoft will hear us and build such a feature into their future databases!

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

When you’re spoiled with Oracle’s fabulous query transformation capabilities and its really well-done cost-based optimiser, then you might forget how difficult SQL query tuning used to be in the “old days” or with those less sophisticated databases. Here’s a really nice explanation of the various means of implementing an ANTI-JOIN in MySQL:

http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/