Impress Your Coworkers With the Incredible NATURAL FULL OUTER JOIN!


There are already only very few real-world use-cases for FULL [ OUTER ] JOIN, but maybe, you have run into this beast in the past. But when was the last time you’ve seen a NATURAL JOIN? Right.

A quick reminder from our article about JOINs:

FULL JOIN

A FULL JOIN is a type of OUTER JOIN that retains data from both sides of the JOIN operation, regardless if the JOIN predicate matches or not. For example, querying the Sakila database:

SELECT first_name, last_name, title
FROM actor
FULL JOIN film_actor USING (actor_id)
FULL JOIN film USING (film_id)

The result will look something like this:

first_name   last_name   title
--[ LEFT JOIN ] -------------------------
Jon          Doe                          <-- Actors that didn't play in any film
Claire       Miller                      
--[ INNER JOIN ] ------------------------
Samuel       Jackson     Django Unchained <-- Actors played in many films
Samuel       Jackson     Pulp Fiction
John         Travolta    Pulp Fiction     <-- Films featured several actors
--[ RIGHT JOIN ]-------------------------
                         Meh              <-- Films that don't have actors

Note the special “markers” that delimit the parts that would have also been contributed by LEFT JOIN, INNER JOIN, RIGHT JOIN. In a way, a FULL JOIN combines them all.

Handy, occasionally

NATURAL JOIN

This one is less obviously useful. It looks useful at first, because if we carefully design our schemas as in the Sakila database used above, then we can simplify our joins as such:

SELECT first_name, last_name, title
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film

Which is just syntax sugar for this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (<all common columns>)
JOIN film USING (<all common columns>)

So, a NATURAL JOIN will join two tables using all the columns they have in common. You’d think that this would be the useful outcome:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

Unfortunately, the outcome looks more like this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id, last_update)
JOIN film USING (film_id, last_update)

In other words: It makes no sense at all. In real world situations (and the Sakila database is already close enough to the real world), we’ll always have accidentally conflicting column names that will be considered for NATURAL JOIN, when they hsould not be.

OK, so why NATURAL FULL JOIN?

… you’re asking. One is very rarely needed, and the other is pretty useless in the real world. Not so fast!

I recently helped a customer who had two very similar tables that both contained payments. Let’s call them: PAYMENT and STANDING_ORDER, the latter being a PAYMENT template which generates actual PAYMENTs periodically.

Now, there was some PL/SQL logic that operated on payments, which after some change request, needed to operate both on payments and/or on standing orders. If this was Java, we’d just (ab)use subtype polymorphism for the quick win, or factor out common logic into common component types for the thorough solution. For instance, these kinds of columns certainly appear in both cases:

“System” columns

  • ID – The surrogate key
  • CREATED_AT – Audit info
  • MODIFIED_AT – Audit info
  • DELETED – Soft deletion

“Business” columns

  • AMOUNT
  • CURRENCY

There are many more common columns. There are other columns that are specific to one or the other type:

Payment only

  • STANDING_ORDER_ID – Only payments reference the standing order that may have created them.

Standing order only

  • PERIODICITY – This is a standing order’s core feature.

Now, in order to refactor that PL/SQL logic, I needed a row type that combines all these columns easily. I didn’t want to manually track the exact columns of both tables, I wanted this to always work, even if the maintainers of the tables add/remove/rename columns.

NATURAL FULL JOIN to the rescue

I created a view like this:

CREATE VIEW like_a_payment
SELECT *
FROM payment
NATURAL FULL JOIN standing_order
WHERE 1 = 0

This view itself doesn’t do anything. It just sits there and produces the row type I need. Why? Because the NATURAL JOIN will join by ALL the common columns, namely ID, CREATED_AT, MODIFIED_AT, DELETED, AMOUNT, CURRENCY and then adds the columns from the left table, namely STANDING_ORDER_ID, and then from the right table, namely PERIODICITY.

I can now use this type in PL/SQL:

DECLARE
  paym like_a_payment%ROWTYPE;
BEGIN
  -- Do things...
END;

But why FULL?

True, I didn’t need FULL yet, because I’m not producing any rows anyway. Correct. But observe how I will populate the PAYM local variable in the PL/SQL block:

DECLARE
  l_paym_id like_a_payment.id%TYPE;
  l_paym    like_a_payment%ROWTYPE;
BEGIN
  l_paym_id := 1;

  SELECT *
  INTO l_paym
  FROM (
    SELECT * FROM payment WHERE id = l_paym_id
  )
  NATURAL FULL JOIN (
    SELECT * FROM standing_order WHERE id = l_paym_id
  );
END;

And I’m done! This could not have been done any more easily!

  • I’m using NATURAL because now I have to use it all the time to populate the record (or select individual columns, but meh)
  • I’m using FULL because only one of the two queries will produce a result, and the remaining columns will be NULL

Excellent!

Some observations

Of course, this is kind of a hack. Please don’t abuse this too often. Here’s some hints to follow:

  • The refactoring would have been substantial. I needed a quick win.
  • The logic actually needs almost all the columns from the payments. Otherwise, you might want to review that SELECT * usage.
  • The derived tables are necessary in Oracle for performance reasons.
  • The actual code contained two distinct queries, each one joining “one side” with an empty substitute query, as we know that the ID is either a payment or a standing order.

Having said so, now you know one of the very few real world use-cases of NATURAL FULL JOIN. Put this in your code base and impress your coworkers!

Beautiful SQL: Lateral Unnesting of Array Columns


Sometimes, SQL can just be so beautiful. One of the less mainstream features in SQL is the array type (or nested collections). In fact, it’s so not mainstream that only 2 major databases actually support it: Oracle and PostgreSQL (and HSQLDB and H2 in the Java ecosystem).

In PostgreSQL, you can write:

CREATE TABLE blogs (
  id    SERIAL NOT NULL PRIMARY KEY,
  title text   NOT NULL,
  tags  text[]
)

Or in Oracle:

-- Oracle only knows nominal array types, so we have to declare
-- them in advance
CREATE TYPE tag_t AS VARRAY(100) OF VARCHAR2(100 CHAR);

CREATE TABLE blogs (
  id    NUMBER(18) GENERATED BY DEFAULT AS IDENTITY 
                   NOT NULL PRIMARY KEY,
  title VARCHAR2(100 CHAR) NOT NULL,
  tags  tag_t
)

So, roughly the same thing. Now, let’s insert some data. How about the 3 most recent posts on the jOOQ blog, prior to this one:

In PostgreSQL:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  ARRAY[
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ]
), (
  'How to Execute SQL Batches With JDBC and jOOQ',
  ARRAY[
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ]
), (
  'How to Emulate Partial Indexes in Oracle',
  ARRAY[
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ]
)

Or in Oracle:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  tag_t(
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Execute SQL Batches With JDBC and jOOQ',
  tag_t(
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Emulate Partial Indexes in Oracle',
  tag_t(
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ));

Now, the array type by itself is not very useful. When it gets really interesting is when we unnest it again into a table. For instance in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) AS tags(tag);

Or in Oracle:

-- Classic style
SELECT title, tags.*
FROM blogs, TABLE(tags) tags;

-- Since Oracle 12c
SELECT title, tags.*
FROM blogs, LATERAL (SELECT * FROM TABLE(tags)) tags;

Note that we’re using the keyword LATERAL in some of the above queries. For those of you who are used to T-SQL syntax, it’s almost the same thing as APPLY. Both LATERAL and APPLY are also very useful with table valued functions (stay tuned for a blog post on those).

The idea behind LATERAL is that the table (derived table, subquery, function call, array unnesting) on the right side of LATERAL can “laterally” access stuff from the left side of LATERAL in order to produce new tables. In the above query, we’re producing a new table of tags for each blog post, and then we cross join the two tables.

Here’s what the above queries result in:

title                                                         tag
-----------------------------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   implicit cursor
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   batch
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   oracle
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jooq
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jdbc
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   resultset
How to Execute SQL Batches With JDBC and jOOQ                 batch
How to Execute SQL Batches With JDBC and jOOQ                 batch statement
How to Execute SQL Batches With JDBC and jOOQ                 mysql
How to Execute SQL Batches With JDBC and jOOQ                 jooq
How to Execute SQL Batches With JDBC and jOOQ                 jdbc
How to Execute SQL Batches With JDBC and jOOQ                 sql server
How to Execute SQL Batches With JDBC and jOOQ                 sql
How to Emulate Partial Indexes in Oracle                      optimisation
How to Emulate Partial Indexes in Oracle                      index
How to Emulate Partial Indexes in Oracle                      partial index
How to Emulate Partial Indexes in Oracle                      oracle
How to Emulate Partial Indexes in Oracle                      sql
How to Emulate Partial Indexes in Oracle                      postgresql
How to Emulate Partial Indexes in Oracle                      t-sql
How to Emulate Partial Indexes in Oracle                      sql server

You can immediately see the cross join semantics here, as we’re combining each tag (per post) with its post.

Looking for ordinals (i.e. the tag number inside of the array) along with the array? Easy!

Just add the powerful WITH ORDINALITY clause after the UNNEST() call in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) WITH ORDINALITY AS tags(tag);

A bit more complicated to emulate in Oracle:

-- Fancy, with a window function
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROW_NUMBER() OVER (ORDER BY NULL)
  FROM TABLE(tags) tags
) tags;

-- Classic, with ROWNUM
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROWNUM
  FROM TABLE(tags) tags
) tags;

The result now contains the tag “ID”, i.e the ordinal of the tag inside of the array:

title                                           tag               ordinal
-------------------------------------------------------------------------
How to Fetch ... Cursors with JDBC and jOOQ     implicit cursor   1
How to Fetch ... Cursors with JDBC and jOOQ     batch             2
How to Fetch ... Cursors with JDBC and jOOQ     oracle            3
How to Fetch ... Cursors with JDBC and jOOQ     jooq              4
How to Fetch ... Cursors with JDBC and jOOQ     jdbc              5
How to Fetch ... Cursors with JDBC and jOOQ     resultset         6
How to Execute SQL Batches With JDBC and jOOQ   batch             1
How to Execute SQL Batches With JDBC and jOOQ   batch statement   2
How to Execute SQL Batches With JDBC and jOOQ   mysql             3
How to Execute SQL Batches With JDBC and jOOQ   jooq              4
How to Execute SQL Batches With JDBC and jOOQ   jdbc              5
How to Execute SQL Batches With JDBC and jOOQ   sql server        6
How to Execute SQL Batches With JDBC and jOOQ   sql               7
How to Emulate Partial Indexes in Oracle        optimisation      1
How to Emulate Partial Indexes in Oracle        index             2
How to Emulate Partial Indexes in Oracle        partial index     3
How to Emulate Partial Indexes in Oracle        oracle            4
How to Emulate Partial Indexes in Oracle        sql               5
How to Emulate Partial Indexes in Oracle        postgresql        6
How to Emulate Partial Indexes in Oracle        t-sql             7
How to Emulate Partial Indexes in Oracle        sql server        8

Now, imagine looking for those blog posts that are tagged “jooq”. Easy!

PostgreSQL:

SELECT title
FROM blogs
WHERE 'jooq' = ANY(tags);

Oracle:

SELECT title
FROM blogs
WHERE 'jooq' IN (SELECT * FROM TABLE(tags));

Yielding:

title
-----------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ
How to Execute SQL Batches With JDBC and jOOQ

Conclusion

These are just a few nice things we can do when we denormalise our data into nested collections / arrays, and then use features like UNNEST to bring them back to the table level. Both Oracle and PostgreSQL support a variety of really nice features building on top of arrays, so do check them out!

How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ


Earlier this week, I’ve blogged about how to execute SQL batches with JDBC and jOOQ. This was useful for the MySQL, SQL Server, and Sybase users among you.

Today, we’ll discuss a slightly more difficult task, how to fetch Oracle 12c implicit cursors – which are essentially the same thing.

What’s an implicit cursor?

Oracle 12c added new procedures to their dynamic SQL API DBMS_SQL. Just run the following query in SQL Developer to see the results:

DECLARE
  c1 sys_refcursor;
  c2 sys_refcursor;
BEGIN
  OPEN c1 FOR SELECT 1 AS a FROM dual;
  dbms_sql.return_result(c1);
  OPEN c2 FOR SELECT 2 AS b FROM dual;
  dbms_sql.return_result(c2);
END;

The anonymous PL/SQL block contains two cursors that are opened and returned to whoever calls this block using DBMS_SQL.RETURN_RESULT. This is kind of magic, as we’re calling a procedure, passing a cursor to it, and somehow, this has a side effect on the client of this program after the program ends.

Not only can you do this in anonymous PL/SQL blocks, you can nest these calls in any procedure, of course. So, in other words, from Oracle 12c onwards, you don’t know for sure if you call a procedure if there will be more results than what you can see. For instance:

BEGIN
  any_procedure();
END;

The above call might just as well yield some implicit cursors. You can’t know for sure.

How to discover implicit cursors with JDBC

With JDBC, if you don’t know for sure what your query will yield as a result, you use the Statement.execute(String), or the PreparedStatement.execute() method to find out. As mentioned in the previous post, this is what you would do:

try (PreparedStatement s = c.prepareStatement(sql)) {
    fetchLoop:
    for (int i = 0, updateCount = 0;; i++) {
        boolean result = (i == 0)
            ? s.execute()
            : s.getMoreResults();
 
        if (result)
            try (ResultSet rs = s.getResultSet()) {
                System.out.println("\nResult:");
 
                while (rs.next())
                    System.out.println("  " + rs.getInt(1));
            }
        else if ((updateCount = s.getUpdateCount()) != -1)
            System.out.println("\nUpdate Count: " + updateCount);
        else
            break fetchLoop;
    }
}

Unfortunately, that won’t work on Oracle as Oracle’s JDBC driver doesn’t implement the JDBC spec correctly. I’ve documented this flaw in length on this Stack Overflow question here.

Using ojdbc, the following “improved” loop needs to be written:

/* Alternatively, use this for non-PreparedStatements:
try (Statement s = cn.createStatement()) {
    Boolean result = s.execute(sql); */
try (PreparedStatement s = cn.prepareStatement(sql)) {
    // Use good old three-valued boolean logic
    Boolean result = s.execute();

    fetchLoop:
    for (int i = 0;; i++) {

        // Check for more results if not already done in 
        // this iteration
        if (i > 0 && result == null)
            result = s.getMoreResults();
        System.out.println(result);

        if (result) {
            result = null;

            try (ResultSet rs = s.getResultSet()) {
                System.out.println("Fetching result " + i);
            }
            catch (SQLException e) {
                // Ignore ORA-17283: No resultset available (1)
                if (e.getErrorCode() == 17283)
                    continue fetchLoop;
                else
                    throw e;
            }
        }
        else if (s.getUpdateCount() == -1)
            // Ignore -1 value if there is one more result! (2)
            if (result = s.getMoreResults())
                continue fetchLoop;
            else
                break fetchLoop;
    }
}

Two elements of the above logic need more explanation:

  1. There’s a possibility of an ORA-17283: No resultset available error being raised when accessing the Statement.getResultSet() despite the previous call to Statement.execute() yielding true. If that happens, we’ll just ignore the error and try fetching another result set
  2. In case we’re using PreparedStatement, the original call to PreparedStatement.execute() will yield false (!) and the Statement.getUpdateCount() value is -1, which would normally mean that we should stop. Not in this case. Let’s just try one more time to get a result set, and tah-dah, here are our implicit result sets.

Note that the algorithm now works with both static Statement and PreparedStatement, which (very unfortunately) behave differently when calling execute().

The above will now work with any SQL statement. In case you’re using the previous SQL statement returning implicit cursors:

String sql =
    "\nDECLARE"
  + "\n  c1 sys_refcursor;"
  + "\n  c2 sys_refcursor;"
  + "\nBEGIN"
  + "\n  OPEN c1 FOR SELECT 1 AS a FROM dual;"
  + "\n  dbms_sql.return_result(c1);"
  + "\n  OPEN c2 FOR SELECT 2 AS a FROM dual;"
  + "\n  dbms_sql.return_result(c2);"
  + "\nEND;";

… you will now be able to fetch all the results:

true
true
Fetching result 1
true
Fetching result 2
false

How to get those cursors with jOOQ?

With jOOQ 3.10 (as always), you don’t need to worry about those low level JDBC details. Just call the following code:

System.out.println(
    DSL.using(cn).fetchMany(sql)
);

And you’ll get a convenient, object oriented representation of your multiple result sets in the form of an org.jooq.Results:

Result set:
+----+
|   A|
+----+
|   1|
+----+
Result set:
+----+
|   A|
+----+
|   2|
+----+

Even better, when you use a code generator to return multiple implicit cursors like this in a stored procedure, just call the generated stored procedure object like this, to get all the cursors automatically:

MyProcedure p = new MyProcedure();
p.setParameter1(x);
p.setParameter2(y);
p.execute(configuration);
Results results = p.getResults();

for (Result<?> result : results)
  for (Record record : result)
    System.out.println(record);

Done!

How to Emulate Partial Indexes in Oracle


A very interesting feature of the SQL Server and PostgreSQL databases (and some others, including SQLite) is the partial index (sometimes also called “filtered index”). That’s an index that contains only “parts” of the table data. For instance, we can write the following index in SQL Server and PostgreSQL:

CREATE INDEX i ON message WHERE deleted = 1;

Let’s imagine you have a house keeping job that periodically removes deleted messages. Now, let’s assume you have discovered, that only 0.1% of all messages are really deleted, so an index on the DELETED column is very selective if you’re looking for deleted messages.

On the other hand, it’s not selective at all if you’re looking for non-deleted messages, as such a query would return 99.9% of all messages, in case of which a full table scan is more efficient.

So, since the index is never useful for non-deleted messages, why index those messages at all? If we can avoid indexing non-deleted messages, then we can:

  • Save a lot of disk space, as the index will be much smaller
  • Save a lot of time inserting into the messages table, since we don’t have to update the index all the time
  • Save a lot of time scanning the index, since it will contain a lot less blocks

Unfortunately, Oracle doesn’t support this feature

… but “luckily”, Oracle has another controversial “feature”. In Oracle, all indexes are partial indexes, because they don’t contain NULL values. This is probably due to an ancient optimisation (remember, partial indexes are smaller), which occasionally gets into your way, performance wise, if you do want to query for NULL values.

But in this case, it’s useful. Check this out:

CREATE TABLE message(deleted number(1));

CREATE INDEX i ON message (
  CASE WHEN deleted > 0 THEN deleted END
);

The above index is a function-based index, i.e. an index that contains not the value of the deleted column itself, but an expression based on it. Concretely, it contains only deleted values that are strictly greater than zero, because if the value is zero, then it is turned to NULL by the CASE expression, and Oracle doesn’t index NULL values. Check this out:

INSERT INTO message
SELECT DECODE(LEVEL, 1, 1, 0)
FROM dual
CONNECT BY LEVEL < 100000;

The above query is inserting a single row containing a deleted value of 1, and almost 100k rows containing a value of 0. The insert is very quick, because only one row has to be added to the index. The other almost 100k rows are skipped:

EXEC dbms_stats.gather_table_stats('SANDBOX', 'MESSAGE');

SELECT NUM_ROWS, BLOCKS
FROM SYS.ALL_TAB_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

SELECT NUM_ROWS, LEAF_BLOCKS
FROM SYS.ALL_IND_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

The result is:

NUM_ROWS       BLOCKS
---------------------
   99999          152 <-- table size

NUM_ROWS  LEAF_BLOCKS
---------------------
       1            1 <-- index size

The “trouble” with this kind of emulation is: It’s a function-based index. We can use this index only if we really reproduce the same “function” (or in this case, expression) as in the index itself. So, in order to fetch all the deleted messages, we must not write the following query:

SELECT *
FROM message
WHERE deleted = 1;

But this one, instead:

SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

Check out execution plans:

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE deleted = 1;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

SELECT * FROM TABLE(dbms_xplan.display);

The output being:

------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         | 50000 |   146K|    44   (3)|
|*  1 |  TABLE ACCESS FULL| MESSAGE | 50000 |   146K|    44   (3)|
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("DELETED"=1)

And

----------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     3 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| MESSAGE |     1 |     3 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | I       |     1 |       |     1   (0)|
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(CASE  WHEN "DELETED">0 THEN "DELETED" END =1)

As you can see, the first query runs a full table scan, estimating to retrieve 50% of all the rows, when the actual result is only 1 row as can be seen in the second execution plan!

Insertion speed

What’s even more impressive is the difference in insertion speed. Consider the following code block, which measures the time it takes to insert 1 million times 0 and one million times 1:

SET SERVEROUTPUT ON
DECLARE
  ts TIMESTAMP;
BEGIN
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 0 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
  
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 1 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
END;
/

The result being:

+000000000 00:00:01.501000000
+000000000 00:00:08.162000000

The insertion is much faster if we don’t have to modify the index!

Conclusion

Partial indexes are a very neat trick in cases where your data is highly skewed and some values in a column are extremely rare and very frequently queried. They may drastically reduce the index size, which greatly improves performance in some situations, including inserting into the table, and querying the index.

In Oracle, they can be emulated using function-based indexes, which means you have to use the exact function expression from the index also in queries, in order to profit. But it may well be worth the trouble!

Use jOOQ to Read / Write Oracle PL/SQL RECORD Types


Some of the biggest limitations when working with Oracle PL/SQL from Java is the lack of support for a variety of PL/SQL features through the JDBC interface. This lack of support is actually not limited to JDBC, but also extends to Oracle SQL. For instance, if you’re using the useful PL/SQL BOOLEAN type as such:

CREATE OR REPLACE FUNCTION yes RETURN boolean AS
BEGIN
  RETURN true;
END yes;
/

It would now be terrific if you could do this:

SELECT yes FROM dual;

But it’s not possible. You’ll be getting an error along the lines of the following:

ORA-06552: PL/SQL: Statement ignored
ORA-06553: PLS-382: expression is of wrong type
06552. 00000 -  "PL/SQL: %s"
*Cause:    
*Action:

It’s crazy to think that the Oracle SQL language still doesn’t support the SQL standard boolean type, which is so useful as I’ve shown in previous blog posts. Here’s where you can upvote the feature request:

https://community.oracle.com/ideas/2633

BOOLEAN isn’t the only “inaccessible” SQL feature

Now, there are a couple of other data types, which cannot be “bridged” to the SQL engine, and thus (for some reason only the OJDBC driver gods can fathom) cannot be “bridged” to a JDBC client. Among them: The very useful PL/SQL RECORD type.

Very often, you want to do this:

CREATE PACKAGE customers AS
  TYPE person IS RECORD (
    first_name VARCHAR2(50),
    last_name VARCHAR2(50)
  );
  
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person;
END customers;
/

CREATE PACKAGE BODY customers AS
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person IS
    v_person customers.person;
  BEGIN
    SELECT c.first_name, c.last_name
    INTO v_person
    FROM customer c
    WHERE c.customer_id = p_customer_id;
    
    RETURN v_person;
  END get_customer;
END customers;
/

(we’re running this on the SAKILA database).

As in any language with the least bit of sophistication, we can define “structs” or records in PL/SQL, which we can now frequently reuse. Everyone knows what a PERSON is and we can pass them around between procedures and functions.

For instance, in PL/SQL client code:

SET SERVEROUTPUT ON
DECLARE
  v_person customers.person;
BEGIN
  v_person := customers.get_customer(1);
  
  dbms_output.put_line(v_person.first_name);
  dbms_output.put_line(v_person.last_name);
END;
/

… which yields:

MARY
SMITH

What about JDBC client code?

After having added support for PL/SQL BOOLEAN types in jOOQ 3.9, with jOOQ 3.9, we now finally support PL/SQL record types in stored procedures as well, at least in standalone calls, which are not embedded in SQL statements. The jOOQ code generator will pick up all of these package-level PL/SQL record types and their structures and generate the boring boiler plate code for you. E.g. (simplified):

package org.jooq.demo.sakila.packages.customers.records;

public class PersonRecord extends UDTRecordImpl<PersonRecord> {
    public void   setFirstName(String value) { ... }
    public String getFirstName()             { ... }
    public void   setLastName(String value)  { ... }
    public String getLastName()              { ... }

    public PersonRecord() { ... }
    public PersonRecord(String firstName, String lastName) { ... }
}

Notice how jOOQ doesn’t really make any difference in its API between the generated code for an Oracle SQL OBJECT type or an Oracle PL/SQL RECORD type. They’re essentially the same thing (from a jOOQ API perspective).

More interesting, what happened to the generated package and the function? This code is generated (simplified):

public class Customers extends PackageImpl {
    public static PersonRecord getCustomer(
        Configuration configuration, Long pCustomerId
    ) { ... }
}

That’s all! So all we now need to do is pass the ubiquitous jOOQ Configuration (which contains information such as SQLDialect or JDBC Connection) and the actual stored function parameter, the P_CUSTOMER_ID value, and we’re done!

This is how jOOQ client code might look:

PersonRecord person = Customers.getCustomer(configuration(), 1L);
System.out.println(person);

As you can see, this is just the same thing as the corresponding PL/SQL code. And the output of this println call is this:

SAKILA.CUSTOMERS.PERSON('MARY', 'SMITH')

A fully qualified RECORD declaration with schema, package, and type name.

How does it work?

Let’s turn on jOOQ’s built-in TRACE logging to see what jOOQ did behind the scenes:

Calling routine          : 
  DECLARE
    r1 "CUSTOMERS"."PERSON";
  BEGIN
    r1 := "CUSTOMERS"."GET_CUSTOMER"("P_CUSTOMER_ID" => ?);
    ? := r1."FIRST_NAME";
    ? := r1."LAST_NAME";
  END;
Binding variable 1       : 1 (class java.lang.Long)
Registering variable 2   : class java.lang.String
Registering variable 3   : class java.lang.String
Fetched OUT parameters   : +-----------------+
                         : |RETURN_VALUE     |
                         : +-----------------+
                         : |('MARY', 'SMITH')|
                         : +-----------------+

So, jOOQ usually doesn’t use JDBC’s very limited escape syntax to call stored procedures, it just produces an anonymous PL/SQL block with a local variable of type CUSTOMER.PERSON, i.e. of our RECORD type. The function call is then assigned to this local variable, and the local variable is descructured into its individual parts.

In the TRACE log, you can see the individual bind variables, i.e. there’s an IN variable at index 1 of type Long, and two OUT variables of type String at indexes 2 and 3.

How does jOOQ know the record types?

At runtime, all the information is hard-wired to the generated code. So, the magic is inside of the code generator. Warning: Some serious SQL ahead!

This beauty here queries the dictionary views for PL/SQL record types:

AllArguments a = ALL_ARGUMENTS.as("a");
AllArguments x = ALL_ARGUMENTS.as("x");
Field<BigDecimal> nextSibling = field(name("next_sibling"), x.SEQUENCE.getDataType());

DSL.using(configuration)
   .select(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME)
   .select(
       a.ARGUMENT_NAME                                               .as(ALL_TYPE_ATTRS.ATTR_NAME),
       a.SEQUENCE                                                    .as(ALL_TYPE_ATTRS.ATTR_NO),
       a.TYPE_OWNER                                                  .as(ALL_TYPE_ATTRS.ATTR_TYPE_OWNER),
       nvl2(a.TYPE_SUBNAME, a.TYPE_NAME, inline(null, a.TYPE_NAME))  .as("package_name"),
       coalesce(a.TYPE_SUBNAME, a.TYPE_NAME, a.DATA_TYPE)            .as(ALL_TYPE_ATTRS.ATTR_TYPE_NAME),
       a.DATA_LENGTH                                                 .as(ALL_TYPE_ATTRS.LENGTH),
       a.DATA_PRECISION                                              .as(ALL_TYPE_ATTRS.PRECISION),
       a.DATA_SCALE                                                  .as(ALL_TYPE_ATTRS.SCALE))
   .from(a)
   .join(table(
       select(
           a.TYPE_OWNER,
           a.TYPE_NAME,
           a.TYPE_SUBNAME,
           min(a.OWNER        ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.OWNER),
           min(a.PACKAGE_NAME ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.PACKAGE_NAME),
           min(a.SUBPROGRAM_ID).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SUBPROGRAM_ID),
           min(a.SEQUENCE     ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SEQUENCE),
           min(nextSibling    ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(nextSibling),
           min(a.DATA_LEVEL   ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.DATA_LEVEL))
      .from(table(
          select(
              lead(a.SEQUENCE, 1, a.SEQUENCE).over(
                  partitionBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.DATA_LEVEL)
                 .orderBy(a.SEQUENCE)
              ).as("next_sibling"),
              a.TYPE_OWNER,
              a.TYPE_NAME,
              a.TYPE_SUBNAME,
              a.OWNER,
              a.PACKAGE_NAME,
              a.SUBPROGRAM_ID,
              a.SEQUENCE,
              a.DATA_LEVEL,
              a.DATA_TYPE)
         .from(a)
         .where(a.OWNER.in(getInputSchemata()))
      ).as("a"))
      .where(a.TYPE_OWNER.in(getInputSchemata()))
      .and(a.OWNER.in(getInputSchemata()))
      .and(a.DATA_TYPE.eq("PL/SQL RECORD"))
      .groupBy(a.TYPE_OWNER, a.TYPE_NAME, a.TYPE_SUBNAME)
   ).as("x"))
   .on(row(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID).eq(x.OWNER, x.PACKAGE_NAME, x.SUBPROGRAM_ID))
   .and(a.SEQUENCE.between(x.SEQUENCE).and(nextSibling))
   .and(a.DATA_LEVEL.eq(x.DATA_LEVEL.plus(one())))
   .orderBy(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME, a.SEQUENCE)
   .fetch();

This is a nice little jOOQ query, which corresponds to the following equially impressive SQL query, which you can run directly in your SQL developer, or some other SQL client for Oracle:

SELECT 
  "x"."TYPE_OWNER",
  "x"."TYPE_NAME",
  "x"."TYPE_SUBNAME",
  "a"."ARGUMENT_NAME" "ATTR_NAME",
  "a"."SEQUENCE" "ATTR_NO",
  "a"."TYPE_OWNER" "ATTR_TYPE_OWNER",
  nvl2("a"."TYPE_SUBNAME", "a"."TYPE_NAME", NULL) "package_name",
  COALESCE("a"."TYPE_SUBNAME", "a"."TYPE_NAME", "a"."DATA_TYPE") "ATTR_TYPE_NAME",
  "a"."DATA_LENGTH" "LENGTH",
  "a"."DATA_PRECISION" "PRECISION",
  "a"."DATA_SCALE" "SCALE"
FROM "SYS"."ALL_ARGUMENTS" "a"
JOIN (
  SELECT 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME",
    MIN("a"."OWNER") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "OWNER",
    MIN("a"."PACKAGE_NAME") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "PACKAGE_NAME",
    MIN("a"."SUBPROGRAM_ID") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SUBPROGRAM_ID",
    MIN("a"."SEQUENCE") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SEQUENCE",
    MIN("next_sibling") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "next_sibling",
    MIN("a"."DATA_LEVEL") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "DATA_LEVEL"
  FROM (
    SELECT 
	  lead("a"."SEQUENCE", 1, "a"."SEQUENCE") OVER (
	    PARTITION BY "a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID", "a"."DATA_LEVEL" 
		ORDER BY "a"."SEQUENCE" ASC
	  ) "next_sibling",
      "a"."TYPE_OWNER",
      "a"."TYPE_NAME",
      "a"."TYPE_SUBNAME",
      "a"."OWNER",
      "a"."PACKAGE_NAME",
      "a"."SUBPROGRAM_ID",
      "a"."SEQUENCE",
      "a"."DATA_LEVEL",
      "a"."DATA_TYPE"
    FROM "SYS"."ALL_ARGUMENTS" "a"
    WHERE "a"."OWNER" IN ('SAKILA', 'SYS')     -- Possibly replace schema here
    ) "a"
  WHERE ("a"."TYPE_OWNER" IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."OWNER"         IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."DATA_TYPE"      = 'PL/SQL RECORD')
  GROUP BY 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME"
  ) "x"
ON (("a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID")
 = (("x"."OWNER", "x"."PACKAGE_NAME", "x"."SUBPROGRAM_ID"))
AND "a"."SEQUENCE" BETWEEN "x"."SEQUENCE" AND "next_sibling"
AND "a"."DATA_LEVEL" = ("x"."DATA_LEVEL" + 1))
ORDER BY 
  "x"."TYPE_OWNER" ASC,
  "x"."TYPE_NAME" ASC,
  "x"."TYPE_SUBNAME" ASC,
  "a"."SEQUENCE" ASC

Whew.

The output shows that we got all the required information for our RECORD type:

[  TYPE INFO COLUMNS  ]   ATTR_NAME   ATTR_TYPE_NAME  LENGTH
SAKILA CUSTOMERS PERSON	  FIRST_NAME  VARCHAR2        50
SAKILA CUSTOMERS PERSON	  LAST_NAME   VARCHAR2        50

All of this also works for:

  • Nested types
  • Multiple IN and OUT parameters

I’ll blog about a more advanced use-case in the near future, so stay tuned.

Why You Should (Sometimes) Avoid Expressions in SQL Predicates


I’ve recently discovered a rather significant performance issue on a productive Oracle 11g customer database. And I’m sure you have this issue too, which is why I’m documenting it here.

This is a simplified representation of the setup at the customer site:

        ID PAYMENT_DATE TEXT                               
---------- ------------ -----------------------------------
     33803 21.05.16     DcTNBOrkQIgMtbietUWOsSFNMIqGLlDw...
     29505 09.03.16     VIuPaOAQqzCMlFBYPQtvqUSbWYPDndJD...
     10738 25.07.16     TUkxGpZPrGKaHzDRxczrktkFWvGmiwjR...

Let’s produce the above table with the following statement:

CREATE TABLE payment (
  id            NOT NULL PRIMARY KEY,
  payment_date  NOT NULL,
  text
) AS
SELECT
  level,
  SYSDATE - dbms_random.value(1, 500),
  dbms_random.string('a', 500)
FROM dual
CONNECT BY level <= 50000
ORDER BY dbms_random.value;

CREATE INDEX i_payment_date ON payment(payment_date);

EXEC dbms_stats.gather_table_stats('TEST', 'PAYMENT');

What we’re doing here: There’s a payment table with an ID primary key column, an indexed payment_date column and some “text” info. The real table is, of course, much bigger than this.

Now, this table needs infrequent house keeping. In nightly batch jobs, the customer ran through the table and removed all payments that were older than some fixed amount of days, e.g.:

DELETE
FROM payment
WHERE payment_date < SYSDATE - 470 // Older than 470 days

This might look fine at first, because:

  1. We have an index on payment_date, and it can be used for the deletion
  2. SYSDATE is constant per query, so the expression SYSDATE - 470 should also be constant per query

We can prove both statements easily:

1. Index usage

Let’s run:

EXPLAIN PLAN FOR
DELETE
FROM payment
WHERE payment_date < SYSDATE - 470;

SELECT * FROM TABLE(dbms_xplan.display);

And we’re getting:

----------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                |  3008 |    10   (0)| 00:00:01 |
|   1 |  DELETE           | PAYMENT        |       |            |          |
|*  2 |   INDEX RANGE SCAN| I_PAYMENT_DATE |  3008 |    10   (0)| 00:00:01 |
----------------------------------------------------------------------------
                                                                            
Predicate Information (identified by operation id):                         
---------------------------------------------------
                                                                            
   2 - access("PAYMENT_DATE"<SYSDATE@!-470)                                 

So, we get a relatively cheap access predicate on the I_PAYMENT_DATE index, and there’s this weird SYSDATE@!-470 constant in the plan.

2. SYSDATE being constant

There’s a lot of confusion about what the actual value of these non-deterministic, non-pure date/time functions is in each database. Sometimes, the function is evaluated on each row individually and produces a new value each time. In my opinion, that’s the worst, as such functions are completely unpredictable.

Sometimes, there’s a guarantee that these timestamps stay the same for the entire duration of a transaction. That’s a bit weird as a default, but OK why not.

In Oracle, it seems that SYSDATE (and SYSTIMESTAMP) are evaluated only a single time on a per-statement level. This would explain that weird SYSDATE@! notation in the execution plan, and it can be “proven” by doing something like this:

CREATE OR REPLACE FUNCTION sleep(i NUMBER) RETURN NUMBER 
AS
BEGIN
  dbms_lock.sleep(i);
  RETURN i;
END;
/

SELECT 
  min(CAST(sysdate AS TIMESTAMP)), 
  max(CAST(sysdate AS TIMESTAMP))
FROM dual
CONNECT BY level <= 5 AND sleep(1) > 0;

DROP FUNCTION sleep;
/

The result of this statement (which unsurprisingly takes around 4 seconds to execute) is:

MIN(CAST(SYSDATEASTIMESTAMP)) MAX(CAST(SYSDATEASTIMESTAMP))
----------------------------- -----------------------------
01.11.16 11:15:20.000000000   01.11.16 11:15:20.000000000  

Some background info on this OTN thread here. Unfortunately, the SYSDATE documentation fails to clearly specify this behaviour…

So, what’s the problem?

Even if we’re probably relying on an undocumented feature, it looks like everything is fine, right? SYSDATE - 470 is more or less a constant, so we’re fine putting it in that WHERE clause.

Wrong!

Let’s run a benchmark, comparing 3 times an equivalent query running each query 100 times (and for repeatability, we use SELECT, not DELETE):

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 100;
  v_range CONSTANT NUMBER := 470;
  v_date CONSTANT DATE := SYSDATE - v_range;
BEGIN
  v_ts := SYSTIMESTAMP;
  
  -- Original query with inline expression
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < SYSDATE - v_range
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
  
  -- Pre-calculated PL/SQL local variable
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < v_date
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
  
  -- Magical 11g scalar subquery caching
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < (SELECT SYSDATE - v_range FROM dual)
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 3 : ' || (SYSTIMESTAMP - v_ts));
END;
/

The above benchmark uses 3 techniques that all produce the same result but have different timing characteristics:

  1. is using an inline expression in the predicate
  2. is using a bind variable
  3. is using a scalar subquery

When we check out the estimated execution plans, nothing seems to indicate that any solution might be better than the other:

EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < SYSDATE - :v_range;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < :v_date;

SELECT * FROM TABLE(dbms_xplan.display);

-- CAST is there because of a "bug" in the EXPLAIN PLAN parser
EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < (
  SELECT CAST(SYSDATE - :v_range AS DATE) FROM dual
);

SELECT * FROM TABLE(dbms_xplan.display);

The output is:

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   453   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      
Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"<SYSDATE@!-:V_RANGE)                                      


--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   453   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      
Predicate Information (identified by operation id):                                   

Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"<:V_DATE)                                                 


--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   455   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
|   3 |    FAST DUAL                |                |     1 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      

Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"< (SELECT CAST(SYSDATE@!-:V_RANGE AS DATE) FROM           
              "SYS"."DUAL" "DUAL"))                                                   

The only exception is that in the 3rd plan, we have this FAST DUAL operation which hints at scalar subquery caching kicking in.

Let’s compare actual results, though! (as always, I’m not publishing the real results, just qualitative numbers in a fictive unit of measurement to comply with Oracle legal blah blah)

-- Run 1
Statement 1 : 1.98 wombos
Statement 2 : 1.17 wombos
Statement 3 : 0.80 wombos

-- Run 2
Statement 1 : 1.49 blorfs
Statement 2 : 1.19 blorfs
Statement 3 : 0.78 blorfs

-- Run 3
Statement 1 : 1.46 gnarls
Statement 2 : 1.20 gnarls
Statement 3 : 0.75 gnarls

Wow! As you can see, using a bind variable certainly helps, but when you apply scalar subquery caching, THIS is where you get the most benefits. Now, this query is just a very simple example. The actual customer query was immensely more complex, and trust me – the performance improvement was 10x (I couldn’t believe it myself at first, and I still don’t know exactly why!).

Conclusion (for Oracle 11g)

Modern optimisers “recognise” a lot of expressions to be constant. For instance, in most databases, it doesn’t matter if you’re writing COUNT(1) or COUNT(*), they’re both translated to the same thing.

In this particular case, however, I was quite disappointed by the fact that there is a significant difference between these three perfectly equivalent queries as far as I’m concerned, and the least intuitive solution using scalar subquery caching performed the best on Oracle 11g.

But wait (Oracle 12c)

I’ve tested the same behaviour on my Oracle 12c on Docker instance:

Statement 1 : 1.23 xmfs
Statement 2 : 1.11 xmfs
Statement 3 : 1.14 xmfs

Statement 1 : 1.27 xlorgs
Statement 2 : 1.07 xlorgs
Statement 3 : 1.30 xlorgs

Statement 1 : 1.23 glrls (I can invent units of measurement all day!)
Statement 2 : 1.00 glrls
Statement 3 : 1.39 glrls

… where now there seems to be a “regression” in the scalar subquery solution (it’s now the same as without the scalar subquery) and the bind variable solution now seems to be the fastest.

If anything at all, this just proves that with SQL, you will always have to measure stuff on your end, because your setup and database versions may differ. The bottom line is: If performance of a single statement matters to you, chances are that you can improve things by 2-digit percentages with just some simple tricks, and in a batch job or under heavy load, this definitely matters!

Certainly, you should not use any expressions of the form SYSDATE - some_value in your predicates.

How to Quickly Rename all Primary Keys in Oracle


Are you working with someone else’s schema and they haven’t declared nice names for all their constraints?

Unfortunately, it is all too easy to create a table like this:

CREATE TABLE order1 (
  order_id NUMBER(18) NOT NULL PRIMARY KEY
);

Or like this:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  PRIMARY KEY (order_id)
);

Sure, you get a little convenience when writing the table. But from now on, you’re stuck with weird, system generated names both for the constraint and for the backing index. For instance, when doing execution plan analyses:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|   1 |  NESTED LOOPS      |              |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0042007 |
|*  3 |   INDEX UNIQUE SCAN| SYS_C0042005 |
-------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

So, I got these system generated index names called SYS_C0042007 and SYS_C0042005. What do they mean? I can derive the actual meaning perhaps from the predicate information, as SYS_C0042007 is accessed in operation #2, which uses an access predicate on ORDER2. Fine. But do I really need to look these things up all the time?

Just name your constraints. Always!

Don’t be fooled into this convenience. It’ll hurt you time and again, not just when doing analyses. You might not be able to easily import / export your schema to some other database, because another database might already occupy these generated names.

So, do this instead:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  CONSTRAINT pk_order2 PRIMARY KEY (order_id)
);

Find a naming schema (any naming scheme), like for instance PK_[table name]. If you’re cleaning up an existing database, this might help:

SET SERVEROUTPUT ON
BEGIN
  FOR stmts IN (
    SELECT 
      'ALTER TABLE ' || table_name || 
      ' RENAME CONSTRAINT ' || constraint_name || 
      ' TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE constraint_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
  
  FOR stmts IN (
    SELECT 
      'ALTER INDEX ' || index_name || 
      ' RENAME TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE index_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
END;
/

The above yields (and runs)

ALTER TABLE ORDER1 RENAME CONSTRAINT SYS_C0042005 TO PK_ORDER1
ALTER TABLE ORDER2 RENAME CONSTRAINT SYS_C0042007 TO PK_ORDER2
ALTER INDEX SYS_C0042005 RENAME TO PK_ORDER1
ALTER INDEX SYS_C0042007 RENAME TO PK_ORDER2

You can of course repeat the exercise for unique constraints, etc. I omit the example here because the naming scheme might be a bit more complicated there. Now re-calculate the execution plan and check this out:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  NESTED LOOPS      |           |
|*  2 |   INDEX UNIQUE SCAN| PK_ORDER2 |
|*  3 |   INDEX UNIQUE SCAN| PK_ORDER1 |
----------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

There! That’s much more like it.