Beautiful SQL: Lateral Unnesting of Array Columns


Sometimes, SQL can just be so beautiful. One of the less mainstream features in SQL is the array type (or nested collections). In fact, it’s so not mainstream that only 2 major databases actually support it: Oracle and PostgreSQL (and HSQLDB and H2 in the Java ecosystem).

In PostgreSQL, you can write:

CREATE TABLE blogs (
  id    SERIAL NOT NULL PRIMARY KEY,
  title text   NOT NULL,
  tags  text[]
)

Or in Oracle:

-- Oracle only knows nominal array types, so we have to declare
-- them in advance
CREATE TYPE tag_t AS VARRAY(100) OF VARCHAR2(100 CHAR);

CREATE TABLE blogs (
  id    NUMBER(18) GENERATED BY DEFAULT AS IDENTITY 
                   NOT NULL PRIMARY KEY,
  title VARCHAR2(100 CHAR) NOT NULL,
  tags  tag_t
)

So, roughly the same thing. Now, let’s insert some data. How about the 3 most recent posts on the jOOQ blog, prior to this one:

In PostgreSQL:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  ARRAY[
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ]
), (
  'How to Execute SQL Batches With JDBC and jOOQ',
  ARRAY[
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ]
), (
  'How to Emulate Partial Indexes in Oracle',
  ARRAY[
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ]
)

Or in Oracle:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  tag_t(
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Execute SQL Batches With JDBC and jOOQ',
  tag_t(
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Emulate Partial Indexes in Oracle',
  tag_t(
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ));

Now, the array type by itself is not very useful. When it gets really interesting is when we unnest it again into a table. For instance in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) AS tags(tag);

Or in Oracle:

-- Classic style
SELECT title, tags.*
FROM blogs, TABLE(tags) tags;

-- Since Oracle 12c
SELECT title, tags.*
FROM blogs, LATERAL (SELECT * FROM TABLE(tags)) tags;

Note that we’re using the keyword LATERAL in some of the above queries. For those of you who are used to T-SQL syntax, it’s almost the same thing as APPLY. Both LATERAL and APPLY are also very useful with table valued functions (stay tuned for a blog post on those).

The idea behind LATERAL is that the table (derived table, subquery, function call, array unnesting) on the right side of LATERAL can “laterally” access stuff from the left side of LATERAL in order to produce new tables. In the above query, we’re producing a new table of tags for each blog post, and then we cross join the two tables.

Here’s what the above queries result in:

title                                                         tag
-----------------------------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   implicit cursor
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   batch
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   oracle
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jooq
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jdbc
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   resultset
How to Execute SQL Batches With JDBC and jOOQ                 batch
How to Execute SQL Batches With JDBC and jOOQ                 batch statement
How to Execute SQL Batches With JDBC and jOOQ                 mysql
How to Execute SQL Batches With JDBC and jOOQ                 jooq
How to Execute SQL Batches With JDBC and jOOQ                 jdbc
How to Execute SQL Batches With JDBC and jOOQ                 sql server
How to Execute SQL Batches With JDBC and jOOQ                 sql
How to Emulate Partial Indexes in Oracle                      optimisation
How to Emulate Partial Indexes in Oracle                      index
How to Emulate Partial Indexes in Oracle                      partial index
How to Emulate Partial Indexes in Oracle                      oracle
How to Emulate Partial Indexes in Oracle                      sql
How to Emulate Partial Indexes in Oracle                      postgresql
How to Emulate Partial Indexes in Oracle                      t-sql
How to Emulate Partial Indexes in Oracle                      sql server

You can immediately see the cross join semantics here, as we’re combining each tag (per post) with its post.

Looking for ordinals (i.e. the tag number inside of the array) along with the array? Easy!

Just add the powerful WITH ORDINALITY clause after the UNNEST() call in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) WITH ORDINALITY AS tags(tag);

A bit more complicated to emulate in Oracle:

-- Fancy, with a window function
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROW_NUMBER() OVER (ORDER BY NULL)
  FROM TABLE(tags) tags
) tags;

-- Classic, with ROWNUM
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROWNUM
  FROM TABLE(tags) tags
) tags;

The result now contains the tag “ID”, i.e the ordinal of the tag inside of the array:

title                                           tag               ordinal
-------------------------------------------------------------------------
How to Fetch ... Cursors with JDBC and jOOQ     implicit cursor   1
How to Fetch ... Cursors with JDBC and jOOQ     batch             2
How to Fetch ... Cursors with JDBC and jOOQ     oracle            3
How to Fetch ... Cursors with JDBC and jOOQ     jooq              4
How to Fetch ... Cursors with JDBC and jOOQ     jdbc              5
How to Fetch ... Cursors with JDBC and jOOQ     resultset         6
How to Execute SQL Batches With JDBC and jOOQ   batch             1
How to Execute SQL Batches With JDBC and jOOQ   batch statement   2
How to Execute SQL Batches With JDBC and jOOQ   mysql             3
How to Execute SQL Batches With JDBC and jOOQ   jooq              4
How to Execute SQL Batches With JDBC and jOOQ   jdbc              5
How to Execute SQL Batches With JDBC and jOOQ   sql server        6
How to Execute SQL Batches With JDBC and jOOQ   sql               7
How to Emulate Partial Indexes in Oracle        optimisation      1
How to Emulate Partial Indexes in Oracle        index             2
How to Emulate Partial Indexes in Oracle        partial index     3
How to Emulate Partial Indexes in Oracle        oracle            4
How to Emulate Partial Indexes in Oracle        sql               5
How to Emulate Partial Indexes in Oracle        postgresql        6
How to Emulate Partial Indexes in Oracle        t-sql             7
How to Emulate Partial Indexes in Oracle        sql server        8

Now, imagine looking for those blog posts that are tagged “jooq”. Easy!

PostgreSQL:

SELECT title
FROM blogs
WHERE 'jooq' = ANY(tags);

Oracle:

SELECT title
FROM blogs
WHERE 'jooq' IN (SELECT * FROM TABLE(tags));

Yielding:

title
-----------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ
How to Execute SQL Batches With JDBC and jOOQ

Conclusion

These are just a few nice things we can do when we denormalise our data into nested collections / arrays, and then use features like UNNEST to bring them back to the table level. Both Oracle and PostgreSQL support a variety of really nice features building on top of arrays, so do check them out!

How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ


Earlier this week, I’ve blogged about how to execute SQL batches with JDBC and jOOQ. This was useful for the MySQL, SQL Server, and Sybase users among you.

Today, we’ll discuss a slightly more difficult task, how to fetch Oracle 12c implicit cursors – which are essentially the same thing.

What’s an implicit cursor?

Oracle 12c added new procedures to their dynamic SQL API DBMS_SQL. Just run the following query in SQL Developer to see the results:

DECLARE
  c1 sys_refcursor;
  c2 sys_refcursor;
BEGIN
  OPEN c1 FOR SELECT 1 AS a FROM dual;
  dbms_sql.return_result(c1);
  OPEN c2 FOR SELECT 2 AS b FROM dual;
  dbms_sql.return_result(c2);
END;

The anonymous PL/SQL block contains two cursors that are opened and returned to whoever calls this block using DBMS_SQL.RETURN_RESULT. This is kind of magic, as we’re calling a procedure, passing a cursor to it, and somehow, this has a side effect on the client of this program after the program ends.

Not only can you do this in anonymous PL/SQL blocks, you can nest these calls in any procedure, of course. So, in other words, from Oracle 12c onwards, you don’t know for sure if you call a procedure if there will be more results than what you can see. For instance:

BEGIN
  any_procedure();
END;

The above call might just as well yield some implicit cursors. You can’t know for sure.

How to discover implicit cursors with JDBC

With JDBC, if you don’t know for sure what your query will yield as a result, you use the Statement.execute(String), or the PreparedStatement.execute() method to find out. As mentioned in the previous post, this is what you would do:

try (PreparedStatement s = c.prepareStatement(sql)) {
    fetchLoop:
    for (int i = 0, updateCount = 0;; i++) {
        boolean result = (i == 0)
            ? s.execute()
            : s.getMoreResults();
 
        if (result)
            try (ResultSet rs = s.getResultSet()) {
                System.out.println("\nResult:");
 
                while (rs.next())
                    System.out.println("  " + rs.getInt(1));
            }
        else if ((updateCount = s.getUpdateCount()) != -1)
            System.out.println("\nUpdate Count: " + updateCount);
        else
            break fetchLoop;
    }
}

Unfortunately, that won’t work on Oracle as Oracle’s JDBC driver doesn’t implement the JDBC spec correctly. I’ve documented this flaw in length on this Stack Overflow question here.

Using ojdbc, the following “improved” loop needs to be written:

/* Alternatively, use this for non-PreparedStatements:
try (Statement s = cn.createStatement()) {
    Boolean result = s.execute(sql); */
try (PreparedStatement s = cn.prepareStatement(sql)) {
    // Use good old three-valued boolean logic
    Boolean result = s.execute();

    fetchLoop:
    for (int i = 0;; i++) {

        // Check for more results if not already done in 
        // this iteration
        if (i > 0 && result == null)
            result = s.getMoreResults();
        System.out.println(result);

        if (result) {
            result = null;

            try (ResultSet rs = s.getResultSet()) {
                System.out.println("Fetching result " + i);
            }
            catch (SQLException e) {
                // Ignore ORA-17283: No resultset available (1)
                if (e.getErrorCode() == 17283)
                    continue fetchLoop;
                else
                    throw e;
            }
        }
        else if (s.getUpdateCount() == -1)
            // Ignore -1 value if there is one more result! (2)
            if (result = s.getMoreResults())
                continue fetchLoop;
            else
                break fetchLoop;
    }
}

Two elements of the above logic need more explanation:

  1. There’s a possibility of an ORA-17283: No resultset available error being raised when accessing the Statement.getResultSet() despite the previous call to Statement.execute() yielding true. If that happens, we’ll just ignore the error and try fetching another result set
  2. In case we’re using PreparedStatement, the original call to PreparedStatement.execute() will yield false (!) and the Statement.getUpdateCount() value is -1, which would normally mean that we should stop. Not in this case. Let’s just try one more time to get a result set, and tah-dah, here are our implicit result sets.

Note that the algorithm now works with both static Statement and PreparedStatement, which (very unfortunately) behave differently when calling execute().

The above will now work with any SQL statement. In case you’re using the previous SQL statement returning implicit cursors:

String sql =
    "\nDECLARE"
  + "\n  c1 sys_refcursor;"
  + "\n  c2 sys_refcursor;"
  + "\nBEGIN"
  + "\n  OPEN c1 FOR SELECT 1 AS a FROM dual;"
  + "\n  dbms_sql.return_result(c1);"
  + "\n  OPEN c2 FOR SELECT 2 AS a FROM dual;"
  + "\n  dbms_sql.return_result(c2);"
  + "\nEND;";

… you will now be able to fetch all the results:

true
true
Fetching result 1
true
Fetching result 2
false

How to get those cursors with jOOQ?

With jOOQ 3.10 (as always), you don’t need to worry about those low level JDBC details. Just call the following code:

System.out.println(
    DSL.using(cn).fetchMany(sql)
);

And you’ll get a convenient, object oriented representation of your multiple result sets in the form of an org.jooq.Results:

Result set:
+----+
|   A|
+----+
|   1|
+----+
Result set:
+----+
|   A|
+----+
|   2|
+----+

Even better, when you use a code generator to return multiple implicit cursors like this in a stored procedure, just call the generated stored procedure object like this, to get all the cursors automatically:

MyProcedure p = new MyProcedure();
p.setParameter1(x);
p.setParameter2(y);
p.execute(configuration);
Results results = p.getResults();

for (Result<?> result : results)
  for (Record record : result)
    System.out.println(record);

Done!

How to Emulate Partial Indexes in Oracle


A very interesting feature of the SQL Server and PostgreSQL databases (and some others, including SQLite) is the partial index (sometimes also called “filtered index”). That’s an index that contains only “parts” of the table data. For instance, we can write the following index in SQL Server and PostgreSQL:

CREATE INDEX i ON message WHERE deleted = 1;

Let’s imagine you have a house keeping job that periodically removes deleted messages. Now, let’s assume you have discovered, that only 0.1% of all messages are really deleted, so an index on the DELETED column is very selective if you’re looking for deleted messages.

On the other hand, it’s not selective at all if you’re looking for non-deleted messages, as such a query would return 99.9% of all messages, in case of which a full table scan is more efficient.

So, since the index is never useful for non-deleted messages, why index those messages at all? If we can avoid indexing non-deleted messages, then we can:

  • Save a lot of disk space, as the index will be much smaller
  • Save a lot of time inserting into the messages table, since we don’t have to update the index all the time
  • Save a lot of time scanning the index, since it will contain a lot less blocks

Unfortunately, Oracle doesn’t support this feature

… but “luckily”, Oracle has another controversial “feature”. In Oracle, all indexes are partial indexes, because they don’t contain NULL values. This is probably due to an ancient optimisation (remember, partial indexes are smaller), which occasionally gets into your way, performance wise, if you do want to query for NULL values.

But in this case, it’s useful. Check this out:

CREATE TABLE message(deleted number(1));

CREATE INDEX i ON message (
  CASE WHEN deleted > 0 THEN deleted END
);

The above index is a function-based index, i.e. an index that contains not the value of the deleted column itself, but an expression based on it. Concretely, it contains only deleted values that are strictly greater than zero, because if the value is zero, then it is turned to NULL by the CASE expression, and Oracle doesn’t index NULL values. Check this out:

INSERT INTO message
SELECT DECODE(LEVEL, 1, 1, 0)
FROM dual
CONNECT BY LEVEL < 100000;

The above query is inserting a single row containing a deleted value of 1, and almost 100k rows containing a value of 0. The insert is very quick, because only one row has to be added to the index. The other almost 100k rows are skipped:

EXEC dbms_stats.gather_table_stats('SANDBOX', 'MESSAGE');

SELECT NUM_ROWS, BLOCKS
FROM SYS.ALL_TAB_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

SELECT NUM_ROWS, LEAF_BLOCKS
FROM SYS.ALL_IND_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

The result is:

NUM_ROWS       BLOCKS
---------------------
   99999          152 <-- table size

NUM_ROWS  LEAF_BLOCKS
---------------------
       1            1 <-- index size

The “trouble” with this kind of emulation is: It’s a function-based index. We can use this index only if we really reproduce the same “function” (or in this case, expression) as in the index itself. So, in order to fetch all the deleted messages, we must not write the following query:

SELECT *
FROM message
WHERE deleted = 1;

But this one, instead:

SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

Check out execution plans:

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE deleted = 1;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

SELECT * FROM TABLE(dbms_xplan.display);

The output being:

------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         | 50000 |   146K|    44   (3)|
|*  1 |  TABLE ACCESS FULL| MESSAGE | 50000 |   146K|    44   (3)|
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("DELETED"=1)

And

----------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     3 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| MESSAGE |     1 |     3 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | I       |     1 |       |     1   (0)|
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(CASE  WHEN "DELETED">0 THEN "DELETED" END =1)

As you can see, the first query runs a full table scan, estimating to retrieve 50% of all the rows, when the actual result is only 1 row as can be seen in the second execution plan!

Insertion speed

What’s even more impressive is the difference in insertion speed. Consider the following code block, which measures the time it takes to insert 1 million times 0 and one million times 1:

SET SERVEROUTPUT ON
DECLARE
  ts TIMESTAMP;
BEGIN
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 0 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
  
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 1 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
END;
/

The result being:

+000000000 00:00:01.501000000
+000000000 00:00:08.162000000

The insertion is much faster if we don’t have to modify the index!

Conclusion

Partial indexes are a very neat trick in cases where your data is highly skewed and some values in a column are extremely rare and very frequently queried. They may drastically reduce the index size, which greatly improves performance in some situations, including inserting into the table, and querying the index.

In Oracle, they can be emulated using function-based indexes, which means you have to use the exact function expression from the index also in queries, in order to profit. But it may well be worth the trouble!

Use jOOQ to Read / Write Oracle PL/SQL RECORD Types


Some of the biggest limitations when working with Oracle PL/SQL from Java is the lack of support for a variety of PL/SQL features through the JDBC interface. This lack of support is actually not limited to JDBC, but also extends to Oracle SQL. For instance, if you’re using the useful PL/SQL BOOLEAN type as such:

CREATE OR REPLACE FUNCTION yes RETURN boolean AS
BEGIN
  RETURN true;
END yes;
/

It would now be terrific if you could do this:

SELECT yes FROM dual;

But it’s not possible. You’ll be getting an error along the lines of the following:

ORA-06552: PL/SQL: Statement ignored
ORA-06553: PLS-382: expression is of wrong type
06552. 00000 -  "PL/SQL: %s"
*Cause:    
*Action:

It’s crazy to think that the Oracle SQL language still doesn’t support the SQL standard boolean type, which is so useful as I’ve shown in previous blog posts. Here’s where you can upvote the feature request:

https://community.oracle.com/ideas/2633

BOOLEAN isn’t the only “inaccessible” SQL feature

Now, there are a couple of other data types, which cannot be “bridged” to the SQL engine, and thus (for some reason only the OJDBC driver gods can fathom) cannot be “bridged” to a JDBC client. Among them: The very useful PL/SQL RECORD type.

Very often, you want to do this:

CREATE PACKAGE customers AS
  TYPE person IS RECORD (
    first_name VARCHAR2(50),
    last_name VARCHAR2(50)
  );
  
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person;
END customers;
/

CREATE PACKAGE BODY customers AS
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person IS
    v_person customers.person;
  BEGIN
    SELECT c.first_name, c.last_name
    INTO v_person
    FROM customer c
    WHERE c.customer_id = p_customer_id;
    
    RETURN v_person;
  END get_customer;
END customers;
/

(we’re running this on the SAKILA database).

As in any language with the least bit of sophistication, we can define “structs” or records in PL/SQL, which we can now frequently reuse. Everyone knows what a PERSON is and we can pass them around between procedures and functions.

For instance, in PL/SQL client code:

SET SERVEROUTPUT ON
DECLARE
  v_person customers.person;
BEGIN
  v_person := customers.get_customer(1);
  
  dbms_output.put_line(v_person.first_name);
  dbms_output.put_line(v_person.last_name);
END;
/

… which yields:

MARY
SMITH

What about JDBC client code?

After having added support for PL/SQL BOOLEAN types in jOOQ 3.9, with jOOQ 3.9, we now finally support PL/SQL record types in stored procedures as well, at least in standalone calls, which are not embedded in SQL statements. The jOOQ code generator will pick up all of these package-level PL/SQL record types and their structures and generate the boring boiler plate code for you. E.g. (simplified):

package org.jooq.demo.sakila.packages.customers.records;

public class PersonRecord extends UDTRecordImpl<PersonRecord> {
    public void   setFirstName(String value) { ... }
    public String getFirstName()             { ... }
    public void   setLastName(String value)  { ... }
    public String getLastName()              { ... }

    public PersonRecord() { ... }
    public PersonRecord(String firstName, String lastName) { ... }
}

Notice how jOOQ doesn’t really make any difference in its API between the generated code for an Oracle SQL OBJECT type or an Oracle PL/SQL RECORD type. They’re essentially the same thing (from a jOOQ API perspective).

More interesting, what happened to the generated package and the function? This code is generated (simplified):

public class Customers extends PackageImpl {
    public static PersonRecord getCustomer(
        Configuration configuration, Long pCustomerId
    ) { ... }
}

That’s all! So all we now need to do is pass the ubiquitous jOOQ Configuration (which contains information such as SQLDialect or JDBC Connection) and the actual stored function parameter, the P_CUSTOMER_ID value, and we’re done!

This is how jOOQ client code might look:

PersonRecord person = Customers.getCustomer(configuration(), 1L);
System.out.println(person);

As you can see, this is just the same thing as the corresponding PL/SQL code. And the output of this println call is this:

SAKILA.CUSTOMERS.PERSON('MARY', 'SMITH')

A fully qualified RECORD declaration with schema, package, and type name.

How does it work?

Let’s turn on jOOQ’s built-in TRACE logging to see what jOOQ did behind the scenes:

Calling routine          : 
  DECLARE
    r1 "CUSTOMERS"."PERSON";
  BEGIN
    r1 := "CUSTOMERS"."GET_CUSTOMER"("P_CUSTOMER_ID" => ?);
    ? := r1."FIRST_NAME";
    ? := r1."LAST_NAME";
  END;
Binding variable 1       : 1 (class java.lang.Long)
Registering variable 2   : class java.lang.String
Registering variable 3   : class java.lang.String
Fetched OUT parameters   : +-----------------+
                         : |RETURN_VALUE     |
                         : +-----------------+
                         : |('MARY', 'SMITH')|
                         : +-----------------+

So, jOOQ usually doesn’t use JDBC’s very limited escape syntax to call stored procedures, it just produces an anonymous PL/SQL block with a local variable of type CUSTOMER.PERSON, i.e. of our RECORD type. The function call is then assigned to this local variable, and the local variable is descructured into its individual parts.

In the TRACE log, you can see the individual bind variables, i.e. there’s an IN variable at index 1 of type Long, and two OUT variables of type String at indexes 2 and 3.

How does jOOQ know the record types?

At runtime, all the information is hard-wired to the generated code. So, the magic is inside of the code generator. Warning: Some serious SQL ahead!

This beauty here queries the dictionary views for PL/SQL record types:

AllArguments a = ALL_ARGUMENTS.as("a");
AllArguments x = ALL_ARGUMENTS.as("x");
Field<BigDecimal> nextSibling = field(name("next_sibling"), x.SEQUENCE.getDataType());

DSL.using(configuration)
   .select(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME)
   .select(
       a.ARGUMENT_NAME                                               .as(ALL_TYPE_ATTRS.ATTR_NAME),
       a.SEQUENCE                                                    .as(ALL_TYPE_ATTRS.ATTR_NO),
       a.TYPE_OWNER                                                  .as(ALL_TYPE_ATTRS.ATTR_TYPE_OWNER),
       nvl2(a.TYPE_SUBNAME, a.TYPE_NAME, inline(null, a.TYPE_NAME))  .as("package_name"),
       coalesce(a.TYPE_SUBNAME, a.TYPE_NAME, a.DATA_TYPE)            .as(ALL_TYPE_ATTRS.ATTR_TYPE_NAME),
       a.DATA_LENGTH                                                 .as(ALL_TYPE_ATTRS.LENGTH),
       a.DATA_PRECISION                                              .as(ALL_TYPE_ATTRS.PRECISION),
       a.DATA_SCALE                                                  .as(ALL_TYPE_ATTRS.SCALE))
   .from(a)
   .join(table(
       select(
           a.TYPE_OWNER,
           a.TYPE_NAME,
           a.TYPE_SUBNAME,
           min(a.OWNER        ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.OWNER),
           min(a.PACKAGE_NAME ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.PACKAGE_NAME),
           min(a.SUBPROGRAM_ID).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SUBPROGRAM_ID),
           min(a.SEQUENCE     ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SEQUENCE),
           min(nextSibling    ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(nextSibling),
           min(a.DATA_LEVEL   ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.DATA_LEVEL))
      .from(table(
          select(
              lead(a.SEQUENCE, 1, a.SEQUENCE).over(
                  partitionBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.DATA_LEVEL)
                 .orderBy(a.SEQUENCE)
              ).as("next_sibling"),
              a.TYPE_OWNER,
              a.TYPE_NAME,
              a.TYPE_SUBNAME,
              a.OWNER,
              a.PACKAGE_NAME,
              a.SUBPROGRAM_ID,
              a.SEQUENCE,
              a.DATA_LEVEL,
              a.DATA_TYPE)
         .from(a)
         .where(a.OWNER.in(getInputSchemata()))
      ).as("a"))
      .where(a.TYPE_OWNER.in(getInputSchemata()))
      .and(a.OWNER.in(getInputSchemata()))
      .and(a.DATA_TYPE.eq("PL/SQL RECORD"))
      .groupBy(a.TYPE_OWNER, a.TYPE_NAME, a.TYPE_SUBNAME)
   ).as("x"))
   .on(row(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID).eq(x.OWNER, x.PACKAGE_NAME, x.SUBPROGRAM_ID))
   .and(a.SEQUENCE.between(x.SEQUENCE).and(nextSibling))
   .and(a.DATA_LEVEL.eq(x.DATA_LEVEL.plus(one())))
   .orderBy(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME, a.SEQUENCE)
   .fetch();

This is a nice little jOOQ query, which corresponds to the following equially impressive SQL query, which you can run directly in your SQL developer, or some other SQL client for Oracle:

SELECT 
  "x"."TYPE_OWNER",
  "x"."TYPE_NAME",
  "x"."TYPE_SUBNAME",
  "a"."ARGUMENT_NAME" "ATTR_NAME",
  "a"."SEQUENCE" "ATTR_NO",
  "a"."TYPE_OWNER" "ATTR_TYPE_OWNER",
  nvl2("a"."TYPE_SUBNAME", "a"."TYPE_NAME", NULL) "package_name",
  COALESCE("a"."TYPE_SUBNAME", "a"."TYPE_NAME", "a"."DATA_TYPE") "ATTR_TYPE_NAME",
  "a"."DATA_LENGTH" "LENGTH",
  "a"."DATA_PRECISION" "PRECISION",
  "a"."DATA_SCALE" "SCALE"
FROM "SYS"."ALL_ARGUMENTS" "a"
JOIN (
  SELECT 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME",
    MIN("a"."OWNER") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "OWNER",
    MIN("a"."PACKAGE_NAME") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "PACKAGE_NAME",
    MIN("a"."SUBPROGRAM_ID") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SUBPROGRAM_ID",
    MIN("a"."SEQUENCE") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SEQUENCE",
    MIN("next_sibling") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "next_sibling",
    MIN("a"."DATA_LEVEL") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "DATA_LEVEL"
  FROM (
    SELECT 
	  lead("a"."SEQUENCE", 1, "a"."SEQUENCE") OVER (
	    PARTITION BY "a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID", "a"."DATA_LEVEL" 
		ORDER BY "a"."SEQUENCE" ASC
	  ) "next_sibling",
      "a"."TYPE_OWNER",
      "a"."TYPE_NAME",
      "a"."TYPE_SUBNAME",
      "a"."OWNER",
      "a"."PACKAGE_NAME",
      "a"."SUBPROGRAM_ID",
      "a"."SEQUENCE",
      "a"."DATA_LEVEL",
      "a"."DATA_TYPE"
    FROM "SYS"."ALL_ARGUMENTS" "a"
    WHERE "a"."OWNER" IN ('SAKILA', 'SYS')     -- Possibly replace schema here
    ) "a"
  WHERE ("a"."TYPE_OWNER" IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."OWNER"         IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."DATA_TYPE"      = 'PL/SQL RECORD')
  GROUP BY 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME"
  ) "x"
ON (("a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID")
 = (("x"."OWNER", "x"."PACKAGE_NAME", "x"."SUBPROGRAM_ID"))
AND "a"."SEQUENCE" BETWEEN "x"."SEQUENCE" AND "next_sibling"
AND "a"."DATA_LEVEL" = ("x"."DATA_LEVEL" + 1))
ORDER BY 
  "x"."TYPE_OWNER" ASC,
  "x"."TYPE_NAME" ASC,
  "x"."TYPE_SUBNAME" ASC,
  "a"."SEQUENCE" ASC

Whew.

The output shows that we got all the required information for our RECORD type:

[  TYPE INFO COLUMNS  ]   ATTR_NAME   ATTR_TYPE_NAME  LENGTH
SAKILA CUSTOMERS PERSON	  FIRST_NAME  VARCHAR2        50
SAKILA CUSTOMERS PERSON	  LAST_NAME   VARCHAR2        50

All of this also works for:

  • Nested types
  • Multiple IN and OUT parameters

I’ll blog about a more advanced use-case in the near future, so stay tuned.

Why You Should (Sometimes) Avoid Expressions in SQL Predicates


I’ve recently discovered a rather significant performance issue on a productive Oracle 11g customer database. And I’m sure you have this issue too, which is why I’m documenting it here.

This is a simplified representation of the setup at the customer site:

        ID PAYMENT_DATE TEXT                               
---------- ------------ -----------------------------------
     33803 21.05.16     DcTNBOrkQIgMtbietUWOsSFNMIqGLlDw...
     29505 09.03.16     VIuPaOAQqzCMlFBYPQtvqUSbWYPDndJD...
     10738 25.07.16     TUkxGpZPrGKaHzDRxczrktkFWvGmiwjR...

Let’s produce the above table with the following statement:

CREATE TABLE payment (
  id            NOT NULL PRIMARY KEY,
  payment_date  NOT NULL,
  text
) AS
SELECT
  level,
  SYSDATE - dbms_random.value(1, 500),
  dbms_random.string('a', 500)
FROM dual
CONNECT BY level <= 50000
ORDER BY dbms_random.value;

CREATE INDEX i_payment_date ON payment(payment_date);

EXEC dbms_stats.gather_table_stats('TEST', 'PAYMENT');

What we’re doing here: There’s a payment table with an ID primary key column, an indexed payment_date column and some “text” info. The real table is, of course, much bigger than this.

Now, this table needs infrequent house keeping. In nightly batch jobs, the customer ran through the table and removed all payments that were older than some fixed amount of days, e.g.:

DELETE
FROM payment
WHERE payment_date < SYSDATE - 470 // Older than 470 days

This might look fine at first, because:

  1. We have an index on payment_date, and it can be used for the deletion
  2. SYSDATE is constant per query, so the expression SYSDATE - 470 should also be constant per query

We can prove both statements easily:

1. Index usage

Let’s run:

EXPLAIN PLAN FOR
DELETE
FROM payment
WHERE payment_date < SYSDATE - 470;

SELECT * FROM TABLE(dbms_xplan.display);

And we’re getting:

----------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                |  3008 |    10   (0)| 00:00:01 |
|   1 |  DELETE           | PAYMENT        |       |            |          |
|*  2 |   INDEX RANGE SCAN| I_PAYMENT_DATE |  3008 |    10   (0)| 00:00:01 |
----------------------------------------------------------------------------
                                                                            
Predicate Information (identified by operation id):                         
---------------------------------------------------
                                                                            
   2 - access("PAYMENT_DATE"<SYSDATE@!-470)                                 

So, we get a relatively cheap access predicate on the I_PAYMENT_DATE index, and there’s this weird SYSDATE@!-470 constant in the plan.

2. SYSDATE being constant

There’s a lot of confusion about what the actual value of these non-deterministic, non-pure date/time functions is in each database. Sometimes, the function is evaluated on each row individually and produces a new value each time. In my opinion, that’s the worst, as such functions are completely unpredictable.

Sometimes, there’s a guarantee that these timestamps stay the same for the entire duration of a transaction. That’s a bit weird as a default, but OK why not.

In Oracle, it seems that SYSDATE (and SYSTIMESTAMP) are evaluated only a single time on a per-statement level. This would explain that weird SYSDATE@! notation in the execution plan, and it can be “proven” by doing something like this:

CREATE OR REPLACE FUNCTION sleep(i NUMBER) RETURN NUMBER 
AS
BEGIN
  dbms_lock.sleep(i);
  RETURN i;
END;
/

SELECT 
  min(CAST(sysdate AS TIMESTAMP)), 
  max(CAST(sysdate AS TIMESTAMP))
FROM dual
CONNECT BY level <= 5 AND sleep(1) > 0;

DROP FUNCTION sleep;
/

The result of this statement (which unsurprisingly takes around 4 seconds to execute) is:

MIN(CAST(SYSDATEASTIMESTAMP)) MAX(CAST(SYSDATEASTIMESTAMP))
----------------------------- -----------------------------
01.11.16 11:15:20.000000000   01.11.16 11:15:20.000000000  

Some background info on this OTN thread here. Unfortunately, the SYSDATE documentation fails to clearly specify this behaviour…

So, what’s the problem?

Even if we’re probably relying on an undocumented feature, it looks like everything is fine, right? SYSDATE - 470 is more or less a constant, so we’re fine putting it in that WHERE clause.

Wrong!

Let’s run a benchmark, comparing 3 times an equivalent query running each query 100 times (and for repeatability, we use SELECT, not DELETE):

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 100;
  v_range CONSTANT NUMBER := 470;
  v_date CONSTANT DATE := SYSDATE - v_range;
BEGIN
  v_ts := SYSTIMESTAMP;
  
  -- Original query with inline expression
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < SYSDATE - v_range
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
  
  -- Pre-calculated PL/SQL local variable
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < v_date
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
  
  -- Magical 11g scalar subquery caching
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT *
      FROM payment
      WHERE payment_date < (SELECT SYSDATE - v_range FROM dual)
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 3 : ' || (SYSTIMESTAMP - v_ts));
END;
/

The above benchmark uses 3 techniques that all produce the same result but have different timing characteristics:

  1. is using an inline expression in the predicate
  2. is using a bind variable
  3. is using a scalar subquery

When we check out the estimated execution plans, nothing seems to indicate that any solution might be better than the other:

EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < SYSDATE - :v_range;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < :v_date;

SELECT * FROM TABLE(dbms_xplan.display);

-- CAST is there because of a "bug" in the EXPLAIN PLAN parser
EXPLAIN PLAN FOR
SELECT *
FROM payment
WHERE payment_date < (
  SELECT CAST(SYSDATE - :v_range AS DATE) FROM dual
);

SELECT * FROM TABLE(dbms_xplan.display);

The output is:

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   453   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      
Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"<SYSDATE@!-:V_RANGE)                                      


--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   453   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      
Predicate Information (identified by operation id):                                   

Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"<:V_DATE)                                                 


--------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |  2500 |   455   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PAYMENT        |  2500 |   453   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | I_PAYMENT_DATE |   450 |     3   (0)| 00:00:01 |
|   3 |    FAST DUAL                |                |     1 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
                                                                                      

Predicate Information (identified by operation id):                                   
---------------------------------------------------                                   
                                                                                      
   2 - access("PAYMENT_DATE"< (SELECT CAST(SYSDATE@!-:V_RANGE AS DATE) FROM           
              "SYS"."DUAL" "DUAL"))                                                   

The only exception is that in the 3rd plan, we have this FAST DUAL operation which hints at scalar subquery caching kicking in.

Let’s compare actual results, though! (as always, I’m not publishing the real results, just qualitative numbers in a fictive unit of measurement to comply with Oracle legal blah blah)

-- Run 1
Statement 1 : 1.98 wombos
Statement 2 : 1.17 wombos
Statement 3 : 0.80 wombos

-- Run 2
Statement 1 : 1.49 blorfs
Statement 2 : 1.19 blorfs
Statement 3 : 0.78 blorfs

-- Run 3
Statement 1 : 1.46 gnarls
Statement 2 : 1.20 gnarls
Statement 3 : 0.75 gnarls

Wow! As you can see, using a bind variable certainly helps, but when you apply scalar subquery caching, THIS is where you get the most benefits. Now, this query is just a very simple example. The actual customer query was immensely more complex, and trust me – the performance improvement was 10x (I couldn’t believe it myself at first, and I still don’t know exactly why!).

Conclusion (for Oracle 11g)

Modern optimisers “recognise” a lot of expressions to be constant. For instance, in most databases, it doesn’t matter if you’re writing COUNT(1) or COUNT(*), they’re both translated to the same thing.

In this particular case, however, I was quite disappointed by the fact that there is a significant difference between these three perfectly equivalent queries as far as I’m concerned, and the least intuitive solution using scalar subquery caching performed the best on Oracle 11g.

But wait (Oracle 12c)

I’ve tested the same behaviour on my Oracle 12c on Docker instance:

Statement 1 : 1.23 xmfs
Statement 2 : 1.11 xmfs
Statement 3 : 1.14 xmfs

Statement 1 : 1.27 xlorgs
Statement 2 : 1.07 xlorgs
Statement 3 : 1.30 xlorgs

Statement 1 : 1.23 glrls (I can invent units of measurement all day!)
Statement 2 : 1.00 glrls
Statement 3 : 1.39 glrls

… where now there seems to be a “regression” in the scalar subquery solution (it’s now the same as without the scalar subquery) and the bind variable solution now seems to be the fastest.

If anything at all, this just proves that with SQL, you will always have to measure stuff on your end, because your setup and database versions may differ. The bottom line is: If performance of a single statement matters to you, chances are that you can improve things by 2-digit percentages with just some simple tricks, and in a batch job or under heavy load, this definitely matters!

Certainly, you should not use any expressions of the form SYSDATE - some_value in your predicates.

How to Quickly Rename all Primary Keys in Oracle


Are you working with someone else’s schema and they haven’t declared nice names for all their constraints?

Unfortunately, it is all too easy to create a table like this:

CREATE TABLE order1 (
  order_id NUMBER(18) NOT NULL PRIMARY KEY
);

Or like this:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  PRIMARY KEY (order_id)
);

Sure, you get a little convenience when writing the table. But from now on, you’re stuck with weird, system generated names both for the constraint and for the backing index. For instance, when doing execution plan analyses:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|   1 |  NESTED LOOPS      |              |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0042007 |
|*  3 |   INDEX UNIQUE SCAN| SYS_C0042005 |
-------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

So, I got these system generated index names called SYS_C0042007 and SYS_C0042005. What do they mean? I can derive the actual meaning perhaps from the predicate information, as SYS_C0042007 is accessed in operation #2, which uses an access predicate on ORDER2. Fine. But do I really need to look these things up all the time?

Just name your constraints. Always!

Don’t be fooled into this convenience. It’ll hurt you time and again, not just when doing analyses. You might not be able to easily import / export your schema to some other database, because another database might already occupy these generated names.

So, do this instead:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  CONSTRAINT pk_order2 PRIMARY KEY (order_id)
);

Find a naming schema (any naming scheme), like for instance PK_[table name]. If you’re cleaning up an existing database, this might help:

SET SERVEROUTPUT ON
BEGIN
  FOR stmts IN (
    SELECT 
      'ALTER TABLE ' || table_name || 
      ' RENAME CONSTRAINT ' || constraint_name || 
      ' TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE constraint_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
  
  FOR stmts IN (
    SELECT 
      'ALTER INDEX ' || index_name || 
      ' RENAME TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE index_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
END;
/

The above yields (and runs)

ALTER TABLE ORDER1 RENAME CONSTRAINT SYS_C0042005 TO PK_ORDER1
ALTER TABLE ORDER2 RENAME CONSTRAINT SYS_C0042007 TO PK_ORDER2
ALTER INDEX SYS_C0042005 RENAME TO PK_ORDER1
ALTER INDEX SYS_C0042007 RENAME TO PK_ORDER2

You can of course repeat the exercise for unique constraints, etc. I omit the example here because the naming scheme might be a bit more complicated there. Now re-calculate the execution plan and check this out:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  NESTED LOOPS      |           |
|*  2 |   INDEX UNIQUE SCAN| PK_ORDER2 |
|*  3 |   INDEX UNIQUE SCAN| PK_ORDER1 |
----------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

There! That’s much more like it.

Be Careful When Emulating Parameterised Views with SYS_CONTEXT in Oracle


Everyone who writes complex SQL (or dare I say, move business logic into the database?) wishes for parameterised views from time to time. This hypothetical syntax would be terrific:

CREATE VIEW v_categories_per_actor(
  p_actor_id NUMBER
) AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = p_actor_id

Or as user-defined functions if you insist (as available in SQL Server):

CREATE FUNCTION v_categories_per_actor(
  p_actor_id NUMBER
) RETURNS TABLE AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = p_actor_id

As always on this blog, I’m using the useful Sakila database for examples.

The above parameterised view could now be used very nicely with awesome SQL features like APPLY / LATERAL:

-- Oracle 12c / SQL Server syntax:
SELECT a.first_name, a.last_name, c.name
FROM actor a
OUTER APPLY v_categories_per_actor(a.actor_id)

-- Oracle 12c / PostgreSQL / SQL standard syntax:
SELECT a.first_name, a.last_name, c.name
FROM actor a
LEFT JOIN LATERAL v_categories_per_actor(a.actor_id) ON 1 = 1

More info about LATERAL here. The good news about the above is: SQL Server supports this through inline table valued functions, where the database optimiser can actually go inside the function / view and optimise it away, potentially implementing it with ordinary JOIN algorithms!

Unfortunately, this isn’t true for Oracle (yet). Even in Oracle 12c (where APPLY / LATERAL is now supported), table valued functions are opaque for the optimiser, which cannot “peek inside” to apply nice SQL transformation operations to optimise such statements.

People emulate parameterised views with SYS_CONTEXT

A lot of people use Oracle’s SYS_CONTEXT feature to emulate parameterised views:

A SYS_CONTEXT is essentially a global variable (per session), which you can set e.g. when getting a connection out of your connection pool in your Java client application. From then on, all SQL statements that use SYS_CONTEXT are globally “parameterised”. For example, our previous view becomes this:

CREATE VIEW v_categories_per_actor AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))

Now, of course, this view cannot be used (yet), because it will always return an empty result:

SELECT * 
FROM v_categories_per_actor
ORDER BY name;

Which yields:

NAME 
----

Now, let’s do that SYS_CONTEXT initialisation magic:

-- Just some boilerplate you have to do once
CREATE CONTEXT my_app USING set_ctx;

CREATE OR REPLACE PROCEDURE set_ctx(
  p_actor_id NUMBER := NULL
) IS
BEGIN
  dbms_session.set_context('MY_APP', 'ACTOR_ID', p_actor_id);
END;
/

-- This will now set the ACTOR_ID for our session
EXEC set_ctx(1);

If we re-run our previous statement that queries the view now:

SELECT * 
FROM v_categories_per_actor
ORDER BY name;

We’ll get an actual result!

NAME
-----------
Animation
Children
Classics
Comedy
Documentary
Family
Foreign
Games
Horror
Music
New
Sci-Fi
Sports

Now, of course this is nowhere near as powerful as actual parameterised views, because I cannot lateral join this thing to an actual actor table, setting that context on each row. But it already helps if you want your Java client application be able to set a parameter for such a view. Let’s call it:

Oracle’s poor man’s parameterised views

How does SYS_CONTEXT impact performance?

A client of mine makes heavy use of SYS_CONTEXT for security purposes, in order to implement something similar to row-level security. I’ll show an example later on.

You’ll also find a simpler “security” use-case on an interesting Ask Tom article:

Note how in the WHERE clause, I always use the SYS_CONTEXT function. I never put the literal value into the query—that would be very bad for Performance Scalability Shared pool utilization Perhaps most important: security (SQL injection)

I don’t really agree with this, because in my opinion, bind variables are a much better approach to this simplistic security concern.

In addition to that, since Oracle 11g, we have adaptive cursor sharing and bind variable peeking, which means that the same query can produce distinct execution plans depending on the actual bind variable values. This isn’t (currently) being done for SYS_CONTEXT. See also this very interesting article by Connor McDonald:
https://connormcdonald.wordpress.com/2016/10/20/taking-a-peek-at-sys_context

Connor was kind enough to write this article as a reply to a question about SYS_CONTEXT I asked on Twitter:

Not peeking at SYS_CONTEXT may be good or bad, depending on your taste and on your opinion about Oracle 11g’s adaptive cursor sharing feature. I’m personally a big fan of it, although I do appreciate that some experts don’t like it too much (see also Tony Hasler’s opinions in Expert Oracle SQL).

Row level security with SYS_CONTEXT

The important thing is just to remember, that no peeking is done at SYS_CONTEXT values. What does this mean?

This client of mine has always used SYS_CONTEXT just like I mentioned before, for security reasons. More particularly, they implemented sophisticated row-level security with it (they did this before Oracle had out of the box support for row-level security). Again, if you look at the previous query:

CREATE VIEW v_categories_per_actor AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))

You could interpret this as being a security feature. Actors can only access their own data across the entire database. In this case, if clients don’t get GRANTs to tables, but only to these views, it’s possible to prevent access to all sorts of data that is not related to “my own actor_id”, even if client code forgets to add the appropriate predicate.

The execution plan for a query against this view shows that it’s quite decent:

-----------------------------------------------------------------------------
| Id  | Operation                 | Name         | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |              |      1 |        |     13 |
|   1 |  SORT ORDER BY            |              |      1 |     16 |     13 |
|   2 |   VIEW                    | V_CATEGORIE..|      1 |     16 |     13 |
|   3 |    HASH UNIQUE            |              |      1 |     16 |     13 |
|*  4 |     HASH JOIN             |              |      1 |     27 |     19 |
|   5 |      NESTED LOOPS         |              |      1 |     27 |     19 |
|   6 |       INDEX FAST FULL SCAN| PK_FILM_CAT..|      1 |   1000 |   1000 |
|*  7 |       INDEX UNIQUE SCAN   | PK_FILM_ACTOR|   1000 |      1 |     19 |
|   8 |      TABLE ACCESS FULL    | CATEGORY     |      1 |     16 |     16 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
   7 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID')) 
              AND "FC"."FILM_ID"="FA"."FILM_ID")

As you can see, the cardinality estimates are all pretty OK and this query does the right thing as the database knows it can use the single SYS_CONTEXT value for a fast access predicate on the FILM_ACTOR primary key.

Let’s ignore for a moment the slightly wrong cardinality estimate on PK_FILM_ACTOR, whose effects can be fixed by using a /*+LEADING(fa fc)*/ hint inside of the view. But this is not related to using SYS_CONTEXT.

Using SYS_CONTEXT to overload view behaviour

Now, that particular client evaluated whether SYS_CONTEXT can be used for something entirely else: To put two UNION ALL subqueries in a view and depending on the “session mode”, they wanted to execute one or the other query. Let’s assume you have different types of users in your client application:

  • Normal customers: “C”
  • Operators: “O”

The latter have more privileges and can see entirely different data. In the terms of our view this might mean the following:

CREATE OR REPLACE VIEW v_categories_per_actor AS

SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'

UNION ALL

SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE sys_context('MY_APP', 'USER_TYPE') = 'O'

What this means now is that the same view is reused for both types of users: customers/actors (C) and operators (O). Customers only get to see their own data whereas operators get to see all data. Imagine that the real query is much more complex.

So, this is really nice, because then you can start reusing complex views and put these views in other complex views and the behaviour of your entire application starts changing depending on who is logging in. So, this shouldn’t just be called row level security, it should be called access control list, because that’s what they’re really doing.

Excellent!

So where’s the problem? The problem lies in the execution plans. Let’s update our procedure again:

CREATE OR REPLACE PROCEDURE set_ctx(
  p_actor_id NUMBER := NULL
) IS
BEGIN
  dbms_session.set_context('MY_APP', 'ACTOR_ID', p_actor_id);
  dbms_session.set_context('MY_APP', 'USER_TYPE', 
    CASE WHEN p_actor_id IS NULL THEN 'O' ELSE 'C' END);
END;
/

EXEC set_ctx(1);

Not high security. If we initialise the context with an P_ACTOR_ID, then we’re a customer (C), otherwise, we’re an operator (O). What’s our plan now?

SELECT * FROM TABLE (dbms_xplan.display_cursor(
  sql_id => 'cmdfbydppfqbu',
  format => 'ALLSTATS LAST'
));

Which yields…

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name                   | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                        |      1 |        |     13 |
|   1 |  SORT ORDER BY              |                        |      1 |   2275 |     13 |
|   2 |   VIEW                      | V_CATEGORIES_PER_ACTOR |      1 |   2275 |     13 |
|   3 |    UNION-ALL                |                        |      1 |        |     13 |
|   4 |     HASH UNIQUE             |                        |      1 |     12 |     13 |
|*  5 |      FILTER                 |                        |      1 |        |     19 |
|*  6 |       HASH JOIN             |                        |      1 |     27 |     19 |
|   7 |        NESTED LOOPS         |                        |      1 |     27 |     19 |
|   8 |         INDEX FAST FULL SCAN| PK_FILM_CATEGORY       |      1 |   1000 |   1000 |
|*  9 |         INDEX UNIQUE SCAN   | PK_FILM_ACTOR          |   1000 |      1 |     19 |
|  10 |        TABLE ACCESS FULL    | CATEGORY               |      1 |     16 |     16 |
|  11 |     HASH UNIQUE             |                        |      1 |   2263 |      0 |
|* 12 |      FILTER                 |                        |      1 |        |      0 |
|* 13 |       HASH JOIN             |                        |      0 |   5462 |      0 |
|* 14 |        HASH JOIN            |                        |      0 |   1000 |      0 |
|  15 |         TABLE ACCESS FULL   | CATEGORY               |      0 |     16 |      0 |
|  16 |         INDEX FAST FULL SCAN| PK_FILM_CATEGORY       |      0 |   1000 |      0 |
|  17 |        INDEX FAST FULL SCAN | PK_FILM_ACTOR          |      0 |   5462 |      0 |
-----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
   6 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
   9 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID')) 
              AND "FC"."FILM_ID"="FA"."FILM_ID")
  12 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='O')
  13 - access("FC"."FILM_ID"="FA"."FILM_ID")
  14 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")

As you can see, the first part of the UNION-ALL concatenation is still roughly the same, except we now got this FILTER operation on operation #5. The second part of the UNION-ALL operation, however, got its cardinalities completely wrong. Operation 11 estimates 2263 rows even if there are none.

The E-Rows column (estimated rows) estimate that all data is selected from our Sakila database, i.e. 1000 FILM_CATEGORY relationships and 5462 FILM_ACTOR relationships, which is all of our data. But the A-Rows column (actual rows) is zero, as expected, because we set our USER_TYPE context value to 'C' for customer, not 'O' for operator.

Interesting also the Starts column, which shows that the operations below the FILTER operation on line #12 aren’t started. This was also confirmed by Franck Pachot in that interesting Twitter conversation:

And also by Chris Antognini:

Good news, no?

Yes and no.

  • YES: Because even if the plan looks quite bad (for customer usage), it performed optimally. At least, Oracle knew when to stop even if estimates were wrong.
  • NO: Because all these cardinality (and cost) estimates will propagate leading to bigger and bigger errors, depending on how you use this view.

Check out the following query:

SELECT actor_id, name, COUNT(*)
FROM v_categories_per_actor ca
JOIN category c USING (name)
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id, actor_id)
GROUP BY actor_id, name
ORDER BY actor_id, name;

The above query could of course be stored in a view again to fit our security concept of giving grants only to views, not tables… In any case, what we’re doing here is the following:

We take all categories per actor from our previous view, then we want to count the number of total films in that category for that actor. If we’re still running this in a customer (C) context with ACTOR_ID = 1, we’ll get:

  ACTOR_ID NAME                        COUNT(*)
---------- ------------------------- ----------
         1 Animation                          1
         1 Children                           1
         1 Classics                           2
         1 Comedy                             1
         1 Documentary                        1
         1 Family                             2
         1 Foreign                            1
         1 Games                              2
         1 Horror                             3
         1 Music                              1
         1 New                                2
         1 Sci-Fi                             1
         1 Sports                             1

Which translates to: I (my user = ACTOR_ID = 1) have played in these categories, and these categories have so many total films that I played in. Again, the actual query might be much more complex, where we can’t easily factor out things (e.g. avoid doubling access to various tables). I’m just trying to make a point here.

What’s the plan of this query? It’s very bad

---------------------------------------------------------------------------------------
| Id  | Operation                       | Name             | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                  |      1 |        |     13 |
|   1 |  SORT ORDER BY                  |                  |      1 |   2263 |     13 |
|   2 |   HASH GROUP BY                 |                  |      1 |   2263 |     13 |
|   3 |    NESTED LOOPS                 |                  |      1 |   3883 |     19 |
|*  4 |     HASH JOIN                   |                  |      1 |  62130 |    247 |
|*  5 |      HASH JOIN                  |                  |      1 |   2275 |     13 |
|   6 |       TABLE ACCESS FULL         | CATEGORY         |      1 |     16 |     16 |
|   7 |       VIEW                      | V_CATEGORIES_... |      1 |   2275 |     13 |
|   8 |        UNION-ALL                |                  |      1 |        |     13 |
|   9 |         HASH UNIQUE             |                  |      1 |     12 |     13 |
|* 10 |          FILTER                 |                  |      1 |        |     19 |
|* 11 |           HASH JOIN             |                  |      1 |     27 |     19 |
|  12 |            NESTED LOOPS         |                  |      1 |     27 |     19 |
|  13 |             INDEX FAST FULL SCAN| PK_FILM_CATEGORY |      1 |   1000 |   1000 |
|* 14 |             INDEX UNIQUE SCAN   | PK_FILM_ACTOR    |   1000 |      1 |     19 |
|  15 |            TABLE ACCESS FULL    | CATEGORY         |      1 |     16 |     16 |
|  16 |         HASH UNIQUE             |                  |      1 |   2263 |      0 |
|* 17 |          FILTER                 |                  |      1 |        |      0 |
|* 18 |           HASH JOIN             |                  |      0 |   5462 |      0 |
|* 19 |            HASH JOIN            |                  |      0 |   1000 |      0 |
|  20 |             TABLE ACCESS FULL   | CATEGORY         |      0 |     16 |      0 |
|  21 |             INDEX FAST FULL SCAN| PK_FILM_CATEGORY |      0 |   1000 |      0 |
|  22 |            INDEX FAST FULL SCAN | PK_FILM_ACTOR    |      0 |   5462 |      0 |
|  23 |      INDEX FAST FULL SCAN       | PK_FILM_ACTOR    |      1 |   5462 |   5462 |
|* 24 |     INDEX UNIQUE SCAN           | PK_FILM_CATEGORY |    247 |      1 |     19 |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("CA"."ACTOR_ID"="FA"."ACTOR_ID")
   5 - access("CA"."NAME"="C"."NAME")
  10 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
  11 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
  14 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID'))
       AND "FC"."FILM_ID"="FA"."FILM_ID")
  17 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='O')
  18 - access("FC"."FILM_ID"="FA"."FILM_ID")
  19 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
  24 - access("FC"."FILM_ID"="FA"."FILM_ID" AND "C"."CATEGORY_ID"="FC"."CATEGORY_ID")

Why is it so bad?

Observe how at some point, I had a cardinality estimate of 62130 (operation #4), and the whole query was still expected to return 2263 rows (operation #1), when in fact, I got only 13.

Even if the optimiser got the number of rows almost right for the first UNION ALL subquery (estimated 12, got 13 on operation #9), the estimate for the second UNION ALL subquery made it think that with so many rows coming out of the view (2263 on operation #16), a hash join will be optimal to count the number of films (operation #4 and #5). While the actual numbers aren’t as bad as it was estimated, the hash join operation is much more costly for small data sets, than an equivalent nested loop join operation.

If we remove again that UNION ALL operation from the view, restricting the view back to the original customer (C) only use case:

CREATE OR REPLACE VIEW v_categories_per_actor AS
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'

… then, we get a much better plan for that aggregation, where we now get those nested loops:

-------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |      1 |        |     13 |
|   1 |  SORT ORDER BY                |                  |      1 |     12 |     13 |
|   2 |   HASH GROUP BY               |                  |      1 |     12 |     13 |
|   3 |    VIEW                       | VM_NWVW_1        |      1 |     47 |     19 |
|   4 |     HASH UNIQUE               |                  |      1 |     47 |     19 |
|*  5 |      FILTER                   |                  |      1 |        |     33 |
|   6 |       NESTED LOOPS            |                  |      1 |     47 |     33 |
|*  7 |        HASH JOIN              |                  |      1 |    746 |    361 |
|   8 |         NESTED LOOPS          |                  |      1 |     27 |     19 |
|*  9 |          HASH JOIN            |                  |      1 |   1000 |   1000 |
|* 10 |           HASH JOIN           |                  |      1 |     16 |     16 |
|  11 |            TABLE ACCESS FULL  | CATEGORY         |      1 |     16 |     16 |
|  12 |            TABLE ACCESS FULL  | CATEGORY         |      1 |     16 |     16 |
|  13 |           INDEX FAST FULL SCAN| PK_FILM_CATEGORY |      1 |   1000 |   1000 |
|* 14 |          INDEX UNIQUE SCAN    | PK_FILM_ACTOR    |   1000 |      1 |     19 |
|* 15 |         INDEX RANGE SCAN      | PK_FILM_ACTOR    |      1 |     27 |     19 |
|* 16 |        INDEX UNIQUE SCAN      | PK_FILM_CATEGORY |    361 |      1 |     33 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
   7 - access("FA"."ACTOR_ID"="FA"."ACTOR_ID")
   9 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
  10 - access("C"."NAME"="C"."NAME")
  14 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID')) 
              AND "FC"."FILM_ID"="FA"."FILM_ID")
  15 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID')))
  16 - access("FC"."FILM_ID"="FA"."FILM_ID" AND "C"."CATEGORY_ID"="FC"."CATEGORY_ID")

All the cardinality estimates are now much better, unsurprisingly. Let’s benchmark and measure, too:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 1000;
BEGIN
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT actor_id, name, COUNT(*)
      FROM v_categories_per_actor ca -- No UNION ALL
      JOIN category c USING (name)
      JOIN film_category fc USING (category_id)
      JOIN film_actor fa USING (film_id, actor_id)
      GROUP BY actor_id, name
      ORDER BY actor_id, name
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT actor_id, name, COUNT(*)
      FROM v_categories_per_actor2 ca -- With UNION ALL
      JOIN category c USING (name)
      JOIN film_category fc USING (category_id)
      JOIN film_actor fa USING (film_id, actor_id)
      GROUP BY actor_id, name
      ORDER BY actor_id, name
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
END;
/

Which yields:

Statement 1 : +000000000 00:00:01.940000000
Statement 2 : +000000000 00:00:02.923000000

Clearly, that UNION ALL and its resulting hash joins are hurting us drastically!

Conclusion: Use SYS_CONTEXT with care

SYS_CONTEXT can be useful to emulate parameterised views as we’ve seen. There’s nothing wrong about doing “extended row-level security” by adding SYS_CONTEXT predicates in views and granting access only to views, not tables, e.g.

CREATE OR REPLACE VIEW v_categories_per_actor AS
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'

However, due to the lack of peeking at those SYS_CONTEXT values, we cannot profit from the adaptive cursor sharing feature. This might have been possible with real parameterised views (such as they are supported in SQL Server), but in this case, we don’t get multiple alternative execution plans for the same SQL query, depending on SYS_CONTEXT values. This has been shown by Connor McDonald in his blog post:
https://connormcdonald.wordpress.com/2016/10/20/taking-a-peek-at-sys_context

I do hope that a future version of Oracle will treat SYS_CONTEXT more like bind variables, because ultimately, that’s what they are: Constant external values for the scope of a single query execution, or “parameters” like in “parameterised views”. Until we have that (or real parameterised views), I strongly advise against using SYS_CONTEXT for the use-case that my client was testing (cutting off individual UNION ALL subtrees from execution plans).

But the good news is: There’s nothing wrong with the ordinary use-case (forming predicates in WHERE clauses).

PS: If you like the idea of peeking at SYS_CONTEXT, be sure to also leave a comment here: https://community.oracle.com/ideas/15826