Beautiful SQL: Lateral Unnesting of Array Columns

Sometimes, SQL can just be so beautiful. One of the less mainstream features in SQL is the array type (or nested collections). In fact, it’s so not mainstream that only 2 major databases actually support it: Oracle and PostgreSQL (and HSQLDB and H2 in the Java ecosystem).

In PostgreSQL, you can write:

CREATE TABLE blogs (
  id    SERIAL NOT NULL PRIMARY KEY,
  title text   NOT NULL,
  tags  text[]
)

Or in Oracle:

-- Oracle only knows nominal array types, so we have to declare
-- them in advance
CREATE TYPE tag_t AS VARRAY(100) OF VARCHAR2(100 CHAR);

CREATE TABLE blogs (
  id    NUMBER(18) GENERATED BY DEFAULT AS IDENTITY 
                   NOT NULL PRIMARY KEY,
  title VARCHAR2(100 CHAR) NOT NULL,
  tags  tag_t
)

So, roughly the same thing. Now, let’s insert some data. How about the 3 most recent posts on the jOOQ blog, prior to this one:

In PostgreSQL:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  ARRAY[
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ]
), (
  'How to Execute SQL Batches With JDBC and jOOQ',
  ARRAY[
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ]
), (
  'How to Emulate Partial Indexes in Oracle',
  ARRAY[
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ]
)

Or in Oracle:

INSERT INTO blogs (title, tags)
VALUES (
  'How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ',
  tag_t(
    'implicit cursor',
    'batch',
    'oracle',
    'jooq',
    'jdbc',
    'resultset'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Execute SQL Batches With JDBC and jOOQ',
  tag_t(
    'batch',
    'batch statement',
    'mysql',
    'jooq',
    'jdbc',
    'sql server',
    'sql'
  ));
INSERT INTO blogs (title, tags)
VALUES (
  'How to Emulate Partial Indexes in Oracle',
  tag_t(
    'optimisation',
    'index',
    'partial index',
    'oracle',
    'sql',
    'postgresql',
    't-sql',
    'sql server'
  ));

Now, the array type by itself is not very useful. When it gets really interesting is when we unnest it again into a table. For instance in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) AS tags(tag);

Or in Oracle:

-- Classic style
SELECT title, tags.*
FROM blogs, TABLE(tags) tags;

-- Since Oracle 12c
SELECT title, tags.*
FROM blogs, LATERAL (SELECT * FROM TABLE(tags)) tags;

Note that we’re using the keyword LATERAL in some of the above queries. For those of you who are used to T-SQL syntax, it’s almost the same thing as APPLY. Both LATERAL and APPLY are also very useful with table valued functions (stay tuned for a blog post on those).

The idea behind LATERAL is that the table (derived table, subquery, function call, array unnesting) on the right side of LATERAL can “laterally” access stuff from the left side of LATERAL in order to produce new tables. In the above query, we’re producing a new table of tags for each blog post, and then we cross join the two tables.

Here’s what the above queries result in:

title                                                         tag
-----------------------------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   implicit cursor
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   batch
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   oracle
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jooq
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   jdbc
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ   resultset
How to Execute SQL Batches With JDBC and jOOQ                 batch
How to Execute SQL Batches With JDBC and jOOQ                 batch statement
How to Execute SQL Batches With JDBC and jOOQ                 mysql
How to Execute SQL Batches With JDBC and jOOQ                 jooq
How to Execute SQL Batches With JDBC and jOOQ                 jdbc
How to Execute SQL Batches With JDBC and jOOQ                 sql server
How to Execute SQL Batches With JDBC and jOOQ                 sql
How to Emulate Partial Indexes in Oracle                      optimisation
How to Emulate Partial Indexes in Oracle                      index
How to Emulate Partial Indexes in Oracle                      partial index
How to Emulate Partial Indexes in Oracle                      oracle
How to Emulate Partial Indexes in Oracle                      sql
How to Emulate Partial Indexes in Oracle                      postgresql
How to Emulate Partial Indexes in Oracle                      t-sql
How to Emulate Partial Indexes in Oracle                      sql server

You can immediately see the cross join semantics here, as we’re combining each tag (per post) with its post.

Looking for ordinals (i.e. the tag number inside of the array) along with the array? Easy!

Just add the powerful WITH ORDINALITY clause after the UNNEST() call in PostgreSQL:

SELECT title, tag
FROM blogs, LATERAL unnest(tags) WITH ORDINALITY AS tags(tag);

A bit more complicated to emulate in Oracle:

-- Fancy, with a window function
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROW_NUMBER() OVER (ORDER BY NULL)
  FROM TABLE(tags) tags
) tags;

-- Classic, with ROWNUM
SELECT title, tags.*
FROM blogs, LATERAL (
  SELECT tags.*, ROWNUM
  FROM TABLE(tags) tags
) tags;

The result now contains the tag “ID”, i.e the ordinal of the tag inside of the array:

title                                           tag               ordinal
-------------------------------------------------------------------------
How to Fetch ... Cursors with JDBC and jOOQ     implicit cursor   1
How to Fetch ... Cursors with JDBC and jOOQ     batch             2
How to Fetch ... Cursors with JDBC and jOOQ     oracle            3
How to Fetch ... Cursors with JDBC and jOOQ     jooq              4
How to Fetch ... Cursors with JDBC and jOOQ     jdbc              5
How to Fetch ... Cursors with JDBC and jOOQ     resultset         6
How to Execute SQL Batches With JDBC and jOOQ   batch             1
How to Execute SQL Batches With JDBC and jOOQ   batch statement   2
How to Execute SQL Batches With JDBC and jOOQ   mysql             3
How to Execute SQL Batches With JDBC and jOOQ   jooq              4
How to Execute SQL Batches With JDBC and jOOQ   jdbc              5
How to Execute SQL Batches With JDBC and jOOQ   sql server        6
How to Execute SQL Batches With JDBC and jOOQ   sql               7
How to Emulate Partial Indexes in Oracle        optimisation      1
How to Emulate Partial Indexes in Oracle        index             2
How to Emulate Partial Indexes in Oracle        partial index     3
How to Emulate Partial Indexes in Oracle        oracle            4
How to Emulate Partial Indexes in Oracle        sql               5
How to Emulate Partial Indexes in Oracle        postgresql        6
How to Emulate Partial Indexes in Oracle        t-sql             7
How to Emulate Partial Indexes in Oracle        sql server        8

Now, imagine looking for those blog posts that are tagged “jooq”. Easy!

PostgreSQL:

SELECT title
FROM blogs
WHERE 'jooq' = ANY(tags);

Oracle:

SELECT title
FROM blogs
WHERE 'jooq' IN (SELECT * FROM TABLE(tags));

Yielding:

title
-----------------------------------------------------------
How to Fetch Oracle 12c Implicit Cursors with JDBC and jOOQ
How to Execute SQL Batches With JDBC and jOOQ

Conclusion

These are just a few nice things we can do when we denormalise our data into nested collections / arrays, and then use features like UNNEST to bring them back to the table level. Both Oracle and PostgreSQL support a variety of really nice features building on top of arrays, so do check them out!

What if every object was an array? No more NullPointerExceptions!

To NULL or not to NULL? Programming language designers inevitably have to decide whether they support NULLs or not. And they’ve proven to have a hard time getting this right. NULL is not intuitive in any language, because NULL is an axiom of that language, not a rule that can be derived from lower-level axioms. Take Java for instance, where

// This yields true:
null == null

// These throw an exception (or cannot be compiled)
null.toString();
int value = (Integer) null;

It’s not like there weren’t any alternatives. SQL, for instance, implements a more expressive but probably less intuitive three-value logic, which most developers get wrong in subtle ways once in a while.

At the same time, SQL doesn’t know “NULL” results, only “NULL” column values. From a set theory perspective, there are only empty sets, not NULL sets.

Other languages allow for dereferencing null through special operators, letting the compiler generate tedious null checks for you, behind the scenes. An example for this is Groovy with its null-safe dereferencing operator. This solution is far from being generally accepted, as can be seen in this discussion about a Scala equivalent. Scala uses Option, which Java 8 will imitate using Optional (or @Nullable).

Let’s think about a much broader solution

To me, nullability isn’t a first-class citizen. I personally dislike the fact that Scala’s Option[T] type pollutes my type system by introducing a generic wrapper type (even if it seems to implement similar array-features through the traversable trait). I don’t want to distinguish the types of Option[T] and T. This is specifically true when reasoning about types from a reflection API perspective, where Scala’s (and Java’s) legacy will forever keep me from accessing the type of T at runtime.

But much worse, most of the times, in my application I don’t really want to distinguish between “option” references and “some” references. Heck, I don’t even want to distinguish between having 1 reference and having dozens. jQuery got this quite right. One of the main reasons why jQuery is so popular is because everything you do, you do on a set of wrapped DOM elements. The API never distinguishes between matching 1 or 100 div’s. Check out the following code:

// This clearly operates on a single object or none
$('div#unique-id').html('new content')
                  .click(function() { ... });

// This possibly operates on several objects or none
$('div.any-class').html('new content')
                  .click(function() { ... });

This is possible because JavaScript allows you to override the prototype of the JavaScript Array type, modifying arrays in general, at least for the scope of the jQuery library. How more awesome can it get? .html() and .click() are actions performed on the array as a whole, no matter if you have zero, one, or 100 elements in your match. What would a more typesafe language look like, where everything behaves like an array (or an ArrayList)? Think about the following model:

class Customer {
  String firstNames;  // Read as String[] firstNames
  String lastName;    // Read as String[] lastName
  Order orders;       // Read as Order[] orders
}

class Order {
  int value;          // Read as int[] value
  boolean shipped() { // Read as boolean[] shipped
  }
}

Don’t rant (just yet). Let’s assume this wouldn’t lead to memory or computation overhead. Let’s continue thinking about the advantages of this. So, I want to see if a Customer’s orders have been shipped. Easy:

Customer customer = // ...
boolean shipped = customer.orders.shipped();

This doesn’t look spectacular (yet). But beware of the fact that a customer can have several orders, and the above check is really to see if all orders have been shipped. I really don’t want to write the loop, I find it quite obvious that I want to perform the shipped() check on every order. Consider:

// The length pseudo-field would still be
// present on orders
customer.orders.length;

// In fact, the length pseudo-field is also
// present on customer, in case there are several
customer.length;

// Let's add an order to the customer:
customer.orders.add(new Order());

// Let's reset order
customer.orders.clear();

// Let's calculate the sum of all values
// OO-style:
customer.orders.value.sum();
// Functional style:
sum(customer.orders.value);

Of course there would be a couple of caveats and the above choice of method names might not be the best one. But being able to deal with single references (nullable or non-nullable) or array references (empty, single-valued, multi-valued) in the same syntactic way is just pure syntax awesomeness. Null-checks would be replaced by length checks, but mostly you don’t even have to do those, because each method would always be called on every element in the array. The current single-reference vs. multi-reference semantics would be documented by naming conventions. Clearly, naming something “orders” indicates that multi-references are possible, whereas naming something “customer” indicates that multi-references are improbable.

As users have commented, this technique is commonly referred to as array programming, which is implemented in Matlab or R.

Convinced?

I’m curious to hear your thoughts!

How to Execute Something Multiple Times in Java

When writing unit / integration tests, you often want to execute something multiple times, with different configurations / parameters / arguments every time. For instance, if you want to pass a “limit” or “timeout” or any other argument value of 1, 10, and 100, you could do this:

@Test
public void test() {
    runCode(1);
    runCode(10);
    runCode(100);
}

private void runCode(int argument) {

    // Run the actual test
    assertNotNull(new MyObject(argument).execute());
}

Exracting methods is the most obvious approach, but it can quickly get nasty, as these extracted methods are hardly re-usable outside of that single test-case and thus don’t really deserve being put in their own methods. Instead, just use this little trick:

@Test
public void test() {

    // Repeat the contents 3 times, for values 1, 10, 100
    for (int argument : new int[] { 1, 10, 100 }) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }

    // Alternatively, use Arrays.asList(), which has a similar effect:
    for (Integer argument : Arrays.asList(1, 10, 100)) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }
}

The power of REF CURSOR types

Many RDBMS have started implementing support for some sort of CURSOR, REF CURSOR or ARRAY/TABLE types. These types have roughly the same semantics as JDBC‘s  java.sql.Array and java.sql.ResultSet. In principle, such types can appear anywhere in SQL, even if some RDBMS limit support to a certain functionality subset. What exactly are these types?

ARRAY types

ARRAY types are the easiest to understand. An array is usually implemented as a typed collection of values. They can be used both for table columns and with stored procedures. Most database schema designers would agree that using ARRAY types in tables is not necessarily a good idea, as this will lead to a schema that is normalised in a “non-first normal form“. On the other hand, ARRAY types can be very powerful when used as stored procedure parameters and especially as stored function results. In both scenarios (table columns, function results), arrays can be unnested using operators such as TABLE(…) or UNNEST(…). These operators will make the contents of an array available to any SQL clause that takes tables as arguments. An example (in HSQLDB):

-- unnest an ad-hoc anonymously typed array
SELECT element
FROM unnest(array[1, 2, 3]) AS unnested(element);

A similar example (in Oracle, which doesn’t support anonymous ARRAY types):

-- create a typed array of numbers
CREATE TYPE number_array AS VARRAY(10) OF NUMBER(7);

-- unnest an ad-hoc array instance. Oracle will name
-- the column of such an unnested array "COLUMN_VALUE"
SELECT column_value
FROM table(number_array(1, 2, 3));

Both constructs will result in a simple table holding one column and three records: 1, 2, 3. As mentioned previously, the power of such ARRAY types becomes obvious, when using them as results in stored functions. In Oracle, for instance, you could define a function like this:

-- a simple example function
CREATE FUNCTION get_array RETURN number_array IS
BEGIN
  RETURN number_array(1, 2, 3);
END;

-- unnest results from a function returning an array
SELECT column_value
FROM TABLE(get_array);

With these syntactical outlines, you can define very complex functions returning well-defined types, which can then again be unnested into SQL tables. The power of this functionality knows little limits, when you combine ARRAY types with OBJECT types, in those RDBMS that support it. Again, with Oracle you can write:

-- A simple, reusable person type
CREATE TYPE person AS OBJECT(
  id NUMBER(7),
  name VARCHAR2(100)
);

-- An array of such users
CREATE TYPE person_array AS VARRAY(10) OF person;

-- An unnested array of such users used in SQL:
SELECT *
FROM TABLE(person_array(
  person(1, 'Jim'),
  person(2, 'Joe')
));

The above SELECT statement will intuitively result in a two-column, two-record table looking like this:

ID NAME
1 Jim
2 Joe

Again, this could be used in combination with stored functions that do a lot more complex processing first to calculate the above result including Jim and Joe.

TABLE types

Some RDBMS (e.g. Oracle) distinguish between in-memory ARRAY types (e.g. the VARRAY type we’ve seen before), and in-memory TABLE types. The main difference for this article is the fact that VARRAY types have a maximum size, whereas TABLE types can be extended to any arbitrary length. Also, Oracle’s API for manipulating nested tables directly in SQL is richer than that of VARRAY types, if you want to add a record to a table nested in another table, for instance. A thorough comparison would be out of scope here, though.

REF CURSORS

Cursors, and in particular REF CURSORS, are handled differently in a way that they don’t directly contain all of their data, but can be iterated (or “looped”) over. Also, a REF CURSOR is a weakly typed object, which means that the number and types of a REF CURSOR’s columns cannot be known at SQL compile-time (or “parse-time”), but only when the SQL statement is executed. This makes it harder to use REF CURSOR directly in SQL. Oracle’s TABLE(…) function, for instance, does not support REF CURSOR types as parameter. See also an overview of what’s possible and what isn’t on this Stack Overflow question:

http://stackoverflow.com/questions/6410452/fetch-oracle-table-type-from-stored-procedure-using-jdbc

Nevertheless, a REF CURSOR can be returned from a stored procedure or from a stored function, and retrieved in JDBC like any other java.sql.ResultSet.

jOOQ’s support for ARRAY/CURSOR types

One of jOOQ’s major goals is to allow its users to easily integrate advanced RDBMS concepts into Java, a non-trivial endeavour when using JDBC. These concepts have not yet (to my knowledge) been standardised in SQL:2008. Every RDBMS that supports those concepts, employs its own syntax. The most curious (yet also the most powerful) syntax I have encountered is that of the H2 database, where unnesting of arrays can be done like this:

-- unnest an ad-hoc anonymously typed array
SELECT *
FROM TABLE(
  ID INT=(1, 2),
  NAME VARCHAR=('Hello', 'World'));

The above example can be found here:

http://www.h2database.com/html/functions.html#table

Apart from hiding these many SQL syntax facts from the user, jOOQ also aims at hiding JDBC statement preparing and type mapping from the user. Passing arrays to a prepared statement is not trivial with Oracle, fetching ResultSets from stored functions isn’t either. And none of todays major frameworks, including Hibernate/JPA, Spring, myBatis, etc, support an easy way to integrate such data types and stored procedures automatically into Java – although Spring allows you to write one custom mapper per procedure.

With jOOQ this is going to be possible. While ARRAY types have been supported by jOOQ for quite some time now, unnesting them is part of the currently running “project CURSOR”. The API is being extended to handle the examples given in SQL, above. An example of such API usage is this:

// Create the usual jOOQ factory
Factory create = new OracleFactory(connection);

// Loop over the values returned from thegenerated getArray function
for (Record record : create.select()
                           .from(create.table(getArray()))
                           .fetch()) {

    // This will print 1, 2, 3
    System.out.println(record.getValue(0));
}

Advanced data types as a future investment

The power of these data types have long been known (and loved) by DBA and database programmers using PL/SQL and other database languages. They have been avoided by Java developers mainly because of the awkwardness (or even lack) of JDBC and/or JPA support. It is difficult for an unexperienced JDBC developer, to correctly bind arrays of object to an Oracle prepared statement, or to fetch a REF CURSOR from a CallableStatement.

The depicted functionality is already available with jOOQ 1.6.3. Many other features dealing with CURSOR and ARRAY types will be implemented in the near future.

…so start using the full functionality set of your database!