How to Write Multiset Conditions With Oracle VARRAY Types

Oracle is one of the few databases that implements the SQL standard ORDBMS extensions, which essentially allow for nested collections. Other databases that have these features to some extent are CUBRID, Informix, PostgreSQL.

Oracle has two types of nested collections:

-- Nested tables
CREATE TYPE t1 AS TABLE OF VARCHAR2(10);
/

-- Varrays
CREATE TYPE t2 AS VARRAY(10) OF VARCHAR2(10);
/

The main difference at first is that a nested table can be of arbitrary size, whereas a varray has a fixed maximum size. Other than that, they behave in similar ways.

When storing a nested collection in a table, there is another difference. Varrays can be inlined into the table just like any other data type, whereas nested tables have to be accompanied by an additional storage clause:

CREATE TABLE t (
  id NUMBER(10),
  t1 t1,
  t2 t2
)
NESTED TABLE t1 STORE AS t1_nt;

This is a minor hassle in terms of DDL. The runtime implications are more significant.

Multiset Conditions

The most important difference is the fact that all the useful multiset conditions are not available with varrays. For instance, consider running these statements:

INSERT INTO t VALUES (1, NULL, NULL);
INSERT INTO t VALUES (2, t1(), t2());
INSERT INTO t VALUES (
  3, 
  t1('abc', 'xyz', 'zzz'), 
  t2('abc', 'xyz', 'zzz')
);
INSERT INTO t VALUES (
  4, 
  t1('dup', 'dup', 'dup'), 
  t2('dup', 'dup', 'dup')
);

SELECT * FROM t WHERE 'abc' MEMBER OF t1;
SELECT * FROM t WHERE 'abc' MEMBER OF t2;

The result of these queries is:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

ORA-00932: inconsistent datatypes: expected UDT got TEST.T2

Bummer. The documentation is a bit unclear about this. It reads (emphasis mine):

he return value is TRUE if expr is equal to a member of the specified nested table or varray. The return value is NULL if expr is null or if the nested table is empty.

There is some explicit mention of varrays supporting these operations, but in most of the documentation, varrays are not mentioned. So, how can we write such operations with varrays? Here’s an list of translations of the nested table operator to the equivalent SQL expression for use with varrays.

These are the multiset conditions:

IS A SET condition

In SQL, everything is a (partially ordered) multiset by default. Sometimes, however, we want to work with sets, i.e. a special type of multiset that has no duplicate values. We can easily check whether nested tables are sets (or whether they aren’t):

-- Nested table version
SELECT * FROM t WHERE t1 IS A SET;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND (SELECT count(*) FROM TABLE(t2)) 
  = (SELECT count(DISTINCT column_value) FROM TABLE(t2));

The IS A SET operation yields UNKNOWN if the nested table is NULL, so we have to take that into account as well. If it isn’t NULL, we can count the total values in the varray and compare that with the total distinct values in the varray.

The result is:

ID  T1                        T2
-----------------------------------------------------
2   T1()                      T2()
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

IS EMPTY condition

This predicate needs no explanation. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE t1 IS EMPTY;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT * FROM TABLE (t2)
);

The result being:

ID  T1                 T2
---------------------------------------
2   T1()               T2()

MEMBER condition

This handy predicate can help check if a specific value is contained in a nested collection. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE 'abc' MEMBER OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) WHERE column_value = 'abc'
);

Yielding:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

SUBMULTISET condition

Just like the previous MEMBER condition, this predicate can help check if specific values (more than one) are contained in a nested collection. This is a bit more tricky than the previous emulations. The MEMBER condition works the same way for sets and multisets, as we’re checking if exactly one element is contained in the (multi)set.

When working with multisets, duplicates are allowed, and in the case of the SUBMULTISET operation, the following can be observed:

-- Equal multisets
t1() SUBMULTISET OF t1();
t1('a', 'a') SUBMULTISET OF t1('a', 'a');

-- Subsets
t1('a') SUBMULTISET OF t1('a', 'a');

-- But this is not true
t1('a', 'a') SUBMULTISET OF t1('a');

When we omit the fact that nested collections can be multisets and pretend we’re working with sets only, then the emulation of the SUBMULTISET operator is relatively easy:

-- Nested table version
SELECT * FROM t WHERE t1('abc', 'xyz') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'abc'
  INTERSECT
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'xyz'
);

Yielding, once more:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

If we’re really working with multisets, things are a bit more tricky:

-- Nested table version
SELECT * FROM t WHERE t1('dup', 'dup') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT column_value, count(*)
  FROM TABLE (t2('dup', 'dup')) x
  GROUP BY column_value
  HAVING count(*) > (
    SELECT count(*)
    FROM TABLE (t2) y
    WHERE y.column_value = x.column_value
  )
);

Yielding:

ID  T1                        T2
-----------------------------------------------------
4   T1('dup', 'dup', 'dup')   T2('dup', 'dup', 'dup')

How does it work? In the NOT EXISTS correlated subquery, we’re counting the number of duplicate values in the potential SUBMULTISET, effectively turning that SUBMULTISET into a SET using the GROUP BY operation.

We’re then comparing that count value from the left operand with the corresponding count value from the right operand. If there is no value in the left operand whose number of occurrences is bigger than the number of occurrences of that value in the right operand, then the whole left operand is a SUBMULTISET of the right operand.

Cool, eh? We’ll talk about performance another time :-)

MULTISET operators

Also very interesting, the multiset operators:

  • MULTISET EXCEPT [ ALL | DISTINCT ]
  • MULTISET INTERSECT [ ALL | DISTINCT ]
  • MULTISET UNION [ ALL | DISTINCT ]

Notice how there are some differences to the ordinary set operators that can be used in SELECT statements. In particular:

  • EXCEPT is used as defined in the standard, not MINUS
  • ALL is supported on all three operators, not just on UNION
  • ALL is the default, not DISTINCT

How can we work with these operators? Consider these queries:

SELECT id, t1 MULTISET EXCEPT t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

SELECT id, t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

Both yielding:

ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('dup')

With this operator, we’re removing each element of the right operand once from the left operand:

  • 'aaa' does not appear in the left operand, so nothing happens
  • 'abc' appears on row with ID = 3 and we remove it
  • 'dup' appears on row with ID = 4, 3 times, and we remove it twice, leaving one value

Conversely, when adding DISTINCT, we’ll get:

SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup') FROM t;

Yielding:

ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('')

The only difference is on row with ID = 4, where all 'dup' values were removed, regardless how many there were on either side of the MULTISET EXCEPT DISTINCT operator.

How to emulate this for varrays?

DISTINCT version

This is a bit easier, because we can now use MINUS:

-- Nested table version
SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM TABLE (t2)
        MINUS
        SELECT column_value
        FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
      ) AS t2)
  END r
FROM t;

Luckily, we can still cast a structural MULTISET type that we can obtain using the MULTISET() operator to a varray type. This greatly simplifies the task.

ALL version

If we want the MULTISET EXCEPT or MULTISET EXCEPT ALL semantics, things are trickier. Here’s a solution that resorts to using window functions, in order to turn a MULTISET back into a SET:

-- Nested table version
SELECT t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM (
          SELECT 
            column_value,
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2)
          MINUS
          SELECT 
            column_value, 
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
        )
      ) AS t2)
  END r
FROM t;

How does this work? Ideally, we’ll look at what this ROW_NUMBER() evaluates to on each row. For this, we use OUTER APPLY:

SELECT id, t2, column_value, rn
FROM t
OUTER APPLY (
  SELECT 
    column_value,
    row_number() OVER (
      PARTITION BY column_value
      ORDER BY column_value) rn
  FROM TABLE (t2)
);

The result is:

ID      T2                       COLUMN_VALUE  RN
-----------------------------------------------------
1       (null)                   (null)        (null)
2       T2()                     (null)        (null)
3       T2('abc', 'xyz', 'zzz')  abc           1
3       T2('abc', 'xyz', 'zzz')  xyz           1
3       T2('abc', 'xyz', 'zzz')  zzz           1
4       T2('dup', 'dup', 'dup')  dup           1
4       T2('dup', 'dup', 'dup')  dup           2
4       T2('dup', 'dup', 'dup')  dup           3

As can be seen, each duplicate value gets assigned a unique row number due to the nature of how ROW_NUMBER() works (this property can be very useful for solving the gaps-and-islands-problem. See trick #4).

Now that we turned our (COLUMN_VALUE) multiset into a (COLUMN_VALUE, RN) set (without duplicates), we can use MINUS again.

MULTISET INTERSECT and MULTISET UNION

MULTISET INTERSECT works exactly the same way as MULTISET EXCEPT, with the same window function based emulation in the MULTISET INTERSECT ALL case. MULTISET UNION is simpler, because Oracle knows UNION ALL, so we do not need to resort to such trickery.

Conclusion

Nested collections are a very powerful tool in Oracle SQL. Oracle knows two types of nested collections:

  • Nested tables
  • Varrays

Nested tables are trickier to maintain as you have to think of their storage more explicitly. Varrays can just be embedded into ordinary tables like any other column. But there’s a price to pay for using varrays. Oracle regrettably doesn’t support all of the above very useful multiset conditions and multiset operators.

Luckily, when you encounter a situation where you have varrays and cannot change that, you can still emulate each of the operators using more traditional SQL.

The power of REF CURSOR types

Many RDBMS have started implementing support for some sort of CURSOR, REF CURSOR or ARRAY/TABLE types. These types have roughly the same semantics as JDBC‘s  java.sql.Array and java.sql.ResultSet. In principle, such types can appear anywhere in SQL, even if some RDBMS limit support to a certain functionality subset. What exactly are these types?

ARRAY types

ARRAY types are the easiest to understand. An array is usually implemented as a typed collection of values. They can be used both for table columns and with stored procedures. Most database schema designers would agree that using ARRAY types in tables is not necessarily a good idea, as this will lead to a schema that is normalised in a “non-first normal form“. On the other hand, ARRAY types can be very powerful when used as stored procedure parameters and especially as stored function results. In both scenarios (table columns, function results), arrays can be unnested using operators such as TABLE(…) or UNNEST(…). These operators will make the contents of an array available to any SQL clause that takes tables as arguments. An example (in HSQLDB):

-- unnest an ad-hoc anonymously typed array
SELECT element
FROM unnest(array[1, 2, 3]) AS unnested(element);

A similar example (in Oracle, which doesn’t support anonymous ARRAY types):

-- create a typed array of numbers
CREATE TYPE number_array AS VARRAY(10) OF NUMBER(7);

-- unnest an ad-hoc array instance. Oracle will name
-- the column of such an unnested array "COLUMN_VALUE"
SELECT column_value
FROM table(number_array(1, 2, 3));

Both constructs will result in a simple table holding one column and three records: 1, 2, 3. As mentioned previously, the power of such ARRAY types becomes obvious, when using them as results in stored functions. In Oracle, for instance, you could define a function like this:

-- a simple example function
CREATE FUNCTION get_array RETURN number_array IS
BEGIN
  RETURN number_array(1, 2, 3);
END;

-- unnest results from a function returning an array
SELECT column_value
FROM TABLE(get_array);

With these syntactical outlines, you can define very complex functions returning well-defined types, which can then again be unnested into SQL tables. The power of this functionality knows little limits, when you combine ARRAY types with OBJECT types, in those RDBMS that support it. Again, with Oracle you can write:

-- A simple, reusable person type
CREATE TYPE person AS OBJECT(
  id NUMBER(7),
  name VARCHAR2(100)
);

-- An array of such users
CREATE TYPE person_array AS VARRAY(10) OF person;

-- An unnested array of such users used in SQL:
SELECT *
FROM TABLE(person_array(
  person(1, 'Jim'),
  person(2, 'Joe')
));

The above SELECT statement will intuitively result in a two-column, two-record table looking like this:

ID NAME
1 Jim
2 Joe

Again, this could be used in combination with stored functions that do a lot more complex processing first to calculate the above result including Jim and Joe.

TABLE types

Some RDBMS (e.g. Oracle) distinguish between in-memory ARRAY types (e.g. the VARRAY type we’ve seen before), and in-memory TABLE types. The main difference for this article is the fact that VARRAY types have a maximum size, whereas TABLE types can be extended to any arbitrary length. Also, Oracle’s API for manipulating nested tables directly in SQL is richer than that of VARRAY types, if you want to add a record to a table nested in another table, for instance. A thorough comparison would be out of scope here, though.

REF CURSORS

Cursors, and in particular REF CURSORS, are handled differently in a way that they don’t directly contain all of their data, but can be iterated (or “looped”) over. Also, a REF CURSOR is a weakly typed object, which means that the number and types of a REF CURSOR’s columns cannot be known at SQL compile-time (or “parse-time”), but only when the SQL statement is executed. This makes it harder to use REF CURSOR directly in SQL. Oracle’s TABLE(…) function, for instance, does not support REF CURSOR types as parameter. See also an overview of what’s possible and what isn’t on this Stack Overflow question:

http://stackoverflow.com/questions/6410452/fetch-oracle-table-type-from-stored-procedure-using-jdbc

Nevertheless, a REF CURSOR can be returned from a stored procedure or from a stored function, and retrieved in JDBC like any other java.sql.ResultSet.

jOOQ’s support for ARRAY/CURSOR types

One of jOOQ’s major goals is to allow its users to easily integrate advanced RDBMS concepts into Java, a non-trivial endeavour when using JDBC. These concepts have not yet (to my knowledge) been standardised in SQL:2008. Every RDBMS that supports those concepts, employs its own syntax. The most curious (yet also the most powerful) syntax I have encountered is that of the H2 database, where unnesting of arrays can be done like this:

-- unnest an ad-hoc anonymously typed array
SELECT *
FROM TABLE(
  ID INT=(1, 2),
  NAME VARCHAR=('Hello', 'World'));

The above example can be found here:

http://www.h2database.com/html/functions.html#table

Apart from hiding these many SQL syntax facts from the user, jOOQ also aims at hiding JDBC statement preparing and type mapping from the user. Passing arrays to a prepared statement is not trivial with Oracle, fetching ResultSets from stored functions isn’t either. And none of todays major frameworks, including Hibernate/JPA, Spring, myBatis, etc, support an easy way to integrate such data types and stored procedures automatically into Java – although Spring allows you to write one custom mapper per procedure.

With jOOQ this is going to be possible. While ARRAY types have been supported by jOOQ for quite some time now, unnesting them is part of the currently running “project CURSOR”. The API is being extended to handle the examples given in SQL, above. An example of such API usage is this:

// Create the usual jOOQ factory
Factory create = new OracleFactory(connection);

// Loop over the values returned from thegenerated getArray function
for (Record record : create.select()
                           .from(create.table(getArray()))
                           .fetch()) {

    // This will print 1, 2, 3
    System.out.println(record.getValue(0));
}

Advanced data types as a future investment

The power of these data types have long been known (and loved) by DBA and database programmers using PL/SQL and other database languages. They have been avoided by Java developers mainly because of the awkwardness (or even lack) of JDBC and/or JPA support. It is difficult for an unexperienced JDBC developer, to correctly bind arrays of object to an Oracle prepared statement, or to fetch a REF CURSOR from a CallableStatement.

The depicted functionality is already available with jOOQ 1.6.3. Many other features dealing with CURSOR and ARRAY types will be implemented in the near future.

…so start using the full functionality set of your database!