How to Write Multiset Conditions With Oracle VARRAY Types

Oracle is one of the few databases that implements the SQL standard ORDBMS extensions, which essentially allow for nested collections. Other databases that have these features to some extent are CUBRID, Informix, PostgreSQL.

Oracle has two types of nested collections:

-- Nested tables
CREATE TYPE t1 AS TABLE OF VARCHAR2(10);
/

-- Varrays
CREATE TYPE t2 AS VARRAY(10) OF VARCHAR2(10);
/

The main difference at first is that a nested table can be of arbitrary size, whereas a varray has a fixed maximum size. Other than that, they behave in similar ways.

When storing a nested collection in a table, there is another difference. Varrays can be inlined into the table just like any other data type, whereas nested tables have to be accompanied by an additional storage clause:

CREATE TABLE t (
  id NUMBER(10),
  t1 t1,
  t2 t2
)
NESTED TABLE t1 STORE AS t1_nt;

This is a minor hassle in terms of DDL. The runtime implications are more significant.

Multiset Conditions

The most important difference is the fact that all the useful multiset conditions are not available with varrays. For instance, consider running these statements:

INSERT INTO t VALUES (1, NULL, NULL);
INSERT INTO t VALUES (2, t1(), t2());
INSERT INTO t VALUES (
  3, 
  t1('abc', 'xyz', 'zzz'), 
  t2('abc', 'xyz', 'zzz')
);
INSERT INTO t VALUES (
  4, 
  t1('dup', 'dup', 'dup'), 
  t2('dup', 'dup', 'dup')
);

SELECT * FROM t WHERE 'abc' MEMBER OF t1;
SELECT * FROM t WHERE 'abc' MEMBER OF t2;

The result of these queries is:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

ORA-00932: inconsistent datatypes: expected UDT got TEST.T2

Bummer. The documentation is a bit unclear about this. It reads (emphasis mine):

he return value is TRUE if expr is equal to a member of the specified nested table or varray. The return value is NULL if expr is null or if the nested table is empty.

There is some explicit mention of varrays supporting these operations, but in most of the documentation, varrays are not mentioned. So, how can we write such operations with varrays? Here’s an list of translations of the nested table operator to the equivalent SQL expression for use with varrays.

These are the multiset conditions:

IS A SET condition

In SQL, everything is a (partially ordered) multiset by default. Sometimes, however, we want to work with sets, i.e. a special type of multiset that has no duplicate values. We can easily check whether nested tables are sets (or whether they aren’t):

-- Nested table version
SELECT * FROM t WHERE t1 IS A SET;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND (SELECT count(*) FROM TABLE(t2)) 
  = (SELECT count(DISTINCT column_value) FROM TABLE(t2));

The IS A SET operation yields UNKNOWN if the nested table is NULL, so we have to take that into account as well. If it isn’t NULL, we can count the total values in the varray and compare that with the total distinct values in the varray.

The result is:

ID  T1                        T2
-----------------------------------------------------
2   T1()                      T2()
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

IS EMPTY condition

This predicate needs no explanation. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE t1 IS EMPTY;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT * FROM TABLE (t2)
);

The result being:

ID  T1                 T2
---------------------------------------
2   T1()               T2()

MEMBER condition

This handy predicate can help check if a specific value is contained in a nested collection. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE 'abc' MEMBER OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) WHERE column_value = 'abc'
);

Yielding:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

SUBMULTISET condition

Just like the previous MEMBER condition, this predicate can help check if specific values (more than one) are contained in a nested collection. This is a bit more tricky than the previous emulations. The MEMBER condition works the same way for sets and multisets, as we’re checking if exactly one element is contained in the (multi)set.

When working with multisets, duplicates are allowed, and in the case of the SUBMULTISET operation, the following can be observed:

-- Equal multisets
t1() SUBMULTISET OF t1();
t1('a', 'a') SUBMULTISET OF t1('a', 'a');

-- Subsets
t1('a') SUBMULTISET OF t1('a', 'a');

-- But this is not true
t1('a', 'a') SUBMULTISET OF t1('a');

When we omit the fact that nested collections can be multisets and pretend we’re working with sets only, then the emulation of the SUBMULTISET operator is relatively easy:

-- Nested table version
SELECT * FROM t WHERE t1('abc', 'xyz') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'abc'
  INTERSECT
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'xyz'
);

Yielding, once more:

ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

If we’re really working with multisets, things are a bit more tricky:

-- Nested table version
SELECT * FROM t WHERE t1('dup', 'dup') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT column_value, count(*)
  FROM TABLE (t2('dup', 'dup')) x
  GROUP BY column_value
  HAVING count(*) > (
    SELECT count(*)
    FROM TABLE (t2) y
    WHERE y.column_value = x.column_value
  )
);

Yielding:

ID  T1                        T2
-----------------------------------------------------
4   T1('dup', 'dup', 'dup')   T2('dup', 'dup', 'dup')

How does it work? In the NOT EXISTS correlated subquery, we’re counting the number of duplicate values in the potential SUBMULTISET, effectively turning that SUBMULTISET into a SET using the GROUP BY operation.

We’re then comparing that count value from the left operand with the corresponding count value from the right operand. If there is no value in the left operand whose number of occurrences is bigger than the number of occurrences of that value in the right operand, then the whole left operand is a SUBMULTISET of the right operand.

Cool, eh? We’ll talk about performance another time :-)

MULTISET operators

Also very interesting, the multiset operators:

  • MULTISET EXCEPT [ ALL | DISTINCT ]
  • MULTISET INTERSECT [ ALL | DISTINCT ]
  • MULTISET UNION [ ALL | DISTINCT ]

Notice how there are some differences to the ordinary set operators that can be used in SELECT statements. In particular:

  • EXCEPT is used as defined in the standard, not MINUS
  • ALL is supported on all three operators, not just on UNION
  • ALL is the default, not DISTINCT

How can we work with these operators? Consider these queries:

SELECT id, t1 MULTISET EXCEPT t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

SELECT id, t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

Both yielding:

ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('dup')

With this operator, we’re removing each element of the right operand once from the left operand:

  • 'aaa' does not appear in the left operand, so nothing happens
  • 'abc' appears on row with ID = 3 and we remove it
  • 'dup' appears on row with ID = 4, 3 times, and we remove it twice, leaving one value

Conversely, when adding DISTINCT, we’ll get:

SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup') FROM t;

Yielding:

ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('')

The only difference is on row with ID = 4, where all 'dup' values were removed, regardless how many there were on either side of the MULTISET EXCEPT DISTINCT operator.

How to emulate this for varrays?

DISTINCT version

This is a bit easier, because we can now use MINUS:

-- Nested table version
SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM TABLE (t2)
        MINUS
        SELECT column_value
        FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
      ) AS t2)
  END r
FROM t;

Luckily, we can still cast a structural MULTISET type that we can obtain using the MULTISET() operator to a varray type. This greatly simplifies the task.

ALL version

If we want the MULTISET EXCEPT or MULTISET EXCEPT ALL semantics, things are trickier. Here’s a solution that resorts to using window functions, in order to turn a MULTISET back into a SET:

-- Nested table version
SELECT t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM (
          SELECT 
            column_value,
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2)
          MINUS
          SELECT 
            column_value, 
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
        )
      ) AS t2)
  END r
FROM t;

How does this work? Ideally, we’ll look at what this ROW_NUMBER() evaluates to on each row. For this, we use OUTER APPLY:

SELECT id, t2, column_value, rn
FROM t
OUTER APPLY (
  SELECT 
    column_value,
    row_number() OVER (
      PARTITION BY column_value
      ORDER BY column_value) rn
  FROM TABLE (t2)
);

The result is:

ID      T2                       COLUMN_VALUE  RN
-----------------------------------------------------
1       (null)                   (null)        (null)
2       T2()                     (null)        (null)
3       T2('abc', 'xyz', 'zzz')  abc           1
3       T2('abc', 'xyz', 'zzz')  xyz           1
3       T2('abc', 'xyz', 'zzz')  zzz           1
4       T2('dup', 'dup', 'dup')  dup           1
4       T2('dup', 'dup', 'dup')  dup           2
4       T2('dup', 'dup', 'dup')  dup           3

As can be seen, each duplicate value gets assigned a unique row number due to the nature of how ROW_NUMBER() works (this property can be very useful for solving the gaps-and-islands-problem. See trick #4).

Now that we turned our (COLUMN_VALUE) multiset into a (COLUMN_VALUE, RN) set (without duplicates), we can use MINUS again.

MULTISET INTERSECT and MULTISET UNION

MULTISET INTERSECT works exactly the same way as MULTISET EXCEPT, with the same window function based emulation in the MULTISET INTERSECT ALL case. MULTISET UNION is simpler, because Oracle knows UNION ALL, so we do not need to resort to such trickery.

Conclusion

Nested collections are a very powerful tool in Oracle SQL. Oracle knows two types of nested collections:

  • Nested tables
  • Varrays

Nested tables are trickier to maintain as you have to think of their storage more explicitly. Varrays can just be embedded into ordinary tables like any other column. But there’s a price to pay for using varrays. Oracle regrettably doesn’t support all of the above very useful multiset conditions and multiset operators.

Luckily, when you encounter a situation where you have varrays and cannot change that, you can still emulate each of the operators using more traditional SQL.

5 Things You May Not Have Known About jOOQ

jOOQ has been around for a while now (since 2009!) and by now we can say we’ve seen quite a bit of things about the SQL and Java languages. Some of our design decisions are particular in the way jOOQ thinks about programming with SQL. These include:

  • Nullability (let’s stop fighting it)
  • Value types (let’s stop pretending SQL has identities)
  • Everything is a table (this really helps get the most out of SQL)
  • Queries are side-effect free functions

jOOQ incorporates all of these ideas. Here are 5 Things You May Not Have Known About jOOQ:

1. Every Column Type is Nullable

SQL NULL is a subtly different beast from Java null, even if programmers often use it for the same thing: Something that is “uninitialised”, some value that we don’t “care about” (yet), or some value that we “don’t need”. A good example would be a middle name:

CREATE TABLE person (
  first_name  VARCHAR(50) NOT NULL,
  middle_name VARCHAR(50),
  last_name   VARCHAR(50) NOT NULL,
  ..
);

Of course, a sufficiently pessimistic programmer will immediately see tons of flaws with the above design. Go read this article about falsehoods programmers believe about names for details.

But anyway, the important thing to understand about NOT NULL constraints in SQL is the fact that they’re… constaints. Just like UNIQUE constraints, FOREIGN KEY constraints, or CHECK constraints.

In fact, they are CHECK constraints. We could rewrite the above table as such:

CREATE TABLE person (
  first_name  VARCHAR(50) CHECK (first_name IS NOT NULL),
  middle_name VARCHAR(50),
  last_name   VARCHAR(50) CHECK (last_name IS NOT NULL),
  ..
);

… and the table would be semantically equivalent. This constraint is just so common that it has a special syntax for it (which is also sometimes better optimised than the equivalent check constraint).

Sidenote: An even more sophisticated constraint type is the SQL standard assertion, which unfortunately hasn’t been implemented in any database I’m aware of yet. There are discussions of adding it to a future Oracle version, though. Assertions are like CHECK constraints, but they work on the entire table / schema / whatever scope. For instance, we could assert that every department of a company must have at least one manager. Currently, we can do this sort of thing only through triggers.

The important message here is that a constraint is a validation on the entire data set (or on a subset, down to an individual row). It is not a type modifier, because even if the NOT NULL constraint may have direct optimisation implications on the column type it is attached to, it is a separate construct that can even be deferred. While languages don’t have to be this way (e.g. Ceylon models constraints directly on types), SQL works like this.

Two examples:

  1. DEFAULT columns: When you have an identity column or some sort of GENERATED BY DEFAULT AS... clause on your column, then the value in the column may be generated by default (duh), which may include – depending on the RDBMS vendor – the generation of a value when it is NULL.
  2. DEFERRED constraints: Some databases (e.g. PostgreSQL) support deferred constraints, i.e. constraints that are validated only when the transaction is committed. This can be specified on individual constraints, or on the session. Which means that the value NULL is a totally acceptable value for a NOT NULL column for a certain amount of time.

Both of the above imply that we must not take NOT NULL as a type modifier, the way some languages have started doing it, like Ceylon or Kotlin:

val a: String? = null;
val b: String = a; // Error

In such languages, String? and String are distinct types, specifically in Ceylon where String? is just syntax sugar for the union type String|Null.

But not in SQL. If a Java API wants to properly reflect the SQL language the way jOOQ does, then all types must be nullable. It is a mistake to:

  • Use primitive types
  • Use Option(al) (there are other caveats with these related to generic type erasure)
  • Use non-null types in languages that have them
  • Use validation annotations (we made that mistake, unfortunately)
  • Use JSR-305 or JSR-308 annotations

Sidenote: If this constraint information should be annotated in Java classes, then JPA @Column(nullable=true) annotations are acceptable, because they simply map to the constraint without any implications on the type. The implications are applied on the persistence behaviour, which is reasonable.

Besides, even if at first, encoding nullability through e.g. Option(al) seems reasonable, it breaks as soon as you outer join anything, e.g.:

SELECT p.*
FROM dual
LEFT JOIN person p
ON p.first_name = 'Ooops, no one by that name'

The above query will produce a single person record with only NULL values in its columns. DESPITE the NOT NULL constraints. Ooops. We’ll get null in non-optional types.

Similar things can happen with unions, grouping sets, functions, and a few other operations.

Takeaway

In SQL, all types are always nullable. We simply have to deal with this. Every clever type safety is contrary to SQL logic. If your API does this, you may get some minor convenience in 80% of the use-cases for the price of a major annoyance in 20% of the use-cases. That’s not a reasonable tradeoff given that in Java, every non-primitive type is nullable as well, so we got a perfect and intuitive match.

2. SQL is a Set-Based, Values-Only Language

Values or Objects? That’s a tricky question for people who work with Java, a language that claims to be mainly object-oriented. Java has value support as well. There are 8 different value types as of Java 8:

  • byte
  • short
  • int
  • long
  • float
  • double
  • boolean
  • char

Values have a couple of nice properties:

  • They are immutable. It may be possible to mutate a variable holding such a value, but we cannot mutate the value itself. 42 will always stay 42
  • Two values that are equal are undistinguishable. 42 == 42 really means that they’re the exact same thing. Reusing == for value equality and identity equality has been a bit of an unfortunate choice in Java, because technically, a String is also a value, yet we cannot compare it with == because there’s a possibility of two identical strings having different identity. (True) values don’t have identity.

Java 8 introduced the notion of a “ValueBased” class, which is really a weird thing, because a “ValueBased” wrapper like Optional can reference a non-value based type, say, a java.sql.Connection. Not a good idea, but certainly possible.

A future Java might have more complex value types, for instance:

// Hypothetical syntax
value Point(int x, int y) {}
value Box(Point a, Point b) {
  int area() {
    return Math.abs(a.x - b.x * a.y - b.y);
  }
}

This will certainly be helpful (as soon as we’ll figure out how to model nullability in such scenarios).

In SQL, all records are values. They do not have a true identity (although most databases choose to provide implementation specific identities like ROWIDs). Do not confuse primary keys with identity descriptors. A primary key is a special value that is guaranteed to be unique within a table. It happens to be used as a logical identity (at least when using surrogate keys). But as NOT NULL constraints, PRIMARY KEY constraints are constraints, and they’re deferrable in some databases.

And there are many ways how we can produce results where primary keys are no longer meaningful, e.g. this:

SELECT * FROM person
UNION ALL
SELECT * FROM person

SQL, unlike relational algebra, doesn’t operate on sets but on bags (or multisets), i.e. data structures that allow for duplicate values. Multisets make analytics much more powerful, while making OLTP quite harder. As always, with useful things, they come at a price.

jOOQ, by consequence, also works in the value-oriented multi set paradigm. This is completely contrary to what Hibernate / JPA does, as Hibernate emulates entity identity through the primary key, which it has to do, being an object-graph persistence API. It doesn’t have to do this because of working with sets rather than multisets, although having identities does make things easier in that paradigm. If you want to read an interesting and entertaining discussion on the subject, check out these tweets between Gavin King and myself:

The importance here is to understand: Neither approach is absolutely better. Both have their advantages. If a RDBMS vendor had implemented a database following a set-based approach instead of SQL’s multiset-based approach, a lot of persistence problems would have been much easier to implement on that RDBMS. On the other hand, a lot of reporting and analytics would have been harder, because with sets being sets, we’d have to constantly prevent “duplicates” from being removed early by keeping primary keys around in queries until the final aggregation.

Now even if we could re-start this interesting discussion, fact is, that we have SQL and it is multiset-based. The default is SELECT "ALL", not SELECT DISTINCT (the ALL keyword being specified in the standard, but not available in most implementations).

When using jOOQ, a value-based record-centric programming approach is recommended, where result sets from jOOQ queries are really “just” streams of records, which will be further transformed without ever thinking about persisting any elements from those streams again. Sure there can be write operations as well, but a jOOQ (or SQL) write operation is also a multiset-based streaming of records (values) back into the database. That’s important to know, because all of

  • INSERT
  • UPDATE
  • DELETE
  • MERGE

statements are multiset-based, i.e. they take a set of values, not just a single row. For instance, INSERT:

-- Not all databases support this standard syntax:
INSERT INTO t (a, b, c)
VALUES (1, 2, 3),
       (4, 5, 6),
       (7, 8, 9);

-- But all databases support this one:
INSERT INTO t1 (a, b, c)
SELECT a, b, c
FROM t2;

Notice how this has absolutely nothing to do with identity-based object-graph persistence. In SQL, we’re always streaming a set of values from one place to another, possibly targeting a table where we store that set. The approach is really beautiful, try to think this way and it’ll open up a whole new world to the SQL-oriented programmer.

In a future article, I’ll even go a step further and claim that SQL is an (almost) completely side-effect free language (and this includes statements like INSERT – stay tuned).

Takeaway

In SQL, everything is a value. There is no identity. It is not needed, because SQL is a multiset-based language, where we’re always operating on the entire data set, not on individual records, even if CRUD operations may make you think otherwise. jOOQ encourages this way of thinking by putting the table and the “value-based” record into the center of the programming model.

3. ResultQuery is an Iterable

I’ve blogged about this before, and some users may have discovered it by accident, intrigued. A jOOQ ResultQuery is an Iterable, meaning that you can “foreach it”:

ResultQuery<?> query =
DSL.using(configuration)
   .select(PERSON.FIRST_NAME, PERSON.LAST_NAME)
   .from(PERSON);

// Java 5 style
for (Record record : query)
  System.out.println(record);

// Java 8 style
query.forEach(System.out::println);

It makes a lot of sense. A SQL query is a description of a set of tuples. SQL is a functional programming language, and if you forget about some concurrency aspects, it is, in principle, side-effect free. This means that the query really IS the set of tuples (another nice way to think about SQL!). With that thought in mind, we can simply iterate it.

To the procedural mind of many Java developers, this might be a bit funky and surprising, but give this a little thought and it might “click”. Consider also this previous article, claiming that streams, for comprehensions, and SQL are all the same:

Or also this fun tweet:

Takeaway

We’re not there yet in Java, we still explicitly iterate, but when we do, and the data source is a SQL query, make it a jOOQ query because that helps you forget about the difference between the query and the data, which are really the same thing in SQL.

4. Ordering is Nice When It’s Cheap. Let’s Retain It

You should avoid ORDER BY in SQL if you don’t really need it. Why? Because unless you can profit from an index that has already pre-ordered your result sets, sorting is a super expensive operation in all programming languages, including SQL. It’s essentially O(n log n).

But let’s assume you do have to sort your results, well, we better want to make sure that this ordering stays the same for as long as possible.

By default, jOOQ returns a Result type, or List types, but there are many utility methods like the ResultQuery.fetchMap() method, which can return something like this:

Map<Integer, String> people =
DSL.using(configuration)
   .select(PERSON.ID, PERSON.FIRST_NAME)
   .from(PERSON)
   .orderBy(PERSON.ID)
   .fetchMap(PERSON.ID, PERSON.FIRST_NAME);

Internally, jOOQ collects all data into a LinkedHashMap, which is a slightly more resource intensive map than the similar HashMap. In case you haven’t used this very often, it’s a map that preserves the insertion order when iterating the map using Map.entrySet() and all the other methods. Quite useful when displaying the map, too. After all, if you do specify the ordering, then you wanted that order to appear in the results, right?

In a similar way, when using Collections.sort() in Java, the sort algorithm guarantees that sorting is stable. If you sort a list twice, then the original ordering will be retained for elements that are not re-ordered. I.e. when sorting by first name, and then by last name, the first name ordering will be retained for equal last names.

Takeaway

ORDER BY is expensive, so if you go through the trouble of actually doing it, you want to retain that order.

5. Dynamic SQL is the Default

In the old days, people mostly wrote static SQL, e.g. using stored procedures in languages like PL/SQL. When you write an implicit cursor loop in PL/SQL:

FOR rec IN (SELECT * FROM person)
LOOP
  dbms_output.put_line(rec.first_name || ' ' || rec.last_name);
END LOOP;

… then, this SQL statement is compiled along with the surrounding procedural code and it will never be changed again. That’s useful for batch processing, reporting, etc. (Strictly speaking it isn’t really “static”, because the SQL statement will still be parsed by the SQL engine like any other query, but the PL/SQL programming model allows for hiding this from you).

In modern days, we require dynamic SQL very often, because the SQL code is often generated from user input. Mostly, because:

  • Users can add predicates through the UI
  • Users can specify aggregations through the UI
  • Users can specify ordering through the UI

In some more remote use-cases, users might also influence the JOIN tree and other parts of a dynamically created query.

From a JDBC perspective, all queries are dynamic, even if you’re doing something like this:

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person"
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

Clearly, the SQL string seems “static” in the way that the Java compiler will compile it once and then never touch it again. The above program will always send the exact same SQL string to the server. Yet from a JDBC API perspective, the string is just an argument to the executeQuery() method, just as if we wrote it like this:

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person" + 
  (active ? " WHERE active = 1" : "")
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

Yuck! String concatenation to build SQL strings. There’s a substantial risk of:

Of course, the above example is SQL injection “safe”, because the SQL string is entirely constructed from constants, not user input. But how quickly could the code be refactored to this?

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person" + 
  (active ? (" WHERE active = " + active) : "")
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

SQL builders like jOOQ help prevent SQL injection, even for dynamic SQL queries. The above query will be written as follows in jOOQ:

for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(active
		    ? PERSON.ACTIVE.eq(active)
			: trueCondition()))
  out.println(rec.getFirstName() + " " + rec.getLastName());

The active flag check that is added to the SQL query dynamically will default to creating a bind variable, and even if it is inlined, it will be escaped, depending on its type.

The interesting bit here, however, is that the jOOQ query is always a dynamic SQL query. The above approach used an inline expression to decide whether a certain predicate needs to be added to the statement. If that predicate gets more complex, we can extract the construction of the predicate to a local variable, or a function.

Local variable

Condition condition = trueCondition();

if (active)
  condition = PERSON.ACTIVE.eq(active);
	
if (searchForFirstName)
  condition = condition.and(PERSON.FIRST_NAME.like(pattern));

for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(condition))
  out.println(rec.getFirstName() + " " + rec.getLastName());

This is quite neat.

Functions

Or, if things get even more complex, we might like to factor out the logic to a method, or a function. Some people have started calling such an approach “functional relational mapping”:

Condition personCondition(boolean active, String pattern) {
  Condition condition = trueCondition();

  if (active)
    condition = PERSON.ACTIVE.eq(active);
	
  if (pattern != null)
    condition = condition.and(PERSON.FIRST_NAME.like(pattern));
	
  return condition;
}

// And then:
for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(personCondition(active, pattern)))
  out.println(rec.getFirstName() + " " + rec.getLastName());

Or even:

BiFunction<Boolean, String, Condition> personCondition() {
  return (active, pattern) -> {
    Condition condition = trueCondition();

    if (active)
      condition = PERSON.ACTIVE.eq(active);
	
    if (pattern != null)
      condition = condition.and(PERSON.FIRST_NAME.like(pattern));
	
    return condition;
  };
}

// And then:
for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(personCondition.apply(active, pattern)))
  out.println(rec.getFirstName() + " " + rec.getLastName());

Not only is this approach to writing dynamic SQL extremely useful for client code that relies on dynamic SQL, the expression tree that is built behind the scenes is also available at runtime for more complex transformations, such as applying row level security to certain queries, or more simply to apply something like schema-based multi-tenancy. While the Java code stays exactly the same, the generated SQL string may be transformed by your own library code, behind the scenes.

Static SQL

Of course, jOOQ doesn’t imply that you have to write dynamic SQL. You can store jOOQ-generated SQL strings in caches, or you can use stored procedures with jOOQ. In fact, jOOQ encourages you to use stored procedures!

Takeaway

Dynamic SQL is really useful. jOOQ defaults to writing dynamic SQL in a way that you don’t even notice. A SQL query is a function just as much as it is a collection description. jOOQ helps you think about SQL this way.

Conclusion

SQL is a beautiful language with an interesting syntax. If we look at the concepts that are the foundation of the SQL language, we see that SQL queries are functional / declarative collection descriptions. With this paradigm in mind, we can write really powerful SQL statements, and jOOQ encourages this as this paradigm is at the core of the jOOQ API design.

Enjoy writing functional-relational mapping code.