Create Empty Optional SQL Clauses with jOOQ

When using jOOQ to create dynamic SQL statements (one of jOOQ’s core value propositions), it is often necessary to add query elements conditionally, with a default “No-op” behaviour. For first time users, this default “no-op” behaviour is not always obvious as the jOOQ API is vast, and as with any vast API, there are many different options to do similar things.

How not to do it

A common pitfall is to be tempted to work with the many XYZStep types. What types are these? They are usually invisible to the developer as developers use jOOQ’s DSL API in a fluent fashion, just like the JDK Stream API. For example:

DSLContext ctx = ...;

Result<?> result =
ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1))
   .and(T.D.eq(2))
   .fetch();

Let’s decompose the above query to see what happens in the API. We could assign every method result to a local variable:

SelectFromStep<?> s1 = ctx.select(T.A, T.B);
SelectWhereStep<?> s2 = s1.from(T);
SelectConditionStep<?> s3 = s2.where(T.C.eq(1));
SelectConditionStep<?> s4 = s3.and(T.D.eq(2))

Result<?> result = s4.fetch();

Our previous fluent API design blog post explains this API design technique.

This is not what people usually do with “static SQL” statements, but they might be tempted to do this if they wanted to add the last predicate (T.D = 2) conditionally, e.g:

DSLContext ctx = ...;

SelectConditionStep<?> c =
ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1));

if (something)
    c = c.and(T.D.eq(2));

Result<?> result = c.fetch();

This is perfectly valid API usage, but we do not recommend it because it is very messy and leads to difficult to maintain client code. Also, it is absolutely unnecessary, because there is a better way:

Composing queries from its parts

The problem with the above approach is that it is trying to use an imperative approach of adding things to a query step by step. This is how many developers tend to structure their code, but with SQL (and by consequence, jOOQ) that can turn out to be difficult to get right. A functional approach tends to work better.

Notice that not only the entire DSL structure could be assigned to local variables, but also the individual SELECT clause arguments. For example:

DSLContext ctx = ...;

List<SelectField<?>> select = Arrays.asList(T.A, T.B);
Table<?> from = T;
Condition where = T.C.eq(1).and(T.D.eq(2));

Result<?> result =
ctx.select(select)
   .from(from)
   .where(where)
   .fetch();

In fact, every jOOQ query is a dynamic SQL query. Many queries just happen to look like static queries, due to jOOQ’s API design.

Again, we wouldn’t be assigning every SELECT clause argument to a local variable, only the truly dynamic ones. For example:

DSLContext ctx = ...;

Condition where = T.C.eq(1);

if (something)
    where = where.and(T.D.eq(2));

Result<?> result =
ctx.select(T.A, T.B)
   .from(T)
   .where(where)
   .fetch();

This already looks quite decent.

Avoid breaking readability

A lot of people aren’t happy with this approach either, because it breaks a query’s readability by making its components non-local. The predicates in the query are declared up front, away from the query itself. This isn’t how many people like to reason about SQL.

And you don’t have to! It is totally possible to embed the condition directly in the WHERE clause like this:

DSLContext ctx = ...;

Result<?> result =
ctx.select(T.A, T.B)
   .from(T)

   // We always need this predicate
   .where(T.C.eq(1))

   // This is only added conditionally
   .and(something
      ? T.D.eq(2)
      : DSL.noCondition())
   .fetch();

The magic is in the above usage of DSL.noCondition, which is a pseudo predicate that does not generate any content. It is a placeholder where an org.jooq.Condition type is required without actually materialising one.

There is also:

… but that requires having to think about these identities and the reductions all the time. Also, if you append many of these trueCondition() or falseCondition() to a query, the resulting SQL tends to be quite ugly, for example for people having to analyse performance in production. noCondition() just never generates any content at all.

Note that noCondition() does not act as an identity! If your noCondition() is the only predicate left in a WHERE clause, there will not be any WHERE clause, regardless if you work with AND predicates or OR predicates.

No-op expressions in jOOQ

When using dynamic SQL like this, and adding things conditionally to queries, such “no-op expressions” become mandatory. In the previous example, we’ve seen how to add a “no-op predicate” to a WHERE clause (the same approach would obviously work with HAVING and all other clauses that work with boolean expressions).

The three most important jOOQ query types are:

Users may want to add all of these conditionally to queries.

org.jooq.Condition

We’ve already seen how to do this with org.jooq.Condition.

org.jooq.Field

What about dynamic column expressions in the projection (the SELECT clause)? Assuming you want to project columns only in certain cases. In our example, the T.B column is something we don’t always need. That’s easy! The same approach can be used (assuming T.B is a string column):

DSLContext ctx = ...;

Result<Record2<String, String>> result =
ctx.select(T.A, something ? T.B : DSL.inline("").as(T.B))
   .from(T)
   .where(T.C.eq(1))
   .and(T.D.eq(2))
   .fetch();

Using inlined parameters via DSL.inline(), you can easily produce a no-op value in your projection if you don’t want to modify the projection’s row type. The advantage is that you can now use this subquery in a union that expects two columns:

DSLContext ctx = ...;

Result<Record2<String, String>> result =

// First union subquery has a conditionally projected column
ctx.select(T.A, something ? T.B : DSL.inline("").as(T.B))
   .from(T)
   .where(T.C.eq(1))
   .and(T.D.eq(2))

   .union(

// Second union subquery has no such conditions
    select(U.A, U.B)
   .from(U))
   .fetch();

You can take this one step further, and make an entire union subquery conditional this way!

DSLContext ctx = ...;

Result<Record2<String, String>> result =

// First union subquery has a conditionally projected column
ctx.select(T.A, T.B)
   .from(T)
   .union(
      something
        ? select(U.A, U.B).from(U)
        : select(inline(""), inline("")).where(falseCondition())
   )
   .fetch();

This is a bit more syntactic work, but it’s nice to see how easy it is to add something conditionally to a jOOQ query without making the query completely unreadable. Everything is local to where it is being used. No local variables are needed, no imperative control flow is invoked.

And because everything is now an expression (and not a statement / no control flow), we can factor out parts of this query into auxiliary methods, that can be made reusable.

org.jooq.Table

Conditional table expressions usually appear when doing conditional joins. This is usually not done in isolation, but together with other conditional elements in a query. E.g. if some columns are projected conditionally, those columns may require an additional join, as they originate from another table than the tables that are used unconditionally. For example:

DSLContext ctx = ...;

Result<?> result =
ctx.select(
      T.A, 
      T.B, 
      something ? U.X : inline(""))
   .from(
      something
      ? T.join(U).on(T.Y.eq(U.Y))
      : T)
   .where(T.C.eq(1))
   .and(T.D.eq(2))
   .fetch();

There isn’t a more simple way to produce the conditional JOIN expression, because JOIN and ON need to be provided independently. For simple cases as shown above, this is perfectly fine. In more complex cases, some auxiliary methods may be needed, or views.

Conclusion

There are two important messages here in this post:

  1. The XYZStep types are auxiliary types only. They are there to make your dynamically constructed SQL statement look like static SQL. But you should never feel the need to assign them to local variables, or return them from methods. While it is not wrong to do so, there is almost always a better way to write dynamic SQL.
  2. In jOOQ, every query is a dynamic query. This is the benefit of composing SQL queries using an expression tree like the one that is used in jOOQ’s internals. You may not see the expression tree because the jOOQ DSL API mimicks static SQL statement syntax. But behind the scenes, you’re effectively building this expression tree. Every part of the expression tree can be produced dynamically, from local variables, methods, or expressions, such as conditional expressions. I’m looking forward to using the new JEP 361 switch expressions in dynamic SQL. Just like a SQL CASE expression, some SQL statement parts can be constructed dynamically in the client, prior to passing them to the server.

Once these two things are internalised, you can write very fancy dynamic SQL, including using FP approaches to constructing data structures, such as a jOOQ query object.

Using Java 13+ Text Blocks for Plain SQL with jOOQ

Most jOOQ users use the jOOQ DSL API, which provides compile time type safety and an easy way to write dynamic SQL.

But occasionally, this DSL get in the way, because it might be

  • Overkill for some very simple quick and dirty SQL queries
  • Too limited when running very advanced vendor specific SQL, such as Oracle’s MODEL or MATCH_RECOGNIZE clauses

In such cases, you can still benefit from jOOQ’s many secondary features, including for example its nice integration with the Stream API, export functionality, and much more. Think of jOOQ as an improved JDBC!

Starting with Java 13, when enabling preview features, you can now make use of text blocks, or “multi line strings”, which are very useful for embedding static SQL strings in Java code (and XML, JSON, regular expressions, etc). There are two main use cases for using text blocks in jOOQ:

Plain SQL

The main use case is to use plain SQL and also plain SQL templating. For example, to run a quick and dirty jOOQ query like this on H2 or PostgreSQL, for example:

System.out.println(ctx.fetch("""
        SELECT table_schema, count(*)
        FROM information_schema.tables
        GROUP BY table_schema
        ORDER BY table_schema
        """));

(unfortunately, this blog’s syntax highlighter is not quite there yet…)

The output is jOOQ’s nicely formatted text result:

+------------------+--------+
|TABLE_SCHEMA      |COUNT(*)|
+------------------+--------+
|INFORMATION_SCHEMA|      33|
|MCVE              |       2|
|PUBLIC            |       1|
+------------------+--------+

The above is a simple plain SQL query, but we can also make use of plain SQL templating with text blocks! What if we wanted to have a dynamic GROUP BY clause?

Always assuming this static import:

import static org.jooq.impl.DSL.*;

We can now write:

Stream.of(
        field("table_schema"),
        list(field("table_schema"), field("table_type")))
    .forEach(q -> {
        System.out.println(ctx.fetch("""
          SELECT {0}, count(*), row_number() OVER (ORDER BY {0}) AS rn
          FROM information_schema.tables
          GROUP BY {0}
          ORDER BY {0}
          """, q));
    });

We’ve written a dynamic SQL query with jOOQ, but without using much type safety (we could still use some type safe, generated schema object references if we wanted to). jOOQ’s plain SQL templating capabilities are a much underestimated feature set. People have been using MyBatis for many years precisely for these purposes, or maybe built home grown, velocity template based frameworks to produce dynamic SQL using native SQL strings.

You can also use jOOQ for this, and profit from all of the jOOQ API, including, again the formatted output from the above:

+------------------+--------+----+
|TABLE_SCHEMA      |COUNT(*)|  RN|
+------------------+--------+----+
|INFORMATION_SCHEMA|      33|   1|
|MCVE              |       2|   2|
|PUBLIC            |       1|   3|
+------------------+--------+----+

+------------------+------------+--------+----+
|TABLE_SCHEMA      |TABLE_TYPE  |COUNT(*)|  RN|
+------------------+------------+--------+----+
|INFORMATION_SCHEMA|SYSTEM TABLE|      33|   1|
|MCVE              |TABLE       |       1|   2|
|MCVE              |VIEW        |       1|   3|
|PUBLIC            |TABLE       |       1|   4|
+------------------+------------+--------+----+

All of this was available before text blocks, but with text blocks, it makes even more sense to use these features. There is not just one way to use jOOQ, but this approach is really very underestimated!

The parser

Another very important jOOQ feature that works with string based SQL is the jOOQ parser. There are a variety of use cases for using jOOQ’s parser, as we’re adding more and more features to it. One use-case is simple formatting of SQL. Why would you want to do this? One example is to standardise your SQL strings for better execution plan cache usage. As you may know, some database vendors (e.g. Oracle) treat these two SQL strings as different:

SELECT 1 FROM dual
SELECT  1 FROM dual

Observe the irrelevant whitespace difference. Oracle will parse a new SQL string, produce a new SQL_ID, and yet come up with the same execution plan. You don’t want this to happen too often, because it puts a lot of contention on the plan cache and its locks and latches, and creates extra work.

This can happen, nonetheless, with string based dynamic SQL. Using jOOQ’s parser, you can normalise all sorts of SQL strings (including removing comments)

System.out.println(
    ctx.parser()
       .parseResultQuery("""
            SELECT table_schema, count(*)
            FROM information_schema.tables
            GROUP BY table_schema
            -- Order by column index!
            ORDER BY 1
            """)
       .fetch()
);

The SQL string that gets sent to the JDBC driver is this:

select table_schema, count(*) from information_schema.tables group by table_schema order by 1

Alternatively, you could specify some formatting in jOOQ:

DSLContext ctx = DSL.using(connection, 
    new Settings().withRenderFormatted(true));

And now, the SQL string that gets sent to the JDBC driver is this:

select 
  table_schema, 
  count(*)
from information_schema.tables
group by table_schema
order by 1

You can tweak the various supported formatting rules, and use this for logging your hand written SQL strings in a human readable way, for example, rather than executing the SQL through jOOQ. The options are endless. You could also translate parser output to some other SQL dialect. Assuming you run your queries against SQL Server, but want to translate them to H2. No problem! The following SQL Server specific query can be run easily on H2 as well:

System.out.println(
    ctx.parser()
       .parseResultQuery("""
            SELECT TOP 1 table_schema, count(*)
            FROM information_schema.tables
            GROUP BY table_schema
            ORDER BY count(*) DESC
            """)
       .fetch()
);

The translated SQL that gets executed on H2 is this:

select 
  table_schema, 
  count(*)
from information_schema.tables
group by table_schema
order by count(*) desc
limit 1

Or, because I like showing off:

System.out.println(
    ctx.parser()
       .parseResultQuery("""
            SELECT TOP 1 WITH TIES table_schema, count(*)
            FROM information_schema.tables
            GROUP BY table_schema
            ORDER BY count(*) DESC
            """)
       .fetch()
);

Producing this in H2:

select 
  TABLE_SCHEMA, 
  count(*)
from INFORMATION_SCHEMA.TABLES
group by TABLE_SCHEMA
order by 2 desc
fetch next 1 rows with ties

Or this, in PostgreSQL:

select 
  "v0" as table_schema, 
  "v1" as "count"
from (
  select 
    table_schema as "v0", 
    count(*) as "v1", 
    rank() over (order by 2 desc) as "rn"
  from information_schema.tables
  group by table_schema
) "x"
where "rn" > 0
and "rn" <= (0 + 1)
order by "rn"

I know, right?

Parser-derived features

The parser is used in more and more of jOOQ’s many features. Including for example in jOOQ 3.13’s schema diff and management utility.

Run this with jOOQ:

System.out.println(
    ctx.meta("""
    create table t (
      i int
    )
    """).apply("""
    alter table t
      add j int;
    alter table t
      add constraint t_pk primary key (i)
    """)
);

I.e. just copy paste your plain SQL schema definitions and increments (as used in Flyway, for example, or hand written while developing them against your database), and start using the entire tool chain on it. For example, the above prints:

create table T(
  I int null,
  J int null,
  constraint T_PK
    primary key (I)
);

Instead of applying increments to a schema, you can also calculate the diff between two schema versions:

System.out.println(
    ctx.meta("""
    create table t (
      i int
    )
    """).migrateTo(ctx.meta("""
    create table t (
      i int,
      j int,
      constraint t_pk primary key (i)
    )
    """))
);

And now, the output is the increment we’ve applied earlier

alter table T
  add J int null;
alter table T
  add constraint T_PK
    primary key (I);

Conclusion

jOOQ’s DSL is very powerful because:

  • It offers type safety
  • It is compile time checked
  • It helps with auto completion

But sometimes, it gets in the way and plain SQL strings work better, especially when using text blocks. In that case, jOOQ is still also very powerful. The entirety of the jOOQ tool chain is still available to you behind the scenes. You’re just not using the DSL as an API anymore.

Never Concatenate Strings With jOOQ

jOOQ supports a vast amount of SQL syntax out of the box. As such, most users will not think of resorting to string concatenation like in the old days when writing dynamic SQL with JDBC.

But every now and then, a vendor specific feature is not supported by jOOQ (yes, it happens). In that case, jOOQ supports a variety of “plain SQL” API, which can be used to construct almost all types of jOOQ API elements, such as:

// Static import is implied, as always
import static org.jooq.impl.DSL.*;

// Column expressions
Field<String> f = field("cool_function(1, 2, 3)", String.class);

// Predicates
Condition c = condition("col1 <fancy operator> col2");

// Tables
Table<?> t = table("wicked_table_valued_function(x, y)");

But then, sometimes, you need to pass an argument to such a function dynamically, such as another column expression. And you want to do that in a type safe way, because the jOOQ code generator already produced type safe column expressions. So you might be inclined to concatenate, nonetheless:

field("cool_function(1, " + MY_TABLE.MY_COLUMN + ", 3)");

Never do this!

For these reasons:

  1. Despite jOOQ being very SQL injection safe in general, this is where you can in fact introduce a plain SQL injection vulnerability nonetheless. Not in this case, as the column is generated code, but maybe, you will concatenate user input. Note that in order to increase SQL injection protection, plain SQL usage can be prevented globally, and allowed only locally when needed by adding our PlainSQL checker, using the checker framework or Google ErrorProne.
  2. As always with string concatenation, you’re prone to SQL syntax errors. In this case, the generated SQL is not specific to any dialect, as MY_TABLE.MY_COLUMN.toString() is being called, without any contextual information, such as the SQLDialect and all the other configuration flags.

Instead, use jOOQ’s plain SQL templating mini language, which allows for template placeholders like {0}, {1}, {2}:

field("cool_function(1, {0}, 3)", MY_TABLE.MY_COLUMN);

And if you’re doing this more often, you can factor out this call in your own mini DSL:

public static Field<String> coolFunction(Field<?> field) {
    field("cool_function(1, {0}, 3)", field);
}

And now, call it like this:

coolFunction(MY_TABLE.MY_COLUMN)

As a rule of thumb:

With jOOQ, you should never need to resort to SQL string concatenation

You can always use either:

  • The type safe jOOQ DSL API
  • The plain SQL templating API (and ideally hide such usage behind your own type safe DSL API)

5 Ways to Better Understand SQL by Adding Optional Parentheses

It appears that our recent beginner SQL articles explaining SQL syntax were quite popular. These include:

Today, I want to tackle the problem of understanding and learning SQL syntax from a new angle.

SQL’s COBOL-esque syntactic heritage

Unlike most C-style languages, SQL syntax is a messy mixture of keywords, special characters, and both with a lot of overloaded meanings. I’d call myself rather experienced with SQL, yet I’m still often unsure about whether something is a keyword or an identifier (and yes, it does matter):

From my past SQL training sessions, however, I have learned that one of the best tools to understand what’s really going on in SQL syntax is to add parentheses at random places. Parentheses can be noisy, but they can definitely help understand the syntactic structure of a statement.

A trivial example from arithmetics, or logic:

a + b * c
a OR b AND c

In almost all languages, operator precedence dictates that the above be equivalent to this:

a + (b * c)
a OR (b AND c)

And already, we’ve introduced some syntactic clarity to readers who might not be aware of operator precedence. While the parentheses are not necessary in these cases, they can help with readability if the expressions are much more complex than the above.

Did you know you can place parentheses at many other locations in SQL? Here’s a list of interesting places, where most people usually omit parentheses. I’m going to use the Sakila database for examples.

1. Row value expressions

In every day language, we don’t call them “row value expressions”. We call them ordinary “column expressions”. For example, we might look for a specific actor by name

SELECT * 
FROM actor
WHERE first_name = 'SUSAN'
AND last_name = 'DAVIS';

We will get:

|actor_id   |first_name|last_name |
|-----------|----------|----------|
|101        |SUSAN     |DAVIS     |
|110        |SUSAN     |DAVIS     |

But what we really did was this:

-- Works in many dialects
SELECT * 
FROM actor
WHERE (first_name) = ('SUSAN')
AND (last_name) = ('DAVIS');

Or also:

-- Works in a few dialects, including e.g. PostgreSQL
SELECT * 
FROM actor
WHERE ROW (first_name) = ROW ('SUSAN')
AND ROW (last_name) = ROW ('DAVIS');

The parentheses are somewhat ambiguous in this case, the ROW constructor makes it more obvious. The SQL standard specifies a “row value expression special case”, i.e. the case where a row value expression (i.e. a tuple expression) is of degree 1. In case of which most people will “colloquially” omit the parentheses, thinking we’re comparing column expressions.

In this example, we could rewrite the query to this:

SELECT * 
FROM actor
WHERE (first_name, last_name) = ('SUSAN', 'DAVIS');

Or, in funky Oracle, double parentheses are required for no good reason on the right hand side.

SELECT * 
FROM actor
WHERE (first_name, last_name) = (('SUSAN', 'DAVIS'));

But I was not going to confuse you, I was going to clarify things ;-)

In fact, once you get a hang of row value expressions, you will quickly want to use them for less trivial cases than avoiding two AND connected predicates. For example, you can use this approach also with the IN predicate (and if you’re daring enough, with the NOT IN predicate, but beware of NULLS):

SELECT * 
FROM actor
WHERE (first_name, last_name) IN (
  ('SUSAN', 'DAVIS'),
  ('NICK' , 'WAHLBERG')
)

Wow! Result:

|actor_id   |first_name|last_name |
|-----------|----------|----------|
|2          |NICK      |WAHLBERG  |
|101        |SUSAN     |DAVIS     |
|110        |SUSAN     |DAVIS     |

And not only that. You can also use the same approach with IN and subqueries:

-- Actors that have the same name as a customer
SELECT *
FROM actor
WHERE (first_name, last_name) IN (
  SELECT first_name, last_name
  FROM customer
)

And indeed:

|actor_id   |first_name|last_name |
|-----------|----------|----------|
|4          |JENNIFER  |DAVIS     |

Yes! The “colloquial” IN predicate that you all wrote without the parentheses around the row value expression, in reality, had a row value expression of degree 1 on the left side, AND on the right side. Increase that degree to 2, and now you need to write a subquery with 2 columns as well.

It’s quite idiomatic, and readable, and nifty. Quite a few dialects support this.

2. JOINs

Joins are very misunderstood, in SQL. Very very misunderstood. One reason is because people confuse them with other set operations, such as UNION, and thus illustrate them with Venn Diagrams. I’ve already covered that. That article explains that JOINs are just (filtered) cartesian products.

The simplest JOIN type is CROSS JOIN. It does not have an ON or USING clause. Much more common, however, is INNER JOIN, or just JOIN (INNER being an optional keyword).

SELECT a.first_name, a.last_name, f.title
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)

We’ve all typed millions of these to obtain:

|first_name|last_name |title                         |
|----------|----------|------------------------------|
|...       |...       |...                           |
|PENELOPE  |GUINESS   |SPLASH GUMP                   |
|PENELOPE  |GUINESS   |VERTIGO NORTHWEST             |
|PENELOPE  |GUINESS   |WESTWARD SEABISCUIT           |
|PENELOPE  |GUINESS   |WIZARD COLDBLOODED            |
|NICK      |WAHLBERG  |ADAPTATION HOLES              |
|NICK      |WAHLBERG  |APACHE DIVINE                 |
|NICK      |WAHLBERG  |BABY HALL                     |
|...       |...       |...                           |

But what are these JOIN things? They are not SELECT clauses! They are operators like + or * or AND or OR.

And like any operator, the messy laws of associativity apply (as all the different join types have the same operator precedence, at least in the SQL standard. Some wonky implementations may disagree).

Luckily, left associativity applies with JOIN (right associativity would be madness). So, our magic parentheses come into play again. Our query from before is really this:

SELECT a.first_name, a.last_name, f.title
FROM (
  actor AS a
    JOIN film_actor AS fa 
      USING (actor_id)
)
JOIN film AS f 
  USING (film_id)

Can you see it? The readability problem here is the USING (or ON) clause. Without it, I could have written:

(actor JOIN film_actor) JOIN film

So, join FILM_ACTOR to ACTOR and then join the FILM table to the product.

Because INNER JOIN is associative (both left and right), we could write the following equivalent expression:

actor JOIN (film_actor JOIN film)

So, join FILM to FILM_ACTOR first, and then join the product to ACTOR. Quite a different beast, logically (the optimiser shouldn’t make a difference in this case).

… or in full SQL

SELECT a.first_name, a.last_name, f.title
FROM actor AS a 
  JOIN (
    film_actor
      JOIN film AS f 
        USING (film_id)
  )
USING (actor_id)

Egh. Don’t be clever. Use this only if you really have to. I find it quite unreadable. Careful formatting is essential here. But it works.

With more than 3 tables, you could build entire JOIN trees. While, in order to stay sane, most people will not actually build JOIN trees (yes, a list is also a tree), it is totally possible to do this confusing thing:

SELECT a.first_name, a.last_name, sum(p.amount)
FROM (
  actor AS a
    JOIN film_actor AS fa 
      USING (actor_id)
)
JOIN (
  film AS f
    JOIN (
      inventory AS i 
        JOIN rental AS r
          USING (inventory_id)
    ) USING (film_id)
) USING (film_id)
JOIN payment AS p 
  USING (rental_id)
GROUP BY a.actor_id

Instead of this nice clean SQL:

SELECT a.first_name, a.last_name, sum(p.amount)
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
JOIN inventory AS i USING (film_id)
JOIN rental AS r USING (inventory_id)
JOIN payment AS p USING (rental_id)
GROUP BY a.actor_id

… to get the revenue of all films we made for any given actor. Something like this:

|first_name|last_name |sum         |
|----------|----------|------------|
|ADAM      |GRANT     |974.19      |
|ADAM      |HOPPER    |1532.21     |
|AL        |GARLAND   |1525.87     |
|ALAN      |DREYFUSS  |1850.29     |
|ALBERT    |JOHANSSON |2202.78     |
|ALBERT    |NOLTE     |2183.75     |

But nevertheless. It is good to know how JOIN works, syntactically, because there will be that edge case where you want to give precedence to an INNER JOIN and then LEFT JOIN the entire product, or something like that. In that case, parentheses are your friend.

3. DISTINCT

There are no parentheses around the “DISTINCT arguments”. Despite a lot of people thinking there are. This is not what some people think it is:

SELECT DISTINCT (actor_id), first_name, last_name
FROM actor

The parentheses can be omitted, and it is the exact same thing.

The other nice thing about SQL syntax is that you never know whether parentheses are part of the language, or part of an expression. In fact, the above (id) is a row value expression again. If you (accidentally) wrapped more than one column in those parentheses:

SELECT DISTINCT (actor_id, first_name), last_name
FROM actor

Then you’d still get the same behaviour (in PostgreSQL, which supports nesting records like that), but the result is not what you wanted:

|row              |last_name|
|-----------------|---------|
|(1,PENELOPE)     |GUINESS  |
|(2,NICK)         |WAHLBERG |
|(3,ED)           |CHASE    |
|(4,JENNIFER)     |DAVIS    |

Funky eh?

4. UNION, INTERSECT, EXCEPT

The nice thing about SQL is that you can never derive a rule from another rule. While all types of JOIN have the same precedence, the set operators do not. Duh. Luckily, most people don’t use INTERSECT or EXCEPT at all, only UNION [ ALL ], so they don’t run into this problem.

In case there is any doubt, INTERSECT has a higher precedence, whereas UNION and EXCEPT have the same precedence (in the SQL standard). I’ll prove it by quoting it. From ISO/IEC 9075-2:2016(E) 7.17 <query expression> (I’ve removed some irrelevant bits)

<query expression body> ::=
  <query term>
| <query expression body> UNION [ ALL ] <query term>
| <query expression body> EXCEPT [ ALL ] <query term>

<query term> ::=
  <query primary>
| <query term> INTERSECT [ ALL ] <query primary>

See. INTERSECT has a higher precedence. Let’s try it (in PostgreSQL):

SELECT 2 AS a, 3 AS b
UNION
SELECT 1 AS a, 2 AS b
INTERSECT
SELECT 1 AS a, 2 AS b

We get:

|a          |b          |
|-----------|-----------|
|1          |2          |
|2          |3          |

So, what we’ve done effectively (and the parenthesis are totally allowed, because <query term> allows it), is this

SELECT 2 AS a, 3 AS b
UNION
(
  SELECT 1 AS a, 2 AS b
  INTERSECT
  SELECT 1 AS a, 2 AS b
)

Now, for the life of me, I really don’t know how to nicely format parenthesised set operations in SQL. It looks awkward. But here we are. The above is a totally different thing than this:

(
  SELECT 2 AS a, 3 AS b
  UNION
  SELECT 1 AS a, 2 AS b
)
INTERSECT
SELECT 1 AS a, 2 AS b

If we do the union first, and then the intersection, the only resulting record is:

|a          |b          |
|-----------|-----------|
|1          |2          |

A similar effect can be seen when combining UNION and UNION ALL, which have the same precedence and are … left associative. Lucky again. So, this query here:

SELECT 2 AS a, 3 AS b
UNION
SELECT 1 AS a, 2 AS b
UNION ALL
SELECT 1 AS a, 2 AS b

Has a UNION that does not get to remove any duplicates, and then a UNION ALL that adds a duplicate. It is equivalent to this (remember, left associative):

(
  SELECT 2 AS a, 3 AS b
  UNION
  SELECT 1 AS a, 2 AS b
)
UNION ALL
SELECT 1 AS a, 2 AS b

The result is:

|a          |b          |
|-----------|-----------|
|1          |2          |
|2          |3          |
|1          |2          |

It is quite a different beast from this one:

SELECT 2 AS a, 3 AS b
UNION
(
  SELECT 1 AS a, 2 AS b
  UNION ALL
  SELECT 1 AS a, 2 AS b
)

Now, we’ve used parentheses to force the query to do the UNION ALL operator first (producing duplicates), but then, the UNION removes them agian. The result is now:

|a          |b          |
|-----------|-----------|
|1          |2          |
|2          |3          |

Bonus: What about the ORDER BY clause

Ah, yes. The miracles of SQL syntax. In theory (and in the SQL standard), there is an optional ORDER BY clause over all set operations. E.g. like this:

SELECT 2 AS a, 3 AS b
UNION
SELECT 1 AS a, 2 AS b
INTERSECT
SELECT 1 AS a, 2 AS b
ORDER BY a DESC

We’re getting:

|a          |b          |
|-----------|-----------|
|2          |3          |
|1          |2          |

Think of it this way:

(
  SELECT 2 AS a, 3 AS b
  UNION
  SELECT 1 AS a, 2 AS b
  INTERSECT
  SELECT 1 AS a, 2 AS b
)
ORDER BY a DESC

Some dialects may or may not allow you to actually place those parentheses. E.g. PostgreSQL does.

5. Subqueries

When you write subqueries, or specifically:

  • Scalar subqueries (subqueries in SELECT or WHERE, etc.)
  • Derived tables (subqueries in FROM)

… then you must put them in parentheses. For example, a correlated (scalar) subquery:

SELECT 
  first_name, 
  last_name, (
    -- Correlated subquery here
    SELECT count(*) 
    FROM film_actor AS fa 
    WHERE fa.actor_id = a.actor_id
  ) c
FROM actor AS a

These parentheses are not optional. Neither are those around the derived table:

SELECT 
  first_name, 
  last_name, 
  c
FROM actor AS a
JOIN (
  -- Derived table here
  SELECT actor_id, count(*) AS c
  FROM film_actor
  GROUP BY actor_id
) fa USING (actor_id)

Conclusion

SQL (and its syntax) is a device whose mystery is only exceeded by its power. With parentheses (as with other syntactic tokens), you can never really know if they are part of the language, of some clause, or part of an expression.

With some operators, parentheses help better understand the syntactic structure of the language.

  • Row value expressions of degree 1 have optional parentheses. We never write them, but we could have!
  • JOINs are really trees, not lists. We can nest them arbitrarily to get associativity under control.
  • DISTINCT doesn’t have parentheses. Don’t be fooled by your coworkers’ coding style!
  • Set operations have two levels of precedence: 1) INTERSECT and 2) UNION and EXCEPT (assuming standards compliance).
  • Subqueries (outside of set operations) have mandatory parentheses

I’m curious about your own greatest SQL syntax confusions. Please leave them in the comments.

SQL DISTINCT is not a function

A very common misconception I often encounter with SQL users is the idea that DISTINCT is something like a function, and that it can take parenthesised arguments. Just recently, I’ve seen this Stack Overflow question where the OP was looking for a way to express this in jOOQ:

SELECT DISTINCT (emp.id), emp.fname, emp.name
FROM employee emp;

Notice the parentheses around (emp.id), which look as though this is some special kind of DISTINCT usage, which is akin to a DISTINCT function. The idea is often that:

  • The behaviour is somewhat different from omitting the parentheses
  • The performance is faster, because only the ID needs to be considered for distinctness

This is incorrect

These claims are incorrect, of course. There is no semantic or performance difference between the two. The parentheses are merely parentheses around a column expression, in a similar way as you would use parentheses to influence operator precedence. Think of it this way:

SELECT DISTINCT (emp.id + 1) * 2, emp.fname, emp.name
FROM employee emp;

In the above example, we do not apply a “DISTINCT function” to the expression emp.id + 1. We merely placed parentheses around a column expression emp.id + 1 to make sure the addition happens before the multiplication. The DISTINCT operator happens after the projection, always. If SQL had used a more logical syntax, rather than following English grammar (it was originally called Structured English QUEry Language, or SEQUEL), then we would write the OP’s statement like this:

FROM employee
SELECT id, fname, name
DISTINCT

Again, the DISTINCT operation always happens after the projection (SELECT clause content), and is applied to the entirety of the projection. There is no way in standard SQL to apply distinctness only to parts of the projection (there is in PostgreSQL, see further down).

To clarify this a bit more, I recommend reading our previous blog post about the logical order of operations in SQL, and how DISTINCT and ORDER BY are related.

What would it mean anyway?

We can revert the question back to the OP and ask ourselves, what would it mean for a DISTINCT operation to apply to only one column, anyway? Let’s assume this data set:

|id |fname|name|
|---|-----|----|
|1  |A    |A   |
|1  |B    |B   |

If we applied DISTINCT only to the ID column (and didn’t project anything else), clearly, we’d only get one row as a result:

SELECT DISTINCT id FROM employee
|id |
|---|
|1  |

But if we wanted to also project FNAME and NAME, which row would “win”? Would we display the first or the second row, or any random row? The behaviour would be undefined, and SQL doesn’t like undefined behaviour, so this is not possible. The only reasonable application of DISTINCT is always on the entire projection.

Exception: PostgreSQL

Fortunately (or to add more to syntactic confusion: unfortunately), PostgreSQL has implemented an extension to the SQL standard. With DISTINCT ON, it is effectively possible to apply distinctness only to parts of the projection:

WITH emp (id, fname, name) AS (
  VALUES (1, 'A', 'A'),
         (1, 'B', 'B')
)
SELECT DISTINCT ON (id) id, fname, name
FROM emp
ORDER BY id, fname, name

The output is now what the OP desired (but couldn’t use, because they were using MySQL):

|id         |fname|name |
|-----------|-----|-----|
|1          |A    |A    |

I personally don’t like DISTINCT ON. While it is very useful, no doubt, it makes something that is already very difficult to explain to SQL beginners even more complicated. With a “more reasonable” syntax, the query would be written like this:

FROM emp
SELECT id, fname, name
ORDER BY id, fname, name
DISTINCT ON (id) 

With this syntactic order of operation, there would be no doubt about the semantics of DISTINCT or DISTINCT ON.

Use the jOOQ-Refaster Module for Automatic Migration off of Deprecated jOOQ API

Deprecation notice

After encountering numerous problems building the jOOQ-refaster module in various JDK versions and after receiving no feedback from the community about this feature, we have decided to remove it again in jOOQ 3.15: https://github.com/jOOQ/jOOQ/issues/10803

Starting with jOOQ 3.13, we’re offering a new module called jOOQ Refaster, which provides refaster templates for automatic API migration.

What’s Refaster?

Refaster is a lesser known sub project of the popular Google Error Prone library, a library for static code analysis, implemented as a Java compiler plugin. We already support a few such static code analysis tools through the jOOQ Checker module, which can be used with both the Checker Framework and with Google’s Error Prone.

Refaster allows for finding “bad” API usage patterns in your code, and produce .patch files automatically, which fix such patterns, producing better code.

What are we offering in jOOQ?

Over the years, we have deprecated quite a bit of API, as new API came along. For example, there is the ResultQuery.fetchLazy(int) method, where the int parameter corresponds to the JDBC fetchSize that is being passed along to JDBC.

Long ago, we introduced better API for passing the fetchSize. You can:

  • Specify a Settings value with your configuration for a global fetchSize to apply
  • Specify ResultQuery.fetchSize(int), which can then be combined with fetchLazy(), or with any type of fetch() call, as this isn’t really specific to “lazy” fetching

The Javadoc says:

Deprecated. [#2811] – 3.3.0 – Use fetchSize(int) and fetchLazy() instead.

…and your IDE will probably warn you about deprecated API usage, but who has time to fix all these warnings and read all these Javadoc?

Instead, you can now download the refaster file for your jOOQ version from our website https://www.jooq.org/refaster, configure your Java compiler to include ErrorProne’s refaster library (as documented in the jOOQ manual), and get a patch right away.

For example, if your code reads:

public void runQuery() {
    try (Cursor<?> cursor = ctx.select(T.A)
       .from(T)
       .fetchLazy(42)) {
        // Do something
    }
}

You might be getting a patch file like this one:

--- ..\..\src\main\java\org\jooq\refaster\test\RefasterTests.java
+++ ..\..\src\main\java\org\jooq\refaster\test\RefasterTests.java
@@ -62,6 +62,5 @@
 public void runQuery() {
     try (Cursor<?> cursor = ctx.select(T.A)
-       .from(T)
-       .fetchLazy(42)) {
+       .from(T).fetchSize(42).fetchLazy()) {
         // Do something
     }

And you’ll get the improved version of your code automatically:

public void runQuery() {
    try (Cursor<?> cursor = ctx.select(T.A)
       .from(T).fetchSize(42).fetchLazy()) {
        // Do something
    }
}

In our opinion, all libraries should offer such refaster files (or another technology doing something similar) to help users upgrade more easily between versions.

What’s next?

In addition to upgrading deprecated API usage (which is a very obvious use-case for Refaster), we’re also experimenting with some other static code analysis and improvement approaches that deal with our understanding of the SQL language, and best practices. For example, as we’ve blogged before, it is not recommended to use COUNT(*) queries in SQL (or in any language), when a simple EXISTS() check would suffice.

jOOQ 3.13 already offers an introspection that replaces calls like these:

ctx.fetchCount(table) != 0
ctx.fetchCount(table) >= 1 // And many more

By equivalent, but likely much more performant expressions:

ctx.fetchExists(table)

So, stay tuned for more such goodies in the future!

jOOQ 3.13 Released with More API and Tooling for DDL Management

jOOQ 3.13 has been released with CockroachDB support, much more API and tooling for DDL management, and SQL:2011 temporal table support

Starting with this release, we will further embrace our support for parsing,
translating, executing, and now also interpreting DDL statements. The driving
force is better code generation support, but in the future, also better
integration with database change management systems like Flyway or Liquibase.

We’ve added support for a very promising new dialect: CockroachDB.

We’re starting to support one of SQL:2011’s most interesting features: temporal
tables for system and application time versioning, and we’ve generally improved
the way we build and distribute jOOQ to our paying customers.

SQL Dialects

We’ve worked together with a team of great engineers at CockroachDB to speed up
our newly added, formal CockroachDB support in the jOOQ Professional and
Enterprise Editions. While many users have successfully used the POSTGRES
dialect with CockroachDB for simple integrations, this new dialect is now fully
integration tested and supported for future jOOQ and CockroachDB versions. The
integration fixed a variety of issues and minor differences that we have
encountered in the code generator and the runtime library.

We’ve added support for a variety of features now available in Firebird 3.0+,
MariaDB 10.3+, MySQL 8+, and PostgreSQL 10+.

DDL interpretation

Our biggest investment has been made in the area of DDL interpretation. Since
jOOQ 3.9, we can parse more and more SQL syntax. A logical next step is to allow
for interpreting parsed (or ordinary, DSL constructed) DDL statements in order
to maintain an incremental, in-memory meta model of a database. This has a
variety of benefits, which we will take advantage of in this release and in the
future.

The main out of the box benefits in this version include:

  • Being able to create a diff between two versions of a schema, programmatically through Meta.migrateTo(Meta), on the command line through DiffCLI (https://www.jooq.org/doc/latest/manual/sql-building/schema-diff-cli), or online, through our website: https://www.jooq.org/diff.
  • Such a diff or the meta model itself can be used for ad-hoc introspections, or formally, for automated migrations by tools built on top of jOOQ. Future versions will offer such migrations out of the box.
  • Unfortunately, we have not yet replaced our H2 backed DDLDatabase by our new DDL interpretation capabilities. This is a high priority for jOOQ 3.14, though

In the context of these investments, we have improved a variety of
infrastructure that were needed for these features and that users can benefit as
well. These improvements include:

  • A new LiquibaseDatabase, which uses Liquibase behind the scenes to simulate a schema migration in memory (similar to the DDLDatabase for Flyway), to reverse engineer that for the code generator. If you’re using Liquibase, chances are that you no longer need to connect to an actual database to use jOOQ’s code generator. More info here: https://blog.jooq.org/2019/10/16/how-to-simulate-a-liquibase-migration-using-h2/
  • We now support the inclusion of SQL text in generated code and other parts of the runtime meta model, including for CHECK constraints and views
  • A lot of improvements to the JDBC DatabaseMetaData backed org.jooq.Meta API have been implemented.
  • A lot of additional DDL statement and syntax support has been added throughout the API, parser, and other tooling.
  • The runtime meta model has been enhanced with new information, such as sequence flags, check constraints, table type information, and more.

Temporal table support

Perhaps the SQL:2011 standard’s most exciting “new” feature are temporal tables,
an enhancement on top of Oracle’s historic flashback query feature, and much
more.

In many systems, single, or bi temporal versioning of data is essential. The two
dimensions of versioning that are now supported in jOOQ are:

  • System versioning, useful for backups and auditing, where every mutating operation results in a backup copy of the data in a managed backup table.
  • Application versioning, useful for data that has a timely limited validity, such as a price for a product, which could have a temporary discount.

jOOQ 3.13 supports org.jooq.Period and related API in Select, Update, and Delete
to manipulate and query temporal tables in dialects that ship with native
support for these features. To different degrees, these dialects include:

  • DB2
  • MariaDB
  • Oracle
  • SQL Server

Emulations of incompletely supported syntax in the above dialects, as well as in
all other dialects will be added in future jOOQ versions, making temporal
validity a problem of the past.

More details here:
https://www.jooq.org/doc/3.13/manual/sql-building/table-expressions/temporal-tables/

This is a commercially licensed only feature.

Build improvements

We’ve invested a lot of time in improving our build automation using newer
Jenkins features that allow us to publish builds more frequently, especially
snapshot builds of upcoming minor releases for early adopters, as well as patch
releases for customers needing urgent fixes:

https://www.jooq.org/download/versions

These new releases also include trial versions for Java 6, 8, 11, without the
need to register for extended trials.

In addition to the above, we’ve improved and continue to improve our
documentation of features per edition, in the future.

Procedural API improvement

In jOOQ 3.12, we’ve introduced support for procedural languages, which we are
improving in every new release. jOOQ 3.13 adds support for additional
procedural statements, including:

  • EXECUTE for dynamic SQL execution embedded in procedural blocks
  • RETURN to return from functions
  • Plain SQL statement support

In the future, we’re planning to support also the generation of functions,
procedures, and triggers through standard jOOQ API.

More details here:
https://www.jooq.org/doc/3.13/manual/sql-building/procedural-statements/

This is a commercially licensed only feature.

jOOQ-Refaster

Some of our users have had a lot of success using our checker-framework or
ErrorProne integrations, which allow for some static API usage analysis in
client code (e.g. plain SQL / SQLi risk, dialect compatiblity, etc.)

Starting from jOOQ 3.13, we’re also adding support for ErrorProne’s sub project
“refaster”, which allows for automatically rewriting bad API usage to something
better, such as for example:

  • Migrate deprecated API usage like field.trim(), to its replacement trim(field)
  • Migrate inefficient SQL usage like ctx.fetchCount(select) != 0 to ctx.fetchExists(select)

The complete list of changes can be found on our website:
https://www.jooq.org/notes

Stop Mapping Stuff in Your Middleware. Use SQL’s XML or JSON Operators Instead

It’s been a while since I’ve ranted on this blog, but I was recently challenged by a reddit thread to write about this topic, so here goes…

So, you’re writing a service that produces some JSON from your database model. What do you need? Let’s see:

  • Read a book on DDD
  • Read another book on DDD
  • Write some entities, DTOs, factories, and factory builders
  • Discuss whether your entities, DTOs, factories, and factory builders should be immutable, and use Lombok, Autovalue, or Immutables to ease the pain of construction of said objects
  • Discuss whether you want to use standard JPA, or Hibernate specific features for your mapping
  • Plug in Jackson, the XML and JSON mapper library, because you’ve read a nice blog post about it
  • Debug 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations. Minor thing
  • Debug 1-2 N+1 cases

STOP IT

No, seriously. Just stop it right there!

What you needed was this kind of JSON structure, exported form your favourite Sakila database:

[{
  "first_name": "PENELOPE",
  "last_name": "GUINESS",
  "categories": [{
    "name": "Animation",
    "films": [{
      "title": "ANACONDA CONFESSIONS"
    }]
   }, {
    "name": "Family",
    "films": [{
      "title": "KING EVOLUTION"
    }, {
      "title": "SPLASH GUMP"
    }]
  }]
}, {
   ...

In English: We need a list of actors, and the film categories they played in, and grouped in each category, the individual films they played in.

Let me show you how easy this is with SQL Server SQL (all other database dialects can do it these days, I just happen to have a SQL Server example ready:

-- 1) Produce actors
SELECT
  a.first_name,
  a.last_name, (

    -- 2) Nest categories in each actor
    SELECT
      c.name, (

        -- 3) Nest films in each category
        SELECT title
        FROM film AS f
        JOIN film_category AS fc ON f.film_id = fc.film_id
        JOIN film_actor AS fa ON fc.film_id = fa.film_id
        WHERE fc.category_id = c.category_id
        AND a.actor_id = fa.actor_id
        FOR JSON PATH -- 4) Turn into JSON
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR JSON PATH -- 4) Turn into JSON
  ) AS categories
FROM
  actor AS a 
FOR JSON PATH, ROOT ('actors') -- 4) Turn into JSON

That’s it. That’s all there is to it. Only basic SQL-92, enhanced with some vendor-specific JSON export syntax. (There are also SQL standard JSON APIs as implemented in other RDBMS). Let’s discuss it quickly:

  1. The outer most query produces a set of actors. As you would have expected
  2. For each actor, a correlated subquery produces a nested JSON array of categories
  3. For each category, another correlated subquery finds all the films per actor and category
  4. Finally, turn all the result structures into JSON

That’s it.

Want to change the result structure? Super easy. Just modify the query accordingly. No need to modify:

  • Whatever you thought your DDD “root aggregate was”
  • Your gazillion entities, DTOs, factories, and factory builders
  • Your gazillion Lombok, Autovalue, or Immutables annotations
  • Your hacks and workarounds to get this stuff through your standard JPA, or Hibernate specific features for your mapping
  • Your gazilion Jackson, the XML and JSON mapper library annotations
  • Debugging another 1-2 problems arising from combining Jackson, JAXB, Lombok, and JPA annotations
  • Debugging another 1-2 N+1 cases

No! No need! It’s so simple. Just stream the JSON directly from the database to the client using whatever SQL API of your preference: JDBC, jOOQ, JdbcTemplate, MyBatis, or even JPA native query. Just don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware. Let me repeat that for emphasis:

Don’t go mapping that stuff in the middleware if you’re not consuming it in the middleware.

Oh, want to switch to XML? Easy. In SQL Server, this amounts to almost nothing but replacing JSON by XML:

SELECT
  a.first_name,
  a.last_name, (
    SELECT
      c.name, (
	    SELECT title
	    FROM film AS f
	    JOIN film_category AS fc ON f.film_id = fc.film_id
	    JOIN film_actor AS fa ON fc.film_id = fa.film_id
	    WHERE fc.category_id = c.category_id
	    AND a.actor_id = fa.actor_id
	    FOR XML PATH ('film'), TYPE
      ) AS films
    FROM category AS c
    JOIN film_category AS fc ON c.category_id = fc.category_id
    JOIN film_actor AS fa ON fc.film_id = fa.film_id
    WHERE fa.actor_id = a.actor_id
    GROUP BY c.category_id, c.name
    FOR XML PATH ('category'), TYPE
  ) AS categories
FROM
  actor AS a 
FOR XML PATH ('actor'), ROOT ('actors')

And now, you’re getting:

<actors>
  <actor>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <categories>
      <category>
        <name>Animation</name>
        <films>
          <film>
            <title>ANACONDA CONFESSIONS</title>
          </film>
        </films>
      </category>
      <category>
        <name>Family</name>
        <films>
          <film>
            <title>KING EVOLUTION</title>
          </film>
          <film>
            <title>SPLASH GUMP</title>
          </film>
        </films>
      </category>
      ...

It’s so easy with SQL!

Want to support both without rewriting too much logic? Produce XML and use XSLT to automatically generate the JSON. Whatever.

FAQ, Q&A

But my favourite Java SQL API can’t handle it

So what. Write a view and query that instead. By the way, jOOQ 3.14 supports all of it.

But this doesn’t fit our architecture

Then fix the architecture

But SQL is bad

No, it’s great. It’s based on relational algebra and augments it in many many useful ways. It’s a declarative 4GL, the optimiser produces way better execution plans than you could ever imagine (see my talk), and it’s way more fun than your gazillion 3GL mapping libraries.

But SQL is evil because of Oracle

Then use PostgreSQL. It can do JSON.

But what about testing

Just spin up a test database with https://www.testcontainers.org, install your schema with some migration framework like Flyway or Liquibase in it, fill in some sample data, and write your simple integration tests.

But mocking is better

It is not. The more you mock away the database, the more you’re writing your own database.

But I’m paid by the lines of code

Well, good riddance, then.

But what if we have to change the RDBMS

So what? Your management paid tens of millions for the new licensing. They can pay you tens of hundreds to spend 20 minutes rewriting your 5-10 SQL queries. You already wrote the integration tests above.

Anyway. It won’t happen. And if it will, then those few JSON queries will not be your biggest problem.

What was that talk of yours again?

Here, highly recommended:

But we’ve already spent so many person years implementing our middleware

It has a name

But I’ve read this other blog post…

And now you’ve read mine.

But that’s like 90s style 2 tier architecture

So what? You’ve spent 5% the time to implement it. That’s 95% more time adding value to your customers, rather than bikeshedding mapping technology. I call that a feature.

What about ingestion? We need abstraction over ingestion

No, you don’t. You can send the JSON directly into your database, and transform / normalise it from there, using the same technique. You don’t need middleware abstraction and mapping, you just want middleware abstraction and mapping.

A Guide to SQL Naming Conventions

One of Java’s big strengths, in my opinion, is the fact that most naming conventions have been established by the creators of the language. For example:

  • Class names are in PascalCase
  • Member names are in camelCase
  • Constants are in SNAKE_CASE

If someone does not adhere to these conventions, the resulting code quickly looks non-idiomatic.

What about SQL?

SQL is different. While some people claim UPPER CASE IS FASTEST:

Others do not agree on the “correct” case:

There seems to be a tendency towards writing identifiers in lower case, with no agreement on the case of keywords. Also, in most dialects, people prefer snake_case for identifiers, although in SQL Server, people seem to prefer PascalCase or camelCase.

That’s for style. And I’d love to hear your opinion on style and naming conventions in the comments!

What about naming conventions?

In many languages, naming conventions (of identifiers) is not really relevant, because the way the language designs namespacing, there is relatively little risk for conflict. In SQL, this is a bit different. Most SQL databases support only a 3-4 layered set of namespaces:

  1. Catalog
  2. Schema
  3. Table (or procedure, type)
  4. Column (or parameter, attribute)

Some dialect dependent caveats:

  • While SQL Server supports both catalog AND schema, most dialects only support one of them
  • MySQL treats the catalog (“database”) as the schema
  • Oracle supports a package namespace for procedures, between schema and procedure

In any case, there is no such concept as package (“schema”) hierarchies as there is in languages like Java, which makes namespacing in SQL quite tricky. A problem that can easily happen when writing stored procedures:

FUNCTION get_name (id NUMBER) IS
  result NUMBER;
BEGIN
  SELECT name
  INTO result
  FROM customer
  WHERE id = id; -- Ehm...

  RETURN result;
END;

As can be seen above, both the CUSTOMER.ID column as well as the GET_NAME.ID parameter could be resolved by the unqualified ID expression. This is easy to work around, but a tedious problem to think of all the time.

Another example is when joining tables, which probably have duplicate column names:

SELECT *
FROM customer c
JOIN address a ON c.id = a.customer_id

This query might produce two ambiguous ID columns: CUSTOMER.ID and ADDRESS.ID. In the SQL language, it is mostly easy to distinguish between them by qualifying them. But in clients (e.g. Java), they are less easy to qualify properly. If we put the query in a view, it gets even trickier.

“Hungarian notation”

Hence, SQL and the procedural languages are a rare case where some type of Hungarian notation could be useful. Unlike with hungarian notation itself, where the data type is encoded in the name, in this case, we might encode some other piece of information in the name. Here’s a list of rules I’ve found very useful in the past:

1. Prefixing objects by semantic type

Tables, views, and other “tabular things” may quickly conflict with each other. Especially in Oracle, where one does not simply create a schema because of all the security hassles this produces (schemas and users are kinda the same thing, which is nuts of course. A schema should exist completely independently from a user), it may be useful to encode a schema in the object name:

  • C_TABLE: The C_ prefix denotes “customer data”, e.g. as opposed to:
  • S_TABLE: The S_ prefix denotes “system data” or “master data”
  • L_TABLE: The L_ prefix denotes “log data”
  • V_VIEW: The V_ prefix denotes a view
  • P_PARAMETER: The P_ prefix denotes a procedure or function parameter
  • L_VARIABLE: The L_ prefix denotes a local variable

Besides, when using views for security and access control, one might have additional prefixes or suffixes to denote the style of view:

  • _R: The _R suffix denotes read only views
  • _W: The _W suffix denotes writeable (updatable) views

This list is obviously incomplete. I’m undecided whether this is necessarily a good thing in general. For example, should packages, procedures, sequences, constraints be prefixed as well? Often, they do not lead to ambiguities in namespace resolution. But sometimes they do. The importance, as always, is to be consistent with a ruleset. So, once this practice is embraced, it should be applied everywhere.

2. Singular or plural table names

Who cares. Just pick one and use it consistently.

3. Establishing standard aliasing

Another technique that I’ve found very useful in the past is a standard approach to aliasing things. We need to alias tables all the time, e.g. in queries like this:

SELECT *
FROM customer c
JOIN address a ON c.id = a.customer_id

But what if we have to join ACCOUNT as well? We already used A for ADDRESS, so we cannot reuse A. But if we don’t re-use the same aliases in every query, the queries start to be a bit confusing to read.

We could just not use aliases and always fully qualify all identifiers:

SELECT *
FROM customer
JOIN address ON customer.id = address.customer_id

But that quickly turns out to be verbose, especially with longer table names, so also not very readable. The standard approach to aliasing things I’ve found very useful is to use this simple algorithm that produces 4 letter aliases for every table. Given the Sakila database, we could establish:

PREFIX TABLE NAME
ACTO ACTOR
ADDR ADDRESS
CATE CATEGORY
CITY CITY
COUN COUNTRY
CUST CUSTOMER
FILM FILM
FIAC FILM_ACTOR
FICA FILM_CATEGORY
FITE FILM_TEXT
INVE INVENTORY
LANG LANGUAGE
PAYM PAYMENT
RENT RENTAL
STAF STAFF
STOR STORE

The algorithm to shorten a table name is simple:

  • If the name does not contain an underscore, take the four first letters, e.g CUSTOMER becomes CUST
  • If the name contains 1 underscore, take the first two letters of each word, e.g. FILM_ACTOR becomes FIAC
  • If the name contains 2 underscores, take the first two letters of the first word, and the first letter of the other words, e.g. FILM_CATEGORY_DETAILS becomes FICD
  • If the name contains 3 or more underscores, take the first letter of each word
  • If a new abbreviation causes a conflict with the existing ones, make a pragmatic choice

This technique worked well for large-ish schemas with 500+ tables. You’d think that abbreviations like FICD are meaningless, and indeed, they are, at first. But once you start writing a ton of SQL against this schema, you start “learning” the abbreviations, and they become meaningful.

What’s more, you can use these abbreviations everywhere, not just when writing joins:

SELECT 
  cust.first_name,
  cust.last_name,
  addr.city
FROM customer cust
JOIN address addr ON cust.id = addr.customer_id

But also when aliasing columns in views or derived tables:

SELECT 
  cust.first_name AS cust_first_name,
  cust.last_name AS cust_last_name,
  addr.city AS addr_city
FROM customer cust
JOIN address addr ON cust.id = addr.customer_id

This becomes invaluable when your queries become more complex (say, 20-30 joins) and you start projecting tons of columns in a library of views that select from other views that select from other views. It’s easy to keep consistent, and you can also easily recognise things like:

  • What table a given column originates from
  • If that column has an index you can use (on a query against the view!)
  • If two columns that look the same (e.g. FIRST_NAME) really are the same

I think that if you work with views extensively (I’ve worked with schemas of 1000+ views in the past), then such a naming convention is almost mandatory.

Conclusion

There isn’t really a “correct” way to name things in any language, including SQL. But given the limitations of some SQL dialects, or the fact that after joining, two names may easily conflict, I’ve found the above two tools very useful in the past: 1) Prefixing identifiers with a hint about their object types, 2) Establishing a standard for aliasing tables, and always alias column names accordingly.

When you’re using a code generator like jOOQ’s, the generated column names on views will already include the table name as a prefix, so you can easily “see” what you’re querying.

I’m curious about your own naming conventions, looking forward to your comments in the comment section!

Dogfooding in Product Development

Dogfooding, or eating your own dog food, is a practice that all product developers should implement all the time. According to wikipedia:

Dogfooding, occurs when an organization uses its own product. This can be a way for an organization to test its products in real-world usage. Hence dogfooding can act as quality control, and eventually a kind of testimonial advertising. Once in the market, dogfooding demonstrates confidence in the developers’ own products

I’ve recently started delivering this talk about API design at conferences, where I mentioned dogfooding as an excellent approach to make sure the user experience of your API is great:

The more you use your own API, the better it gets from a UX and usability perspective.

Dogfooding via tests

We do a lot of dogfooding ourselves, inevitably, as we write tons and tons of tests for jOOQ, to make sure jOOQ works correctly on all the currently 26 RDBMS that we support.

Writing a test for new API means we have to immediately use the new API for the first time. This helps discover the first usability problems. But there are even better ways:

Dogfooding via significant new functionality

Another great way to dogfood is to create significant new features. For example, this new schema diff tool that jOOQ 3.13 will ship with (#9425). It is part of a bigger project that we call DDL interpretation, where we start implementing the DDL part of a database by maintaining an up to date database schema depending on a stream of DDL statements. This will be part of a variety of improvements and value propositions in the area of database change management / SQL migrations. Recently, we’ve published a post about an improved Liquibase integration, which goes in a similar direction.

The schema diff tool is something many database products are offering. For us, it is relatively simple to implement as we:

  • Support a variety of schema meta representations
  • Support a lot of DDL syntax
  • Support exporting schema meta representations as DDL

We’ve always had a type called org.jooq.Meta. That type represents your database. Historically, it just gave jOOQ-style access to JDBC’s java.sql.DatabaseMetaData. But over time, we also started supporting exposing generated jOOQ code, or XML files as org.jooq.Meta.

With jOOQ 3.13, we’ll support interpreting arbitrary DDL (parsed or manually created using the jOOQ API), or also Liquibase XML files to create such a org.jooq.Meta representation. Which can then be turned again into DDL using the new Meta.ddl() method:

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
  );

When we run the above program, we’re getting the following output:

create table t(
  i int not null,
  j int null,
  primary key (i)
);

That’s already cool – we can create a snapshot of our schema at any point of a set of database migration scripts. Can we also do the inverse? We can, with the new diff tool:

System.out.println(
  ctx.meta(
    "create table t (i int)"
  ).migrateTo(ctx.meta(
    "create table t ("
  + "i int not null, "
  + "j int null, "
  + "primary key (i))"
  ))
);

The output generated from this is:

alter table t alter i set not null;
alter table t add j integer null;
alter table t add constraint primary key (i);

This is part of an exciting new set of DDL / migration features, we can hardly wait to publish with jOOQ 3.13 (around Q1 2020), as we believe it will greatly help with your existing database change management solution – perhaps as a Flyway or Liquibase plugin.

What does this have to do with dogfooding?

While developing this feature, we’ve discovered numerous missing features, which we also implemented for jOOQ 3.13:

  • #7752: Our current meta model for sequences does remember flags like MAXVALUE
  • #9428: Meta.toString() could just call Meta.ddl(), the DDL export. This is what I’ve shown above, very useful when debugging with small schemas!
  • #9433: Meta.equals() and Meta.hashCode() was not yet implemented, as this didn’t make sense, historically.
  • #9434: Our current DDL export should support reordering objects in alphabetical order for the export. This is useful for equality checks, text-based diffs, etc.
  • #9437: We don’t support ALTER SEQUENCE statements that allow for modifying sequence properties like MAXVALUE yet.
  • #9438: The current emulation of ALTER SEQUENCE .. RESTART hard codes restarting the sequence at 1, when in fact, it could be some other START WITH value
  • #9440: We’ll need a synthetic syntax to drop unnamed foreign keys.

This is just a short extract of things we’ve implemented this week based on our discoveries of what’s missing when implementing the “big” feature: The Meta.migrateTo(Meta) method. The migrateTo() method will be the one highlighted in the release notes of 3.13. But the little things above are what ultimately makes jOOQ so useful – the many little things that were discovered while dogfooding and that help everyone, not just the users that use the migrateTo() method.

Dogfooding via blogging and documentation

But, perhaps even better than when implementing new features is blogging or documenting API. When blogging or documenting, the vendor / maintainer has to put themselves into the position of the user, in particular, the first time user of a specific API.

Imagine, after writing all this code that I as an author love, I have to reset my experience, and pretend I don’t know what to expect. Then, sit down, and write a really good and simple example using the API that I wrote.

The first example I came up with was not like this (as that API didn’t exist yet):

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
);

Instead, it was like this (which works the same way):

System.out.println(
  ctx.meta("create table t (i int);\n"
+ "alter table t add j int;\n"
+ "alter table t alter i set not null;\n"
+ "alter table t add primary key (i);")
);

I thought to myself, do I really need all that string concatenation? I already have this Meta.apply(Queries) method, which would add even more value. But then, my code would look like this:

System.out.println(
  ctx.meta("")
     .apply(queries(createTable("t").column("i", INTEGER)))
     .apply(queries(alterTable("t").add("i", INTEGER)))
     .apply(queries(alterTable("t").alter("i").setNotNull()))
     .apply(queries(alterTable("t").add(primaryKey("i")))
);

That would be cool, too. I could show that our DSL API and our parser can do the same things. But I didn’t want to show the DSL API in this blog post. It would only distract from my main point. I could, of course, parse the queries explicitly:

System.out.println(
  ctx.meta("")
     .apply(ctx.parse("create table t (i int)"))
     .apply(ctx.parse("alter table t add j int"))
     .apply(ctx.parse("alter table t alter i set not null"))
     .apply(ctx.parse("alter table t add primary key (i)"))
);

But why not just add convenience API that does this for me?

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
);

Where Meta.apply(String) looks like this:

public final Meta apply(String diff) {
return apply(dsl().parser().parse(diff));
}

That’s mere convenience. It is not needed. But it is very useful:

  • For tests
  • For blog posts
  • For documentation
  • For new users
  • For simple applications

The more thorough, more powerful API that accepts Queries (which wraps a bunch of parsed or hand-constructed jOOQ Query) objects is the real feature here. But convenience helps the user very much!

I would not have discovered this requirement without dogfooding.

Convenience

A much underrated tweet by Brian Goetz is this one:

APIs need a ton of abstractions. The JDK Collector API is a very good example. In Java and in SQL, in order to write a custom aggregate function, you need these four operations:

  • Supplier<A>: A supplier that provides an empty, intermediary data structure to aggregate into
  • BiConsumer<A, T>: A accumulator that accumulates new values from the stream into our intermediary data structure.
  • BinaryOperator<A>: A combiner that combines two intermediary data structures. This is used for parallel streams only.
  • Function<A, R>: The finisher function that extracts the result from the intermediary data structure.

Writing a Collector from scratch is super annoying and “low level”. Using convenience API is much better, this is why the JDK has Collectors, or jOOλ has Agg, and there are other Collector libraries that provide some basic building blocks. The Stream API does not have to know about all of these building blocks, it just knows a single, abstract type. (serves the API). But we users, we don’t want only a single abstraction. We want those convenient building blocks (serves the user).

Convenience is never requirement, but always a big plus. In APIs like in languages.

Have you noticed this new method on InputStream, which has been added in Java 9?

public long transferTo(OutputStream out) throws IOException {
  Objects.requireNonNull(out, "out");
  long transferred = 0;
  byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
  int read;
  while ((read = this.read(buffer, 0, DEFAULT_BUFFER_SIZE)) >= 0) {
    out.write(buffer, 0, read);
    transferred += read;
  }
  return transferred;
}

How many times have we written this kind of code? N times (and a big N, too)

How many times have we enjoyed writing this code? 1 time

How many times too many have we written this code? N + 1 times.

Dogfooding is a very good way to make sure this kind of convenience our users love us for will make it into products much earlier on. Because we, the vendors, are the first users, and we hate writing this silly code all the time. And who is a better first user to implement cool new features for, if not ourselves?