jOOQ Internals: Pushing up SQL fragments

Over the past 13 years, jOOQ has accrued quite some internal features, which you, the user, are not exposed to. One very interesting feature is the capability for any arbitrary jOOQ expression tree element to push a SQL fragment up to a higher level.

How does it work?

The jOOQ expression tree model

When you write a jOOQ query, you’re really creating an expression tree of your SQL statement that just happens to look like actual SQL syntax. For example:

ctx.select(BOOK.ID, BOOK.TITLE)
   .from(BOOK)
   .where(BOOK.AUTHOR_ID.eq(1))
   .fetch();

In jOOQ, this is nothing else than an expression tree of this form

                     Select (Statement)
                             |
          +------------------+-----------------+
          |                  |                 |
        Select             From              Where
          |                  |                 |
    +-----+------+           |                 |
    |            |           |                 |
TableField  TableField     Table        CompareCondition
  |     |     |     |        |           |     |       |
Table Field Table Field    BOOK    TableField  EQ     Val
  |     |     |     |                |     |           |
BOOK    ID   BOOK TITLE            Table Field         1
                                     |     |
                                    BOOK AUTHOR_ID

When generating the SQL, jOOQ mostly just traverses the tree brepth first (just kidding. It’s mostly depth first, but some child elements on the same level are often considered, prior to going down one level), collecting each element into a StringBuilder to the expected form:

SELECT book.id, book.title
FROM book
WHERE book.author_id = 1

Each of these so-called QueryPart expression tree elements can decide for itself how it’s going to render its SQL. For example, the CompareCondition will roughly generate this sequence:

<lhs> <operator> <rhs>

… further delegating SQL generation to its children, whatever they may be. A TableField will decide whether to fully / partially / or not qualify its Field reference at all, etc., e.g. based on the schema mapping (multi tenancy) feature.

If you’re using functions, such as Substring, that function can decide on its own how it should generate its SQL. As can be seen in the manual, these are all the same:

-- ACCESS
mid('hello world', 7)

-- ASE, SQLDATAWAREHOUSE, SQLSERVER
substring('hello world', 7, 2147483647)

-- AURORA_MYSQL, AURORA_POSTGRES, COCKROACHDB, CUBRID, H2, 
-- HANA, HSQLDB, IGNITE, MARIADB, MEMSQL, MYSQL, POSTGRES, REDSHIFT, 
-- SNOWFLAKE, SYBASE, VERTICA
substring('hello world', 7)

-- DB2, DERBY, INFORMIX, ORACLE, SQLITE
substr('hello world', 7)

-- FIREBIRD, TERADATA
substring('hello world' FROM 7)

This way, a jOOQ expression tree can emulate any syntax in any dialect.

But what if the emulation is not local?

Non local emulations

Sometimes, a QueryPart needs to assume the presence of a non-local syntactic element to work. A recent case was https://github.com/jOOQ/jOOQ/issues/11366.

When writing this procedural logic in jOOQ:

for_(i).in(1, 10).loop(
    insertInto(t).columns(a).values(i)
)

Of course, you wouldn’t do this. You’d write a bulk insert statement instead and solve this with SQL only! But you have your reasons, right?

Then, the indexed FOR loop may have to be emulated in some dialects, using an equivalent WHILE statement. So, instead of this straight forward procedural SQL generation, which we might get in Oracle for example:

FOR i IN 1 .. 10 LOOP
  INSERT INTO t (a) VALUES (i);
END LOOP;

… we generate this in MySQL, more or less:

DECLARE i INT DEFAULT 1;
WHILE i <= 10 DO
  INSERT INTO t (a) VALUES (i);
  SET i = i + 1;
END WHILE;

This can still be done completely locally, as shown before. There’s a FOR expression tree element that can locally generate the DECLARE and WHILE query parts, instead. But what if local variables aren’t possible? What if there’s no block scope, as in Firebird?

In Firebird, all your local variables must be declared in a top level declaration section. If we run the above in an anonymous block, the correct generated procedural SQL would be this:

EXECUTE BLOCK AS
  DECLARE i INT;
BEGIN
  :i = 1; -- The loop variable must still be initialised locally
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (a) VALUES (i);
    :i = :i + 1;
  END
END

This is still the case when we further nest the loop in procedural control flow elements, such as:

for_(i).in(1, 10).loop(
    for_(j).in(1, 10).loop(
        insertInto(t).columns(a, b).values(i, j)
    )
)

Of course, you still wouldn’t do this. You’d write a bulk insert statement from a cartesian product instead and solve this with SQL only! But alas, let’s keep the example simple

We now have nested variable declarations, which still work well in MySQL

DECLARE i INT DEFAULT 1;
WHILE i <= 10 DO 
  BEGIN
    DECLARE j INT DEFAULT 1;
    WHILE j <= 10 DO
      INSERT INTO t (a, b) VALUES (i, j);
      SET j = j + 1;
    END WHILE;
  END
  SET i = i + 1;
END WHILE;

But in Firebird, the declarations will both have to be pushed up to the top:

EXECUTE BLOCK AS
  DECLARE i INT;
  DECLARE j INT;
BEGIN
  :i = 1; -- The loop variable must still be initialised locally
  WHILE (:i <= 10) DO BEGIN
    :j = 1; -- The loop variable must still be initialised locally
    WHILE (:j <= 10) DO BEGIN
      INSERT INTO t (a, b) VALUES (i, j);
      :j = :j + 1;
    END
    :i = :i + 1;
  END
END

This doesn’t handle all the edge cases yet (e.g. some dialects allow for “hiding” local variables, such as PL/SQL) but it goes a long way already for simple procedures, functions, and triggers, all of which will be supported starting in jOOQ 3.15.

How does it work?

Alternative 1: Expression tree transformation

There are numerous ways to make such an expression tree transformation work. In the long run, we will re-design our internal expression tree model, and make it available also as a public API to those who wish to use the jOOQ parser and expression tree transformation as a standalone product. To some extent, this is already possible using the VisitListener SPI as shown in this post about row level security, but the current implementation is complex.

Also, it is relatively rare (so far) for an expression tree to require non-local transformation. This means that eagerly trying to look for possible candidates every time is probably overkill.

Alternative 2: Lazy expression tree transformation

We could transform the expression tree “lazily”, i.e. still assuming it is unnecessary, and when it is, throw an exception and start again. We actually do this, the “pattern” in jOOQ is called a ControlFlowSignal, and it is mainly used to work around the maximum number of bind parameters per statement limit of different dialects. I.e. we just count bind values, and if there are more than 2000 in SQL Server (2100 are supported by SQL Server, but only 2000 by jtds), then we just re-generate the SQL from scratch using inline values in a static statement.

As always with jOOQ, you can re-configure these limits to your own value.

Another case is when you forgot to turn on the ROWNUM to LIMIT transformation when migrating off Oracle’s old ROWNUM filtering. It would be silly to eagerly search for ROWNUM instances every time. Instead, we just re-generate the entire SQL query when we encounter one, and when you’re not using Oracle.

The assumption here is that these things happens very rarely, and if they do, you haven’t thought of it, and you don’t want the query to fail in production. The fact that a probably already slow query takes just a bit more time for jOOQ to generate is a price worth paying for the query to still Just Work™

Alternative 3: Patching generated SQL strings

Now, this is what we are actually doing.

Better assume almost all SQL transformations are local (as in the Substring example), and patch the SQL in case they’re not. In the end, we’re just generating SQL strings! And as such, why not introduce an infrastructure where we can put special markers to special text areas, and then replace that area with alternative SQL content.

Without the fix #11366, the generated code may have looked like this:

EXECUTE BLOCK AS
  -- Special marker here
BEGIN 
  -- Entering scope
  DECLARE i INT DEFAULT 1;
  WHILE (:i <= 10) DO BEGIN
    DECLARE j INT DEFAULT 1;
    WHILE (:j <= 10) DO BEGIN
      INSERT INTO t (a, b) VALUES (i, j);
      :j = :j + 1;
    END
    :i = :i + 1;
  END
  -- Exiting scope
END

This doesn’t work in Firebird, so we apply the fix. Note there’s a cheap, special marker that is placed by the SQL generation of anonymous blocks, but also for procedures, functions, and triggers, e.g.

CREATE FUNCTION func1()
RETURNS INTEGER
AS 
  -- Special marker here
BEGIN
  -- Entering scope
  RETURN 1;
  -- Exiting scope
END

Now, whenever the org.jooq.impl.DeclarationImpl query part generates its SQL locally, then, instead of generating something like:

DECLARE i INT DEFAULT 1;
DECLARE j INT;

We generate (locally)

:i = 1;
-- j isn't initialised here, so nothing needs to be done locally

At the same time, we push the org.jooq.impl.DeclarationImpl into some context variable that is visible to the entire scope (see “entering scope” and “exiting scope” comments).

As soon as we exit the scope, we must render all collected declarations, this time without the defaults, e.g.

DECLARE i INT;
DECLARE j INT;

… and then insert that rendered SQL right where the marker was located. All subsequent markers, if any, are then shifted by the difference of text lengths, of course.

Applications in jOOQ

This is currently used a few times within jOOQ:

  • To emulate calling Oracle PL/SQL functions with BOOLEAN parameters / return values. We patch the generated SQL producing a synthetic WITH clause with some BOOLEAN to NUMBER translation logic. Since Oracle 12c, Oracle supports PL/SQL embedded in WITH, which is quite a nifty feature!
  • The entire implicit JOIN feature is implemented this way! Markers delimit each table in the FROM clause (e.g. FROM BOOK), and if any path originating from any such marked table is encountered in the query (e.g. SELECT BOOK.AUTHOR.FIRST_NAME), then 1) instead of generating the path, a synthetic alias for the path is used to qualify the column, 2) instead of generating the FROM BOOK table, a synthetic LEFT JOIN or INNER JOIN is generated, joining all the necessary to-one relationships. An example will be shown below.
  • The above Firebird (and possibly also T-SQL, let’s see) procedural local variable scoping fixes are implemented this way.
  • A few emulations that require prepending SQL to the complete statement, such as CREATE OR REPLACE PROCEDURE x emulations that prepend DROP PROCEDURE x to CREATE PROCEDURE x work in a similar fashion. These types of emissions are “special” in that they add another statement to the statement batch. This means we’ll also have to be careful to skip any possible result sets or update counts this produces when calling the batch from JDBC.

Future applications may include:

  • More top level CTE, which are quite handy for a variety of emulations

An example for implicit joins:

SELECT
  book.author.first_name,
  book.author.last_name
FROM book -- Special marker around book

The above SQL doesn’t work in any dialect, it’s just jOOQ specific. We generate an alias for each unique path, based on the path’s hash code, so the query might look like this, instead:

SELECT
  -- The path isn't generated, but an alias for it is
  alias_xyz.first_name,
  alias_xyz.last_name
FROM (
  -- The marked "book" table is replaced by a join tree
  book 
    LEFT JOIN author AS alias_xyz 
    ON book.author_id = author.id
)

We simply replace book by (book LEFT JOIN ...) in the resulting SQL string. Thanks to our infrastructure of being able to define scopes and register context and variables for each scope, this works for arbitrary levels of nesting. We can always identify the appropriate book identifier for each path expression, even for things like this:

SELECT
  book.author.first_name,
  book.author.last_name,

  -- Different book tables here, because the nested scope 
  -- hides the "book" identifier from the outer scope
  (SELECT count(*) FROM book),
  (SELECT count(DISTINCT book.author.first_name) FROM book),

  -- Outer book, again
  (SELECT book.author.first_name)
FROM 
  book

The above being emulated like this, by patching two marked occurrences of book by the same join graph:

SELECT
  alias_xyz.first_name,
  alias_xyz.last_name,

  -- No patching done to this book
  (SELECT count(*) FROM book),

  -- The alias_xyz alias is used again, the path is the same
  (SELECT count(DISTINCT alias_xyz.first_name) 

  -- And the book table is patched again with the same left join
   FROM (
     book 
       LEFT JOIN author AS alias_xyz 
       ON book.author_id = author.id
  )),

  -- Outer book, again
  (SELECT alias_xyz.first_name)
FROM (
  book 
    LEFT JOIN author AS alias_xyz 
    ON book.author_id = author.id
)

It sounds as sophisticated as it sounds complex, but it works really really well.

Perhaps, in the future, expression tree transformation will be preferred over patching the result string. So far, with the current applications, this was the path of least resistance, and highest performance.

Automatically Transform Oracle Style Implicit Joins to ANSI JOIN using jOOQ

While jOOQ is mostly being used as an internal SQL DSL for embedded, dynamic SQL in Java, where it offers the best solution on the market, jOOQ is increasingly also used for one of its secondary features: Its parser.

Having been introduced in jOOQ 3.9 primarly for the purpose of being able to parse DDL statements, e.g. to reverse engineer your schema for code generation purposes, we’ve added an increasing amount of features and SQL transformation capabilities that allow for the parser to be used as a standalone product, through the command line interface, our website, or through the ordinary jOOQ API.

One feature that has been added to jOOQ 3.14, which is mostly useful to those using jOOQ as a parser, is the capability of transforming old Oracle style implicit joins to ANSI JOIN.

Why avoid “implicit joins”?

The old Oracle style implicit join syntax is supported and properly optimised by most RDBMS vendors. In the past, prior to SQL-92, this is how we used to inner join tables, e.g. when querying the Sakila database:

SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id

Granted, the syntax is kind of intuitive. Just declare all the tables you want to fetch data from, and then make sure the proper data is retained only by filtering for matching primary key / foreign key values.

Of course, this can go terribly wrong. For the many obvious reasons, e.g. when you forget a join predicate after adding a table. If the query is complex, this may be hard to debug. The solution is ANSI JOIN. Starting from SQL-92 (almost 30 years now!), this is how we join in most RDBMS:

SELECT *
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id

While it’s still possible to define wrong join predicates, at least it’s no longer possible to forget a predicate, because this is syntactically incorrect (except for MySQL, where, regrettably, the ON clause is optional):

SELECT *
FROM actor a
JOIN film_actor fa -- Syntax error
JOIN film f -- Syntax error

jOOQ’s implicit JOIN

Notice that it is common to refer to the above syntax as “implicit join”, while JPQL and jOOQ recycled the term for another kind of “implicit join”, which is foreign key path based, and even less error prone than the ANSI SQL syntax. With jOOQ, the above query can be written as follows:

ctx.select(
      FILM_ACTOR.actor().asterisk(), 
      FILM_ACTOR.asterisk(),
      FILM_ACTOR.film().asterisk())
   .from(FILM_ACTOR)
   .fetch();

The mere presence of these to-one relationship paths in the query will implicitly add the appropriate LEFT JOIN or INNER JOIN to the FROM clause. This is merely convenience on top of ordinary ANSI JOINs, not a replacement.

Transforming Oracle implicit joins

When you have an old code base that you wish to upgrade and transform all your queries to using ANSI JOIN, use jOOQ for that. You can use jOOQ’s programmatic capabilities (as mentioned before), or the free website https://www.jooq.org/translate.

On the website, just pick the “Oracle style to ANSI JOIN” option, place the following SQL on the left:

Input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  CROSS JOIN (
    film_category fc
      JOIN category c
        ON fc.category_id = c.category_id
  )
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

And… whoopsies. The output correctly displays the resulting, undesired CROSS JOIN because one of the join predicates was missing:

Yep, the tool already helped! Let’s fix the input query:

Fixed input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND f.film_id = fc.film_id -- This was missing
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  JOIN film_category fc
    ON f.film_id = fc.film_id
  JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

This also works if you were using Oracle’s arcane outer join syntax using (+) (or SQL Server’s *=, which has been unsupported for a while now). You might have this input:

Input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id(+) = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Producing this output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN (
    film_category fc
      RIGHT OUTER JOIN category c
        ON fc.category_id = c.category_id
  )
    ON f.film_id = fc.film_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Err, wat. Whoopsies again! One of the (+) symbols was on the wrong side, which is why we got that RIGHT OUTER JOIN. Again, the tool has shown that the old syntax was quite error prone. Let’s fix it.

Fixed input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id = c.category_id(+)
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN film_category fc
    ON f.film_id = fc.film_id
  LEFT OUTER JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Conclusion

Play around with it and tell us what you think! https://www.jooq.org/translate

jOOQ 3.11 Released With 4 New Databases, Implicit Joins, Diagnostics, and Much More

Today, jOOQ 3.11 has been released with support for 4 new databases, implicit joins, diagnostics, and much more

New Databases Supported

At last, 4 new SQL dialects have been added to jOOQ! These are:

jOOQ Professional Edition

  • Aurora MySQL Edition
  • Aurora PostgreSQL Edition
  • Azure SQL Data Warehouse

jOOQ Enterprise Edition

  • Teradata

Implicit Joins

One of the really cool features in ORMs like Hibernate, Doctrine, and others, is
the capability of using a relationship graph notation to access another entity’s
columns through what is often called “implicit joins”.

Instead of explicitly joining a to-one relationship to access its columns:

SELECT author.first_name, author.last_name, book.title
FROM book
JOIN author ON book.author_id = author.id

We would like to be able to access those columns directly, using this notation:

SELECT book.author.first_name, book.author.last_name, book.title
FROM book

The join is implied and should be added implicitly. jOOQ now allows for this to
happen when you use the code generator:

ctx.select(BOOK.author().FIRST_NAME, BOOK.author().LAST_NAME, BOOK.TITLE)
   .from(BOOK)
   .fetch();

When rendering this query, the implicit join graph will be calculated on the fly
and added behind the scenes to the BOOK table. This works for queries of
arbitrary complexity and on any level of nested SELECT.

More details in this blog post:
https://blog.jooq.org/2018/02/20/type-safe-implicit-join-through-path-navigation-in-jooq-3-11/

DiagnosticsListener SPI

A new DiagnosticsListener SPI has been added to jOOQ:
https://github.com/jOOQ/jOOQ/issues/5960

The purpose of this SPI is to sanitise your SQL language, JDBC and jOOQ API
usage. Listeners can listen to events such as:

  • duplicateStatements (similar SQL is executed, bind variables should be used)
  • repeatedStatements (identical SQL is executed, should be batched or rewritten)
  • tooManyColumnsFetched (not all projected columns were needed)
  • tooManyRowsFetched (not all fetched rows were needed)

The great thing about this SPI is that it can be exposed to clients through the
JDBC API, in case of which the diagnostics feature can reverse engineer your
JDBC or even JPA generated SQL. Ever wanted to detect N+1 queries from
Hibernate? Pass those Hibernate-generated queries through this SPI.

Want to find missing bind variables leading to cursor cache contention or SQLi?
Let jOOQ find similar SQL statements and report them. E.g.

  • SELECT name FROM person WHERE id = 1
  • SELECT name FROM person WHERE id = 2

Or also:

  • SELECT name FROM person WHERE id IN (?, ?)
  • SELECT name FROM person WHERE id IN (?, ?, ?)

Anonymous blocks

Many databases support anonymous blocks to run several statements in a single
block scope. For example, Oracle:

DECLARE
  l_var NUMBER(10);
BEGIN
  l_var := 10;
  dbms_output.put_line(l_var);
END;

jOOQ now supports the new org.jooq.Block API to allow for wrapping DDL and DML
statements in such a block. This is a first step towards a future jOOQ providing
support for:

  • Abstractions over procedural languages
  • CREATE PROCEDURE and CREATE FUNCTION statements
  • Trigger support
  • And much more

Parser

jOOQ’s parser support is an ongoing effort. This release has added support for
a lot of new SQL clauses and functions from various vendors and in various DDL
and DML statements.

The parser is now also exposed through a public website and API, where SQL can
be translated from one dialect to another:
https://www.jooq.org/translate

This website will help further drive jOOQ API development by helping to find
missing functionality that is used in real-world SQL.

Another way to access this API is through the new org.jooq.ParserCLI command
line tool. For example, run:

$ java -cp jooq-3.11.0.jar org.jooq.ParserCLI -f -t ORACLE -s "SELECT * FROM (VALUES(1),(2)) AS t(a)"

To get:

select *
from (
  (
    select null a
    from dual
    where 1 = 0
  )
  union all (
    select *
    from (
      (
        select 1
        from dual
      )
      union all (
        select 2
        from dual
      )
    ) t
  )
) t;

Formal Java 10 Support

jOOQ 3.11 is the first release that is formally integration tested with Java 10.
To use jOOQ with Java 10, use the Java 8 distribution which has not yet been
modularised, but contains Automatic-Module-Name specification to be forward
compatible with future, modularised jOOQ distributions.

Additionally, package names between jOOQ, jOOQ-meta, and jOOQ-codegen have been
cleaned up to prevent duplicate package names, and the JAXB dependency has been
added explicitly to the various artefacts.

Other great improvements

  • Finally, asterisks (SELECT * or SELECT t.*) are formally supported in the API.
  • Collations can now be specified on a variety of syntax elements
  • The org.jooq.Comment type has been added, and DDL statements for it
  • The DefaultBinding implementation has been rewritten for better peformance
  • Several performance improvements in jOOQ’s internals
  • Many more DDL statements are supported including GRANT and REVOKE
  • Support for the EXPLAIN statement
  • FETCH n PERCENT ROWS and TOP n PERCENT clauses are supported
  • Better org.jooq.Name and org.jooq.Named API for identifier handling
  • Support for PostgreSQL 10
  • Support for SQL Server 2017
  • Support for DB2 11
  • Upgraded MariaDB support for window functions, inv dist functions, WITH
  • jOOU dependency updated to 0.9.3
  • jOOR dependency updated to 0.9.8
  • Server output (e.g. DBMS_OUTPUT) can now be fetched automatically, by jOOQ
  • Code generation support for PL/SQL TABLE types
  • SQL Keywords Can Now Be Rendered In Pascal Style If You Must
  • Emulate PostgreSQL’s ON CONFLICT clause using MERGE

The complete list can be seen here:
https://www.jooq.org/notes/?version=3.11

Type Safe Implicit JOIN Through Path Navigation in jOOQ 3.11

One of the biggest contributors to SQL syntax verbosity is the need to explicitly JOIN every table that somehow contributes to the query, even if that contribution is “trivial”. When looking at the Sakila database, an example could be seen easily when fetching customer data:

SELECT 
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)  

That single access to the country information cost us 3 additional lines of SQL code as well as the cognitive overhead of mentally navigating through the to-one relationships in order to get the joins right.

This can be equally tedious when writing the SQL as well as when reading it! There is separation of concerns (projection vs joins) where there shouldn’t be in this particular case. We’re just projecting the country, not doing anything with it, let alone care about the individual table / primary key / foreign key names. Imagine if we had composite keys in the path from customer to country…

Implicit JOIN from SELECT clause

Wouldn’t it be much better (in this case) to be able to write:

SELECT 
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

Because after all, that’s really the same thing. We’re fetching only customers, and we load some additional content from its parent table(s). Since we’re navigating to-one relationships only (as opposed to navigating to-many relationships), we don’t really need actual JOIN semantics, a JOIN being a filtered cartesian product.

Implicit JOIN from WHERE clause

The same is true when fetching customers from a particular country. Why write:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)
WHERE co.country = 'Switzerland'

When this would be a lot more natural:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
WHERE cu.address.city.country.country = 'Switzerland'

Implicit JOIN from multiple clauses

Another example would be when grouping by country to find out how many customers per country we have. Standard SQL, explicit JOIN version:

SELECT 
  co.country,
  COUNT(*),
  COUNT(DISTINCT city.city)
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)  
GROUP BY co.country
ORDER BY co.country

Again, the many JOINs could be seen as noise, when the implicit version may seem much leaner:

SELECT 
  cu.address.city.country.country,
  COUNT(*),
  COUNT(DISTINCT cu.address.city.city)
FROM customer AS cu
GROUP BY cu.address.city.country.country
ORDER BY cu.address.city.country.country

Even if the same expression is repeated 3x (and we could easily alias it, of course), the output query would still do only that single JOIN graph that we’ve seen before. In fact, there are two different paths:

  • cu.address.city.*
  • cu.address.city.country.*

Internally, we should recognise that the paths are part of the same tree traversal, so the JOIN graph produced by cu.address.city.* can be re-used for cu.address.city.country.*

In fact, we could actually add one (semi-)explicit JOIN to avoid the repetition:

SELECT 
  ci.country.country,
  COUNT(*),
  COUNT(DISTINCT ci.city)
FROM customer AS cu
IMPLICIT JOIN cu.address.city AS ci
GROUP BY ci.country.country
ORDER BY ci.country.country

Implicit JOIN from correlated subqueries

A more sophisticated case would be an implicit join in a correlated subquery, which should really affect the outer query rather than the subquery. Consider finding all customers and the number of customers from the same country:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    JOIN address USING (address_id)
    JOIN city AS ci2 USING (city_id)
    WHERE ci2.country_id = ci.country_id
  ) AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Now clearly, the JOINs start getting into the way of readability (and writeability as well). There’s a slight risk of getting semantics wrong because of all the aliasing going on. A much leaner solution is:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    WHERE cu2.address.city.country_id =
          cu.address.city.country_id
  ) AS customers_from_same_country
FROM customer AS cu

Now, of course, many of you cringed and were ready to point out that a correlated subquery isn’t the best solution in this case, and you’re absolutely correct. Use window functions, instead!

Implicit JOIN from window functions

Still, you can profit from implicit JOIN again. Plain SQL version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY ci.country_id)
    AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Implicit JOIN version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY cu.address.city.country_id)
    AS customers_from_same_country
FROM customer AS cu

It doesn’t matter where the implicit JOIN appears, i.e. where the path-based parent table access appears. The translation from implicit JOIN syntax to explicit JOIN will always append a JOIN or several JOINs to the left-most child table in the JOIN path, wherever that table is declared. This is a simple matter of scope resolution.

Drawbacks

Technically, there are no drawbacks of the implicit JOIN syntax for to-one relationships compared to the explicit JOIN syntax. But of course, as always with syntax sugar, there’s a slight risk of a developer not fully aware of how things work behind the scenes choosing a less optimal (but visually more elegant) solution over a more performant one.

This could be the case when modelling ANTI JOINs as implicit JOINs with a IS NULL predicate. In some databases, that might still be better, but in most databases, using NOT EXISTS() should be preferred when ANTI JOIN semantics is implemented.

Implicit JOIN for to-many relationship

Having a syntax for navigating to-many relationships is desireable as well, although the implications on semantics are vastly different. While implicit JOINs on to-one relationships have no unexpected effects on the semantics of the query, implicit JOINs on to-many relationships implicitly change the cardinalities of queries they’re contained in. For example:

SELECT
  a.first_name,
  a.last_name,
  a.film.title
FROM actor AS a

When navigating from the ACTOR to the FILM table (via the FILM_ACTOR relationship table), we’re going to duplicate the actor results. It is rather unexpected to have an expression in the SELECT clause to modify the cardinalities of a query, and thus, probably not a good idea. Specifically, there are many cases of implicit JOINs on to-many relationships where the semantics is unclear, ambiguous, or even wrong, because of this change of cardinalities.

For the sake of simplicity, this discussion is out of scope for this article, and for the upcoming jOOQ feature:

jOOQ support for implicit JOIN

Some ORMs like Hibernate, Doctrine, and others have implemented this feature in the past in their own respective query languages, such as HQL, DQL. jOOQ 3.11 follows suit and offers this feature as well through its type safe SQL query API (see https://github.com/jOOQ/jOOQ/issues/1502)

This will be done for the entirety of the SQL language, not just a limited subset, such as HQL or DQL.

All of the above queries can be written in jOOQ as such:

Customer cu = CUSTOMER.as("cu");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      cu.address().city().country().COUNTRY)
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME)
   .from(cu)
   .where(cu.address().city().country().COUNTRY.eq("Switzerland"))
   .fetch();

ctx.select(cu.address().city().country().COUNTRY, count())
   .from(cu)
   .groupBy(cu.address().city().country().COUNTRY)
   .orderBy(cu.address().city().country().COUNTRY)
   .fetch();

Customer cu2 = CUSTOMER.as("cu2");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      field(selectCount()
          .from(cu2)
          .where(cu2.address().city().COUNTRY_ID.eq(
                 cu.address().city().COUNTRY_ID))
      ).as("customers_from_same_country"))
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      count().over(partitionBy(cu.address().city().COUNTRY_ID))
        .as("customers_from_same_country"))
   .from(cu)
   .fetch();

The navigation is completely type safe thanks to jOOQ’s code generator which generates navigational methods from child table to parent table in the presence of foreign keys. By default, the method name matches the parent table name (single foreign key between child and parent) or the foreign key constraint name (multiple foreign keys between child and parent), but as always, this can be overridden easily using generator strategies.

The feature is really extremely powerful. For a much more complex example, see:

Bringing implicit JOIN to actual SQL

A nice jOOQ feature that hasn’t been advertised too often yet is the new jOOQ parser, whose main purpose (so far) is to offer support for the DDLDatabase, a tool that reverse engineers your DDL scripts to generate jOOQ code. The parser will have many other uses in the future, though, including its capability of being exposed behind a JDBC proxy API, which can parse any JDBC based application’s SQL and re-generate it using different settings (e.g. a different dialect).

Of course, the parser (if supplied with schema meta information, see https://github.com/jOOQ/jOOQ/issues/5296) will be able to resolve such path expressions and transform the input SQL string using implicit JOINs to the equivalent output SQL string with natural SQL joins.

This topic is still under research. More information will follow as the scope of this functionality will become more clear.

Availability in jOOQ

jOOQ 3.11 is due for late Q3 2018 / early Q4 2018. You can already play around with this feature by checking out jOOQ from GitHub:
https://github.com/jOOQ/jOOQ

Your feedback is very welcome!