Type Safe Implicit JOIN Through Path Navigation in jOOQ 3.11

One of the biggest contributors to SQL syntax verbosity is the need to explicitly JOIN every table that somehow contributes to the query, even if that contribution is “trivial”. When looking at the Sakila database, an example could be seen easily when fetching customer data:

SELECT 
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)

That single access to the country information cost us 3 additional lines of SQL code as well as the cognitive overhead of mentally navigating through the to-one relationships in order to get the joins right.

This can be equally tedious when writing the SQL as well as when reading it! There is separation of concerns (projection vs joins) where there shouldn’t be in this particular case. We’re just projecting the country, not doing anything with it, let alone care about the individual table / primary key / foreign key names. Imagine if we had composite keys in the path from customer to country…

Implicit JOIN from SELECT clause

Wouldn’t it be much better (in this case) to be able to write:

SELECT 
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

Because after all, that’s really the same thing. We’re fetching only customers, and we load some additional content from its parent table(s). Since we’re navigating to-one relationships only (as opposed to navigating to-many relationships), we don’t really need actual JOIN semantics, a JOIN being a filtered cartesian product.

Implicit JOIN from WHERE clause

The same is true when fetching customers from a particular country. Why write:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)
WHERE co.country = 'Switzerland'

When this would be a lot more natural:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
WHERE cu.address.city.country.country = 'Switzerland'

Implicit JOIN from multiple clauses

Another example would be when grouping by country to find out how many customers per country we have. Standard SQL, explicit JOIN version:

SELECT 
  co.country,
  COUNT(*),
  COUNT(DISTINCT city.city)
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)  
GROUP BY co.country
ORDER BY co.country

Again, the many JOINs could be seen as noise, when the implicit version may seem much leaner:

SELECT 
  cu.address.city.country.country,
  COUNT(*),
  COUNT(DISTINCT cu.address.city.city)
FROM customer AS cu
GROUP BY cu.address.city.country.country
ORDER BY cu.address.city.country.country

Even if the same expression is repeated 3x (and we could easily alias it, of course), the output query would still do only that single JOIN graph that we’ve seen before. In fact, there are two different paths:

  • cu.address.city.*
  • cu.address.city.country.*

Internally, we should recognise that the paths are part of the same tree traversal, so the JOIN graph produced by cu.address.city.* can be re-used for cu.address.city.country.*

In fact, we could actually add one (semi-)explicit JOIN to avoid the repetition:

SELECT 
  ci.country.country,
  COUNT(*),
  COUNT(DISTINCT ci.city)
FROM customer AS cu
IMPLICIT JOIN cu.address.city AS ci
GROUP BY ci.country.country
ORDER BY ci.country.country

Implicit JOIN from correlated subqueries

A more sophisticated case would be an implicit join in a correlated subquery, which should really affect the outer query rather than the subquery. Consider finding all customers and the number of customers from the same country:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    JOIN address USING (address_id)
    JOIN city AS ci2 USING (city_id)
    WHERE ci2.country_id = ci.country_id
  ) AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Now clearly, the JOINs start getting into the way of readability (and writeability as well). There’s a slight risk of getting semantics wrong because of all the aliasing going on. A much leaner solution is:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    WHERE cu2.address.city.country_id =
          cu.address.city.country_id
  ) AS customers_from_same_country
FROM customer AS cu

Now, of course, many of you cringed and were ready to point out that a correlated subquery isn’t the best solution in this case, and you’re absolutely correct. Use window functions, instead!

Implicit JOIN from window functions

Still, you can profit from implicit JOIN again. Plain SQL version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY ci.country_id)
    AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Implicit JOIN version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY cu.address.city.country_id)
    AS customers_from_same_country
FROM customer AS cu

It doesn’t matter where the implicit JOIN appears, i.e. where the path-based parent table access appears. The translation from implicit JOIN syntax to explicit JOIN will always append a JOIN or several JOINs to the left-most child table in the JOIN path, wherever that table is declared. This is a simple matter of scope resolution.

Drawbacks

Technically, there are no drawbacks of the implicit JOIN syntax for to-one relationships compared to the explicit JOIN syntax. But of course, as always with syntax sugar, there’s a slight risk of a developer not fully aware of how things work behind the scenes choosing a less optimal (but visually more elegant) solution over a more performant one.

This could be the case when modelling ANTI JOINs as implicit JOINs with a IS NULL predicate. In some databases, that might still be better, but in most databases, using NOT EXISTS() should be preferred when ANTI JOIN semantics is implemented.

Implicit JOIN for to-many relationship

Having a syntax for navigating to-many relationships is desireable as well, although the implications on semantics are vastly different. While implicit JOINs on to-one relationships have no unexpected effects on the semantics of the query, implicit JOINs on to-many relationships implicitly change the cardinalities of queries they’re contained in. For example:

SELECT
  a.first_name,
  a.last_name,
  a.film.title
FROM actor AS a

When navigating from the ACTOR to the FILM table (via the FILM_ACTOR relationship table), we’re going to duplicate the actor results. It is rather unexpected to have an expression in the SELECT clause to modify the cardinalities of a query, and thus, probably not a good idea. Specifically, there are many cases of implicit JOINs on to-many relationships where the semantics is unclear, ambiguous, or even wrong, because of this change of cardinalities.

For the sake of simplicity, this discussion is out of scope for this article, and for the upcoming jOOQ feature:

jOOQ support for implicit JOIN

Some ORMs like Hibernate, Doctrine, and others have implemented this feature in the past in their own respective query languages, such as HQL, DQL. jOOQ 3.11 follows suit and offers this feature as well through its type safe SQL query API (see https://github.com/jOOQ/jOOQ/issues/1502)

This will be done for the entirety of the SQL language, not just a limited subset, such as HQL or DQL.

All of the above queries can be written in jOOQ as such:

Customer cu = CUSTOMER.as("cu");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      cu.address().city().country().COUNTRY)
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME)
   .from(cu)
   .where(cu.address().city().country().COUNTRY.eq("Switzerland"))
   .fetch();

ctx.select(cu.address().city().country().COUNTRY, count())
   .from(cu)
   .groupBy(cu.address().city().country().COUNTRY)
   .orderBy(cu.address().city().country().COUNTRY)
   .fetch();

Customer cu2 = CUSTOMER.as("cu2");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      field(selectCount()
          .from(cu2)
          .where(cu2.address().city().COUNTRY_ID.eq(
                 cu.address().city().COUNTRY_ID))
      ).as("customers_from_same_country"))
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      count().over(partitionBy(cu.address().city().COUNTRY_ID))
        .as("customers_from_same_country"))
   .from(cu)
   .fetch();

The navigation is completely type safe thanks to jOOQ’s code generator which generates navigational methods from child table to parent table in the presence of foreign keys. By default, the method name matches the parent table name (single foreign key between child and parent) or the foreign key constraint name (multiple foreign keys between child and parent), but as always, this can be overridden easily using generator strategies.

The feature is really extremely powerful. For a much more complex example, see:

Bringing implicit JOIN to actual SQL

A nice jOOQ feature that hasn’t been advertised too often yet is the new jOOQ parser, whose main purpose (so far) is to offer support for the DDLDatabase, a tool that reverse engineers your DDL scripts to generate jOOQ code. The parser will have many other uses in the future, though, including its capability of being exposed behind a JDBC proxy API, which can parse any JDBC based application’s SQL and re-generate it using different settings (e.g. a different dialect).

Of course, the parser (if supplied with schema meta information, see https://github.com/jOOQ/jOOQ/issues/5296) will be able to resolve such path expressions and transform the input SQL string using implicit JOINs to the equivalent output SQL string with natural SQL joins.

This topic is still under research. More information will follow as the scope of this functionality will become more clear.

Availability in jOOQ

jOOQ 3.11 is due for late Q3 2018 / early Q4 2018. You can already play around with this feature by checking out jOOQ from GitHub:
https://github.com/jOOQ/jOOQ

Your feedback is very welcome!

7 thoughts on “Type Safe Implicit JOIN Through Path Navigation in jOOQ 3.11

  1. That is definitely a fucking GREAT feature! It will help a lot when writing SQL queries in Java!

    Thanks!

  2. First, this is so great I dream it becomes part of sql standard some day

    Second, both my personal experience and this particular post shows that pretty much every time you write GROUP BY x, you will write ORDER BY x.

    I wish we could write GROUP ORDER BY x in sql…

    1. First, this is so great I dream it becomes part of sql standard some day

      Same here. The difficulty is “only” with the naming of the path elements. As shown in this article, non-ambiguous naming will depend on foreign key names, although target table names are more desireable. We’d only need to be able to bi-directionally label foreign keys in the key specification, and we would be done.

      Second, both my personal experience and this particular post shows that pretty much every time you write GROUP BY x, you will write ORDER BY x.

      I wish we could write GROUP ORDER BY x in sql…

      You could try your luck and hope for grouping to be implemented using a sort. In some cases, you get free ordering ;-)

      Indeed, there are many possible improvements to the SQL language. In my opinion, what you’d really want is implicit GROUP BY as discussed here: https://blog.jooq.org/2015/05/05/how-sql-group-by-should-have-been-designed-like-neo4js-implicit-group-by/

      Something that might come to a future jOOQ version as well: https://github.com/jOOQ/jOOQ/issues/4596

  3. What is actual mechanism by which JOOQ finds related tables? Is it foreign keys meta info? Convention of naming fk’s. Separate configs?

    Can it be extended to allow column name as parameter of table for cases where there are multiple connections?

    Why is implicit join needed for occurrence of multiple paths with same “prefix”? If it’s builder pattern, can’t it be analyzed before execution of query? I’m a bit confused since, “SQL-like” pseudo code was provided by not actual code example.

    Is there an intention for future support of OUTER joins? Those do require ON clause (for special cases). (Trivia: until OUTER joins where added to SQL specs, there was no ON clause, all predicates where placed in WHERE)

    1. What is actual mechanism by which JOOQ finds related tables? Is it foreign keys meta info? Convention of naming fk’s. Separate configs?

      The code generator generates all foreign key information. By default, it will generate the parent table name as the method name if there is only one FK between a given child and parent table, or the foreign key name otherwise.

      Can it be extended to allow column name as parameter of table for cases where there are multiple connections?

      You can override the code generator in any way you want by specifying a generator strategy:
      https://www.jooq.org/doc/latest/manual/code-generation/codegen-matcherstrategy

      That way, the generated method name can be anything you want.

      Why is implicit join needed for occurrence of multiple paths with same “prefix”? If it’s builder pattern, can’t it be analyzed before execution of query?

      I don’t understand this question, sorry.

      I’m a bit confused since, “SQL-like” pseudo code was provided by not actual code example.

      There’s actual jOOQ code at the end of the article.

      Is there an intention for future support of OUTER joins? Those do require ON clause (for special cases).

      I’m not sure what you mean by this. jOOQ supports all types of explicit JOIN. Just write TABLE_A.leftJoin(TABLE_B).on(...). When “implicit joining” (new jOOQ 3.11 feature), you don’t have to specify the ON clause, because that is added implicitly from FK meta information.

      The article keeps it simple. Behind the scenes, OUTER JOINs are generated, not INNER JOINs. The rationale can be seen here in my criticism of Hibernate’s implicit join feature:
      http://lists.jboss.org/pipermail/hibernate-dev/2018-February/017299.html

      (Trivia: until OUTER joins where added to SQL specs, there was no ON clause, all predicates where placed in WHERE)

      Thanks. Believe it or not, I know ;-)

Leave a Reply