Automatically Transform Oracle Style Implicit Joins to ANSI JOIN using jOOQ

While jOOQ is mostly being used as an internal SQL DSL for embedded, dynamic SQL in Java, where it offers the best solution on the market, jOOQ is increasingly also used for one of its secondary features: Its parser.

Having been introduced in jOOQ 3.9 primarly for the purpose of being able to parse DDL statements, e.g. to reverse engineer your schema for code generation purposes, we’ve added an increasing amount of features and SQL transformation capabilities that allow for the parser to be used as a standalone product, through the command line interface, our website, or through the ordinary jOOQ API.

One feature that has been added to jOOQ 3.14, which is mostly useful to those using jOOQ as a parser, is the capability of transforming old Oracle style implicit joins to ANSI JOIN.

Why avoid “implicit joins”?

The old Oracle style implicit join syntax is supported and properly optimised by most RDBMS vendors. In the past, prior to SQL-92, this is how we used to inner join tables, e.g. when querying the Sakila database:

SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id

Granted, the syntax is kind of intuitive. Just declare all the tables you want to fetch data from, and then make sure the proper data is retained only by filtering for matching primary key / foreign key values.

Of course, this can go terribly wrong. For the many obvious reasons, e.g. when you forget a join predicate after adding a table. If the query is complex, this may be hard to debug. The solution is ANSI JOIN. Starting from SQL-92 (almost 30 years now!), this is how we join in most RDBMS:

SELECT *
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id

While it’s still possible to define wrong join predicates, at least it’s no longer possible to forget a predicate, because this is syntactically incorrect (except for MySQL, where, regrettably, the ON clause is optional):

SELECT *
FROM actor a
JOIN film_actor fa -- Syntax error
JOIN film f -- Syntax error

jOOQ’s implicit JOIN

Notice that it is common to refer to the above syntax as “implicit join”, while JPQL and jOOQ recycled the term for another kind of “implicit join”, which is foreign key path based, and even less error prone than the ANSI SQL syntax. With jOOQ, the above query can be written as follows:

ctx.select(
      FILM_ACTOR.actor().asterisk(), 
      FILM_ACTOR.asterisk(),
      FILM_ACTOR.film().asterisk())
   .from(FILM_ACTOR)
   .fetch();

The mere presence of these to-one relationship paths in the query will implicitly add the appropriate LEFT JOIN or INNER JOIN to the FROM clause. This is merely convenience on top of ordinary ANSI JOINs, not a replacement.

Transforming Oracle implicit joins

When you have an old code base that you wish to upgrade and transform all your queries to using ANSI JOIN, use jOOQ for that. You can use jOOQ’s programmatic capabilities (as mentioned before), or the free website https://www.jooq.org/translate.

On the website, just pick the “Oracle style to ANSI JOIN” option, place the following SQL on the left:

Input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  CROSS JOIN (
    film_category fc
      JOIN category c
        ON fc.category_id = c.category_id
  )
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

And… whoopsies. The output correctly displays the resulting, undesired CROSS JOIN because one of the join predicates was missing:

Yep, the tool already helped! Let’s fix the input query:

Fixed input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND f.film_id = fc.film_id -- This was missing
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  JOIN film_category fc
    ON f.film_id = fc.film_id
  JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

This also works if you were using Oracle’s arcane outer join syntax using (+) (or SQL Server’s *=, which has been unsupported for a while now). You might have this input:

Input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id(+) = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Producing this output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN (
    film_category fc
      RIGHT OUTER JOIN category c
        ON fc.category_id = c.category_id
  )
    ON f.film_id = fc.film_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Err, wat. Whoopsies again! One of the (+) symbols was on the wrong side, which is why we got that RIGHT OUTER JOIN. Again, the tool has shown that the old syntax was quite error prone. Let’s fix it.

Fixed input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id = c.category_id(+)
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN film_category fc
    ON f.film_id = fc.film_id
  LEFT OUTER JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Conclusion

Play around with it and tell us what you think! https://www.jooq.org/translate

4 thoughts on “Automatically Transform Oracle Style Implicit Joins to ANSI JOIN using jOOQ

  1. As always superb content. I found two cases that you may find interesting:

    Case 1:

    SELECT *
    FROM t
    LEFT JOIN t_ch
      ON t.id = t_ch.t_id
    WHERE t.id < 2;
    
    SELECT *
    FROM t
    LEFT JOIN t_ch
      ON t.id = t_ch.t_id
      AND t.id < 2;
    

    It produces the same output in Oracle style, where we have 2 different input queries.

    Case 2: Order of condition changes matters(I hoped it will eliminate unnecessary LEFT JOIN from both queries

    select *
    from T, T_CH
    where (T.ID = T_CH.T_ID(+) AND T_CH.ID = 1)
    union all
    select *
    from T, T_CH
    where (T_CH.ID = 1 AND T.ID = T_CH.T_ID(+))
    

    are transformed into :

    select *
    from T
    join T_CH
    on T.ID = T_CH.T_ID
    where T_CH.ID = 1
    union all
    select *
    from T
    left outer join T_CH
    on T.ID = T_CH.T_ID
    where T_CH.ID = 1
    

    Live Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=126d3ae3492cc455787fb6a9cd71973b

    1. Thanks a lot for your comments. Nice – I was aware of the first issue (a “known” issue) but didn’t give it much thought. I’ve created a bug report for this: https://github.com/jOOQ/jOOQ/issues/10982. The solution is quite cumbersome, e.g.

      -- Project and rename all columns back to their original names
      SELECT t_id id, t_ch_id id
      FROM (
        SELECT 
          -- Project and rename all other t columns
          t.id t_id,
      
          -- Project and rename all other t_ch columns
          CASE WHEN t.id < 2 THEN t_ch.id END t_ch_id
        FROM t, t_ch
        WHERE t.id = t_ch.t_id(+)
      ) t
      -- Continue with all other clauses, such as GROUP BY, etc. referencing the renamed columns rather than the original ones
      

      Probably not worth it, in this case.

      Your second case is clearly a bug: https://github.com/jOOQ/jOOQ/issues/10983. There is an optimisation that turns outer joins back to inner joins if the outer join doesn’t make any sense, as in your example. The optimisation is applied eagerly instead of after having the full picture. Perhaps, it’s not worth applying the optimisation at all, because the RDBMS will apply it anyway?

      1. I agree, the second case will be handled by RDBMS(which we could observe in execution plan) so it is not such big deal.

        My initial idea was to overuse translator a bit :)
        Starting with query LEFT OUTER JOIN ... WHERE ... -> transform to Oracle syntax and back to get INNER JOIN ... WHERE .... This will quickly expose that developer used “impossible” condition for outer query.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.