Using jOOQ 3.14 Synthetic Foreign Keys to Write Implicit Joins on Views

jOOQ has supported one of JPQL’s most cool features for a while now: implicit joins. Using jOOQ, you can navigate your to-one relationships in a type safe way, generating LEFT JOIN operations implicitly without the effort of having to keep thinking about join predicates, and the correct join order. Consider this Sakila database query here, if SQL supported implicit joins natively:

SELECT
cu.first_name,
cu.last_name
FROM customer AS cu
WHERE cu.address.city.country.country = 'Switzerland'

It translates to this query in native SQL:

SELECT
cu.first_name,
cu.last_name
FROM customer AS cu
JOIN address AS ad ON cu.address_id = ad.address_id
JOIN city AS ci ON a.city_id = ci.city_id
JOIN country AS co ON ci.country_id = co.country_id
WHERE co.country = 'Switzerland'

Note: Inner joins are supported starting from jOOQ 3.14, depending on whether the foreign key is mandatory / not null. The default behaviour is to produce LEFT JOIN which are the correct way to implicitly join optional foreign keys.

Implicit joins aren’t a silver bullet. Not every JOIN graph can be completely transformed into implicit join usage, and not every implicit join usage is more readable than native SQL JOIN graphs. But to have this option is great. Especially, when your keys are composite keys.

Classic joins on views

In classic relational design, surrogate keys are often avoided, and I think we should still avoid them in many cases. Even if you don’t agree, you may occasionally work on a schema where there are few to no surrogate keys. One such example is the standard SQL INFORMATION_SCHEMA which is implemented, for example, in H2, HSQLDB, MariaDB, MySQL, PostgreSQL, or SQL Server.

For example, when querying HSQLDB’s DOMAIN_CONSTRAINTS view to reverse engineer DOMAIN types. The jOOQ query for that used to be:

Domains d = DOMAINS.as("d");
DomainConstraints dc = DOMAIN_CONSTRAINTS.as("dc");
CheckConstraints cc = CHECK_CONSTRAINTS.as("cc");

for (Record record : create()
.select(
d.DOMAIN_SCHEMA,
d.DOMAIN_NAME,
d.DATA_TYPE,
d.CHARACTER_MAXIMUM_LENGTH,
d.NUMERIC_PRECISION,
d.NUMERIC_SCALE,
d.DOMAIN_DEFAULT,
cc.CHECK_CLAUSE)
.from(d)
.join(dc)
.on(row(d.DOMAIN_CATALOG, d.DOMAIN_SCHEMA, d.DOMAIN_NAME)
.eq(dc.DOMAIN_CATALOG, dc.DOMAIN_SCHEMA, dc.DOMAIN_NAME))
.join(cc)
.on(row(dc.CONSTRAINT_CATALOG,
dc.CONSTRAINT_SCHEMA,
dc.CONSTRAINT_NAME)
.eq(cc.CONSTRAINT_CATALOG,
cc.CONSTRAINT_SCHEMA,
cc.CONSTRAINT_NAME))
.where(d.DOMAIN_SCHEMA.in(getInputSchemata()))
.orderBy(d.DOMAIN_SCHEMA, d.DOMAIN_NAME)
) { ... }

So, the query joined the many-to-many relationship between DOMAINS - DOMAIN_CONSTRAINTS - CHECK_CONSTRAINTS to get all the information required for generating domain types.

These views are not updatable, nor do they have any constraint information associated with them, but what if we were able to define synthetic constraints? jOOQ has supported synthetic primary keys to help make views updatable.

Synthetic foreign keys

Starting with jOOQ 3.14, we’ve reworked the way synthetic keys work, and the commercial editions will support synthetic foreign keys as well.

You can specify a configuration like this:

<configuration>
<generator>
<database>
<syntheticObjects>
<primaryKeys>
<primaryKey>
<tables>
CHECK_CONSTRAINTS|CONSTRAINTS|TABLE_CONSTRAINTS
</tables>
<fields>
<field>CONSTRAINT_(CATALOG|SCHEMA|NAME)</field>
</fields>
</primaryKey>
<primaryKey>
<tables>DOMAINS</tables>
<fields>
<field>DOMAIN_(CATALOG|SCHEMA|NAME)</field>
</fields>
</primaryKey>
</primaryKeys>
<foreignKeys>
<foreignKey>
<tables>DOMAIN_CONSTRAINTS</tables>
<fields>
<field>CONSTRAINT_(CATALOG|SCHEMA|NAME)</field>
</fields>
<referencedTable>CHECK_CONSTRAINTS</referencedTable>
</foreignKey>
<foreignKey>
<tables>DOMAIN_CONSTRAINTS</tables>
<fields>
<field>DOMAIN_(CATALOG|SCHEMA|NAME)</field>
</fields>
<referencedTable>DOMAINS</referencedTable>
</foreignKey>
</foreignKeys>
</syntheticObjects>
</database>
</generator>
</configuration>

And already jOOQ’s code generator will think that these views were tables that actually had constraints like the below:

ALTER TABLE CHECK_CONSTRAINTS
ADD PRIMARY KEY (
CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
);

ALTER TABLE DOMAINS
ADD PRIMARY KEY (
DOMAIN_CATALOG, DOMAIN_SCHEMA, DOMAIN_NAME
);

ALTER TABLE DOMAIN_CONSTRAINTS
ADD FOREIGN KEY (
CONSTRAINT_CATALOG, CONSTRAINT_SCHEMA, CONSTRAINT_NAME
)
REFERENCES CHECK_CONSTRAINTS;

ALTER TABLE DOMAIN_CONSTRAINTS
ADD FOREIGN KEY (
DOMAIN_CATALOG, DOMAIN_SCHEMA, DOMAIN_NAME
)
REFERENCES DOMAINS;

More sophisticated configuration is possible, e.g. to assign names to constraints, to have composite constraints using field ordering differing from the ordering in the table, or foreign keys referencing unique keys rather than primary keys. For a full description of what’s available, please refer to the manual.

With the above synthetic meta data available to the code generator, all the numerous goodies are now available on views as well, including:

Let’s look at

Implicit joins

Implicit joins are now also possible on these views, meaning:

  1. You’ll never have to remember including all the key columns in join predicates anymore (bye bye accidental cartesian products)
  2. Your code will still be correct in case your composite key changes to something else!

So, this is more than just merely a convenience thing, it’s also a correctness thing. Our query from before can now be written like this, in a much more concise way:

DomainConstraints dc = DOMAIN_CONSTRAINTS.as("dc");

for (Record record : create()
.select(
dc.domains().DOMAIN_SCHEMA,
dc.domains().DOMAIN_NAME,
dc.domains().DATA_TYPE,
dc.domains().CHARACTER_MAXIMUM_LENGTH,
dc.domains().NUMERIC_PRECISION,
dc.domains().NUMERIC_SCALE,
dc.domains().DOMAIN_DEFAULT,
dc.checkConstraints().CHECK_CLAUSE)
.from(dc)
.where(dc.domains().DOMAIN_SCHEMA.in(getInputSchemata()))
.orderBy(dc.domains().DOMAIN_SCHEMA, dc.domains().DOMAIN_NAME)
) { ... }

Notice how we’re using the relationship table as the only table to put in the FROM clause. This way, we can navigate in both directions of the to-one relationships from DOMAIN_CONSTRAINTS -> DOMAINS and DOMAIN_CONSTRAINTS -> CHECK_CONSTRAINTS. The resulting SQL query is equivalent to the previous one, but all the nastiness of joining by 3-column-composite-keys is gone. I personally find this much more readable.

Future work

So far, only to-one relationships can be navigated this way. JPQL also offers navigating to-many relationships with a few restrictions. This is a slippery slope. When offering to-many relationships, some use-cases are obvious, but the semantics of others is less so. For example, it is not a good idea to let the SELECT clause produce more (or less) rows depending on the presence of a projected column. This is why jOOQ so far produced only LEFT JOIN for implicit joins, because that would guarantee that an implicitly joined column does not reduce the number of rows because of an INNER JOIN not producing any matches.

Nevertheless, there is a lot that we can still add in #7536, including:

  • Implicit to-many joins in the FROM clause, where they don’t cause any trouble
  • Implicit joins in DML
  • Parser support to offer this functionality also in https://www.jooq.org/translate and to everyone working with jOOQ SQL through jOOQ’s ParsingConnection

And much more!

Serious SQL: A “convex hull” of “correlated tables”

Now THIS is an interesting, and challenging question on the jOOQ user group:
https://groups.google.com/d/topic/jooq-user/6TBBLYt9eR8/discussion

Say you have a big database with lots of tables and foreign key references. Now you would like to know all tables that are somehow inter-connected by their respective foreign key relationship “paths”. You could call this a “convex hull” around all of your “correlated tables”. Here’s a pseudo-algorithm to achieve this:

// Initialise the hull with an "origin" table
Set tables = {"any table"};
int size = 0;

// Grow the "tables" result until no new tables are added
while (size < tables.size) {
  size = tables.size;

  for (table in tables) {
    tables.addAll(table.referencedTables);
    tables.addAll(table.referencingTables);
  }
}

At the end of this algorithm, you would have all tables in the “tables” set, that are somehow connected with the original “any table”.

Calculate this with jOOQ

With jOOQ’s generated classes, you can easily implement the above algorithm in Java. This would be an example implementation

public class Hull {
  public static Set<Table<?>> hull(Table<?>... tables) {
    Set<Table<?>> result =
        new HashSet<Table<?>>(Arrays.asList(tables));

    // Loop as long as there are no new result tables
    int size = 0;
    while (result.size() > size) {
      size = result.size();

      for (Table<?> table : new ArrayList<Table<?>>(result)) {

        // Follow all outbound foreign keys
        for (ForeignKey<?, ?> fk : table.getReferences()) {
          result.add(fk.getKey().getTable());
        }

        // Follow all inbound foreign keys from tables
        // within the same schema
        for (Table<?> other : table.getSchema().getTables()) {
          if (other.getReferencesTo(table).size() > 0) {
            result.add(other);
          }
        }
      }
    }

    return result;
  }

  public static void main(String[] args) {
    // Calculate the "convex hull" for the T_AUTHOR table
    System.out.println(hull(T_AUTHOR));
  }
}

Do it with SQL

Now this still looks straightforward. But we’re SQL pro’s and we love weird queries, so let’s give Oracle SQL a shot at resolving this problem in a single SQL statement. Here goes (warning, some serious SQL ahead)!

-- "graph" denotes an undirected foreign key reference graph
-- for schema "TEST"
with graph as (
  select c1.table_name t1, c2.table_name t2
  from all_constraints c1
    join all_constraints c2
      on c1.owner = c2.r_owner
      and c1.constraint_name = c2.r_constraint_name
  where c1.owner = 'TEST'
  union all
  select c2.table_name t1, c1.table_name t2
  from all_constraints c1
    join all_constraints c2
      on c1.owner = c2.r_owner
      and c1.constraint_name = c2.r_constraint_name
  where c1.owner = 'TEST'
),
-- "paths" are all directed paths within that schema
-- as a #-delimited string
paths as (
  select sys_connect_by_path(t1, '#') || '#' path
  from graph
  connect by nocycle prior t1 = t2
),
-- "subgraph" are all those directed paths that go trough
-- a given table T_AUTHOR
subgraph as (
  select distinct t.table_name,
    regexp_replace(p.path, '^#(.*)#$', '\1') path
  from paths p
  cross join all_tables t
  where t.owner = 'TEST'
  and p.path like '%#' || t.table_name || '#%'
),
-- This XML-trick splits paths and generates rows for every distinct
-- table name
split_paths as (
select distinct table_name origin,
  cast(t.column_value.extract('//text()') as varchar2(4000)) table_names
from
  subgraph,
  table(xmlsequence(xmltype(
      '<x><x>' || replace(path, '#', '</x><x>') ||
'</x></x>').extract('//x/*'))) t
),
-- "table_graphs" lists every table and its associated graph
table_graphs as (
  select
    origin,
    count(*) graph_size,
    listagg(table_names, ', ') within group (order by 1) table_names
  from split_paths
  group by origin
)
select
  origin,
  graph_size "SIZE",
  dense_rank() over (order by table_names) id,
  table_names
from table_graphs
order by origin

When run against the jOOQ integration test database, this beautiful query will return:

+----------------------+------+----+-----------------------------------------+
| ORIGIN               | SIZE | ID | TABLE_NAMES                             |
+----------------------+------+----+-----------------------------------------+
| T_658_11             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_12             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_21             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_22             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_31             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_32             |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_658_REF            |    7 |  3 | T_658_11, T_658_12, T_658_21, T_658_22, |
|                      |      |    | T_658_31, T_658_32, T_658_REF           |
| T_AUTHOR             |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| T_BOOK               |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| T_BOOK_DETAILS       |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| T_BOOK_STORE         |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| T_BOOK_TO_BOOK_STORE |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| T_DIRECTORY          |    1 |  2 | T_DIRECTORY                             |
| T_LANGUAGE           |    7 |  1 | T_AUTHOR, T_BOOK, T_BOOK_DETAILS,       |
|                      |      |    | T_BOOK_SALE, T_BOOK_STORE,              |
|                      |      |    | T_BOOK_TO_BOOK_STORE, T_LANGUAGE        |
| X_TEST_CASE_64_69    |    4 |  4 | X_TEST_CASE_64_69, X_TEST_CASE_71,      |
|                      |      |    | X_TEST_CASE_85, X_UNUSED                |
| X_TEST_CASE_71       |    4 |  4 | X_TEST_CASE_64_69, X_TEST_CASE_71,      |
|                      |      |    | X_TEST_CASE_85, X_UNUSED                |
| X_TEST_CASE_85       |    4 |  4 | X_TEST_CASE_64_69, X_TEST_CASE_71,      |
|                      |      |    | X_TEST_CASE_85, X_UNUSED                |
| X_UNUSED             |    4 |  4 | X_TEST_CASE_64_69, X_TEST_CASE_71,      |
|                      |      |    | X_TEST_CASE_85, X_UNUSED                |
+----------------------+------+----+-----------------------------------------+

Can you beat this? :-)

I challenge you to write a shorter query and to achieve the same result! Here’s the integration test database:
https://github.com/jOOQ/jOOQ/blob/master/jOOQ-test/src/org/jooq/test/oracle/create.sql

Note that the above query is horribly inefficient. There’s a lot of potential in beating that, too!