Using UNPIVOT to Traverse a Configuration Table’s Rows and Columns

Imagine you have a configuration table like the following:

CREATE TABLE rule (
  name     VARCHAR2(50)         NOT NULL PRIMARY KEY,
  enabled  NUMBER(1)  DEFAULT 1 NOT NULL CHECK (enabled IN (0,1)),
  priority NUMBER(10) DEFAULT 0 NOT NULL,
  flag1    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag2    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag3    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag4    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag5    NUMBER(3)  DEFAULT 0 NOT NULL
);

It specifies a set of rules that

  • Can be enabled / disabled
  • Can be given a priority among themselves
  • Include a set of flags which correspond to the thing you want to configure (e.g. some check to execute)
  • Those flags can be ordered as well

So, given the following data:

INSERT INTO rule (name, priority, flag1, flag5) 
  VALUES ('RULE 1', 1, 1, 2);
INSERT INTO rule (name, priority, flag2, flag5) 
  VALUES ('RULE 2', 2, 2, 1);
INSERT INTO rule (name, priority, flag3, flag4, flag5) 
  VALUES ('RULE 3', 3, 3, 1, 2);
INSERT INTO rule (name, priority, flag3) 
  VALUES ('RULE 4', 4, 1);

SELECT * FROM rule;

We’ll get our configuration “spreadsheet”:

NAME    ENABLED  PRIORITY  FLAG1  FLAG2  FLAG3  FLAG4  FLAG5
------------------------------------------------------------
RULE 1  1        1         1      0      0      0      2
RULE 2  1        2         0      2      0      0      1
RULE 3  1        3         0      0      3      1      2
RULE 4  1        4         0      0      1      0      0

This form is really useful to edit the configuration. If we want to activate FLAG2 in RULE 1, we just go to that cell in some SQL tool like Oracle SQL Developer, and change the value.

But reading the configuration is a bit different. FLAG1 through FLAG5 are not nicely normalised. How to read the data as though it were normalised?

Using UNPIVOT

In Oracle and SQL Server, we can use UNPIVOT for this use case. I’m using Oracle syntax in this blog post. SQL Server’s is just slightly different. Consider the following query:

SELECT name, flag, value
FROM rule
UNPIVOT (
  value FOR flag IN (
    flag1,  
    flag2,  
    flag3,  
    flag4,  
    flag5
  )
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

This will result in the following result set:

NAME    FLAG    VALUE
---------------------
RULE 1  FLAG1   1
RULE 1  FLAG5   2
RULE 2  FLAG5   1
RULE 2  FLAG2   2
RULE 3  FLAG4   1
RULE 3  FLAG5   2
RULE 3  FLAG3   3
RULE 4  FLAG3   1

In this representation, the rules are ordered by priority, and the flags are ordered by their respective value within a rule. The flags that are not turned on (value 0) are simply omitted. This form is much easier to traverse procedurally, when “consuming” the configuration.

How does it work?

In principle, UNPIVOT is just syntax sugar for a bunch of UNION ALL subqueries. We could have written our query like this, instead:

SELECT name, flag, value
FROM (
  SELECT rule.*, 'FLAG1' AS flag, FLAG1 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG2' AS flag, FLAG2 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG3' AS flag, FLAG3 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG4' AS flag, FLAG4 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG5' AS flag, FLAG5 AS value FROM rule
) rule
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

Which is decidedly more code. It’s also more work for the database. The execution plans are different (I’m using Oracle 12.2.0.1.0):

UNPIVOT version – single table access

---------------------------------------------
| Id  | Operation            | Name | Rows  |
---------------------------------------------
|   0 | SELECT STATEMENT     |      |       |
|   1 |  SORT ORDER BY       |      |     5 |
|*  2 |   VIEW               |      |     5 |
|   3 |    UNPIVOT           |      |       |
|*  4 |     TABLE ACCESS FULL| RULE |     1 |
---------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(("unpivot_view_005"."VALUE">0 AND 
              "unpivot_view_005"."ENABLED"=1))
   4 - filter("RULE"."ENABLED"=1)

UNION ALL version – multi table access

---------------------------------------------
| Id  | Operation            | Name | Rows  |
---------------------------------------------
|   0 | SELECT STATEMENT     |      |       |
|   1 |  SORT ORDER BY       |      |     8 |
|   2 |   VIEW               |      |     8 |
|   3 |    UNION-ALL         |      |       |
|*  4 |     TABLE ACCESS FULL| RULE |     1 |
|*  5 |     TABLE ACCESS FULL| RULE |     1 |
|*  6 |     TABLE ACCESS FULL| RULE |     2 |
|*  7 |     TABLE ACCESS FULL| RULE |     1 |
|*  8 |     TABLE ACCESS FULL| RULE |     3 |
---------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(("RULE"."ENABLED"=1 AND "FLAG1">0))
   5 - filter(("RULE"."ENABLED"=1 AND "FLAG2">0))
   6 - filter(("RULE"."ENABLED"=1 AND "FLAG3">0))
   7 - filter(("RULE"."ENABLED"=1 AND "FLAG4">0))
   8 - filter(("RULE"."ENABLED"=1 AND "FLAG5">0))

We can also measure the time it takes to execute these queries thousands of times. The following shows resulting times relative to the fastest execution (1):

Run 1, Statement 1 : 1.155
Run 1, Statement 2 : 1.88056

Run 2, Statement 1 : 1.04333
Run 2, Statement 2 : 1.95148

Run 3, Statement 1 : 1.02185
Run 3, Statement 2 : 1.86074

Run 4, Statement 1 : 1
Run 4, Statement 2 : 1.85241

Run 5, Statement 1 : 1.0263
Run 5, Statement 2 : 1.82944

The UNION ALL version is consistently about 2x slower on this very small data set. This is significant in the use case presented here, as a configuration table is probably read many times per day.

Knowing when a rule starts and when it ends

The real world use case that is behind this blog post also needed to know when a rule started and when it ended. I.e., which flag entry was the first and which was the last of the rule. This was easy in the non-normalised representation where each rule was a single row.

In the normalised version, we can use LEAD() and LAG().

Using this query:

SELECT 
  CASE WHEN lag(name, 1, 'NULL') 
            OVER (ORDER BY priority, value) != name 
       THEN 1 ELSE 0 END rule_begin,
  CASE WHEN lead(name, 1, 'NULL') 
            OVER (ORDER BY priority, value) != name 
       THEN 1 ELSE 0 END rule_end,
  name, flag, value
FROM rule
UNPIVOT (
  value FOR flag IN (
    flag1,  
    flag2,  
    flag3,  
    flag4,  
    flag5
  )
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

We’re now getting (with some visual emphasis):

RULE_BEGIN  RULE_END  NAME    FLAG    VALUE
-------------------------------------------
1           0         RULE 1  FLAG1   1
0           1         RULE 1  FLAG5   2

1           0         RULE 2  FLAG5   1
0           1         RULE 2  FLAG2   2

1           0         RULE 3  FLAG4   1
0           0         RULE 3  FLAG5   2
0           1         RULE 3  FLAG3   3

1           1         RULE 4  FLAG3   1

LEAD() looks ahead one row to see if the rule name there is different from the rule name on the current row.

LAG() looks behind one row to see if the rule name there is different from the rule name on the current row.

That’s it – very simple. The window functions part of this example is part of my 10 SQL Tricks talk, which I highly recommend you watch.

Oracle’s OFFSET .. FETCH Can be Slower than Classic ROWNUM Filtering

One of Oracle 12c’s coolest features was the introduction of the SQL standard OFFSET .. FETCH clause, as we can now write things like:

SELECT *
FROM film 
ORDER BY film_id
FETCH FIRST 1 ROW ONLY

This is querying the Sakila database. Most other databases had this clause (or a non-standard version of it) for ages, e.g. MySQL with LIMIT. For all the different LIMIT syntaxes, check out the jOOQ manual.

Implementation wise, the Oracle folks chose to rewrite this clause as a simple window function filter. In principle, behind the scenes, the following is executed:

Teradata syntax

SELECT *
FROM film
QUALIFY row_number() OVER (ORDER BY film_id) = 1
ORDER BY film_id

Standard syntax

SELECT * -- Except rn
FROM (
  SELECT film.*, row_number() OVER (ORDER BY film_id) rn
  FROM film
) t
WHERE rn = 1
ORDER BY film_id

This does definitely look much better than the “old” approach using ROWNUM filtering, which many of us have written for years:

Legacy Oracle syntax

SELECT t.*
FROM (
  SELECT *
  FROM film 
  ORDER BY film_id
) t
WHERE ROWNUM = 1

What I don’t like about this “old” approach is that we’re relying on the ORDER BY clause of a derived table, which should not guaranteed to be retained in the outer most query in my opinion (although it is, in Oracle, in this case).

So, having the SQL standard syntax is definitely good.

What’s the problem?

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  COUNT STOPKEY                |         |       |
|   2 |   VIEW                        |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

-------------------------------------------------
| Id  | Operation                | Name | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |
|*  1 |  VIEW                    |      |     1 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000 |
|   3 |    TABLE ACCESS FULL     | FILM |  1000 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

The cardinalities are both correct, but one of the queries seems to traverse the entire table to find the top FILM_ID, which the other query found in the index directly. A workaround would be to hint the number of rows to the FETCH FIRST query:

SELECT /*+FIRST_ROWS(1)*/ *
FROM film
ORDER BY film_id
FETCH FIRST 1 ROW ONLY;

… in case of which we’ll get a similar plan as that of the ROWNUM filtering query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  VIEW                         |         |     1 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Measuring this using our measurement technique yields quite devastating results:

Run 1, Statement 1 :  1.11230  -- ROWNUM
Run 1, Statement 2 :  1.15508  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 46.92781  -- FETCH FIRST

Run 2, Statement 1 :  1.68449
Run 2, Statement 2 :  1.99465
Run 2, Statement 3 : 47.32620

Run 3, Statement 1 :  1.10428
Run 3, Statement 2 :  1.13904
Run 3, Statement 3 : 68.06417

Run 4, Statement 1 :  1
Run 4, Statement 2 :  6.00535
Run 4, Statement 3 : 44.88235

The above results don’t show any time measurement, but a number relative to the fastest execution (1)

There is a 40x performance difference between the approaches, with ROWNUM based filtering being the fastest, FETCH FIRST plus +FIRST_ROWS hint being slightly slower, and “naked” FETCH FIRST being terribly slow, when repeating the measurement 5x and running each query 10000x on my machine on Oracle 12.2.0.1.0 in Docker.

Things get worse when joining tables. Let’s run a query that fetches the first customer and their full address information:

-- Legacy Oracle syntax
SELECT t.*
FROM (
  SELECT *
  FROM customer 
  JOIN address USING (address_id)
  JOIN city USING (city_id)
  JOIN country USING (country_id)
  ORDER BY customer_id
) t
WHERE ROWNUM = 1;

-- Standard syntax with hint
SELECT /*+FIRST_ROWS(1)*/ *
FROM customer 
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;

-- Standard syntax without hint
SELECT *
FROM customer 
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;

The two queries are equivalent, they both produce the same result. Yet, the plans are very different.

Oracle’s legacy syntax

-----------------------------------------------------------------
| Id  | Operation                         | Name        | Rows  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT                  |             |       |
|*  1 |  COUNT STOPKEY                    |             |       |
|   2 |   VIEW                            |             |     1 |
|   3 |    NESTED LOOPS                   |             |     1 |
|   4 |     NESTED LOOPS                  |             |     1 |
|   5 |      NESTED LOOPS                 |             |     1 |
|   6 |       NESTED LOOPS                |             |     1 |
|   7 |        TABLE ACCESS BY INDEX ROWID| CUSTOMER    |   302 |
|   8 |         INDEX FULL SCAN           | PK_CUSTOMER |     1 |
|   9 |        TABLE ACCESS BY INDEX ROWID| ADDRESS     |     1 |
|* 10 |         INDEX UNIQUE SCAN         | PK_ADDRESS  |     1 |
|  11 |       TABLE ACCESS BY INDEX ROWID | CITY        |     1 |
|* 12 |        INDEX UNIQUE SCAN          | PK_CITY     |     1 |
|* 13 |      INDEX UNIQUE SCAN            | PK_COUNTRY  |     1 |
|  14 |     TABLE ACCESS BY INDEX ROWID   | COUNTRY     |     1 |
-----------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
  10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
  12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
  13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

We’re seeing tons of nested loop joins because that’s what we’ll expect given the low cardinalities imposed by the COUNT STOPKEY operation.

SQL standard syntax with hint

-----------------------------------------------------------------
| Id  | Operation                         | Name        | Rows  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT                  |             |       |
|*  1 |  VIEW                             |             |     1 |
|*  2 |   WINDOW NOSORT STOPKEY           |             |     1 |
|   3 |    NESTED LOOPS                   |             |     1 |
|   4 |     NESTED LOOPS                  |             |     1 |
|   5 |      NESTED LOOPS                 |             |     1 |
|   6 |       NESTED LOOPS                |             |     1 |
|   7 |        TABLE ACCESS BY INDEX ROWID| CUSTOMER    |   302 |
|   8 |         INDEX FULL SCAN           | PK_CUSTOMER |     1 |
|   9 |        TABLE ACCESS BY INDEX ROWID| ADDRESS     |     1 |
|* 10 |         INDEX UNIQUE SCAN         | PK_ADDRESS  |     1 |
|  11 |       TABLE ACCESS BY INDEX ROWID | CITY        |     1 |
|* 12 |        INDEX UNIQUE SCAN          | PK_CITY     |     1 |
|* 13 |      INDEX UNIQUE SCAN            | PK_COUNTRY  |     1 |
|  14 |     TABLE ACCESS BY INDEX ROWID   | COUNTRY     |     1 |
-----------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
  10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
  12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
  13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Quite a similar plan

SQL standard syntax without hint

---------------------------------------------------------------
| Id  | Operation                        | Name       | Rows  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                 |            |       |
|*  1 |  VIEW                            |            |     1 |
|*  2 |   WINDOW SORT PUSHED RANK        |            |   599 |
|*  3 |    HASH JOIN                     |            |   599 |
|   4 |     TABLE ACCESS FULL            | CUSTOMER   |   599 |
|*  5 |     HASH JOIN                    |            |   603 |
|   6 |      MERGE JOIN                  |            |   600 |
|   7 |       TABLE ACCESS BY INDEX ROWID| COUNTRY    |   109 |
|   8 |        INDEX FULL SCAN           | PK_COUNTRY |   109 |
|*  9 |       SORT JOIN                  |            |   600 |
|  10 |        TABLE ACCESS FULL         | CITY       |   600 |
|  11 |      TABLE ACCESS FULL           | ADDRESS    |   603 |
---------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
   3 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
   5 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
   9 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
       filter("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Oops. Many HASH JOIN and MERGE JOIN and TABLE ACCESS FULL operations, as well as a WINDOW SORT operation, rather than a WINDOW NOSORT operation. That cannot be good. Let’s measure this again.

Run 1, Statement 1 :  1.26157  -- ROWNUM
Run 1, Statement 2 :  1.32394  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 66.97384  -- FETCH FIRST

Run 2, Statement 1 :  1.31992
Run 2, Statement 2 :  1.76459
Run 2, Statement 3 : 72.76056

Run 3, Statement 1 :  1
Run 3, Statement 2 :  1.36419
Run 3, Statement 3 : 74.06439

Run 4, Statement 1 :  1.08451
Run 4, Statement 2 :  1.64990
Run 4, Statement 3 : 66.83702

The difference is even worse: Factor 60x. And make no mistake, these are trivial data set sizes. As we can see in the last execution plan, the cardinality of the CUSTOMER table is 599. This can get much worse for larger tables.

Why even use this syntax?

The SQL standard syntax is much nicer to write, and it allows for nice TOP-N style queries using CROSS APPLY or LATERAL, e.g. to find the TOP 3 longest film titles per actor:

SELECT actor_id, first_name, last_name, title
FROM actor a
OUTER APPLY (
  SELECT /*+FIRST_ROWS(1)*/ title
  FROM film f
  JOIN film_actor fa USING (film_id)
  WHERE fa.actor_id = a.actor_id
  ORDER BY length(title) DESC
  FETCH FIRST 3 ROWS ONLY
) t
ORDER BY actor_id, length(title) DESC;

This would have been much harder with the ROWNUM approach. In older Oracle versions, it was even impossible, because we could not reference A.ACTOR_ID from doubly nested derived tables / correlated subqueries. Luckily, this is no longer the case. So, syntactically, it is definitely a much better way to do paginated queries or TOP-N queries, but the price is very high.

Disclaimer

The optimiser might make much better choices when:

The base data set is much bigger than the above 600 to 1000 rows “strong” tables.

Indeed, when fetching the first row from the PAYMENT table (with ~16000 rows), the difference becomes smaller or even inexistent:

Run 1, Statement 1 : 1        -- ROWNUM
Run 1, Statement 2 : 1.72246  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 1.76165  -- FETCH FIRST

Run 2, Statement 1 : 1.03919
Run 2, Statement 2 : 1.78284
Run 2, Statement 3 : 1.75742

Run 3, Statement 1 : 1.2553
Run 3, Statement 2 : 1.86441
Run 3, Statement 3 : 2.39089

Run 4, Statement 1 : 2.28814
Run 4, Statement 2 : 3.02436
Run 4, Statement 3 : 2.39407

Run 5, Statement 1 : 1.31462
Run 5, Statement 2 : 2.27225
Run 5, Statement 3 : 1.70975

As can be seen, the measurement errors start to outweigh the difference in performance, so the difference isn’t really as significant anymore.

The limit is not 1 or 3, but 10 or 50

When fetching the top 50 rows from the joined customer/address query, the measurements actually changed quite a bit. Suddenly, the ROWNUM query wasn’t optimal anymore and behaved like the un-hinted FETCH FIRST query. Adding a /*+FIRST_ROWS(1)*/ hint (not /*+FIRST_ROWS(50)*/ !) did help:

Run 1, Statement 1 : 1.00545  -- ROWNUM +FIRST_ROWS
Run 1, Statement 2 : 7.24842  -- ROWNUM
Run 1, Statement 3 : 1.35691  -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 4 : 7.15264  -- FETCH FIRST

Run 2, Statement 1 : 1.08054
Run 2, Statement 2 : 6.51922
Run 2, Statement 3 : 1.35960
Run 2, Statement 4 : 7.94527

Run 3, Statement 1 : 1.02824
Run 3, Statement 2 : 7.16228
Run 3, Statement 3 : 1.19702
Run 3, Statement 4 : 7.55008

Run 4, Statement 1 : 1.08364
Run 4, Statement 2 : 6.66652
Run 4, Statement 3 : 1.18559
Run 4, Statement 4 : 7.36938

Run 5, Statement 1 : 1
Run 5, Statement 2 : 6.89051
Run 5, Statement 3 : 1.24211
Run 5, Statement 4 : 7.15167

Conclusion

What we’ve seen here is a bit unfortunate. For some cases, one approach is better than the other, performance wise. For others, it’s vice versa. Paginated queries are still a bit tricky for Oracle to get right and we have to measure things explicitly.

Workaround in jOOQ

Until this is fixed by Oracle, if you’re using jOOQ, you can use the SQLDialect.ORACLE11G dialect to run classic ROWNUM filtering queries also on Oracle 12c. Alternatively, a future version of jOOQ will optionally generate a +FIRST_ROWS hint with a reasonably approximated cardinality: https://github.com/jOOQ/jOOQ/issues/5793

How to Patch Your IDE to Fix an Urgent Bug

Clock’s ticking. JDK 11 will remove a bunch of deprecated modules through JEP 320, which includes the Java EE modules, which again includes JAXB, a dependency of many libraries, including jOOQ. Thus far, few people have upgraded to Java 9 or 10, as these aren’t LTS releases. Unlike in the old days, however, people will be forced much earlier to upgrade to Java 11, because Java 8 (the free version) will reach end of life soon after Java 11 is released:

End of Public Updates for Oracle JDK 8
As outlined in the Oracle JDK Support Roadmap below, Oracle will not post further updates of Java SE 8 to its public download sites for commercial use after January 2019

So, we library developers must act and finally modularise our libraries. Which is, quite frankly, a pain. Not because of the module system itself, which works surprisingly well. But because of the toolchain, which is far from being production ready. This mostly includes:

It’s still almost not possible to maintain a modularised project in an IDE (I’ve tried Eclipse and IntelliJ, not Netbeans so far) as there are still tons of bugs. Some of which are showstoppers, halting compilation in the IDE (despite compilation working in Maven). For example:

But rather than just complaining, let’s complain and fix it

Let’s fix our own IDE by patching it

Disclaimer: The following procedure assumes that you have the right to modify your IDE’s source and binaries. To my understanding, this is the case with the EPL licensed Eclipse. It may not be the case for other IDEs.

Disclaimer2: Note, as reddit user fubarbazqux so eloquently put it, there are cleaner ways to apply patches (and contribute them) to the Eclipse community, if you have more time. This article just displays a very easy way to do things without spending too much time to figure out how the Eclipse development processes work, internally. It shows a QUICK FIX recipe

The first bug was already discovered and fixed for Eclipse 4.8, but its RC4 version seems to have tons of other problems, so let’s not upgrade to that yet. Instead, let’s apply the fix that can be seen here to our own distribution:

https://github.com/eclipse/eclipse.jdt.core/commit/e60c4f1f36f7efd5fbc1bbc661872b78c6939230#diff-e517e5944661053f0fcff49d9432b74e

It’s just a single line:

How do we do this?

First off, go to the Eclipse Packages Download page:

http://www.eclipse.org/downloads/eclipse-packages

And download the “Eclipse IDE for Eclipse Committers” distribution:

It will contain all the Eclipse source code, which we’ll need to compile the above class. In the new workspace, create a new empty plugin project:

Specify the correct execution environment (in our case Java 10) and add all the Java Development Tools (JDT) dependencies:

Or just add all the available dependencies, it doesn’t really matter.

You can now open the type that you want to edit:

Now, simply copy the source code from the editor and paste it in a new class inside of your project, which you put in the same package as the original (split packages are still possible in this case, yay)

Inside of your copy, apply the desired patch and build the project. Since you already included all the dependencies, it will be easy to compile your copy of the class, and you don’t have to build the entirety of Eclipse.

Now, go to your Windows Explorer or Mac OS X Finder, or Linux shell or whatever and find the compiled class:

This class can now be copied into the Eclipse plugin. How to find the appropriate Eclipse plugin? Just go to your plugin dependencies and check out the location of the class you’ve opened earlier:

Open that plugin from your Eclipse distribution’s /plugins folder using 7zip or whatever zipping tool you prefer, and overwrite the original class file(s). You may need to close Eclipse first, before you can write to the plugin zip file. And it’s always a good idea to make backup copies of the original plugin(s).

Be careful that if your class has any nested classes, you will need to copy them all, e.g.

MyClass.class
MyClass$1.class // Anonymous class
MyClass$Nested.class // Named, nested class

Restart Eclipse, and your bug should be fixed!

How to fix my own bugs?

You may not always be lucky to find a bug with an existing fix in the bug tracker as in the second case:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=535927

No problemo, we can hack our way around that as well. Launch your normal Eclipse instance (not the “Eclipse IDE for Eclipse Committers” one) with a debug agent running, by adding the following lines to your eclipse.ini file:

-Xdebug 
-Xnoagent 
-Djava.compile=NONE 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

Launch Eclipse again, then connect to your Eclipse from your other “Eclipse IDE for Eclipse Committers” instance by connecting a debugger:

And start setting breakpoints wherever you need, e.g. here, in my case:

java.lang.NullPointerException
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:145)
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:226)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.handle(ProblemReporter.java:2513)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1831)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1808)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.checkAndRecordImportBinding(CompilationUnitScope.java:960)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:471)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:501)
	at org.eclipse.jdt.internal.compiler.Compiler.process(Compiler.java:878)
	at org.eclipse.jdt.internal.compiler.ProcessTaskManager.run(ProcessTaskManager.java:141)
	at java.lang.Thread.run(Unknown Source)

And start analysing the problem like your own bugs. The nice thing is, you don’t have to fix the problem, just find it, and possibly comment out some lines of code if you think they’re not really needed. In my case, luckily, the regression was introduced by a new method that is applied to JDK 9+ projects only:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    // ...
}

The method will check for the new @Deprecated(since="9") attribute on the @Deprecated annotation. Not an essential feature, so let’s just turn it off by adding this line to the source file:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    if (true) return;
    // ...
}

This will effectively prevent the faulty logic from ever running. Not a fix, but a workaround. For more details about this specific issue, see the report. Of course, never forget to actually report the issue to Eclipse (or whatever your IDE is), so it can be fixed thoroughly for everyone else as well

Compile. Patch. Restart. Done!

Conclusion

Java is a cool platform. It has always been a very dynamic language at runtime, where compiled class files can be replaced by new versions at any moment, and re-loaded by the class loaders. This makes patching code by other vendors very easy, just:

  • Create a project containing the vendors’ code (or if you don’t have the code, the binaries)
  • Apply a fix / workaround to the Java class that is faulty (or if you don’t have the code, decompile the binaries if you are allowed to)
  • Compile your own version
  • Replace the version of the class file from the vendor by yours
  • Restart

This works with all software, including IDEs. In the case of jOOQ, all our customers have the right to modification, and they get the sources as well. We know how useful it is to be able to patch someone else’s code. This article shows it. Now, I can continue modularising jOOQ, and as a side product, improve the tool chain for everybody else as well.

Again, this article displayed a QUICK FIX approach (some call it “hack”). There are more thorough ways to apply patches / fixes, and contribute them back to the vendor.

Another, very interesting option would be to instrument your runtime and apply the fix only to byte code:

And:

https://www.sitepoint.com/fixing-bugs-in-running-java-code-with-dynamic-attach/

A note on IntelliJ and NetBeans

Again, I haven’t tried NetBeans yet (although I’ve heard its Java 9 support has been working very well for quite a while).

While IntelliJ’s Jigsaw support seems more advanced than Eclipse’s (still with a few flaws as well), it currently has a couple of performance issues when compiling projects like jOOQ or jOOλ. In a future blog post, I will show how to “fix” those by using a profiler, like:

  • Java Mission Control (can be used as a profiler, too)
  • YourKit
  • JProfiler

Profilers can be used to very easily track down the main source of a performance problem. I’ve reported a ton to Eclipse already. For instance, this one:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=474686

Where a lot of time is being spent in the processing of Task Tags, like:

  • TODO
  • FIXME
  • XXX

The great thing about profiling this is:

  • You can report a precise bug to the vendor
  • You can find the flawed feature and turn it off as a workaround. Turning off the above task tag feature was a no-brainer. I’m not even using the feature.

So, stay tuned for another blog post, soon.

jOOQ 3.11 Released With 4 New Databases, Implicit Joins, Diagnostics, and Much More

Today, jOOQ 3.11 has been released with support for 4 new databases, implicit joins, diagnostics, and much more

New Databases Supported

At last, 4 new SQL dialects have been added to jOOQ! These are:

jOOQ Professional Edition

  • Aurora MySQL Edition
  • Aurora PostgreSQL Edition
  • Azure SQL Data Warehouse

jOOQ Enterprise Edition

  • Teradata

Implicit Joins

One of the really cool features in ORMs like Hibernate, Doctrine, and others, is
the capability of using a relationship graph notation to access another entity’s
columns through what is often called “implicit joins”.

Instead of explicitly joining a to-one relationship to access its columns:

SELECT author.first_name, author.last_name, book.title
FROM book
JOIN author ON book.author_id = author.id

We would like to be able to access those columns directly, using this notation:

SELECT book.author.first_name, book.author.last_name, book.title
FROM book

The join is implied and should be added implicitly. jOOQ now allows for this to
happen when you use the code generator:

ctx.select(BOOK.author().FIRST_NAME, BOOK.author().LAST_NAME, BOOK.TITLE)
   .from(BOOK)
   .fetch();

When rendering this query, the implicit join graph will be calculated on the fly
and added behind the scenes to the BOOK table. This works for queries of
arbitrary complexity and on any level of nested SELECT.

More details in this blog post:
https://blog.jooq.org/2018/02/20/type-safe-implicit-join-through-path-navigation-in-jooq-3-11/

DiagnosticsListener SPI

A new DiagnosticsListener SPI has been added to jOOQ:
https://github.com/jOOQ/jOOQ/issues/5960

The purpose of this SPI is to sanitise your SQL language, JDBC and jOOQ API
usage. Listeners can listen to events such as:

  • duplicateStatements (similar SQL is executed, bind variables should be used)
  • repeatedStatements (identical SQL is executed, should be batched or rewritten)
  • tooManyColumnsFetched (not all projected columns were needed)
  • tooManyRowsFetched (not all fetched rows were needed)

The great thing about this SPI is that it can be exposed to clients through the
JDBC API, in case of which the diagnostics feature can reverse engineer your
JDBC or even JPA generated SQL. Ever wanted to detect N+1 queries from
Hibernate? Pass those Hibernate-generated queries through this SPI.

Want to find missing bind variables leading to cursor cache contention or SQLi?
Let jOOQ find similar SQL statements and report them. E.g.

  • SELECT name FROM person WHERE id = 1
  • SELECT name FROM person WHERE id = 2

Or also:

  • SELECT name FROM person WHERE id IN (?, ?)
  • SELECT name FROM person WHERE id IN (?, ?, ?)

Anonymous blocks

Many databases support anonymous blocks to run several statements in a single
block scope. For example, Oracle:

DECLARE
  l_var NUMBER(10);
BEGIN
  l_var := 10;
  dbms_output.put_line(l_var);
END;

jOOQ now supports the new org.jooq.Block API to allow for wrapping DDL and DML
statements in such a block. This is a first step towards a future jOOQ providing
support for:

  • Abstractions over procedural languages
  • CREATE PROCEDURE and CREATE FUNCTION statements
  • Trigger support
  • And much more

Parser

jOOQ’s parser support is an ongoing effort. This release has added support for
a lot of new SQL clauses and functions from various vendors and in various DDL
and DML statements.

The parser is now also exposed through a public website and API, where SQL can
be translated from one dialect to another:
https://www.jooq.org/translate

This website will help further drive jOOQ API development by helping to find
missing functionality that is used in real-world SQL.

Another way to access this API is through the new org.jooq.ParserCLI command
line tool. For example, run:

$ java -cp jooq-3.11.0.jar org.jooq.ParserCLI -f -t ORACLE -s "SELECT * FROM (VALUES(1),(2)) AS t(a)"

To get:

select *
from (
  (
    select null a
    from dual
    where 1 = 0
  )
  union all (
    select *
    from (
      (
        select 1
        from dual
      )
      union all (
        select 2
        from dual
      )
    ) t
  )
) t;

Formal Java 10 Support

jOOQ 3.11 is the first release that is formally integration tested with Java 10.
To use jOOQ with Java 10, use the Java 8 distribution which has not yet been
modularised, but contains Automatic-Module-Name specification to be forward
compatible with future, modularised jOOQ distributions.

Additionally, package names between jOOQ, jOOQ-meta, and jOOQ-codegen have been
cleaned up to prevent duplicate package names, and the JAXB dependency has been
added explicitly to the various artefacts.

Other great improvements

  • Finally, asterisks (SELECT * or SELECT t.*) are formally supported in the API.
  • Collations can now be specified on a variety of syntax elements
  • The org.jooq.Comment type has been added, and DDL statements for it
  • The DefaultBinding implementation has been rewritten for better peformance
  • Several performance improvements in jOOQ’s internals
  • Many more DDL statements are supported including GRANT and REVOKE
  • Support for the EXPLAIN statement
  • FETCH n PERCENT ROWS and TOP n PERCENT clauses are supported
  • Better org.jooq.Name and org.jooq.Named API for identifier handling
  • Support for PostgreSQL 10
  • Support for SQL Server 2017
  • Support for DB2 11
  • Upgraded MariaDB support for window functions, inv dist functions, WITH
  • jOOU dependency updated to 0.9.3
  • jOOR dependency updated to 0.9.8
  • Server output (e.g. DBMS_OUTPUT) can now be fetched automatically, by jOOQ
  • Code generation support for PL/SQL TABLE types
  • SQL Keywords Can Now Be Rendered In Pascal Style If You Must
  • Emulate PostgreSQL’s ON CONFLICT clause using MERGE

The complete list can be seen here:
https://www.jooq.org/notes/?version=3.11

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.