This form is really useful to edit the configuration. If we want to activate FLAG2 in RULE 1, we just go to that cell in some SQL tool like Oracle SQL Developer, and change the value.
But reading the configuration is a bit different. FLAG1 through FLAG5 are not nicely normalised. How to read the data as though it were normalised?
Using UNPIVOT
In Oracle and SQL Server, we can use UNPIVOT for this use case. I’m using Oracle syntax in this blog post. SQL Server’s is just slightly different. Consider the following query:
SELECT name, flag, value
FROM rule
UNPIVOT (
value FOR flag IN (
flag1,
flag2,
flag3,
flag4,
flag5
)
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;
In this representation, the rules are ordered by priority, and the flags are ordered by their respective value within a rule. The flags that are not turned on (value 0) are simply omitted. This form is much easier to traverse procedurally, when “consuming” the configuration.
How does it work?
In principle, UNPIVOT is just syntax sugar for a bunch of UNION ALL subqueries. We could have written our query like this, instead:
SELECT name, flag, value
FROM (
SELECT rule.*, 'FLAG1' AS flag, FLAG1 AS value FROM rule
UNION ALL
SELECT rule.*, 'FLAG2' AS flag, FLAG2 AS value FROM rule
UNION ALL
SELECT rule.*, 'FLAG3' AS flag, FLAG3 AS value FROM rule
UNION ALL
SELECT rule.*, 'FLAG4' AS flag, FLAG4 AS value FROM rule
UNION ALL
SELECT rule.*, 'FLAG5' AS flag, FLAG5 AS value FROM rule
) rule
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;
Which is decidedly more code. It’s also more work for the database. The execution plans are different (I’m using Oracle 12.2.0.1.0):
UNPIVOT version – single table access
---------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | SORT ORDER BY | | 5 |
|* 2 | VIEW | | 5 |
| 3 | UNPIVOT | | |
|* 4 | TABLE ACCESS FULL| RULE | 1 |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("unpivot_view_005"."VALUE">0 AND
"unpivot_view_005"."ENABLED"=1))
4 - filter("RULE"."ENABLED"=1)
Run 1, Statement 1 : 1.155
Run 1, Statement 2 : 1.88056
Run 2, Statement 1 : 1.04333
Run 2, Statement 2 : 1.95148
Run 3, Statement 1 : 1.02185
Run 3, Statement 2 : 1.86074
Run 4, Statement 1 : 1
Run 4, Statement 2 : 1.85241
Run 5, Statement 1 : 1.0263
Run 5, Statement 2 : 1.82944
The UNION ALL version is consistently about 2x slower on this very small data set. This is significant in the use case presented here, as a configuration table is probably read many times per day.
Knowing when a rule starts and when it ends
The real world use case that is behind this blog post also needed to know when a rule started and when it ended. I.e., which flag entry was the first and which was the last of the rule. This was easy in the non-normalised representation where each rule was a single row.
SELECT
CASE WHEN lag(name, 1, 'NULL')
OVER (ORDER BY priority, value) != name
THEN 1 ELSE 0 END rule_begin,
CASE WHEN lead(name, 1, 'NULL')
OVER (ORDER BY priority, value) != name
THEN 1 ELSE 0 END rule_end,
name, flag, value
FROM rule
UNPIVOT (
value FOR flag IN (
flag1,
flag2,
flag3,
flag4,
flag5
)
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;
Implementation wise, the Oracle folks chose to rewrite this clause as a simple window function filter. In principle, behind the scenes, the following is executed:
Teradata syntax
SELECT *
FROM film
QUALIFY row_number() OVER (ORDER BY film_id) = 1
ORDER BY film_id
Standard syntax
SELECT * -- Except rn
FROM (
SELECT film.*, row_number() OVER (ORDER BY film_id) rn
FROM film
) t
WHERE rn = 1
ORDER BY film_id
This does definitely look much better than the “old” approach using ROWNUM filtering, which many of us have written for years:
Legacy Oracle syntax
SELECT t.*
FROM (
SELECT *
FROM film
ORDER BY film_id
) t
WHERE ROWNUM = 1
What I don’t like about this “old” approach is that we’re relying on the ORDER BY clause of a derived table, which should not guaranteed to be retained in the outer most query in my opinion (although it is, in Oracle, in this case).
So, having the SQL standard syntax is definitely good.
What’s the problem?
Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:
---------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | COUNT STOPKEY | | |
| 2 | VIEW | | 1 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILM | 1000 |
| 4 | INDEX FULL SCAN | PK_FILM | 1 |
---------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
And compare that to the FETCH FIRST query:
-------------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 1 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000 |
| 3 | TABLE ACCESS FULL | FILM | 1000 |
-------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)
The cardinalities are both correct, but one of the queries seems to traverse the entire table to find the top FILM_ID, which the other query found in the index directly. A workaround would be to hint the number of rows to the FETCH FIRST query:
SELECT /*+FIRST_ROWS(1)*/ *
FROM film
ORDER BY film_id
FETCH FIRST 1 ROW ONLY;
… in case of which we’ll get a similar plan as that of the ROWNUM filtering query:
---------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 1 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 |
| 3 | TABLE ACCESS BY INDEX ROWID| FILM | 1000 |
| 4 | INDEX FULL SCAN | PK_FILM | 1 |
---------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)
Run 1, Statement 1 : 1.11230 -- ROWNUM
Run 1, Statement 2 : 1.15508 -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 46.92781 -- FETCH FIRST
Run 2, Statement 1 : 1.68449
Run 2, Statement 2 : 1.99465
Run 2, Statement 3 : 47.32620
Run 3, Statement 1 : 1.10428
Run 3, Statement 2 : 1.13904
Run 3, Statement 3 : 68.06417
Run 4, Statement 1 : 1
Run 4, Statement 2 : 6.00535
Run 4, Statement 3 : 44.88235
The above results don’t show any time measurement, but a number relative to the fastest execution (1)
There is a 40x performance difference between the approaches, with ROWNUM based filtering being the fastest, FETCH FIRST plus +FIRST_ROWS hint being slightly slower, and “naked” FETCH FIRST being terribly slow, when repeating the measurement 5x and running each query 10000x on my machine on Oracle 12.2.0.1.0 in Docker.
Things get worse when joining tables. Let’s run a query that fetches the first customer and their full address information:
-- Legacy Oracle syntax
SELECT t.*
FROM (
SELECT *
FROM customer
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
) t
WHERE ROWNUM = 1;
-- Standard syntax with hint
SELECT /*+FIRST_ROWS(1)*/ *
FROM customer
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;
-- Standard syntax without hint
SELECT *
FROM customer
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country USING (country_id)
ORDER BY customer_id
FETCH FIRST 1 ROW ONLY;
The two queries are equivalent, they both produce the same result. Yet, the plans are very different.
Oracle’s legacy syntax
-----------------------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | COUNT STOPKEY | | |
| 2 | VIEW | | 1 |
| 3 | NESTED LOOPS | | 1 |
| 4 | NESTED LOOPS | | 1 |
| 5 | NESTED LOOPS | | 1 |
| 6 | NESTED LOOPS | | 1 |
| 7 | TABLE ACCESS BY INDEX ROWID| CUSTOMER | 302 |
| 8 | INDEX FULL SCAN | PK_CUSTOMER | 1 |
| 9 | TABLE ACCESS BY INDEX ROWID| ADDRESS | 1 |
|* 10 | INDEX UNIQUE SCAN | PK_ADDRESS | 1 |
| 11 | TABLE ACCESS BY INDEX ROWID | CITY | 1 |
|* 12 | INDEX UNIQUE SCAN | PK_CITY | 1 |
|* 13 | INDEX UNIQUE SCAN | PK_COUNTRY | 1 |
| 14 | TABLE ACCESS BY INDEX ROWID | COUNTRY | 1 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
We’re seeing tons of nested loop joins because that’s what we’ll expect given the low cardinalities imposed by the COUNT STOPKEY operation.
SQL standard syntax with hint
-----------------------------------------------------------------
| Id | Operation | Name | Rows |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 1 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 |
| 3 | NESTED LOOPS | | 1 |
| 4 | NESTED LOOPS | | 1 |
| 5 | NESTED LOOPS | | 1 |
| 6 | NESTED LOOPS | | 1 |
| 7 | TABLE ACCESS BY INDEX ROWID| CUSTOMER | 302 |
| 8 | INDEX FULL SCAN | PK_CUSTOMER | 1 |
| 9 | TABLE ACCESS BY INDEX ROWID| ADDRESS | 1 |
|* 10 | INDEX UNIQUE SCAN | PK_ADDRESS | 1 |
| 11 | TABLE ACCESS BY INDEX ROWID | CITY | 1 |
|* 12 | INDEX UNIQUE SCAN | PK_CITY | 1 |
|* 13 | INDEX UNIQUE SCAN | PK_COUNTRY | 1 |
| 14 | TABLE ACCESS BY INDEX ROWID | COUNTRY | 1 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
10 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
12 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
13 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
Quite a similar plan
SQL standard syntax without hint
---------------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
|* 1 | VIEW | | 1 |
|* 2 | WINDOW SORT PUSHED RANK | | 599 |
|* 3 | HASH JOIN | | 599 |
| 4 | TABLE ACCESS FULL | CUSTOMER | 599 |
|* 5 | HASH JOIN | | 603 |
| 6 | MERGE JOIN | | 600 |
| 7 | TABLE ACCESS BY INDEX ROWID| COUNTRY | 109 |
| 8 | INDEX FULL SCAN | PK_COUNTRY | 109 |
|* 9 | SORT JOIN | | 600 |
| 10 | TABLE ACCESS FULL | CITY | 600 |
| 11 | TABLE ACCESS FULL | ADDRESS | 603 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_008"."rowlimit_$$_rownumber"<=1)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "CUSTOMER"."CUSTOMER_ID")<=1)
3 - access("CUSTOMER"."ADDRESS_ID"="ADDRESS"."ADDRESS_ID")
5 - access("ADDRESS"."CITY_ID"="CITY"."CITY_ID")
9 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
filter("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")
Oops. Many HASH JOIN and MERGE JOIN and TABLE ACCESS FULL operations, as well as a WINDOW SORT operation, rather than a WINDOW NOSORT operation. That cannot be good. Let’s measure this again.
Run 1, Statement 1 : 1.26157 -- ROWNUM
Run 1, Statement 2 : 1.32394 -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 66.97384 -- FETCH FIRST
Run 2, Statement 1 : 1.31992
Run 2, Statement 2 : 1.76459
Run 2, Statement 3 : 72.76056
Run 3, Statement 1 : 1
Run 3, Statement 2 : 1.36419
Run 3, Statement 3 : 74.06439
Run 4, Statement 1 : 1.08451
Run 4, Statement 2 : 1.64990
Run 4, Statement 3 : 66.83702
The difference is even worse: Factor 60x. And make no mistake, these are trivial data set sizes. As we can see in the last execution plan, the cardinality of the CUSTOMER table is 599. This can get much worse for larger tables.
SELECT actor_id, first_name, last_name, title
FROM actor a
OUTER APPLY (
SELECT /*+FIRST_ROWS(1)*/ title
FROM film f
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = a.actor_id
ORDER BY length(title) DESC
FETCH FIRST 3 ROWS ONLY
) t
ORDER BY actor_id, length(title) DESC;
This would have been much harder with the ROWNUM approach. In older Oracle versions, it was even impossible, because we could not reference A.ACTOR_ID from doubly nested derived tables / correlated subqueries. Luckily, this is no longer the case. So, syntactically, it is definitely a much better way to do paginated queries or TOP-N queries, but the price is very high.
Disclaimer
The optimiser might make much better choices when:
The base data set is much bigger than the above 600 to 1000 rows “strong” tables.
Indeed, when fetching the first row from the PAYMENT table (with ~16000 rows), the difference becomes smaller or even inexistent:
Run 1, Statement 1 : 1 -- ROWNUM
Run 1, Statement 2 : 1.72246 -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 3 : 1.76165 -- FETCH FIRST
Run 2, Statement 1 : 1.03919
Run 2, Statement 2 : 1.78284
Run 2, Statement 3 : 1.75742
Run 3, Statement 1 : 1.2553
Run 3, Statement 2 : 1.86441
Run 3, Statement 3 : 2.39089
Run 4, Statement 1 : 2.28814
Run 4, Statement 2 : 3.02436
Run 4, Statement 3 : 2.39407
Run 5, Statement 1 : 1.31462
Run 5, Statement 2 : 2.27225
Run 5, Statement 3 : 1.70975
As can be seen, the measurement errors start to outweigh the difference in performance, so the difference isn’t really as significant anymore.
The limit is not 1 or 3, but 10 or 50
When fetching the top 50 rows from the joined customer/address query, the measurements actually changed quite a bit. Suddenly, the ROWNUM query wasn’t optimal anymore and behaved like the un-hinted FETCH FIRST query. Adding a /*+FIRST_ROWS(1)*/ hint (not /*+FIRST_ROWS(50)*/ !) did help:
Run 1, Statement 1 : 1.00545 -- ROWNUM +FIRST_ROWS
Run 1, Statement 2 : 7.24842 -- ROWNUM
Run 1, Statement 3 : 1.35691 -- FETCH FIRST +FIRST_ROWS
Run 1, Statement 4 : 7.15264 -- FETCH FIRST
Run 2, Statement 1 : 1.08054
Run 2, Statement 2 : 6.51922
Run 2, Statement 3 : 1.35960
Run 2, Statement 4 : 7.94527
Run 3, Statement 1 : 1.02824
Run 3, Statement 2 : 7.16228
Run 3, Statement 3 : 1.19702
Run 3, Statement 4 : 7.55008
Run 4, Statement 1 : 1.08364
Run 4, Statement 2 : 6.66652
Run 4, Statement 3 : 1.18559
Run 4, Statement 4 : 7.36938
Run 5, Statement 1 : 1
Run 5, Statement 2 : 6.89051
Run 5, Statement 3 : 1.24211
Run 5, Statement 4 : 7.15167
Conclusion
What we’ve seen here is a bit unfortunate. For some cases, one approach is better than the other, performance wise. For others, it’s vice versa. Paginated queries are still a bit tricky for Oracle to get right and we have to measure things explicitly.
Workaround in jOOQ
Until this is fixed by Oracle, if you’re using jOOQ, you can use the SQLDialect.ORACLE11G dialect to run classic ROWNUM filtering queries also on Oracle 12c. Alternatively, a future version of jOOQ will optionally generate a +FIRST_ROWS hint with a reasonably approximated cardinality: https://github.com/jOOQ/jOOQ/issues/5793
Clock’s ticking. JDK 11 will remove a bunch of deprecated modules through JEP 320, which includes the Java EE modules, which again includes JAXB, a dependency of many libraries, including jOOQ. Thus far, few people have upgraded to Java 9 or 10, as these aren’t LTS releases. Unlike in the old days, however, people will be forced much earlier to upgrade to Java 11, because Java 8 (the free version) will reach end of life soon after Java 11 is released:
End of Public Updates for Oracle JDK 8
As outlined in the Oracle JDK Support Roadmap below, Oracle will not post further updates of Java SE 8 to its public download sites for commercial use after January 2019
So, we library developers must act and finally modularise our libraries. Which is, quite frankly, a pain. Not because of the module system itself, which works surprisingly well. But because of the toolchain, which is far from being production ready. This mostly includes:
It’s still almost not possible to maintain a modularised project in an IDE (I’ve tried Eclipse and IntelliJ, not Netbeans so far) as there are still tons of bugs. Some of which are showstoppers, halting compilation in the IDE (despite compilation working in Maven). For example:
But rather than just complaining, let’s complain and fix it
Let’s fix our own IDE by patching it
Disclaimer: The following procedure assumes that you have the right to modify your IDE’s source and binaries. To my understanding, this is the case with the EPL licensed Eclipse. It may not be the case for other IDEs.
Disclaimer2: Note, as reddit user fubarbazqux so eloquently put it, there are cleaner ways to apply patches (and contribute them) to the Eclipse community, if you have more time. This article just displays a very easy way to do things without spending too much time to figure out how the Eclipse development processes work, internally. It shows a QUICK FIX recipe
The first bug was already discovered and fixed for Eclipse 4.8, but its RC4 version seems to have tons of other problems, so let’s not upgrade to that yet. Instead, let’s apply the fix that can be seen here to our own distribution:
Or just add all the available dependencies, it doesn’t really matter.
You can now open the type that you want to edit:
Now, simply copy the source code from the editor and paste it in a new class inside of your project, which you put in the same package as the original (split packages are still possible in this case, yay)
Inside of your copy, apply the desired patch and build the project. Since you already included all the dependencies, it will be easy to compile your copy of the class, and you don’t have to build the entirety of Eclipse.
Now, go to your Windows Explorer or Mac OS X Finder, or Linux shell or whatever and find the compiled class:
This class can now be copied into the Eclipse plugin. How to find the appropriate Eclipse plugin? Just go to your plugin dependencies and check out the location of the class you’ve opened earlier:
Open that plugin from your Eclipse distribution’s /plugins folder using 7zip or whatever zipping tool you prefer, and overwrite the original class file(s). You may need to close Eclipse first, before you can write to the plugin zip file. And it’s always a good idea to make backup copies of the original plugin(s).
Be careful that if your class has any nested classes, you will need to copy them all, e.g.
MyClass.class
MyClass$1.class // Anonymous class
MyClass$Nested.class // Named, nested class
No problemo, we can hack our way around that as well. Launch your normal Eclipse instance (not the “Eclipse IDE for Eclipse Committers” one) with a debug agent running, by adding the following lines to your eclipse.ini file:
Launch Eclipse again, then connect to your Eclipse from your other “Eclipse IDE for Eclipse Committers” instance by connecting a debugger:
And start setting breakpoints wherever you need, e.g. here, in my case:
java.lang.NullPointerException
at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:145)
at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:226)
at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.handle(ProblemReporter.java:2513)
at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1831)
at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1808)
at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.checkAndRecordImportBinding(CompilationUnitScope.java:960)
at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:471)
at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:501)
at org.eclipse.jdt.internal.compiler.Compiler.process(Compiler.java:878)
at org.eclipse.jdt.internal.compiler.ProcessTaskManager.run(ProcessTaskManager.java:141)
at java.lang.Thread.run(Unknown Source)
And start analysing the problem like your own bugs. The nice thing is, you don’t have to fix the problem, just find it, and possibly comment out some lines of code if you think they’re not really needed. In my case, luckily, the regression was introduced by a new method that is applied to JDK 9+ projects only:
The method will check for the new @Deprecated(since="9") attribute on the @Deprecated annotation. Not an essential feature, so let’s just turn it off by adding this line to the source file:
String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
if (true) return;
// ...
}
This will effectively prevent the faulty logic from ever running. Not a fix, but a workaround. For more details about this specific issue, see the report. Of course, never forget to actually report the issue to Eclipse (or whatever your IDE is), so it can be fixed thoroughly for everyone else as well
Compile. Patch. Restart. Done!
Conclusion
Java is a cool platform. It has always been a very dynamic language at runtime, where compiled class files can be replaced by new versions at any moment, and re-loaded by the class loaders. This makes patching code by other vendors very easy, just:
Create a project containing the vendors’ code (or if you don’t have the code, the binaries)
Apply a fix / workaround to the Java class that is faulty (or if you don’t have the code, decompile the binaries if you are allowed to)
Compile your own version
Replace the version of the class file from the vendor by yours
Restart
This works with all software, including IDEs. In the case of jOOQ, all our customers have the right to modification, and they get the sources as well. We know how useful it is to be able to patch someone else’s code. This article shows it. Now, I can continue modularising jOOQ, and as a side product, improve the tool chain for everybody else as well.
Again, this article displayed a QUICK FIX approach (some call it “hack”). There are more thorough ways to apply patches / fixes, and contribute them back to the vendor.
Another, very interesting option would be to instrument your runtime and apply the fix only to byte code:
Could have use a Java agent to modify the class even without fixing it in the Eclipse source. Makes it easier to upgrade.
Again, I haven’t tried NetBeans yet (although I’ve heard its Java 9 support has been working very well for quite a while).
While IntelliJ’s Jigsaw support seems more advanced than Eclipse’s (still with a few flaws as well), it currently has a couple of performance issues when compiling projects like jOOQ or jOOλ. In a future blog post, I will show how to “fix” those by using a profiler, like:
Java Mission Control (can be used as a profiler, too)
YourKit
JProfiler
Profilers can be used to very easily track down the main source of a performance problem. I’vereporteda ton toEclipse already. For instance, this one:
Where a lot of time is being spent in the processing of Task Tags, like:
TODO
FIXME
XXX
The great thing about profiling this is:
You can report a precise bug to the vendor
You can find the flawed feature and turn it off as a workaround. Turning off the above task tag feature was a no-brainer. I’m not even using the feature.
Today, jOOQ 3.11 has been released with support for 4 new databases, implicit joins, diagnostics, and much more
New Databases Supported
At last, 4 new SQL dialects have been added to jOOQ! These are:
jOOQ Professional Edition
Aurora MySQL Edition
Aurora PostgreSQL Edition
Azure SQL Data Warehouse
jOOQ Enterprise Edition
Teradata
Implicit Joins
One of the really cool features in ORMs like Hibernate, Doctrine, and others, is
the capability of using a relationship graph notation to access another entity’s
columns through what is often called “implicit joins”.
Instead of explicitly joining a to-one relationship to access its columns:
SELECT author.first_name, author.last_name, book.title
FROM book
JOIN author ON book.author_id = author.id
We would like to be able to access those columns directly, using this notation:
SELECT book.author.first_name, book.author.last_name, book.title
FROM book
The join is implied and should be added implicitly. jOOQ now allows for this to
happen when you use the code generator:
When rendering this query, the implicit join graph will be calculated on the fly
and added behind the scenes to the BOOK table. This works for queries of
arbitrary complexity and on any level of nested SELECT.
The purpose of this SPI is to sanitise your SQL language, JDBC and jOOQ API
usage. Listeners can listen to events such as:
duplicateStatements (similar SQL is executed, bind variables should be used)
repeatedStatements (identical SQL is executed, should be batched or rewritten)
tooManyColumnsFetched (not all projected columns were needed)
tooManyRowsFetched (not all fetched rows were needed)
The great thing about this SPI is that it can be exposed to clients through the
JDBC API, in case of which the diagnostics feature can reverse engineer your
JDBC or even JPA generated SQL. Ever wanted to detect N+1 queries from
Hibernate? Pass those Hibernate-generated queries through this SPI.
Want to find missing bind variables leading to cursor cache contention or SQLi?
Let jOOQ find similar SQL statements and report them. E.g.
SELECT name FROM person WHERE id = 1
SELECT name FROM person WHERE id = 2
Or also:
SELECT name FROM person WHERE id IN (?, ?)
SELECT name FROM person WHERE id IN (?, ?, ?)
Anonymous blocks
Many databases support anonymous blocks to run several statements in a single
block scope. For example, Oracle:
DECLARE
l_var NUMBER(10);
BEGIN
l_var := 10;
dbms_output.put_line(l_var);
END;
jOOQ now supports the new org.jooq.Block API to allow for wrapping DDL and DML
statements in such a block. This is a first step towards a future jOOQ providing
support for:
Abstractions over procedural languages
CREATE PROCEDURE and CREATE FUNCTION statements
Trigger support
And much more
Parser
jOOQ’s parser support is an ongoing effort. This release has added support for
a lot of new SQL clauses and functions from various vendors and in various DDL
and DML statements.
The parser is now also exposed through a public website and API, where SQL can
be translated from one dialect to another: https://www.jooq.org/translate
This website will help further drive jOOQ API development by helping to find
missing functionality that is used in real-world SQL.
Another way to access this API is through the new org.jooq.ParserCLI command
line tool. For example, run:
$ java -cp jooq-3.11.0.jar org.jooq.ParserCLI -f -t ORACLE -s "SELECT * FROM (VALUES(1),(2)) AS t(a)"
To get:
select *
from (
(
select null a
from dual
where 1 = 0
)
union all (
select *
from (
(
select 1
from dual
)
union all (
select 2
from dual
)
) t
)
) t;
Formal Java 10 Support
jOOQ 3.11 is the first release that is formally integration tested with Java 10.
To use jOOQ with Java 10, use the Java 8 distribution which has not yet been
modularised, but contains Automatic-Module-Name specification to be forward
compatible with future, modularised jOOQ distributions.
Additionally, package names between jOOQ, jOOQ-meta, and jOOQ-codegen have been
cleaned up to prevent duplicate package names, and the JAXB dependency has been
added explicitly to the various artefacts.
Other great improvements
Finally, asterisks (SELECT * or SELECT t.*) are formally supported in the API.
Collations can now be specified on a variety of syntax elements
The org.jooq.Comment type has been added, and DDL statements for it
The DefaultBinding implementation has been rewritten for better peformance
Several performance improvements in jOOQ’s internals
Many more DDL statements are supported including GRANT and REVOKE
Support for the EXPLAIN statement
FETCH n PERCENT ROWS and TOP n PERCENT clauses are supported
Better org.jooq.Name and org.jooq.Named API for identifier handling
Support for PostgreSQL 10
Support for SQL Server 2017
Support for DB2 11
Upgraded MariaDB support for window functions, inv dist functions, WITH
jOOU dependency updated to 0.9.3
jOOR dependency updated to 0.9.8
Server output (e.g. DBMS_OUTPUT) can now be fetched automatically, by jOOQ
Code generation support for PL/SQL TABLE types
SQL Keywords Can Now Be Rendered In Pascal Style If You Must
Emulate PostgreSQL’s ON CONFLICT clause using MERGE
In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.
To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:
for (Record2<String, String> record : DSL.using(configuration)
// ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the
// generated code referenced from the below SELECT clause
.select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
// vvvvv ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ Generated names
.from(ACTOR)
.orderBy(1, 2)) {
// ...
}
There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.
There is some truth (internal or external), like a specification, data model, etc.
We need a local representation of that truth in our programming language
And it almost always makes sense to generate that latter, to avoid redundancy.
Type providers and annotation processing
Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.
In a way, this does the same thing except:
You don’t see the generated code (perhaps that’s less appalling to some?)
You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.
What’s the problem with code generation?
Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.
If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.
Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:
“But Hibernate / JPA makes coding Java first easy”
Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:
@Entity
class Book {
@Id
int id;
String title;
}
And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:
CREATE TABLE book (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
title VARCHAR(50),
CONSTRAINT pk_book PRIMARY KEY (id)
);
CREATE INDEX i_book_title ON book (title);
… and start running the application. That’s really cool to get started quickly and to try out things.
But, huh, wait. I cheated.
Will Hibernate really apply that named primary key definition?
Will it create the index on TITLE, which I know we’ll need?
Will it add an identity specification?
Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:
@Entity
@Table(name = "book", indexes = {
@Index(name = "i_book_title", columnList = "title")
})
class Book {
@Id
@GeneratedValue(strategy = IDENTITY)
int id;
String title;
}
Cool. Re-generate. Again, this makes it really easy to get started.
But you’ll pay the price later on
At some point, you go to production. And that’s when this model no longer works. Because
Once you go live, you can no longer throw away your database, as your database has become legacy.
From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.
Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…
Instead, you could have done things entirely differently from the beginning. Like using those round wheels.
Go Database First
The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.
And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:
In what tablespace your table resides
What PCTFREE value it has
What the cache size of your sequence (behind the identity) is
Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.
But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.
What about the client model?
As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.
All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:
-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables
-- DB2
SELECT tabschema, tabname
FROM syscat.tables
-- Oracle
SELECT owner, table_name
FROM all_tables
-- SQLite
SELECT name
FROM sqlite_master
-- Teradata
SELECT databasename, tablename
FROM dbc.tables
These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.
From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.
If you’re using JDBC or Spring, you can create a bunch of String constants
If you’re using JPA, you can generate the entities themselves
If you’re using jOOQ, you can generate the jOOQ meta model
Now, any database increment will automatically lead to updated client code. For instance, imagine:
ALTER TABLE book RENAME COLUMN title TO book_title;
Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:
@Entity
@Table(name = "book", indexes = {
// Would you have thought of this?
@Index(name = "i_book_title", columnList = "book_title")
})
class Book {
@Id
@GeneratedValue(strategy = IDENTITY)
int id;
@Column("book_title")
String bookTitle;
}
Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.
A single truth
Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form: https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd
XSD is well understood
XSD specifies XML content very well, and allows for validation in all client languages
XSD can be versioned easily, and evolved backwards compatibly
XSD can be translated to Java code using XJC
The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.
Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…
Databases: Same thing
When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.
Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.
So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.
Thank me later.
Clarification
Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.
In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.
Exceptions
There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):
When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.
The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.