Oracle’s BINARY_DOUBLE Can Be Much Faster Than NUMBER

Using the right data type for some calculation sounds like some obvious advice.

There are many blogs about using temporal data types for temporal data, instead of strings. An obvious reason is data integrity and correctness. We don’t gain much in storing dates as 2019-09-10 in one record, and as Nov 10, 2019 in the next one. Performance is also important in this case – for example, because of statistics being more optimal (an often overlooked side effect of unfit data types).

But at a customer site, I’ve recently discovered a surprising (not surprising in hindsight) performance issue when using NUMBER instead of BINARY_DOUBLE in an Oracle database.

NUMBER for monetary amounts

The NUMBER type (or DECIMAL, NUMERIC in other RDBMS) is perfectly suited for all decimal numbers, which require the correct precision and rounding. I mean, if you ever encounter this kind of display in an invoice, such as when I purchase video games on steam:

… your trust level for the platform immediately goes down by factor 1.0e+1 (even if technically, it does not matter in this case). So, the default, e.g. in banking systems for monetary amounts is always to use NUMBER or an equivalent type, e.g. java.math.BigDecimal.

Calculations on numbers

But now, let’s assume we want to do some statistics on these amounts. Some aggregations of various numeric values, maybe even of their logarithms. We can establish that these two expressions are equal:

ln(a * b) = ln(a) + ln(b)

Or in other words, for positive numbers:

a * b = exp(ln(a) + ln(b))

We’ve already blogged about how this approach can be used to emulate a PRODUCT() aggregate function in SQL, which is very useful for some cases, but none of the SQL databases jOOQ supports has built-in support for this yet. Notice, the blog post also takes care of zero and negative numbers.

But which number type to choose?

Now, we might be tempted to just calculate the LN(SOME_NUMBER) value, and sum that up using SUM(LN(SOME_NUMBER)) for this use-case. This turns out to be terribly slow in Oracle. We were thinking of bad indexes, first, even doubted aggregation in general, until I suggested we try using BINARY_DOUBLE instead, in this case. In our case, we didn’t care about the exact numeric value of the amount. A IEEE 754 floating point number with double precision was going to be good enough.

The results were surprising. In a simple benchmark, we compared 10 approaches to calculating this sum of logarithms:

  1. Using NUMBER(20, 10) and SUM(LN(the_number))
  2. Using NUMBER(20, 10) and SUM(LN(CAST(the_number AS BINARY_DOUBLE)))
  3. Using NUMBER(20, 10) and SUM(LN(TO_BINARY_DOUBLE(the_number)))
  4. Using NUMBER(20, 10), containing a pre-calculated LN value
  5. Using DOUBLE PRECISION and SUM(LN(the_double))
  6. Using DOUBLE PRECISION and SUM(LN(CAST(the_double AS BINARY_DOUBLE)))
  7. Using DOUBLE PRECISION and SUM(LN(TO_BINARY_DOUBLE(the_double)))
  8. Using DOUBLE PRECISION, containing a pre-calculated LN value
  9. Using BINARY_DOUBLE and SUM(LN(the_binary))
  10. Using BINARY_DOUBLE, containing a pre-calculated LN value

These were the thoughts:

  • We tried the above 3 possible numeric data types, expecting BINARY_DOUBLE to be the fastest
  • We tried to pre-calculate the LN() value for the benchmark, to see how much effort goes into summing, and how much effort goes into the LN() calculation with each type. While in general, in this system, such precalculation is impractical, we still wanted to have a benchmark comparison, in case a materialised view or other technique would be feasible.
  • We tried casting and converting each type to BINARY_DOUBLE prior to passing the value to the LN() function. Instead of migrating all the data (with possible side effects), we wanted to see if we can solve this to a reasonable extent “on the fly”

The benchmark that I’m posting here uses this table and example data (full benchmark code at the end of the article):

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

So, we have 100000 records, whose SUM(LN(x)) we want to calculate in the above 10 different ways. N1 contains the raw numeric value, and N2 contains the pre-calculated LN(N1).

The benchmark technique is described here. Do note it has a lot of caveats and is only useful for a limited number of verifications. Please always be careful when running such benchmarks, they often do not test production-like situations – and never use such benchmarks to compare different RDBMS products directly. They are only useful to compare two approaches on the same RDBMS product.

The results as run on Oracle 18c XE in Docker on development hardware are below. Times are compared relative to the fastest run, as actual time spent in each execution is not interesting for comparison. We did have similar results on production-like hardware, though, and I’m sure you can produce similar results in other RDBMS:

NUMBER(20, 10)
-------------------------------------------------
Run 3, Statement 1 : 280.34143  (avg : 280.75347)
Run 3, Statement 2 : 7.99402    (avg : 8.03506)
Run 3, Statement 3 : 7.71383    (avg : 7.73017)
Run 3, Statement 4 : 1.05456    (avg : 1.11735)

DOUBLE PRECISION
------------------------------------------------
Run 3, Statement 5 : 278.89476 (avg : 279.72981)
Run 3, Statement 6 : 8.06512   (avg : 8.07033)
Run 3, Statement 7 : 7.81873   (avg : 7.80063)
Run 3, Statement 8 : 1.5315    (avg : 1.54347)

BINARY_DOUBLE
------------------------------------------------
Run 3, Statement 9 : 2.4963    (avg : 2.57184)
Run 3, Statement 10: 1         (avg : 1.02943)

How to read these results?

  • Statement 10 is the fastest one, unsurprisingly, as it aggregates pre-calculated LN(binary_double) values. The precalculation of the function means that all the work has been done already before the report, and the data type is the one we expected to perform best in general
  • Statements 4 and 8 are almost as fast (precalculated LN() values). Being only slight factors off, we can attribute the difference to the usual benchmark flaws, although it’s interesting to see that DOUBLE PRECISION seems 1.5x slower to sum than BINARY_DOUBLE and even NUMBER
  • Statements 1, 5, 9 are the ones where no data type conversion is applied and SUM(LN(the_value)) is being calculated. It is staggering how much slower both NUMBER and DOUBLE PRECISION are than BINARY_DOUBLE. Statements 1 and 5 are a factor of 112x slower than statement 9!
  • Statements 2-3, 6-7 prove that converting the NUMBER or DOUBLE PRECISION vales to BINARY_DOUBLE on the fly provides a sufficiently performant workaround, which made statements 2-3, 6-7 only 3x slower than statement 9
  • Statements 2-3, 6-7 show that casting and converting are about equivalent

Note, we found similar results with other functions, such as EXP()

Analysis

The order of magnitude may seem surprising at first, but thinking about it, it is not. We would never do CPU intensive computation with java.math.BigDecimal in Java. The BigDecimal type is there for numeric accuracy, e.g. when it really matters what the monetary amount is, exactly. When we run analytics on monetary amounts, using double is sufficient in Java as well.

If our data is BigDecimal, and we cannot reasonably change that, it might still be better to use the BigDecimal::doubleValue conversion prior to further processing using e.g. Math::log. So, this translates directly to SQL, whose LN() implementations are data type specific. IEEE 754 having been designed precisely for this purpose.

When doing CPU intensive computations both in Java, or in the database, we should always evaluate our various options of

  • Quick fixing our data sets for the report (ad-hoc conversion prior to calculation)
  • Thoroughly fixing our data sets in the schema (migration towards a better data type)
  • Preprocessing our data sets (precalculating some very commonly used computations)

Benchmark logic

Just run the below on your own hardware. I’m curious to see your results:

-- Copyright Data Geekery GmbH
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
--     http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- This version displays relative execution times (fastest execution = 1)
SET SERVEROUTPUT ON

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 10;
  v_stmt NUMBER;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
    v_stmt := 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(n1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(n1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(n1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(n2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(d1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(d1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(d1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(d2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(b1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(b2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) / MIN(elapsed) OVER() AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE data;
DROP TABLE results;

Using DISTINCT ON in Non-PostgreSQL Databases

A nice little gem in PostgreSQL’s SQL syntax is the DISTINCT ON clause, which is as powerful as it is esoteric.

In a previous post, we’ve blogged about some caveats to think of when DISTINCT and ORDER BY are used together. The bigger picture can be seen in our article about the logical order of operations in SQL SELECT.

The PostgreSQL documentation explains it well:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. […] For example:

SELECT DISTINCT ON (location) location, time, report
    FROM weather_reports
    ORDER BY location, time DESC;

retrieves the most recent weather report for each location. […]

Again, this is quite esoteric as the distinct-ness is now decided only based on the columns listed separately in parentheses. For all the other columns, only the first row according to ORDER BY is projected. The SQL language is probably one of the only ones where the syntactic order of operations has absolutely nothing to do with the logical order of operations, and this DISTINCT ON syntax isn’t helping. In a more straightforward language design, the above statement could read, instead:

FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
SELECT 
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
DISTINCT
ORDER BY location

In other words, we would execute these operations in the following logical order:

  1. FROM: Access the weather_reports table
  2. WINDOW: Specify “windows” or “groups” in our data set, grouping by location and ordering contents of each group by time, descendingly (using the term “WINDOW” is no accident as we’ll see afterwards)
  3. SELECT: Project the location (which is the grouping) and per group, the first values of time / report ordered by time
  4. DISTINCT: Remove all the duplicates, because the above operation will produce the same time and report value for each record that shares the same location
  5. ORDER BY: Finally, order the results per grouping

That’s what really happens, and incidentally, the above synthetic SQL syntax matches the actual logical order of operations in the SQL language, so translating it back to an actual SQL statement would yield:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
ORDER BY location

If your database doesn’t support the WINDOW clause, just expand it into the individual window functions. E.g. in Oracle, write:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER (PARTITION BY location ORDER BY time DESC),
  FIRST_VALUE (report) OVER (PARTITION BY location ORDER BY time DESC)
FROM weather_reports
ORDER BY location

From a readability perspective, I would definitely prefer the standard SQL syntax over DISTINCT ON.

Want to play around with it? Here’s some sample data:

create table weather_reports (location text, time date, report text);
insert into weather_reports values ('X', DATE '2000-01-01', 'X1');
insert into weather_reports values ('X', DATE '2000-01-02', 'X2');
insert into weather_reports values ('X', DATE '2000-01-03', 'X3');
insert into weather_reports values ('Y', DATE '2000-01-03', 'Y1');
insert into weather_reports values ('Y', DATE '2000-01-05', 'Y2');
insert into weather_reports values ('Z', DATE '2000-01-04', 'Z1');

The result being:

|location|time      |report|
|--------|----------|------|
|X       |2000-01-03|X3    |
|Y       |2000-01-05|Y2    |
|Z       |2000-01-04|Z1    |

Notice that jOOQ already supports PostgreSQL DISTINCT ON and in the future, we might emulate it for other dialects using the above technique: https://github.com/jOOQ/jOOQ/issues/3564

Quantified LIKE ANY predicates in jOOQ 3.12

Quantified comparison predicates

One of SQL’s weirdes features are quantified comparison predicates. I’ve hardly ever seen these in the wild:

SELECT *
FROM t
WHERE id = ANY (1, 2, 3)

The above example is equivalent to using the much more readable IN predicate:

SELECT *
FROM t
WHERE id IN (1, 2, 3)

This equivalence is defined in the SQL standard. There are more esoteric cases that could be solved using such quantified comparison predicates more conveniently than otherwise, such as:

SELECT *
FROM t
WHERE (a, b) > ALL (
  SELECT x, y
  FROM u
)

This is the same thing as writing the more verbose, and in my opinion, a bit less readable:

SELECT *
FROM t
WHERE (a, b) > (
  SELECT x, y
  FROM u
  ORDER BY x, y
  FETCH FIRST ROW ONLY
)

Assuming, of course, that your RDBMS can compare row value expressions like that.

Quantified LIKE predicate

Unfortuantely, the SQL standard and most implementations support the above quantified comparison predicates only for the <, <=, >, >=, =, != comparison operators. Not for other predicate types. For example, the LIKE predicate would profit greatly from such a syntax:

SELECT *
FROM customers
WHERE last_name LIKE ANY ('A%', 'B%', 'C%')

The syntax is immediately understandable and translates to

SELECT *
FROM customers
WHERE last_name LIKE 'A%'
OR last_name LIKE 'B%'
OR last_name LIKE 'C%'

… which is much less convenient to write!

Furthermore, imagine producing such patterns from a subquery:

SELECT *
FROM customers
WHERE last_name LIKE ANY (
  SELECT pattern
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

This is a bit trickier to emulate in standard SQL. For example, in PostgreSQL, we could write:

SELECT *
FROM customers
WHERE true = ANY (
  SELECT last_name LIKE pattern
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

In this case, we can use boolean types. Oracle would make this a bit more difficult:

SELECT *
FROM customers
WHERE 1 = ANY (
  SELECT CASE
    WHEN last_name LIKE pattern THEN 1
    WHEN NOT(last_name LIKE pattern) THEN 0
    ELSE NULL
  END
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

Wouldn’t this a useful SQL feature to support?

jOOQ 3.12 support for this

jOOQ does support this syntax starting from jOOQ 3.12. You can now write

ctx.selectFrom(CUSTOMERS)
   .where(CUSTOMERS.LAST_NAME.like(any("A%", "B%", "C%")))
   .fetch();

ctx.selectFrom(CUSTOMERS)
   .where(CUSTOMERS.LAST_NAME.like(any(
      select(PATTERNS.PATTERN)
      .from(PATTERNS)
      .where(PATTERN.CUSTOMER_TYPE.eq(CUSTOMER.CUSTOMER_TYPE))
   )))
   .fetch();

All the previously mentioned emulations are available. You can play around with it downloading jOOQ:
https://www.jooq.org/download

Or directly on our website:
https://www.jooq.org/translate

jOOQ 3.12 Released With a new Procedural Language API

jOOQ 3.12 has been released with a new procedural language API, new data types, MemSQL support, formal Java 11+ support, a much better parser, and reactive stream API support

In this release, we’ve focused on a lot of minor infrastructure tasks, greatly
improving the overall quality of jOOQ. We’ve reworked some of our automated
integration tests, which has helped us fix a big number of not yet discovered
issues, including a much better coverage of our 26 supported RDBMS dialects.

We’re excited about these internal changes, as they will help us implement a lot
of features that have been requested by many for a long time, including an
immutable query object model, with all the secondary benefits like caching of
generated SQL, and much more powerful dynamic SQL construction and
transformation in the future.

Major new features include the new procedural language API shipped with our
commercial distributions, new data types including native JSON support, MemSQL
support, formal Java 11+ support, a much better parser, and reactive stream API
support.

Procedural languages

Following up on jOOQ 3.11’s support for anonymous blocks, the jOOQ 3.12
Professional and Enterprise Editions now include support for a variety of
procedural language features, including

  • Variable declarations
  • Variable assignments
  • Loops (WHILE, REPEAT, FOR, LOOP)
  • If Then Else
  • Labels
  • Exit, Continue, Goto
  • Execute

This feature set is part of our ongoing efforts to continue supporting more
advanced vendor specific functionality, including our planned definition and
translation of stored procedures, triggers, and other, ad-hoc procedural logic
that helps move data processing logic into the database server.

New Databases Supported

The jOOQ Professional Edition now supports the MemSQL dialect. MemSQL is derived
from MySQL, although our integration tests have shown that there are numerous
differences, such that supporting MemSQL formally will add a lot of value to our
customers by providing much increased syntactic correctness.

Reactive streams

The reactive programming model is gaining traction in some environments as new,
useful streaming APIs emerge, such as e.g. Reactor. These APIs have agreed to
work with a common SPI: reactive streams, and since JDK 9 the new
java.util.concurrent.Flow SPI. jOOQ 3.12 now implements these paradigms on an
API level, such that an integration with APIs like Reactor becomes much more
easy. The implementation still binds to JDBC, and is thus blocking. Future
versions of jOOQ will abstract over JDBC to allow for running queries against
ADBA (by Oracle) or R2DBC (by Spring)

New data types

We’ve introduced native support for a few new data types, which are often very
useful in specific situations. These include:

  • JSON / JSONB: A native string wrapper for textual and binary JSON data. While
    users will still want to bind more specific JSON to maps and lists using
    custom data type Bindings, in a lot of cases, being able to just serialise and
    deserialise JSON content as strings is sufficient. jOOQ now provides out of
    the box support for this approach for various SQL dialects.
  • INSTANT: RDBMS do not agree on the meaning of the SQL standard TIMESTAMP WITH
    TIME ZONE. PostgreSQL, for example, interprets it as a unix timestamp, just
    like java.time.Instant. For an optimal PostgreSQL experience, this new INSTANT
    type will be much more useful than the standard JDBC java.time.OffsetDateTime
    binding.
  • ROWID: Most RDBMS have a native ROWID / OID / CTID / physloc identity
    value that physically identifies a row on the underlying storage system,
    irrespective of any logical primary key. These ROWIDs can be leveraged to run
    more performant, vendor specific queries. Supporting this type allows for
    easily using this feature in arbitrary queries.

Parser

Our parser is seeing a lot of continued improvements over the releases as we
gather feedback from our users. Our main drivers for feedback are:

  • The DDLDatabase which allows for generating code from DDL scripts rather than
    live JDBC connections to your database
  • The https://www.jooq.org/translate website, which translates any kind of SQL
    between database dialects.

SQL dialect translation will evolve into an independent product in the future.
DDL parsing is already very powerful, and a lot of customers rely on it for
their production systems.

In the next versions, we will be able to simulate DDL on our own, without H2,
which will open up a variety of possible use cases, including better schema
management.

Specific jOOQ 3.12 parser improvements include:

  • Being able to access schema meta information (column types, constraints) to
    better emulate SQL features / translate SQL syntax between dialects
  • A parse search path, similar to PostgreSQL’s search_path, or other dialects’
    current_schema, allowing support for unqualified object references.
  • The DDL simulation from the DDLDatabase is now moved into the core library,
    supporting it also out of the box as a DDL script based meta data source
  • A new special comment syntax that helps ignoring SQL fragments in the jOOQ
    parser only, while executing it in your ordinary SQL execution.
  • A new interactive mode in the ParserCLI
  • Support for nested block comments
  • And much more

Formal Java 11 Support

While we have been supporting Java 11 for a while through our integration tests,
jOOQ 3.12 now fully supports Java 11 to help improve the experience around the
transitive JAXB dependency, which we now removed entirely out of jOOQ.

The commercial editions ship with a Java 11+ supporting distribution, which
includes more optimal API usage, depending on new Java 9-11 APIs. All editions,
including the jOOQ Open Source Edition, have a Java 8+ distribution that
supports any Java version starting from Java 8.

Commercial Editions

Dual licensing is at the core of our business, helping us to provide continued
value to our customers.

In the past, the main distinction between the different jOOQ editions was the
number of database products each edition supported. In the future, we want to
provide even more value to our customers with commercial subscriptions. This is
why, starting from jOOQ 3.12, we are now offering some new, advanced features
only in our commercial distributions. Such features include:

  • The procedural language API, which is available with the jOOQ Professional
    and Enterprise Editions
  • While the jOOQ 3.12 Open Source Edition supports Java 8+, the jOOQ 3.12
    Professional Edition also ships with a Java 11+ distribution, leveraging some
    newer JDK APIs, and the jOOQ 3.12 Enterprise Edition continues supporting
    Java 6 and 7.
  • Since Java 8 still sees very substantial market adoption, compared to Java 11,
    we still support Java 8 in the jOOQ 3.12 Open Source Edition.
  • Starting from jOOQ 3.12, formal support for older RDBMS dialect versions in
    the runtime libraries is reserved to the jOOQ Professional and Enterprise
    Editions. The jOOQ Open Source Edition will ship with support for the latest
    version of an RDBMS dialect, only. The code generator is not affected by this
    change.

By offering more value to our paying customers, we believe that we can continue
our successful business model, which in turn allows us to continue the free
jOOQ Open Source Edition for free. Our strategy is:

  • To implement new, advanced, commercial only features.
  • To offer legacy support (legacy Java versions, legacy database versions) to
    paying customers only.
  • To continue supporting a rich set of features to Open Source Edition users.

H2 and SQLite integration

Over the past year, both H2 and SQLite have seen a lot of improvements, which we
have now supported in jOOQ as well. Specifically, H2 is moving at a very fast
pace, and our traditional close cooperation got even better as we’re helping
the H2 team with our insights into the SQL standards, while the H2 team is
helping us with our own implementations.

Other improvements

The complete list of changes can be found on our website:
https://www.jooq.org/notes

A few improvements are worth summarising here explicitly

  • We’ve added support for a few new SQL predicates, such as the standard
    UNIQUE and SIMILAR TO predicates, as well as the synthetic, but very useful
    LIKE ANY predicate.
  • The JAXB implementation dependency has been removed and replaced by our own
    simplified implementation for a better Java 9+ experience.
  • The historic log4j (1.x) dependency has been removed. We’re now logging only
    via the optional slf4j dependency (which supports log4j bridges), or
    java.util.logging, if slf4j cannot be found on the classpath.
  • The shaded jOOR dependency has been upgraded to 0.9.12.
  • We’ve greatly improved our @Support annotation usage for better use with
    jOOQ-checker.
  • jOOQ-checker can now run with ErrorProne as well as with the checker framework
    as the latter still does not support Java 9+.
  • We’ve added support for a lot of new DDL statements and clauses.
  • There is now a synthetic PRODUCT() aggregate and window function.
  • We added support for the very useful window functions GROUPS mode.
  • Formatting CSV, JSON, XML now supports nested formatting.
  • UPDATE / DELETE statements now support (and emulate) ORDER BY and LIMIT.
  • When constructing advanced code generation configuration, users had to resort
    to using programmatic configuration. It is now possible to use SQL statements
    to dynamically construct regular expression matching tables, columns, etc.
  • Configuration has a new UnwrapperProvider SPI.
  • MockFileDatabase can now handle regular expressions and update statements.
  • Settings can cleanly separate the configuration of name case and quotation.
  • MySQL DDL character sets are now supported, just like collations.
  • A new Table.where() API simplifies the construction of simple derived tables.
    This feature will be very useful in the future, for improved row level
    security support.
  • A nice BigQuery and H2 feature is the “* EXCEPT (…)” syntax, which allows
    for removing columns from an asterisked expression. We now have
    Asterisk.except() and QualifiedAsterisk.except().
  • A lot of improvements in date time arithmetic were added, including support
    for vendor specific DateParts, like WEEK.

Full release notes here: https://www.jooq.org/notes

How to Fetch All Current Identity Values in Oracle

Oracle 12c has introduced the useful SQL standard IDENTITY feature, which is essentially just syntax sugar for binding a sequence to a column default. We can use it like this:

create table t1 (col1 number generated always as identity);
create table t2 (col2 number generated always as identity);

insert into t1 values (default);
insert into t1 values (default);
insert into t1 values (default);
insert into t2 values (default);

select * from t1;
select * from t2;

Which produces

COL1
----
  1
  2
  3

COL2
----
  1

For unit testing against our database, we might want to know what “state” our identities are in. For each table, we would like to know the next value such an identity would produce. If we knew all the backing sequence names, we could query their seq.currval, but we don’t know those sequence names as they are generated.

However, we can query the dictionary views to get this information as follows:

select data_default
from user_tab_cols
where data_default is not null
and identity_column = 'YES'
and table_name in ('T1', 'T2');

An alternative is to query user_tab_identity_cols

This would produce:

"TEST"."ISEQ$$_116601".nextval
"TEST"."ISEQ$$_116603".nextval

Now, if we’re lazy, we could just run EXECUTE IMMEDIATE on each of those expressions and we’re done:

set serveroutput on
declare
  v_current number;
begin
  for rec in (
    select table_name, data_default
    from user_tab_cols
    where data_default is not null
    and identity_column = 'YES'
    and table_name in ('T1', 'T2')
  ) loop
    execute immediate replace(
      'select ' || rec.data_default || ' from dual', 
      '.nextval', 
      '.currval'
    ) into v_current;
    dbms_output.put_line(
      'Table : ' || rec.table_name || 
      ', currval : ' || v_current
    );
  end loop;
end;
/

This would produce:

Table : T1, currval : 3
Table : T2, currval : 1

Alternatively, if you want this result to be a SQL result instead of DBMS_OUTPUT content, you could run this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select data_default
      from user_tab_cols
      where table_name = p_table_name
      and data_default is not null
      and identity_column = 'YES'
    )
    loop
      execute immediate replace(
        'select ' || rec.data_default || ' from dual', 
        '.nextval', 
        '.currval'
      ) into v_current;
      return v_current;
    end loop;
    
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
  where table_name in ('T1', 'T2')
)
where current_value is not null
order by table_name;
/

The alternative using user_tab_identity_cols would look like this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select sequence_name
      from user_tab_identity_cols
      where table_name = p_table_name
    )
    loop
      execute immediate 
        'select ' || rec.sequence_name || '.currval from dual'
      into v_current;
      return v_current;
    end loop;
     
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
)
where current_value is not null
order by table_name;
/

The result is now a nice SQL result set:

TABLE_NAME   CURRENT_VALUE
--------------------------
T1           3
T2           1

How to Use jOOQ’s Commercial Distributions with Spring Boot

Spring Boot is great to get started very quickly with what the Spring Boot authors have evaluated to be useful defaults. This can be a lot of help when you’re doing things for the first time, and have no way to copy paste working Maven pom.xml files from existing projects, for example.

When working with the jOOQ Open Source Edition, just go to https://start.spring.io, add the jOOQ dependency, and start working!

It is a bit different when you want to work with the commercial distributions of jOOQ, for two reasons:

  1. They are not on Maven Central, but in your own repository or artifactory, after you’ve installed the latest version from our website: https://www.jooq.org/download/versions
  2. They use a different Maven groupId, to make sure the different distributions can be easily distinguished.

The different groupIds for jOOQ distributions are:

org.jooq For the jOOQ Open Source Edition
org.jooq.trial For the jOOQ Trial Edition
org.jooq.pro For the jOOQ Express, Professional and Enterprise Edition (supporting the latest JDK versions)
org.jooq.pro-java-6 For the jOOQ Express, Professional and Enterprise Edition (supporting Java 6+)
org.jooq.pro-java-8 For the jOOQ Express, Professional and Enterprise Edition (supporting Java 8+, starting from jOOQ 3.12)

Spring Boot doesn’t know this, and doesn’t have to. All of these distributions are largely source and binary compatible, so you can switch editions in your application simply by replacing dependencies. A vanilla https://start.spring.io pom.xml configuration might look like this.

Notice: I’m leaving out spring-boot-starter-test, spring-boot-maven-plugin, and other things not essential for this blog post, please use https://start.spring.io to generate a more complete pom.xml stub!

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    </dependency>
  </dependencies>
</project>

What dependencies are we getting from this?

mvn dependency:tree

We’re getting:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] \- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO]    +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO]    |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO]    |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO]    |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO]    |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO]    |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO]    |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO]    |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO]    |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO]    |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO]    |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO]    +- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO]    |  +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO]    |  \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO]    |     \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO]    \- org.jooq:jooq:jar:3.11.11:compile
[INFO]       \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]          \- javax.activation:javax.activation-api:jar:1.2.0:compile

When this blog post was written, 3.11.11 was the latest jOOQ Open Source Edition version. But perhaps, you want a newer version or an older version. You can override this easily by specifying the ${jooq.version} property in Maven:

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <properties>
    <jooq.version>3.11.0</jooq.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    </dependency>
  </dependencies>
</project>

The dependency tree is now:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] \- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO]    +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO]    |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO]    |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO]    |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO]    |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO]    |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO]    |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO]    |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO]    |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO]    |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO]    |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO]    +- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO]    |  +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO]    |  \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO]    |     \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO]    \- org.jooq:jooq:jar:3.11.0:compile
[INFO]       \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]          \- javax.activation:javax.activation-api:jar:1.2.0:compile

But it’s still the jOOQ Open Source Edition. What if you want a commercial distribution, e.g. to try out jOOQ? One way is to explicitly exclude Spring Boot’s transitive jOOQ Open Source Edition dependency, and introduce your own explicit dependency. For example:

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <properties>
    <jooq.version>3.11.11</jooq.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    
      <!-- Exclude the jOOQ Open Source Edition -->
      <exclusions>
        <exclusion>
          <groupId>org.jooq</groupId>
          <artifactId>jooq</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  
    <!-- Include a commercial jOOQ distribution -->
    <dependency>
      <groupId>org.jooq.trial</groupId>
      <artifactId>jooq</artifactId>
      <version>${jooq.version}</version>
    </dependency>
  </dependencies>
</project>

The new dependency tree is now:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] +- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO] |  |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO] |  |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO] |  |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO] |  |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO] |  |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO] |  |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO] |  |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO] |  |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO] |  |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO] |  \- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO] |     +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO] |     \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO] |        \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO] \- org.jooq.trial:jooq:jar:3.11.11:compile
[INFO]    \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]       \- javax.activation:javax.activation-api:jar:1.2.0:compile

And you’re all set!

How to Write a Simple, yet Extensible API

How to write a simple API is already an art on its own.

I didn’t have time to write a short letter, so I wrote a long one instead.

― Mark Twain

But keeping an API simple for beginners and most users, and making it extensible for power users seems even more of a challenge. But is it?

What does “extensible” mean?

Imagine an API like, oh say, jOOQ. In jOOQ, you can write SQL predicates like this:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1)) // Predicate with bind value here
   .fetch();

By default (as this should always be the default), jOOQ will generate and execute this SQL statement on your JDBC driver, using a bind variable:

SELECT t.a, t.b
FROM t
WHERE t.c = ?

The API made the most common use case simple. Just pass your bind variable as if the statement was written in e.g. PL/SQL, and let the language / API do the rest. So we passed that test.

The use case for power users is to occasionally not use bind variables, for whatever reasons (e.g. skew in data and bad statistics, see also this post about bind variables).Will we pass that test as well?

jOOQ mainly offers two ways to fix this:

On a per-query basis

You can turn your variable into an inline value explicitly for this single occasion:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(inline(1))) // Predicate without bind value here
   .fetch();

This is using the static imported DSL.inline() method. Works, but not very convenient, if you have to do this for several queries, for several bind values, or worse, depending on some context.

This is a necessary API enhancement, but it does not make the API extensible.

On a global basis

Notice that ctx object there? It is the DSLContext object, the “contextual DSL”, i.e. the DSL API that is in the context of a jOOQ Configuration. You can thus set:

ctx2 = DSL.using(ctx
    .configuration()
    .derive()
    .set(new Settings()
    .withStatementType(StatementType.STATIC_STATEMENT));

// And now use this new DSLContext instead of the old one
ctx2.select(T.A, T.B)
    .from(T)
    .where(T.C.eq(1)) // No longer a bind variable
    .fetch();

Different approaches to offering such extensibility

We have our clean and simple API. Now some user wants to extend it. So often, we’re tempted to resort to a hack, e.g. by using thread locals, because they would work easily when under the assumption of a thread-bound execution model – such as e.g. classic Java EE Servlets

The price we’re paying for such a hack is high.

  1. It’s a hack, and as such it will break easily. If we offer this as functionality to a user, they will start depending on it, and we will have to support and maintain it
  2. It’s a hack, and it is based on assumptions, such as thread bound ness. It will not work in an async / reactive / parallel stream context, where our logic may jump back and forth between threads
  3. It’s a hack, and deep inside, we know it’s wrong. Obligatory XKCD: https://xkcd.com/292

This might obviously work, just like global (static) variables. You can set this variable globally (or “globally” for your own thread), and then the API’s internals will be able to read it. No need to pass around parameters, so no need to compromise on the APIs simplicity by adding optional and often ugly, distractive parameters.

What are better approaches to offering such extensibility?

Dependency Injection

One way is to use explicit Dependency Injection (DI). If you have a container like Spring, you can rely on Spring injecting arbitrary objects into your method call / whatever, where you need access to it:

This way, if you maintain several contextual objects of different lifecycle scopes, you can let the DI framework make appropriate decisions to figure out where to get that contextual information from. For example, when using JAX-RS, you can do this using an annotation based approach:


// These annotations bind the method to some HTTP address
@GET
@Produces("text/plain")
@Path("/api")
public String method(

    // This annotation fetches a request-scoped object
    // from the method call's context
    @Context HttpServletRequest request,

    // This annotation produces an argument from the
    // URL's query parameters
    @QueryParam("arg") String arg
) {
    ...
}

This approach works quite nicely for static environments (annotations being static), where you do not want to react to dynamic URLs or endpoints. It is declarative, and a bit magic, but well designed, so once you know all the options, you can choose the right one for your use case very easily.

While @QueryParam is mere convenience (you could have gotten the argument also from the HttpServletRequest), the @Context is powerful. It can help inject values of arbitrary lifecycle scope into your method / class / etc.

I personally favour explicit programming over annotation-based magic (e.g. using Guice for DI), but that’s probably a matter of taste. Both are a great way for implementors of APIs (e.g. HTTP APIs) to help get access to framework objects.

However, if you’re an API vendor, and want to give users of your API a way to extend the API, I personally favour jOOQ’s SPI approach.

SPIs

One of jOOQ’s strengths, IMO, is precisely this single, central place to register all SPI implementations that can be used for all sorts of purposes: The Configuration.

For example, on such a Configuration you can specify a JSR-310 java.time.Clock. This clock will be used by jOOQ’s internals to produce client side timestamps, instead of e.g. using System.currentTimeMillis(). Definitely a use case for power users only, but once you have this use case, you really only want to tweak a single place in jOOQ’s API: The Configuration.

All of jOOQ’s internals will always have a Configuration reference available. And it’s up to the user to decide what the scope of this object is, jOOQ doesn’t care. E.g.

  • per query
  • per thread
  • per request
  • per session
  • per application

In other words, to jOOQ, it doesn’t matter at all if you’re implementing a thread-bound, blocking, classic servlet model, or if you’re running your code reactively, or in parallel, or whatever. Just manage your own Configuration lifecycle, jOOQ doesn’t care.

In fact, you can have a global, singleton Configuration and implement thread bound components of it, e.g. the ConnectionProvider SPI, which takes care of managing the JDBC Connection lifecycle for jOOQ. Typically, users will use e.g. a Spring DataSource, which manages JDBC Connection (and transactions) using a thread-bound model, internally using ThreadLocal. jOOQ does not care. The SPI specifies that jOOQ will:

Again, it does not matter to jOOQ what the specific ConnectionProvider implementation does. You can implement it in any way you want if you’re a power user. By default, you’ll just pass jOOQ a DataSource, and it will wrap it in a default implementation called DataSourceConnectionProvider for you.

The key here is again:

  • The API is simple by default, i.e. by default, you don’t have to know about this functionality, just pass jOOQ a DataSource as always when working with Java and SQL, and you’re ready to go
  • The SPI allows for easily extending the API without compromising on its simplicity, by providing a single, central access point to this kind of functionality

Other SPIs in Configuration include:

  • ExecuteListener: An extremely useful and simple way to hook into the entire jOOQ query management lifecycle, from generating the SQL string to preparing the JDBC statement, to binding variables, to execution, to fetching result sets. A single SPI can accomodate various use cases like SQL logging, patching SQL strings, patching JDBC statements, listening to result set events, etc.
  • ExecutorProvider: Whenever jOOQ runs something asynchronously, it will ask this SPI to provide a standard JDK Executor, which will be used to run the asynchronous code block. By default, this will be the JDK default (the default ForkJoinPool), as always. But you probably want to override this default, and you want to be in full control of this, and not think about it every single time you run a query.
  • MetaProvider: Whenever jOOQ needs to look up database meta information (schemas, tables, columns, types, etc.), it will ask this MetaProvider about the available meta information. By default, this will run queries on the JDBC DatabaseMetaData, which is good enough, but maybe you want to wire these calls to your jOOQ-generated classes, or something else.
  • RecordMapperProvider and RecordUnmapperProvider: jOOQ has a quite versatile default implementation of how to map between a jOOQ Record and an arbitrary Java class, supporting a variety of standard approaches including JavaBeans getter/setter naming conventions, JavaBeans @ConstructorProperties, and much more. These defaults apply e.g. when writing query.fetchInto(MyBean.class). But sometimes, the defaults are not good enough, and you want this particular mapping to work differently. Sure, you could write query.fetchInto(record -> mymapper(record)), but you may not want to remember this for every single query. Just override the mapper (and unmapper) at a single, central spot for your own chosen Configuration scope (e.g. per query, per request, per session, etc.) and you’re done

Conclusion

Writing a simple API is difficult. Making it extensible in a simple way, however, is not. If your API has achieved “simplicity”, then it is very easy to support injecting arbitrary SPIs for arbitrary purposes at a single, central location, such as jOOQ’s Configuration.

In my most recent talk “10 Reasons Why we Love Some APIs and Why we Hate Some Others”, I’ve made a point that things like simplicity, discoverability, consistency, and convenience are among the most important aspects of a great API. How do you define a good API? The most underrated answer on this (obviously closed) Stack Overflow question is this one:

.

Again, this is hard in terms of creating a simple API. But it is extremely easy when making this simple API extensible. Make your SPIs very easily discoverable. A jOOQ power user will always look for extension points in jOOQ’s Configuration. And because the extension points are explicit types which have to be implemented (as opposed to annotations and their magic), no documentation is needed to learn the SPI (of course it is still beneficial as a reference).

I’d love to hear your alternative approaches to this API design challenge in the comments.

Watch the full talk here: