What’s Faster? COUNT(*) or COUNT(1)?

One of the biggest and undead myths in SQL is that COUNT(*) is faster than COUNT(1). Or was it that COUNT(1) is faster than COUNT(*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified?

Let’s measure!

How does COUNT(…) work?

But first, let’s look into some theory. The two ways to count things are not exactly the same thing. Why?

  • COUNT(*) counts all the tuples in a group
  • COUNT(<expr>) counts all the tuples in a group for which <expr> evaluates to something that IS NOT NULL

This distinction can be quite useful. Most of the time, we’ll simply COUNT(*) for convenience, but there are (at least) two cases where we don’t want that, for example:

When outer joining

Imagine that in the Sakila database, we have some actors that did not play in any films. Making sure such an actor actually exists:

INSERT INTO actor (actor_id, first_name, last_name)
VALUES (201, 'SUSAN', 'DAVIS');

When inner joining, we might write the following (using PostgreSQL syntax):

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

And we won’t get the newly added SUSAN DAVIS, because of the nature of inner join:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

So we might change our query to use LEFT JOIN instead

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

There she is now, but oops, wrong count! She doesn’t have any films, which we have proven before with the INNER JOIN query. Yet we get 1:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 1|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

Her COUNT(*) value is 1, because we do get 1 film tuple for her in the group, with all columns being NULL. The solution is to count the FILM_ID instead, which cannot be NULL in the table (being a primary key), but only because of the LEFT JOIN:

SELECT actor_id, a.first_name, a.last_name, count(film_id) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

Notice, we could count other things than the primary key, but with the primary key, we’re quite certain we don’t get any other “accidental” nulls in our groups, which we did not want to exclude from the count value.

Now, we’re getting the correct result:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 0|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

When counting subsets of a group

An even more powerful application of counting only non-null evaluations of an expression is counting only subsets of a group. We’ve already blogged about this technique in our previous post about aggregating several expressions in one single query.

For example, counting in a single query:

  • All actors
  • Actors with their first_name starting with A
  • Actors with their first_name ending with A
  • Actors with their first_name containing A

In SQL:

SELECT 
  count(*),
  count(CASE WHEN first_name LIKE 'A%' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A%' THEN 1 END)
FROM actor;

This yields:

count|count|count|count|
-----|-----|-----|-----|
  201|   13|   30|  105|

This is very useful when pivoting data sets (see also Oracle/SQL Server PIVOT clause).

Notice that PostgreSQL supports the SQL standard FILTER clause for this, which is more convenient and more readable. The above query can be written like this, in PostgreSQL:

SELECT 
  count(*),
  count(*) FILTER (WHERE first_name LIKE 'A%'),
  count(*) FILTER (WHERE first_name LIKE '%A'),
  count(*) FILTER (WHERE first_name LIKE '%A%')
FROM actor;

Back to COUNT(*) vs COUNT(1)

Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), counting ALL the rows in the group again.

There should be no difference, and parsers / optimisers should be able to recognise this and not do the extra work of checking every expression evaluation for NULL-ness.

I recently saw this discussion on Twitter, though, where Vik Fearing looked up the PostgreSQL sources, showing that PostgreSQL does do the extra work instead of optimising this:

So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results:

  • MySQL: Doesn’t matter. Sometimes COUNT(1) was faster, sometimes COUNT(*) was faster, so all differences were only benchmark artifacts
  • Oracle: Doesn’t matter. Like MySQL
  • PostgreSQL: Does matter (!). COUNT(*) was consistently faster by around 10% on 1M rows, that’s much more than I had expected
  • SQL Server: Doesn’t matter. Like MySQL

The benchmark code can be found in the following gists:

The results are below. Each benchmark run repeated SELECT COUNT(*) FROM t or SELECT COUNT(1) FROM t 100 times on a 1M row table, and then the benchmark was repeated 5 times to mitigate any warmup penalties and be fair with respect to caching.

The times displayed are relative to the fastest run per database product. This removes any distraction that may be caused by interpreting actual execution times as we do not want to compare database products against each other.

The database versions I’ve used are:

  • MySQL 8.0.16 (in Docker)
  • Oracle 18c XE (in Docker)
  • PostgreSQL 11.3 (in Docker)
  • SQL Server 2017 Express (in Windows)

MySQL

No relevant difference, nor a clear winner:

RUN     STMT    RELATIVE_TIME
-----------------------------
0	1	1.0079
0	2	1.0212
1	1	1.0229
1	2	1.0256
2	1	1.0009
2	2	1.0031
3	1	1.0291
3	2	1.0256
4	1	1.0618
4	2	1.0000

Oracle

No relevant difference, nor a clear winner

Run 1, Statement 1 : 1.06874
Run 1, Statement 2 : 1.01982
Run 2, Statement 1 : 1.09175
Run 2, Statement 2 : 1.0301
Run 3, Statement 1 : 1.00308
Run 3, Statement 2 : 1.02499
Run 4, Statement 1 : 1.02503
Run 4, Statement 2 : 1
Run 5, Statement 1 : 1.01259
Run 5, Statement 2 : 1.05828

PostgreSQL

A significant, consistent difference of almost 10%:

RUN 1, Statement 1: 1.00134
RUN 1, Statement 2: 1.09538
RUN 2, Statement 1: 1.00190
RUN 2, Statement 2: 1.09115
RUN 3, Statement 1: 1.00000
RUN 3, Statement 2: 1.09858
RUN 4, Statement 1: 1.00266
RUN 4, Statement 2: 1.09260
RUN 5, Statement 1: 1.00454
RUN 5, Statement 2: 1.09694

Again, I’m surprised by the order of magnitude of this difference. I would have expected it to be less. Curious to hear about your own results in the comments, or further ideas why this is so significant in PostgreSQL.

SQL Server

No relevant difference, nor a clear winner

Run 1, Statement 1: 1.00442
Run 1, Statement 2: 1.00702
Run 2, Statement 1: 1.00468
Run 2, Statement 2: 1.00000
Run 3, Statement 1: 1.00208
Run 3, Statement 2: 1.00624
Run 4, Statement 1: 1.00780
Run 4, Statement 2: 1.00364
Run 5, Statement 1: 1.00468
Run 5, Statement 2: 1.00702

Conclusion

As it is now in 2019, given the database versions mentioned above, unfortunately, there is a significant difference between COUNT(*) and COUNT(1) in PostgreSQL. Luckily (and this is rare in SQL), all the other dialects don’t care and thus, consistently using COUNT(*), rather than COUNT(1) is a slightly better choice for ALL measured database products from this article.

Do note that the benchmark only tried a very simple query! The results might be different when using joins, unions, or any other SQL constructs, or in other edge cases, e.g. when using COUNT() in HAVING or ORDER BY or with window functions, etc.

In any case, there shouldn’t be any difference, and I’m sure that a future PostgreSQL version will optimise the constant expression in the COUNT(<expr>) aggregate function directly in the parser to avoid the extra work.

For other interesting optimisations that do not depend on the cost model, see this article here.

Oracle’s BINARY_DOUBLE Can Be Much Faster Than NUMBER

Using the right data type for some calculation sounds like some obvious advice.

There are many blogs about using temporal data types for temporal data, instead of strings. An obvious reason is data integrity and correctness. We don’t gain much in storing dates as 2019-09-10 in one record, and as Nov 10, 2019 in the next one. Performance is also important in this case – for example, because of statistics being more optimal (an often overlooked side effect of unfit data types).

But at a customer site, I’ve recently discovered a surprising (not surprising in hindsight) performance issue when using NUMBER instead of BINARY_DOUBLE in an Oracle database.

NUMBER for monetary amounts

The NUMBER type (or DECIMAL, NUMERIC in other RDBMS) is perfectly suited for all decimal numbers, which require the correct precision and rounding. I mean, if you ever encounter this kind of display in an invoice, such as when I purchase video games on steam:

… your trust level for the platform immediately goes down by factor 1.0e+1 (even if technically, it does not matter in this case). So, the default, e.g. in banking systems for monetary amounts is always to use NUMBER or an equivalent type, e.g. java.math.BigDecimal.

Calculations on numbers

But now, let’s assume we want to do some statistics on these amounts. Some aggregations of various numeric values, maybe even of their logarithms. We can establish that these two expressions are equal:

ln(a * b) = ln(a) + ln(b)

Or in other words, for positive numbers:

a * b = exp(ln(a) + ln(b))

We’ve already blogged about how this approach can be used to emulate a PRODUCT() aggregate function in SQL, which is very useful for some cases, but none of the SQL databases jOOQ supports has built-in support for this yet. Notice, the blog post also takes care of zero and negative numbers.

But which number type to choose?

Now, we might be tempted to just calculate the LN(SOME_NUMBER) value, and sum that up using SUM(LN(SOME_NUMBER)) for this use-case. This turns out to be terribly slow in Oracle. We were thinking of bad indexes, first, even doubted aggregation in general, until I suggested we try using BINARY_DOUBLE instead, in this case. In our case, we didn’t care about the exact numeric value of the amount. A IEEE 754 floating point number with double precision was going to be good enough.

The results were surprising. In a simple benchmark, we compared 10 approaches to calculating this sum of logarithms:

  1. Using NUMBER(20, 10) and SUM(LN(the_number))
  2. Using NUMBER(20, 10) and SUM(LN(CAST(the_number AS BINARY_DOUBLE)))
  3. Using NUMBER(20, 10) and SUM(LN(TO_BINARY_DOUBLE(the_number)))
  4. Using NUMBER(20, 10), containing a pre-calculated LN value
  5. Using DOUBLE PRECISION and SUM(LN(the_double))
  6. Using DOUBLE PRECISION and SUM(LN(CAST(the_double AS BINARY_DOUBLE)))
  7. Using DOUBLE PRECISION and SUM(LN(TO_BINARY_DOUBLE(the_double)))
  8. Using DOUBLE PRECISION, containing a pre-calculated LN value
  9. Using BINARY_DOUBLE and SUM(LN(the_binary))
  10. Using BINARY_DOUBLE, containing a pre-calculated LN value

These were the thoughts:

  • We tried the above 3 possible numeric data types, expecting BINARY_DOUBLE to be the fastest
  • We tried to pre-calculate the LN() value for the benchmark, to see how much effort goes into summing, and how much effort goes into the LN() calculation with each type. While in general, in this system, such precalculation is impractical, we still wanted to have a benchmark comparison, in case a materialised view or other technique would be feasible.
  • We tried casting and converting each type to BINARY_DOUBLE prior to passing the value to the LN() function. Instead of migrating all the data (with possible side effects), we wanted to see if we can solve this to a reasonable extent “on the fly”

The benchmark that I’m posting here uses this table and example data (full benchmark code at the end of the article):

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

So, we have 100000 records, whose SUM(LN(x)) we want to calculate in the above 10 different ways. N1 contains the raw numeric value, and N2 contains the pre-calculated LN(N1).

The benchmark technique is described here. Do note it has a lot of caveats and is only useful for a limited number of verifications. Please always be careful when running such benchmarks, they often do not test production-like situations – and never use such benchmarks to compare different RDBMS products directly. They are only useful to compare two approaches on the same RDBMS product.

The results as run on Oracle 18c XE in Docker on development hardware are below. Times are compared relative to the fastest run, as actual time spent in each execution is not interesting for comparison. We did have similar results on production-like hardware, though, and I’m sure you can produce similar results in other RDBMS:

NUMBER(20, 10)
-------------------------------------------------
Run 3, Statement 1 : 280.34143  (avg : 280.75347)
Run 3, Statement 2 : 7.99402    (avg : 8.03506)
Run 3, Statement 3 : 7.71383    (avg : 7.73017)
Run 3, Statement 4 : 1.05456    (avg : 1.11735)

DOUBLE PRECISION
------------------------------------------------
Run 3, Statement 5 : 278.89476 (avg : 279.72981)
Run 3, Statement 6 : 8.06512   (avg : 8.07033)
Run 3, Statement 7 : 7.81873   (avg : 7.80063)
Run 3, Statement 8 : 1.5315    (avg : 1.54347)

BINARY_DOUBLE
------------------------------------------------
Run 3, Statement 9 : 2.4963    (avg : 2.57184)
Run 3, Statement 10: 1         (avg : 1.02943)

How to read these results?

  • Statement 10 is the fastest one, unsurprisingly, as it aggregates pre-calculated LN(binary_double) values. The precalculation of the function means that all the work has been done already before the report, and the data type is the one we expected to perform best in general
  • Statements 4 and 8 are almost as fast (precalculated LN() values). Being only slight factors off, we can attribute the difference to the usual benchmark flaws, although it’s interesting to see that DOUBLE PRECISION seems 1.5x slower to sum than BINARY_DOUBLE and even NUMBER
  • Statements 1, 5, 9 are the ones where no data type conversion is applied and SUM(LN(the_value)) is being calculated. It is staggering how much slower both NUMBER and DOUBLE PRECISION are than BINARY_DOUBLE. Statements 1 and 5 are a factor of 112x slower than statement 9!
  • Statements 2-3, 6-7 prove that converting the NUMBER or DOUBLE PRECISION vales to BINARY_DOUBLE on the fly provides a sufficiently performant workaround, which made statements 2-3, 6-7 only 3x slower than statement 9
  • Statements 2-3, 6-7 show that casting and converting are about equivalent

Note, we found similar results with other functions, such as EXP()

Analysis

The order of magnitude may seem surprising at first, but thinking about it, it is not. We would never do CPU intensive computation with java.math.BigDecimal in Java. The BigDecimal type is there for numeric accuracy, e.g. when it really matters what the monetary amount is, exactly. When we run analytics on monetary amounts, using double is sufficient in Java as well.

If our data is BigDecimal, and we cannot reasonably change that, it might still be better to use the BigDecimal::doubleValue conversion prior to further processing using e.g. Math::log. So, this translates directly to SQL, whose LN() implementations are data type specific. IEEE 754 having been designed precisely for this purpose.

When doing CPU intensive computations both in Java, or in the database, we should always evaluate our various options of

  • Quick fixing our data sets for the report (ad-hoc conversion prior to calculation)
  • Thoroughly fixing our data sets in the schema (migration towards a better data type)
  • Preprocessing our data sets (precalculating some very commonly used computations)

Benchmark logic

Just run the below on your own hardware. I’m curious to see your results:

-- Copyright Data Geekery GmbH
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
--     http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- This version displays relative execution times (fastest execution = 1)
SET SERVEROUTPUT ON

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 10;
  v_stmt NUMBER;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
    v_stmt := 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(n1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(n1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(n1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(n2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(d1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(d1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(d1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(d2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(b1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(b2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) / MIN(elapsed) OVER() AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE data;
DROP TABLE results;

Using DISTINCT ON in Non-PostgreSQL Databases

A nice little gem in PostgreSQL’s SQL syntax is the DISTINCT ON clause, which is as powerful as it is esoteric.

In a previous post, we’ve blogged about some caveats to think of when DISTINCT and ORDER BY are used together. The bigger picture can be seen in our article about the logical order of operations in SQL SELECT.

The PostgreSQL documentation explains it well:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. […] For example:

SELECT DISTINCT ON (location) location, time, report
    FROM weather_reports
    ORDER BY location, time DESC;

retrieves the most recent weather report for each location. […]

Again, this is quite esoteric as the distinct-ness is now decided only based on the columns listed separately in parentheses. For all the other columns, only the first row according to ORDER BY is projected. The SQL language is probably one of the only ones where the syntactic order of operations has absolutely nothing to do with the logical order of operations, and this DISTINCT ON syntax isn’t helping. In a more straightforward language design, the above statement could read, instead:

FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
SELECT 
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
DISTINCT
ORDER BY location

In other words, we would execute these operations in the following logical order:

  1. FROM: Access the weather_reports table
  2. WINDOW: Specify “windows” or “groups” in our data set, grouping by location and ordering contents of each group by time, descendingly (using the term “WINDOW” is no accident as we’ll see afterwards)
  3. SELECT: Project the location (which is the grouping) and per group, the first values of time / report ordered by time
  4. DISTINCT: Remove all the duplicates, because the above operation will produce the same time and report value for each record that shares the same location
  5. ORDER BY: Finally, order the results per grouping

That’s what really happens, and incidentally, the above synthetic SQL syntax matches the actual logical order of operations in the SQL language, so translating it back to an actual SQL statement would yield:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
ORDER BY location

If your database doesn’t support the WINDOW clause, just expand it into the individual window functions. E.g. in Oracle, write:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER (PARTITION BY location ORDER BY time DESC),
  FIRST_VALUE (report) OVER (PARTITION BY location ORDER BY time DESC)
FROM weather_reports
ORDER BY location

From a readability perspective, I would definitely prefer the standard SQL syntax over DISTINCT ON.

Want to play around with it? Here’s some sample data:

create table weather_reports (location text, time date, report text);
insert into weather_reports values ('X', DATE '2000-01-01', 'X1');
insert into weather_reports values ('X', DATE '2000-01-02', 'X2');
insert into weather_reports values ('X', DATE '2000-01-03', 'X3');
insert into weather_reports values ('Y', DATE '2000-01-03', 'Y1');
insert into weather_reports values ('Y', DATE '2000-01-05', 'Y2');
insert into weather_reports values ('Z', DATE '2000-01-04', 'Z1');

The result being:

|location|time      |report|
|--------|----------|------|
|X       |2000-01-03|X3    |
|Y       |2000-01-05|Y2    |
|Z       |2000-01-04|Z1    |

Notice that jOOQ already supports PostgreSQL DISTINCT ON and in the future, we might emulate it for other dialects using the above technique: https://github.com/jOOQ/jOOQ/issues/3564

Quantified LIKE ANY predicates in jOOQ 3.12

Quantified comparison predicates

One of SQL’s weirdes features are quantified comparison predicates. I’ve hardly ever seen these in the wild:

SELECT *
FROM t
WHERE id = ANY (1, 2, 3)

The above example is equivalent to using the much more readable IN predicate:

SELECT *
FROM t
WHERE id IN (1, 2, 3)

This equivalence is defined in the SQL standard. There are more esoteric cases that could be solved using such quantified comparison predicates more conveniently than otherwise, such as:

SELECT *
FROM t
WHERE (a, b) > ALL (
  SELECT x, y
  FROM u
)

This is the same thing as writing the more verbose, and in my opinion, a bit less readable:

SELECT *
FROM t
WHERE (a, b) > (
  SELECT x, y
  FROM u
  ORDER BY x, y
  FETCH FIRST ROW ONLY
)

Assuming, of course, that your RDBMS can compare row value expressions like that.

Quantified LIKE predicate

Unfortuantely, the SQL standard and most implementations support the above quantified comparison predicates only for the <, <=, >, >=, =, != comparison operators. Not for other predicate types. For example, the LIKE predicate would profit greatly from such a syntax:

SELECT *
FROM customers
WHERE last_name LIKE ANY ('A%', 'B%', 'C%')

The syntax is immediately understandable and translates to

SELECT *
FROM customers
WHERE last_name LIKE 'A%'
OR last_name LIKE 'B%'
OR last_name LIKE 'C%'

… which is much less convenient to write!

Furthermore, imagine producing such patterns from a subquery:

SELECT *
FROM customers
WHERE last_name LIKE ANY (
  SELECT pattern
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

This is a bit trickier to emulate in standard SQL. For example, in PostgreSQL, we could write:

SELECT *
FROM customers
WHERE true = ANY (
  SELECT last_name LIKE pattern
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

In this case, we can use boolean types. Oracle would make this a bit more difficult:

SELECT *
FROM customers
WHERE 1 = ANY (
  SELECT CASE
    WHEN last_name LIKE pattern THEN 1
    WHEN NOT(last_name LIKE pattern) THEN 0
    ELSE NULL
  END
  FROM patterns
  WHERE pattern.customer_type = customer.customer_type
)

Wouldn’t this a useful SQL feature to support?

jOOQ 3.12 support for this

jOOQ does support this syntax starting from jOOQ 3.12. You can now write

ctx.selectFrom(CUSTOMERS)
   .where(CUSTOMERS.LAST_NAME.like(any("A%", "B%", "C%")))
   .fetch();

ctx.selectFrom(CUSTOMERS)
   .where(CUSTOMERS.LAST_NAME.like(any(
      select(PATTERNS.PATTERN)
      .from(PATTERNS)
      .where(PATTERN.CUSTOMER_TYPE.eq(CUSTOMER.CUSTOMER_TYPE))
   )))
   .fetch();

All the previously mentioned emulations are available. You can play around with it downloading jOOQ:
https://www.jooq.org/download

Or directly on our website:
https://www.jooq.org/translate

jOOQ 3.12 Released With a new Procedural Language API

jOOQ 3.12 has been released with a new procedural language API, new data types, MemSQL support, formal Java 11+ support, a much better parser, and reactive stream API support

In this release, we’ve focused on a lot of minor infrastructure tasks, greatly
improving the overall quality of jOOQ. We’ve reworked some of our automated
integration tests, which has helped us fix a big number of not yet discovered
issues, including a much better coverage of our 26 supported RDBMS dialects.

We’re excited about these internal changes, as they will help us implement a lot
of features that have been requested by many for a long time, including an
immutable query object model, with all the secondary benefits like caching of
generated SQL, and much more powerful dynamic SQL construction and
transformation in the future.

Major new features include the new procedural language API shipped with our
commercial distributions, new data types including native JSON support, MemSQL
support, formal Java 11+ support, a much better parser, and reactive stream API
support.

Procedural languages

Following up on jOOQ 3.11’s support for anonymous blocks, the jOOQ 3.12
Professional and Enterprise Editions now include support for a variety of
procedural language features, including

  • Variable declarations
  • Variable assignments
  • Loops (WHILE, REPEAT, FOR, LOOP)
  • If Then Else
  • Labels
  • Exit, Continue, Goto
  • Execute

This feature set is part of our ongoing efforts to continue supporting more
advanced vendor specific functionality, including our planned definition and
translation of stored procedures, triggers, and other, ad-hoc procedural logic
that helps move data processing logic into the database server.

New Databases Supported

The jOOQ Professional Edition now supports the MemSQL dialect. MemSQL is derived
from MySQL, although our integration tests have shown that there are numerous
differences, such that supporting MemSQL formally will add a lot of value to our
customers by providing much increased syntactic correctness.

Reactive streams

The reactive programming model is gaining traction in some environments as new,
useful streaming APIs emerge, such as e.g. Reactor. These APIs have agreed to
work with a common SPI: reactive streams, and since JDK 9 the new
java.util.concurrent.Flow SPI. jOOQ 3.12 now implements these paradigms on an
API level, such that an integration with APIs like Reactor becomes much more
easy. The implementation still binds to JDBC, and is thus blocking. Future
versions of jOOQ will abstract over JDBC to allow for running queries against
ADBA (by Oracle) or R2DBC (by Spring)

New data types

We’ve introduced native support for a few new data types, which are often very
useful in specific situations. These include:

  • JSON / JSONB: A native string wrapper for textual and binary JSON data. While
    users will still want to bind more specific JSON to maps and lists using
    custom data type Bindings, in a lot of cases, being able to just serialise and
    deserialise JSON content as strings is sufficient. jOOQ now provides out of
    the box support for this approach for various SQL dialects.
  • INSTANT: RDBMS do not agree on the meaning of the SQL standard TIMESTAMP WITH
    TIME ZONE. PostgreSQL, for example, interprets it as a unix timestamp, just
    like java.time.Instant. For an optimal PostgreSQL experience, this new INSTANT
    type will be much more useful than the standard JDBC java.time.OffsetDateTime
    binding.
  • ROWID: Most RDBMS have a native ROWID / OID / CTID / physloc identity
    value that physically identifies a row on the underlying storage system,
    irrespective of any logical primary key. These ROWIDs can be leveraged to run
    more performant, vendor specific queries. Supporting this type allows for
    easily using this feature in arbitrary queries.

Parser

Our parser is seeing a lot of continued improvements over the releases as we
gather feedback from our users. Our main drivers for feedback are:

  • The DDLDatabase which allows for generating code from DDL scripts rather than
    live JDBC connections to your database
  • The https://www.jooq.org/translate website, which translates any kind of SQL
    between database dialects.

SQL dialect translation will evolve into an independent product in the future.
DDL parsing is already very powerful, and a lot of customers rely on it for
their production systems.

In the next versions, we will be able to simulate DDL on our own, without H2,
which will open up a variety of possible use cases, including better schema
management.

Specific jOOQ 3.12 parser improvements include:

  • Being able to access schema meta information (column types, constraints) to
    better emulate SQL features / translate SQL syntax between dialects
  • A parse search path, similar to PostgreSQL’s search_path, or other dialects’
    current_schema, allowing support for unqualified object references.
  • The DDL simulation from the DDLDatabase is now moved into the core library,
    supporting it also out of the box as a DDL script based meta data source
  • A new special comment syntax that helps ignoring SQL fragments in the jOOQ
    parser only, while executing it in your ordinary SQL execution.
  • A new interactive mode in the ParserCLI
  • Support for nested block comments
  • And much more

Formal Java 11 Support

While we have been supporting Java 11 for a while through our integration tests,
jOOQ 3.12 now fully supports Java 11 to help improve the experience around the
transitive JAXB dependency, which we now removed entirely out of jOOQ.

The commercial editions ship with a Java 11+ supporting distribution, which
includes more optimal API usage, depending on new Java 9-11 APIs. All editions,
including the jOOQ Open Source Edition, have a Java 8+ distribution that
supports any Java version starting from Java 8.

Commercial Editions

Dual licensing is at the core of our business, helping us to provide continued
value to our customers.

In the past, the main distinction between the different jOOQ editions was the
number of database products each edition supported. In the future, we want to
provide even more value to our customers with commercial subscriptions. This is
why, starting from jOOQ 3.12, we are now offering some new, advanced features
only in our commercial distributions. Such features include:

  • The procedural language API, which is available with the jOOQ Professional
    and Enterprise Editions
  • While the jOOQ 3.12 Open Source Edition supports Java 8+, the jOOQ 3.12
    Professional Edition also ships with a Java 11+ distribution, leveraging some
    newer JDK APIs, and the jOOQ 3.12 Enterprise Edition continues supporting
    Java 6 and 7.
  • Since Java 8 still sees very substantial market adoption, compared to Java 11,
    we still support Java 8 in the jOOQ 3.12 Open Source Edition.
  • Starting from jOOQ 3.12, formal support for older RDBMS dialect versions in
    the runtime libraries is reserved to the jOOQ Professional and Enterprise
    Editions. The jOOQ Open Source Edition will ship with support for the latest
    version of an RDBMS dialect, only. The code generator is not affected by this
    change.

By offering more value to our paying customers, we believe that we can continue
our successful business model, which in turn allows us to continue the free
jOOQ Open Source Edition for free. Our strategy is:

  • To implement new, advanced, commercial only features.
  • To offer legacy support (legacy Java versions, legacy database versions) to
    paying customers only.
  • To continue supporting a rich set of features to Open Source Edition users.

H2 and SQLite integration

Over the past year, both H2 and SQLite have seen a lot of improvements, which we
have now supported in jOOQ as well. Specifically, H2 is moving at a very fast
pace, and our traditional close cooperation got even better as we’re helping
the H2 team with our insights into the SQL standards, while the H2 team is
helping us with our own implementations.

Other improvements

The complete list of changes can be found on our website:
https://www.jooq.org/notes

A few improvements are worth summarising here explicitly

  • We’ve added support for a few new SQL predicates, such as the standard
    UNIQUE and SIMILAR TO predicates, as well as the synthetic, but very useful
    LIKE ANY predicate.
  • The JAXB implementation dependency has been removed and replaced by our own
    simplified implementation for a better Java 9+ experience.
  • The historic log4j (1.x) dependency has been removed. We’re now logging only
    via the optional slf4j dependency (which supports log4j bridges), or
    java.util.logging, if slf4j cannot be found on the classpath.
  • The shaded jOOR dependency has been upgraded to 0.9.12.
  • We’ve greatly improved our @Support annotation usage for better use with
    jOOQ-checker.
  • jOOQ-checker can now run with ErrorProne as well as with the checker framework
    as the latter still does not support Java 9+.
  • We’ve added support for a lot of new DDL statements and clauses.
  • There is now a synthetic PRODUCT() aggregate and window function.
  • We added support for the very useful window functions GROUPS mode.
  • Formatting CSV, JSON, XML now supports nested formatting.
  • UPDATE / DELETE statements now support (and emulate) ORDER BY and LIMIT.
  • When constructing advanced code generation configuration, users had to resort
    to using programmatic configuration. It is now possible to use SQL statements
    to dynamically construct regular expression matching tables, columns, etc.
  • Configuration has a new UnwrapperProvider SPI.
  • MockFileDatabase can now handle regular expressions and update statements.
  • Settings can cleanly separate the configuration of name case and quotation.
  • MySQL DDL character sets are now supported, just like collations.
  • A new Table.where() API simplifies the construction of simple derived tables.
    This feature will be very useful in the future, for improved row level
    security support.
  • A nice BigQuery and H2 feature is the “* EXCEPT (…)” syntax, which allows
    for removing columns from an asterisked expression. We now have
    Asterisk.except() and QualifiedAsterisk.except().
  • A lot of improvements in date time arithmetic were added, including support
    for vendor specific DateParts, like WEEK.

Full release notes here: https://www.jooq.org/notes

How to Fetch All Current Identity Values in Oracle

Oracle 12c has introduced the useful SQL standard IDENTITY feature, which is essentially just syntax sugar for binding a sequence to a column default. We can use it like this:

create table t1 (col1 number generated always as identity);
create table t2 (col2 number generated always as identity);

insert into t1 values (default);
insert into t1 values (default);
insert into t1 values (default);
insert into t2 values (default);

select * from t1;
select * from t2;

Which produces

COL1
----
  1
  2
  3

COL2
----
  1

For unit testing against our database, we might want to know what “state” our identities are in. For each table, we would like to know the next value such an identity would produce. If we knew all the backing sequence names, we could query their seq.currval, but we don’t know those sequence names as they are generated.

However, we can query the dictionary views to get this information as follows:

select data_default
from user_tab_cols
where data_default is not null
and identity_column = 'YES'
and table_name in ('T1', 'T2');

An alternative is to query user_tab_identity_cols

This would produce:

"TEST"."ISEQ$$_116601".nextval
"TEST"."ISEQ$$_116603".nextval

Now, if we’re lazy, we could just run EXECUTE IMMEDIATE on each of those expressions and we’re done:

set serveroutput on
declare
  v_current number;
begin
  for rec in (
    select table_name, data_default
    from user_tab_cols
    where data_default is not null
    and identity_column = 'YES'
    and table_name in ('T1', 'T2')
  ) loop
    execute immediate replace(
      'select ' || rec.data_default || ' from dual', 
      '.nextval', 
      '.currval'
    ) into v_current;
    dbms_output.put_line(
      'Table : ' || rec.table_name || 
      ', currval : ' || v_current
    );
  end loop;
end;
/

This would produce:

Table : T1, currval : 3
Table : T2, currval : 1

Alternatively, if you want this result to be a SQL result instead of DBMS_OUTPUT content, you could run this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select data_default
      from user_tab_cols
      where table_name = p_table_name
      and data_default is not null
      and identity_column = 'YES'
    )
    loop
      execute immediate replace(
        'select ' || rec.data_default || ' from dual', 
        '.nextval', 
        '.currval'
      ) into v_current;
      return v_current;
    end loop;
    
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
  where table_name in ('T1', 'T2')
)
where current_value is not null
order by table_name;
/

The alternative using user_tab_identity_cols would look like this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select sequence_name
      from user_tab_identity_cols
      where table_name = p_table_name
    )
    loop
      execute immediate 
        'select ' || rec.sequence_name || '.currval from dual'
      into v_current;
      return v_current;
    end loop;
     
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
)
where current_value is not null
order by table_name;
/

The result is now a nice SQL result set:

TABLE_NAME   CURRENT_VALUE
--------------------------
T1           3
T2           1

How to Use jOOQ’s Commercial Distributions with Spring Boot

Spring Boot is great to get started very quickly with what the Spring Boot authors have evaluated to be useful defaults. This can be a lot of help when you’re doing things for the first time, and have no way to copy paste working Maven pom.xml files from existing projects, for example.

When working with the jOOQ Open Source Edition, just go to https://start.spring.io, add the jOOQ dependency, and start working!

It is a bit different when you want to work with the commercial distributions of jOOQ, for two reasons:

  1. They are not on Maven Central, but in your own repository or artifactory, after you’ve installed the latest version from our website: https://www.jooq.org/download/versions
  2. They use a different Maven groupId, to make sure the different distributions can be easily distinguished.

The different groupIds for jOOQ distributions are:

org.jooq For the jOOQ Open Source Edition
org.jooq.trial For the jOOQ Trial Edition
org.jooq.pro For the jOOQ Express, Professional and Enterprise Edition (supporting the latest JDK versions)
org.jooq.pro-java-6 For the jOOQ Express, Professional and Enterprise Edition (supporting Java 6+)
org.jooq.pro-java-8 For the jOOQ Express, Professional and Enterprise Edition (supporting Java 8+, starting from jOOQ 3.12)

Spring Boot doesn’t know this, and doesn’t have to. All of these distributions are largely source and binary compatible, so you can switch editions in your application simply by replacing dependencies. A vanilla https://start.spring.io pom.xml configuration might look like this.

Notice: I’m leaving out spring-boot-starter-test, spring-boot-maven-plugin, and other things not essential for this blog post, please use https://start.spring.io to generate a more complete pom.xml stub!

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    </dependency>
  </dependencies>
</project>

What dependencies are we getting from this?

mvn dependency:tree

We’re getting:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] \- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO]    +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO]    |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO]    |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO]    |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO]    |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO]    |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO]    |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO]    |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO]    |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO]    |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO]    |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO]    +- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO]    |  +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO]    |  \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO]    |     \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO]    \- org.jooq:jooq:jar:3.11.11:compile
[INFO]       \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]          \- javax.activation:javax.activation-api:jar:1.2.0:compile

When this blog post was written, 3.11.11 was the latest jOOQ Open Source Edition version. But perhaps, you want a newer version or an older version. You can override this easily by specifying the ${jooq.version} property in Maven:

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <properties>
    <jooq.version>3.11.0</jooq.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    </dependency>
  </dependencies>
</project>

The dependency tree is now:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] \- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO]    +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO]    |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO]    |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO]    |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO]    |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO]    |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO]    |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO]    |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO]    |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO]    |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO]    |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO]    |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO]    |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO]    |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO]    +- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO]    |  +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO]    |  \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO]    |     \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO]    \- org.jooq:jooq:jar:3.11.0:compile
[INFO]       \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]          \- javax.activation:javax.activation-api:jar:1.2.0:compile

But it’s still the jOOQ Open Source Edition. What if you want a commercial distribution, e.g. to try out jOOQ? One way is to explicitly exclude Spring Boot’s transitive jOOQ Open Source Edition dependency, and introduce your own explicit dependency. For example:

<project>
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.6.RELEASE</version>
  </parent>
  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  
  <properties>
    <jooq.version>3.11.11</jooq.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jooq</artifactId>
    
      <!-- Exclude the jOOQ Open Source Edition -->
      <exclusions>
        <exclusion>
          <groupId>org.jooq</groupId>
          <artifactId>jooq</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  
    <!-- Include a commercial jOOQ distribution -->
    <dependency>
      <groupId>org.jooq.trial</groupId>
      <artifactId>jooq</artifactId>
      <version>${jooq.version}</version>
    </dependency>
  </dependencies>
</project>

The new dependency tree is now:

[INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ demo ---
[INFO] com.example:demo:jar:0.0.1-SNAPSHOT
[INFO] +- org.springframework.boot:spring-boot-starter-jooq:jar:2.1.6.RELEASE:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.1.6.RELEASE:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-starter:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  |  \- org.springframework:spring-context:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  |     +- org.springframework:spring-aop:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  |     \- org.springframework:spring-expression:jar:5.1.8.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.1.6.RELEASE:compile
[INFO] |  |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO] |  |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO] |  |  |  |  +- org.apache.logging.log4j:log4j-to-slf4j:jar:2.11.2:compile
[INFO] |  |  |  |  |  \- org.apache.logging.log4j:log4j-api:jar:2.11.2:compile
[INFO] |  |  |  |  \- org.slf4j:jul-to-slf4j:jar:1.7.26:compile
[INFO] |  |  |  +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO] |  |  |  \- org.yaml:snakeyaml:jar:1.23:runtime
[INFO] |  |  +- com.zaxxer:HikariCP:jar:3.2.0:compile
[INFO] |  |  |  \- org.slf4j:slf4j-api:jar:1.7.26:compile
[INFO] |  |  \- org.springframework:spring-jdbc:jar:5.1.8.RELEASE:compile
[INFO] |  \- org.springframework:spring-tx:jar:5.1.8.RELEASE:compile
[INFO] |     +- org.springframework:spring-beans:jar:5.1.8.RELEASE:compile
[INFO] |     \- org.springframework:spring-core:jar:5.1.8.RELEASE:compile
[INFO] |        \- org.springframework:spring-jcl:jar:5.1.8.RELEASE:compile
[INFO] \- org.jooq.trial:jooq:jar:3.11.11:compile
[INFO]    \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO]       \- javax.activation:javax.activation-api:jar:1.2.0:compile

And you’re all set!