What’s Faster? COUNT(*) or COUNT(1)?

One of the biggest and undead myths in SQL is that COUNT(*) is faster than COUNT(1). Or was it that COUNT(1) is faster than COUNT(*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified?

Let’s measure!

How does COUNT(…) work?

But first, let’s look into some theory. The two ways to count things are not exactly the same thing. Why?

  • COUNT(*) counts all the tuples in a group
  • COUNT(<expr>) counts all the tuples in a group for which <expr> evaluates to something that IS NOT NULL

This distinction can be quite useful. Most of the time, we’ll simply COUNT(*) for convenience, but there are (at least) two cases where we don’t want that, for example:

When outer joining

Imagine that in the Sakila database, we have some actors that did not play in any films. Making sure such an actor actually exists:

INSERT INTO actor (actor_id, first_name, last_name)
VALUES (201, 'SUSAN', 'DAVIS');

When inner joining, we might write the following (using PostgreSQL syntax):

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

And we won’t get the newly added SUSAN DAVIS, because of the nature of inner join:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

So we might change our query to use LEFT JOIN instead

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

There she is now, but oops, wrong count! She doesn’t have any films, which we have proven before with the INNER JOIN query. Yet we get 1:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 1|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

Her COUNT(*) value is 1, because we do get 1 film tuple for her in the group, with all columns being NULL. The solution is to count the FILM_ID instead, which cannot be NULL in the table (being a primary key), but only because of the LEFT JOIN:

SELECT actor_id, a.first_name, a.last_name, count(film_id) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

Notice, we could count other things than the primary key, but with the primary key, we’re quite certain we don’t get any other “accidental” nulls in our groups, which we did not want to exclude from the count value.

Now, we’re getting the correct result:

actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 0|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|

When counting subsets of a group

An even more powerful application of counting only non-null evaluations of an expression is counting only subsets of a group. We’ve already blogged about this technique in our previous post about aggregating several expressions in one single query.

For example, counting in a single query:

  • All actors
  • Actors with their first_name starting with A
  • Actors with their first_name ending with A
  • Actors with their first_name containing A

In SQL:

SELECT 
  count(*),
  count(CASE WHEN first_name LIKE 'A%' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A%' THEN 1 END)
FROM actor;

This yields:

count|count|count|count|
-----|-----|-----|-----|
  201|   13|   30|  105|

This is very useful when pivoting data sets (see also Oracle/SQL Server PIVOT clause).

Notice that PostgreSQL supports the SQL standard FILTER clause for this, which is more convenient and more readable. The above query can be written like this, in PostgreSQL:

SELECT 
  count(*),
  count(*) FILTER (WHERE first_name LIKE 'A%'),
  count(*) FILTER (WHERE first_name LIKE '%A'),
  count(*) FILTER (WHERE first_name LIKE '%A%')
FROM actor;

Back to COUNT(*) vs COUNT(1)

Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), counting ALL the rows in the group again.

There should be no difference, and parsers / optimisers should be able to recognise this and not do the extra work of checking every expression evaluation for NULL-ness.

I recently saw this discussion on Twitter, though, where Vik Fearing looked up the PostgreSQL sources, showing that PostgreSQL does do the extra work instead of optimising this:

So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results:

  • MySQL: Doesn’t matter. Sometimes COUNT(1) was faster, sometimes COUNT(*) was faster, so all differences were only benchmark artifacts
  • Oracle: Doesn’t matter. Like MySQL
  • PostgreSQL: Does matter (!). COUNT(*) was consistently faster by around 10% on 1M rows, that’s much more than I had expected
  • SQL Server: Doesn’t matter. Like MySQL

The benchmark code can be found in the following gists:

The results are below. Each benchmark run repeated SELECT COUNT(*) FROM t or SELECT COUNT(1) FROM t 100 times on a 1M row table, and then the benchmark was repeated 5 times to mitigate any warmup penalties and be fair with respect to caching.

The times displayed are relative to the fastest run per database product. This removes any distraction that may be caused by interpreting actual execution times as we do not want to compare database products against each other.

The database versions I’ve used are:

  • MySQL 8.0.16 (in Docker)
  • Oracle 18c XE (in Docker)
  • PostgreSQL 11.3 (in Docker)
  • SQL Server 2017 Express (in Windows)

MySQL

No relevant difference, nor a clear winner:

RUN     STMT    RELATIVE_TIME
-----------------------------
0	1	1.0079
0	2	1.0212
1	1	1.0229
1	2	1.0256
2	1	1.0009
2	2	1.0031
3	1	1.0291
3	2	1.0256
4	1	1.0618
4	2	1.0000

Oracle

No relevant difference, nor a clear winner

Run 1, Statement 1 : 1.06874
Run 1, Statement 2 : 1.01982
Run 2, Statement 1 : 1.09175
Run 2, Statement 2 : 1.0301
Run 3, Statement 1 : 1.00308
Run 3, Statement 2 : 1.02499
Run 4, Statement 1 : 1.02503
Run 4, Statement 2 : 1
Run 5, Statement 1 : 1.01259
Run 5, Statement 2 : 1.05828

PostgreSQL

A significant, consistent difference of almost 10%:

RUN 1, Statement 1: 1.00134
RUN 1, Statement 2: 1.09538
RUN 2, Statement 1: 1.00190
RUN 2, Statement 2: 1.09115
RUN 3, Statement 1: 1.00000
RUN 3, Statement 2: 1.09858
RUN 4, Statement 1: 1.00266
RUN 4, Statement 2: 1.09260
RUN 5, Statement 1: 1.00454
RUN 5, Statement 2: 1.09694

Again, I’m surprised by the order of magnitude of this difference. I would have expected it to be less. Curious to hear about your own results in the comments, or further ideas why this is so significant in PostgreSQL.

SQL Server

No relevant difference, nor a clear winner

Run 1, Statement 1: 1.00442
Run 1, Statement 2: 1.00702
Run 2, Statement 1: 1.00468
Run 2, Statement 2: 1.00000
Run 3, Statement 1: 1.00208
Run 3, Statement 2: 1.00624
Run 4, Statement 1: 1.00780
Run 4, Statement 2: 1.00364
Run 5, Statement 1: 1.00468
Run 5, Statement 2: 1.00702

Conclusion

As it is now in 2019, given the database versions mentioned above, unfortunately, there is a significant difference between COUNT(*) and COUNT(1) in PostgreSQL. Luckily (and this is rare in SQL), all the other dialects don’t care and thus, consistently using COUNT(*), rather than COUNT(1) is a slightly better choice for ALL measured database products from this article.

Do note that the benchmark only tried a very simple query! The results might be different when using joins, unions, or any other SQL constructs, or in other edge cases, e.g. when using COUNT() in HAVING or ORDER BY or with window functions, etc.

In any case, there shouldn’t be any difference, and I’m sure that a future PostgreSQL version will optimise the constant expression in the COUNT(<expr>) aggregate function directly in the parser to avoid the extra work.

For other interesting optimisations that do not depend on the cost model, see this article here.

Writing Custom Aggregate Functions in SQL Just Like a Java 8 Stream Collector

All SQL databases support the standard aggregate functions COUNT(), SUM(), AVG(), MIN(), MAX().

Some databases support other aggregate functions, like:

  • EVERY()
  • STDDEV_POP()
  • STDDEV_SAMP()
  • VAR_POP()
  • VAR_SAMP()
  • ARRAY_AGG()
  • STRING_AGG()

But what if you want to roll your own?

Java 8 Stream Collector

When using Java 8 streams, we can easily roll our own aggregate function (i.e. a Collector). Let’s assume we want to find the second highest value in a stream. The highest value can be obtained like this:

System.out.println(
    Stream.of(1, 2, 3, 4)
          .collect(Collectors.maxBy(Integer::compareTo))
) ;

Yielding:

Optional[4]

Now, what about the second highest value? We can write the following collector:

System.out.println(
    Stream.of(1, 6, 2, 3, 4, 4, 5).parallel()
          .collect(Collector.of(
              () -> new int[] { 
                  Integer.MIN_VALUE, 
                  Integer.MIN_VALUE 
              },
              (a, i) -> {
                  if (a[0] < i) {
                      a[1] = a[0];
                      a[0] = i;
                  }
                  else if (a[1] < i)
                      a[1] = i;
              },
              (a1, a2) -> {
                  if (a2[0] > a1[0]) {
                      a1[1] = a1[0];
                      a1[0] = a2[0];

                      if (a2[1] > a1[1])
                          a1[1] = a2[1];
                  }
                  else if (a2[0] > a1[1])
                      a1[1] = a2[0];

                  return a1;
              },
              a -> a[1]
          ))
) ;

It doesn’t do anything fancy. It has these 4 functions:

  • Supplier<int[]>: A supplier that provides an intermediary int[] of length 2, initialised with Integer.MIN_VALUE, each. This array will remember the MAX() value in the stream at position 0 and the SECOND_MAX() value in the stream at position 1
  • BiConsumer<int[], Integer>: A accumulator that accumulates new values from the stream into our intermediary data structure.
  • BinaryOperator<int[]>: A combiner that combines two intermediary data structures. This is used for parallel streams only.
  • Function<int[], Integer>: The finisher function that extracts the SECOND_MAX() function from the second position in our intermediary array.

The output is now:

5

How to do the same thing with SQL?

Many SQL databases offer a very similar way of calculating custom aggregate functions. Here’s how to do the exact same thing with…

Oracle:

With the usual syntactic ceremony…

CREATE TYPE u_second_max AS OBJECT (

  -- Intermediary data structure
  MAX NUMBER,
  SECMAX NUMBER,

  -- Corresponds to the Collector.supplier() function
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max) RETURN NUMBER,

  -- Corresponds to the Collector.accumulate() function
  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER,

  -- Corresponds to the Collector.combineer() function
  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER,

  -- Correspodns to the Collector.finisher() function
  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER
)
/

-- This is our "colletor" implementation
CREATE OR REPLACE TYPE BODY u_second_max IS
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max)
  RETURN NUMBER IS
  BEGIN
    SCTX := U_SECOND_MAX(0, 0);
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER IS
  BEGIN
    IF VALUE > SELF.MAX THEN
      SELF.SECMAX := SELF.MAX;
      SELF.MAX := VALUE;
    ELSIF VALUE > SELF.SECMAX THEN
      SELF.SECMAX := VALUE;
    END IF;
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS
  BEGIN
    RETURNVALUE := SELF.SECMAX;
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER IS
  BEGIN
    IF CTX2.MAX > SELF.MAX THEN
      SELF.SECMAX := SELF.MAX;
      SELF.MAX := CTX2.MAX;
    
      IF CTX2.SECMAX > SELF.SECMAX THEN
        SELF.SECMAX := CTX2.SECMAX;
      END IF;
    ELSIF CTX2.MAX > SELF.SECMAX THEN
      SELF.SECMAX := CTX2.MAX;
    END IF;
  
    RETURN ODCIConst.Success;
  END;
END;
/

-- Finally, we have to give this aggregate function a name
CREATE FUNCTION SECOND_MAX (input NUMBER) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING u_second_max;
/

We can now run the above on the Sakila database:

SELECT 
  max(film_id), 
  second_max(film_id) 
FROM film;

To get:

MAX     SECOND_MAX
------------------
1000    999

And what’s even better, we can use the aggregate function as a window function for free!

SELECT 
  film_id,
  length,
  max(film_id) OVER (PARTITION BY length), 
  second_max(film_id) OVER (PARTITION BY length)
FROM film
ORDER BY length, film_id;

The above yields:

FILM_ID  LENGTH  MAX   SECOND_MAX
---------------------------------
15       46      730   505
469      46      730   505
504      46      730   505
505      46      730   505
730      46      730   505
237      47      869   784
247      47      869   784
393      47      869   784
398      47      869   784
407      47      869   784
784      47      869   784
869      47      869   784
2        48      931   866
410      48      931   866
575      48      931   866
630      48      931   866
634      48      931   866
657      48      931   866
670      48      931   866
753      48      931   866
845      48      931   866
866      48      931   866
931      48      931   866

Beautiful, right?

PostgreSQL

PostgreSQL supports a slightly more concise syntax in the CREATE AGGREGATE statement. If we don’t allow for parallelism, we can write this minimal implementation:

CREATE FUNCTION second_max_sfunc (
  state INTEGER[], data INTEGER
) RETURNS INTEGER[] AS
$$
BEGIN
  IF state IS NULL THEN
    RETURN ARRAY[data, NULL];
  ELSE
    RETURN CASE 
      WHEN state[1] > data
      THEN CASE 
        WHEN state[2] > data
        THEN state
        ELSE ARRAY[state[1], data]
      END
      ELSE ARRAY[data, state[1]]
    END;
  END IF;
END;
$$ LANGUAGE plpgsql;
/

CREATE FUNCTION second_max_ffunc (
  state INTEGER[]
) RETURNS INTEGER AS
$$
BEGIN
  RETURN state[2];
END;
$$ LANGUAGE plpgsql;

CREATE AGGREGATE second_max (INTEGER) (
  SFUNC     = second_max_sfunc,
  STYPE     = INTEGER[],
  FINALFUNC = second_max_ffunc
);

Here, we use the STYPE (Collector.supplier()), the SFUNC (Collector.accumulator()), and the FINALFUNC (Collector.finisher()) specifications.

Other databases

Many other databases allow for specifying user defined aggregate functions. Look up your database manual’s details to learn more. They always work in the same way as a Java 8 Collector.

How to Write a Multiplication Aggregate Function in SQL

Everyone knows the SQL SUM() aggregate function (and many people also know its window function variant).

When querying the Sakila database, we can get the daily revenue (using PostgreSQL syntax):

WITH p AS (
  SELECT
    CAST (payment_date AS DATE) AS date,
    amount
  FROM payment
)
SELECT
  date,
  SUM (amount) AS daily_revenue,
  SUM (SUM (amount)) OVER (ORDER BY date) AS cumulative_revenue
FROM p
GROUP BY date
ORDER BY date

The result will look something like this:

date       |daily_revenue |cumulative_revenue 
-----------|--------------|-------------------
2005-05-24 |29.92         |29.92              
2005-05-25 |573.63        |603.55             
2005-05-26 |754.26        |1357.81            
2005-05-27 |685.33        |2043.14            
2005-05-28 |804.04        |2847.18            
2005-05-29 |648.46        |3495.64            
2005-05-30 |628.42        |4124.06            
2005-05-31 |700.37        |4824.43            
2005-06-14 |57.84         |4882.27            
...

Doing the same with multiplication

This is already quite useful. Very occasionally, however, we do not need to aggregate multiple values in a sum (through addition), but in a product (through multiplication). I’ve just stumbled upon such a case on Stack Overflow, recently.

The question wanted to achieve the following result:

date        factor          accumulated
---------------------------------------
1986-01-10  null            1000
1986-01-13  -0.026595745    973.4042548
1986-01-14  0.005464481     978.7234036
1986-01-15  -0.016304348    962.7659569
1986-01-16  0               962.7659569
1986-01-17  0               962.7659569
1986-01-20  0               962.7659569
1986-01-21  0.005524862     968.0851061
1986-01-22  -0.005494506    962.765957
1986-01-23  0               962.765957
1986-01-24  -0.005524862    957.4468078
1986-01-27  0.005555556     962.7659569
1986-01-28  0               962.7659569
1986-01-29  0               962.7659569
1986-01-30  0               962.7659569
1986-01-31  0.027624309     989.3617013
1986-02-03  0.016129032     1005.319148
1986-02-04  0.042328041     1047.872338
1986-02-05  0.04568528      1095.744679

If this were a Microsoft Excel spreadsheet, the ACCUMULATED column would simply start with 1000 and have the following formula in all other rows:

accumulated(i) = accumulated(i - 1) * (1 + factor)

In other words (values truncated for simplicity):

1000.0 = start
 973.4 = 1000.0 * (1 - 0.026)
 978.7 =  973.4 * (1 + 0.005)
 962.7 =  978.7 * (1 - 0.016)
 962.7 =  962.7 * (1 - 0.000)
 962.7 =  962.7 * (1 - 0.000)
 962.7 =  962.7 * (1 - 0.000)
 968.0 =  962.7 * (1 + 0.005)
 ...

This is exciting because we’re not only requiring multiplicative aggregation, but even cumulative multiplicative aggregation. So, another window function.

But regrettably, SQL doesn’t offer a MUL() aggregate function, even if it were relatively simple to implement. We have two options:

  • Implementing a custom aggregate function (stay tuned for a future blog post)
  • Using a trick by summing logarithms, rather than multiplying operands directly

We’re implementing the latter for now. Check out this cool Wikipedia website about logarithmic identities, which we are going to blindly trust. In the middle of it, we have:

bx * by = bx + y

Which leads to:

logb(x * y) = logb(x) + logb(y)

How cool is that? And thus:

x * y = blogb(x) + logb(y)

So, we can define any multiplication in terms of a bunch of exponentiation to some base (say e) and logarithms to some base (say e). Or, in SQL:

x * y = EXP(LN(x) + LN(y))

Or, as an aggregate function:

MUL(x) = EXP(SUM(LN(x)))

Heh!

Our original problem can thus be solved very easily using this, as shown in my stack overflow answer:

SELECT
  date,
  factor,
  EXP(SUM(LN(1000 * (1 + COALESCE(factor, 1)))) 
       OVER (ORDER BY date)) AS accumulated
FROM t

And we get the nice result as previously shown. You may have to replace LN() by LOG() depending on your database.

Caveat: Negative numbers

Try running this:

SELECT LN(-1)

You’ll get:

SQL Error [2201E]: ERROR: cannot take logarithm of a negative number

Logarithms are defined only for strictly positive numbers, unless your database is capable of handling complex numbers as well. In case of which a single zero value would still break the aggregation.

But if your data set is defined to contain only strictly positive numbers, you’ll be fine – give or take some floating point rounding errors. Or, you’ll do some sign handling, which looks like this:

WITH v(i) AS (VALUES (-2), (-3), (-4))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(i)))) multiplication1
FROM v;

WITH v(i) AS (VALUES (-2), (-3), (-4), (-5))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(i)))) multiplication2
FROM v;

The above yielding

multiplication1      
--------------------
-23.999999999999993 


multiplication2     
-------------------
119.99999999999997 

Close enough.

Caveat: Zero

Try running this:

SELECT LN(0)

You’ll get:

SQL Error [2201E]: ERROR: cannot take logarithm of zero

Zero is different from negative numbers. A product that has a zero operand is always zero, so we should be able to handle this. We’ll do it in two steps:

  • Exclude zero values from the actual aggregation that uses EXP() and LN()
  • Add an additional CASE expression that checks if any of the operands is zero

The first step might not be necessary depending on how your database optimiser executes the second step.

WITH v(i) AS (VALUES (2), (3), (0))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
    THEN 0
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(NULLIF(i, 0))))) multiplication
FROM v;

Extension: DISTINCT

Calculating the product of all DISTINCT values requires to repeat the DISTINCT keyword in 2 out of the above 3 sums:

WITH v(i) AS (VALUES (2), (3), (3))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
    THEN 0
    WHEN SUM (DISTINCT CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(DISTINCT LN(ABS(NULLIF(i, 0))))) multiplication
FROM v;

The result is now:

multiplication |
---------------|
6              |

Notice that the first SUM() that checks for the presence of NULL values doesn’t require a DISTINCT keyword, so we omit it to improve performance.

Extension: Window functions

Of course, if we are able to emulate a PRODUCT() aggregate function, we’d love to turn it into a window function as well. This can be done simply by transforming each individual SUM() into a window function:

WITH v(i, j) AS (
  VALUES (1, 2), (2, -3), (3, 4), 
         (4, -5), (5, 0), (6, 0)
)
SELECT i, j, 
  CASE 
    WHEN SUM (CASE WHEN j = 0 THEN 1 END) 
      OVER (ORDER BY i) > 0
    THEN 0
    WHEN SUM (CASE WHEN j < 0 THEN -1 END) 
      OVER (ORDER BY i) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(NULLIF(j, 0)))) 
    OVER (ORDER BY i)) multiplication
FROM v;

The result is now:

i |j  |multiplication      |
--|---|--------------------|
1 | 2 |2                   |
2 |-3 |-6                  |
3 | 4 |-23.999999999999993 |
4 |-5 |119.99999999999997  |
5 | 0 |0                   |
6 | 1 |0                   |

So cool! The cumulative product gets bigger and bigger until it hits he first zero, from then on it stays zero.

jOOQ support

jOOQ 3.12 will support this as well and emulate it correctly on all databases:
https://github.com/jOOQ/jOOQ/issues/5939

A note on Oracle performance

Do note that Oracle is very slow to calculate LN(number_type). It can be MUCH faster to calculate, instead LN(binary_double_type). An explicit type cast produced a 100x performance improvement in a simple test, documented here.

How to Calculate Multiple Aggregate Functions in a Single Query

At a customer site, I’ve recently encountered a report where a programmer needed to count quite a bit of stuff from a single table. The counts all differed in the way they used specific predicates. The report looked roughly like this (as always, I’m using the Sakila database for illustration):

-- Total number of films
SELECT count(*)
FROM film

-- Number of films with a given length
SELECT count(*)
FROM film
WHERE length BETWEEN 120 AND 150

-- Number of films with a given language
SELECT count(*)
FROM film
WHERE language_id = 1

-- Number of films for a given rating
SELECT count(*)
FROM film
WHERE rating = 'PG'

And then, unsurprisingly, combinations of these predicates were needed as well, i.e.

-- Number of films with a given length / language_id
SELECT count(*)
FROM film
WHERE length BETWEEN 120 AND 150
AND language_id = 1

-- Number of films with a given length / rating
SELECT count(*)
FROM film
WHERE length BETWEEN 120 AND 150
AND rating = 'PG'

-- Number of films with a given language_id / rating
SELECT count(*)
FROM film
WHERE language_id = 1
AND rating = 'PG'

-- Number of films with a given length / language_id / rating
SELECT count(*)
FROM film
WHERE length BETWEEN 120 AND 150
AND language_id = 1
AND rating = 'PG'

In the end, there were 32 queries in total (or 8 in my example) with all the possible combinations of predicates. Needless to say that running them all took quite a while, because the table had around 200M records and only one predicate could profit from an index.

But in fact, the improvement is really easy. There are several options to calculate all these counts in a single query

Simplest solution works in all databases: Filtered aggregate functions (or manual pivot)

This solution allows for calculating all results in a single query by using 8 different, explicit, filtered aggregate functions and no GROUP BY clause (none in this example. More complex cases where GROUP BY persists are sill imaginable).

This is how it works on all databases:

SELECT 
  count(*),
  count(length),
  count(language_id),
  count(rating),
  count(length + language_id),
  count(length + rating),
  count(language_id + rating),
  count(length + language_id + rating)
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
    CASE WHEN language_id = 1            THEN 1 END language_id,
    CASE WHEN rating = 'PG'              THEN 1 END rating
  FROM film
) film

Which yields:

col1  col2  col3  col4  col5  col6  col7  col8
1000  224   1000  194   224   43    194   43

How to read the above query?

Instead of evaluating the three different predicates in a WHERE clause, we pre-calculate it in a derived table (subquery in the FROM clause) and translate the predicate in some random value (e.g. 1) if TRUE and NULL if FALSE. Note, I omitted the ELSE clause from the CASE expression, which means that we get NULLs per default. Running the nested select on its own…

SELECT
  CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
  CASE WHEN language_id = 1            THEN 1 END language_id,
  CASE WHEN rating = 'PG'              THEN 1 END rating
FROM film

… yields something along the lines of:

length  language_id  rating
---------------------------
NULL    1            1
NULL    1            NULL
NULL    1            NULL
NULL    1            NULL
1       1            NULL
NULL    1            1
NULL    1            NULL
...

(Note, of course, we could have used actual BOOLEAN types, e.g. in PostgreSQL, but that wouldn’t work on all databases)

Now, in the outer query, we’re using once COUNT(*), which simply counts all the rows regardless of any predicates in the CASE expressions. The other COUNT(expr) aggregate functions do something that surprisingly few people are aware of (yet a lot of people use this form “by accident”). They count only the number of non-NULL rows. For instance:

SELECT 
  ...
  count(length),
  ...
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
    ...
  FROM film
) film

Or also:

SELECT 
  count(CASE WHEN length BETWEEN 120 AND 150 THEN 1 END)
FROM
  film

These queries will count those films whose length is BETWEEN 120 AND 150 (because those rows produce the value 1, which is non-NULL, and thus counted), whereas all the other films are not being counted.

Finally, I just used a trick to combine nullable values to make sure they’re all non-NULL:

SELECT 
  ...
  count(length + language_id),
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
    CASE WHEN language_id = 1            THEN 1 END language_id,
    ...
  FROM film
) film

This counts those rows whose length BETWEEN 120 AND 150 and whose language_id = 1, because if either predicate was FALSE, the number would be NULL and thus the sum is NULL as well.

PostgreSQL and HSQLDB variant: FILTER

In PostgreSQL and HSQLDB (and in the SQL standard), there’s a special syntax for this. We can use the FILTER clause instead of encoding values in NULL / non-NULL like this:

SELECT 
  count(*),
  count(*) FILTER (WHERE length IS NOT NULL),
  count(*) FILTER (WHERE language_id IS NOT NULL),
  count(*) FILTER (WHERE rating IS NOT NULL),
  count(*) FILTER (WHERE length + language_id IS NOT NULL),
  count(*) FILTER (WHERE length + rating IS NOT NULL),
  count(*) FILTER (WHERE language_id + rating IS NOT NULL),
  count(*) FILTER (
    WHERE length + language_id + rating IS NOT NULL)
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
    CASE WHEN language_id = 1            THEN 1 END language_id,
    CASE WHEN rating = 'PG'              THEN 1 END rating
  FROM film
) film

Or even, writing out the entire predicates again:

SELECT 
  count(*),
  count(*) FILTER (WHERE length BETWEEN 120 AND 150),
  count(*) FILTER (WHERE language_id = 1),
  count(*) FILTER (WHERE rating = 'PG'),
  count(*) FILTER (
    WHERE length BETWEEN 120 AND 150 AND language_id = 1),
  count(*) FILTER (
    WHERE length BETWEEN 120 AND 150 AND rating = 'PG'),
  count(*) FILTER (
    WHERE language_id = 1 AND rating = 'PG'),
  count(*) FILTER (
    WHERE length BETWEEN 120 AND 150 
    AND language_id = 1 AND rating = 'PG')
FROM film

Usually, the FILTER clause is more convenient, but both approaches are equivalent, and we’re running only a single query!

I also call this “manual PIVOT“, because it really works like a PIVOT table. And the good news is… There is a PIVOT syntax!

A more fancy solution: PIVOT

This solution is vendor-specific and only works in Oracle and with a bit less features in SQL Server. Here’s the Oracle version:

SELECT 
  a + b + c + d + e + f + g + h,
                  e + f + g + h,
          c + d         + g + h,
      b     + d     + f     + h,
                          g + h,
                      f     + h,
              d             + h,
                              h
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 
         THEN 1 ELSE 0 END length,
    CASE WHEN language_id = 1            
         THEN 1 ELSE 0 END language_id,
    CASE WHEN rating = 'PG'              
         THEN 1 ELSE 0 END rating
  FROM film
) film
PIVOT (
  count(*) FOR (length, language_id, rating) IN (
    (0, 0, 0) AS a,
    (0, 0, 1) AS b,
    (0, 1, 0) AS c,
    (0, 1, 1) AS d,
    (1, 0, 0) AS e,
    (1, 0, 1) AS f,
    (1, 1, 0) AS g,
    (1, 1, 1) AS h
  )
)

How to read this solution? There are 3 steps:

Step 1: The derived table

As in the previous example, we’re translating the desired predicates for our report into three columns that produce values 1 and 0. That’s understood so I won’t repeat the explanation.

Step 2: The PIVOT clause

The PIVOT clause can be applied to a table expression to “pivot” it in a similar way as we’re used from Microsoft Excel’s powerful pivot tables. It takes three parts:

  • A list of aggregate functions
  • An expression (FOR clause)
  • A list of expected values (IN clause)

The resulting table expression groups the PIVOT‘s input table by all the remaining columns (i.e. all the columns that are not part of the FOR clause, in our example, that’s no columns), and aggregates all the aggregate functions (in our case, only one) for all the values in the IN list.

If we SELECT * from this PIVOT table:

SELECT *
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 
         THEN 1 ELSE 0 END length,
    CASE WHEN language_id = 1            
         THEN 1 ELSE 0 END language_id,
    CASE WHEN rating = 'PG'              
         THEN 1 ELSE 0 END rating
  FROM film
) film
PIVOT (
  count(*) FOR (length, language_id, rating) IN (
    (0, 0, 0) AS a,
    (0, 0, 1) AS b,
    (0, 1, 0) AS c,
    (0, 1, 1) AS d,
    (1, 0, 0) AS e,
    (1, 0, 1) AS f,
    (1, 1, 0) AS g,
    (1, 1, 1) AS h
  )
)

… we’ll get these values:

a    b    c    d    e    f    g    h
------------------------------------
0    0  625  151    0    0  181   43

As you can see, the column names are generated from the IN list of expected values and the values contained in these columns are aggregations for the different predicates. These aggregations are not exactly the ones we wanted. For instance, column G is all the films whose length BETWEEN 120 AND 150 and whose language_id = 1 and whose RATING != 'PG'.

Step 3: Summing the count values

So, in order to get the expected results, we have to sum all the partial counts as such:

SELECT 
  a + b + c + d + e + f + g + h,
                  e + f + g + h,
          c + d         + g + h,
      b     + d     + f     + h,
                          g + h,
                      f     + h,
              d             + h,
                              h
FROM 
  ...

The result is now the same.

A more fancy solution: GROUPING SETS

GROUPING SETS are a SQL standard and they’re supported in at least:

  • DB2
  • HANA
  • Oracle
  • PostgreSQL
  • SQL Server
  • Sybase SQL Anywhere

Simply put, GROUPING SETS allow for grouping a table several times and creating a UNION of all the results. For example, the following two queries are the same, conceptually, although the GROUPING SETS one is usually faster:

-- Grouping once by language_id, then by rating
SELECT language_id, rating, count(*)
FROM film
GROUP BY GROUPING SETS (
  (language_id),
  (rating)
)

-- Grouping first by language_id
SELECT language_id, NULL, count(*)
FROM film
GROUP BY language_id
UNION ALL
SELECT NULL, rating, count(*)
FROM film
GROUP BY rating

Both queries yield:

language_id   rating   count
          1             1000 -- First grouping set / union subquery
              G          178 \
              PG         194  |
              PG-13      223  | Second grouping set / union subquery
              R          195  |
              NC-17      210 /

Clearly, the GROUPING SETS variant is more concise. Let’s imagine, we’d like to add more combinations of grouping columns, e.g.

SELECT language_id, rating, count(*)
FROM film
GROUP BY GROUPING SETS (
  (),
  (language_id),
  (rating),
  (language_id, rating)
)

Now, we’re grouping by all the combinations of columns, and the result is:

language_id   rating   count
                        1000 -- First grouping set: ()
          1             1000 -- Second grouping set: (language_id)
              G          178 \
              PG         194  |
              PG-13      223  | Third grouping set: (rating)
              R          195  |
              NC-17      210 /
          1   G          178 \
          1   PG         194  |
          1   PG-13      223  | Fourth grouping set: (language_id, rating)
          1   R          195  |
          1   NC-17      210 /

Of course, this would all be more impressive if we had more than one language in the system…

So, how do we solve the original problem with GROUPING SETS? Here’s how:

SELECT 
  GROUPING_ID (length, language_id, rating),
  length,
  language_id,
  rating,
  count(*)
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 
         THEN 1 ELSE 0 END length,
    CASE WHEN language_id = 1            
         THEN 1 ELSE 0 END language_id,
    CASE WHEN rating = 'PG'              
         THEN 1 ELSE 0 END rating
  FROM film
) film
GROUP BY GROUPING SETS (
  (),
  (length),
  (language_id),
  (rating),
  (length, language_id),
  (length, rating),
  (rating, language_id),
  (length, language_id, rating)
)
HAVING COALESCE (length, 1) != 0 
AND COALESCE (language_id, 1) != 0 
AND COALESCE (rating, 1) != 0
ORDER BY GROUPING_ID (length, language_id, rating) DESC

Wow. How to read this? In 4 steps:

Step 1: Again, the derived table

This time, we’ll encode FALSE as 0, not NULL, because NULL already has a different meaning in GROUPING SETS. It means that for a given GROUPING SET, we didn’t group by that column. We’ll see that in step 3.

Step 2: The GROUPING SETS

In this section, we’re just listing all the possible combinations of GROUP BY columns that we want to use, which produces 8 distinct GROUPING SETS. I’ve already explained this in the previous introduction to GROUPING SETS, so this is no different.

Step 3: Filter out unwanted groupings

Just like in the PIVOT example, we’re also getting results for which the predicates are FALSE, but we don’t want those in the result. So we’re filtering them out in the HAVING clause:

SELECT 
  ...
HAVING COALESCE (length, 1) != 0 
AND COALESCE (language_id, 1) != 0 
AND COALESCE (rating, 1) != 0
...

How to read this? E.g. LENGTH can be any of:

  • 1: The length predicate was TRUE
  • 0: The length predicate was FALSE
  • NULL: The length column is not considered for a given GROUPING SET, e.g. () or (rating, language_id)

So, using COALESCE, we’re making sure that we include only 1 and NULL lengths, not 0 lengths.

Step 4: Ordering the results

This is optional, but in order to get the same output order as before, we can use the special GROUPING_ID() (or GROUPING() depending on the DB) function which returns an ID for each GROUPING SET. The output is:

grouping   length   language_id   rating   count
------------------------------------------------
       7     NULL          NULL     NULL    1000
       6     NULL          NULL        1     194
       5     NULL             1     NULL    1000
       4     NULL             1        1     194
       3        1          NULL     NULL     224
       2        1          NULL        1      43
       1        1             1     NULL     224
       0        1             1        1      43

Excellent! And hey, there’s even syntax sugar for “special” GROUPING SETS configurations like ours, where we list all the possible column permutations. In this case, we can use CUBE()!

SELECT 
  GROUPING_ID (length, language_id, rating),
  length,
  language_id,
  rating,
  count(*)
FROM (
  SELECT
    CASE WHEN length BETWEEN 120 AND 150 
         THEN 1 ELSE 0 END length,
    CASE WHEN language_id = 1            
         THEN 1 ELSE 0 END language_id,
    CASE WHEN rating = 'PG'              
         THEN 1 ELSE 0 END rating
  FROM film
) film
GROUP BY CUBE (length, language_id, rating)
HAVING COALESCE(length, 1) != 0 
AND COALESCE(language_id, 1) != 0 
AND COALESCE(rating, 1) != 0
ORDER BY GROUPING_ID (length, language_id, rating) DESC

Performance

Such a comparison blog post wouldn’t be complete if we wouldn’t benchmark for performance. This time, I’ll be benchmarking only for Oracle, as PostgreSQL doesn’t support PIVOT and SQL Server’s PIVOT is more limited than Oracle’s.

Here’s the complete benchmark:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 2000;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP

    -- Individual statements
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT count(*) FROM film
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE length BETWEEN 120 AND 150
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE language_id = 1
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE rating = 'PG'
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE length BETWEEN 120 AND 150
        AND language_id = 1
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE length BETWEEN 120 AND 150
        AND rating = 'PG'
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE language_id = 1
        AND rating = 'PG'
      ) LOOP
        NULL;
      END LOOP;

      FOR rec IN (
        SELECT count(*) FROM film 
        WHERE length BETWEEN 120 AND 150
        AND language_id = 1
        AND rating = 'PG'
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 2 : ' || (SYSTIMESTAMP - v_ts));

    -- Manual PIVOT
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT
          count(*),
          count(length),
          count(language_id),
          count(rating),
          count(length + language_id),
          count(length + rating),
          count(language_id + rating),
          count(length + language_id + rating)
        FROM (
          SELECT
            CASE WHEN length BETWEEN 120 AND 150 THEN 1 END length,
            CASE WHEN language_id = 1            THEN 1 END language_id,
            CASE WHEN rating = 'PG'              THEN 1 END rating
          FROM film
        ) film
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 2 : ' || (SYSTIMESTAMP - v_ts));
    
    -- PIVOT
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT 
          a + b + c + d + e + f + g + h,
                          e + f + g + h,
                  c + d         + g + h,
              b     + d     + f     + h,
                                  g + h,
                              f     + h,
                      d             + h,
                                      h
        FROM (
          SELECT
            CASE WHEN length BETWEEN 120 AND 150 THEN 1 ELSE 0 END length,
            CASE WHEN language_id = 1            THEN 1 ELSE 0 END language_id,
            CASE WHEN rating = 'PG'              THEN 1 ELSE 0 END rating
          FROM film
        ) film
        PIVOT (
          count(*) FOR (length, language_id, rating) IN (
            (0, 0, 0) AS a,
            (0, 0, 1) AS b,
            (0, 1, 0) AS c,
            (0, 1, 1) AS d,
            (1, 0, 0) AS e,
            (1, 0, 1) AS f,
            (1, 1, 0) AS g,
            (1, 1, 1) AS h
          )
        )
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 3 : ' || (SYSTIMESTAMP - v_ts));

    -- GROUPING SETS
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT 
          GROUPING_ID (length, language_id, rating),
          length,
          language_id,
          rating,
          count(*)
        FROM (
          SELECT
            CASE WHEN length BETWEEN 120 AND 150 THEN 1 ELSE 0 END length,
            CASE WHEN language_id = 1            THEN 1 ELSE 0 END language_id,
            CASE WHEN rating = 'PG'              THEN 1 ELSE 0 END rating
          FROM film
        ) film
        GROUP BY CUBE (length, language_id, rating)
        HAVING COALESCE (length, 1) != 0 
        AND COALESCE (language_id, 1) != 0 
        AND COALESCE (rating, 1) != 0
        ORDER BY GROUPING_ID (length, language_id, rating) DESC
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 4 : ' || (SYSTIMESTAMP - v_ts));
  END LOOP;
END;
/

And the results:

Run 1, Statement 1 : +000000000 00:00:01.928497000
Run 1, Statement 2 : +000000000 00:00:01.136341000
Run 1, Statement 3 : +000000000 00:00:02.751679000
Run 1, Statement 4 : +000000000 00:00:00.797529000

Run 2, Statement 1 : +000000000 00:00:01.695543000
Run 2, Statement 2 : +000000000 00:00:01.004073000
Run 2, Statement 3 : +000000000 00:00:02.490895000
Run 2, Statement 4 : +000000000 00:00:00.838979000

Run 3, Statement 1 : +000000000 00:00:01.634047000
Run 3, Statement 2 : +000000000 00:00:01.016266000
Run 3, Statement 3 : +000000000 00:00:02.566895000
Run 3, Statement 4 : +000000000 00:00:00.790159000

Run 4, Statement 1 : +000000000 00:00:01.669844000
Run 4, Statement 2 : +000000000 00:00:01.015502000
Run 4, Statement 3 : +000000000 00:00:02.574646000
Run 4, Statement 4 : +000000000 00:00:00.807804000

Run 5, Statement 1 : +000000000 00:00:01.653498000
Run 5, Statement 2 : +000000000 00:00:00.980375000
Run 5, Statement 3 : +000000000 00:00:02.556186000
Run 5, Statement 4 : +000000000 00:00:00.890283000

Very disappointingly, the PIVOT solution is the slowest every time. I’m assuming there’s some substantial temporary object overhead which wouldn’t be as severe if the table were much larger, but clearly, the manual PIVOT solution (COUNT(CASE ...)) and the GROUPING SETS solution heavily outperform the initial attempt, where we calculate 8 counts individually.

To get back to the original report where 32 counts were calculated: The report was roughly 20x as fast with manual PIVOT on 200M rows and imagine if you need to JOIN – you definitely want to avoid those 32 individual queries and calculate everything in one go.

Cheers!

A True SQL Gem You Didn’t Know Yet: The EVERY() Aggregate Function

We’ve just added support for the EVERY() aggregate function (#1391) to jOOQ, and would like to take the opportunity of letting you know of this true SQL gem that can come in handy EVERY(now and then) (pun intended).

Let’s assume we have four books in our table:

INSERT INTO book VALUES (1, 1, '1984');
INSERT INTO book VALUES (2, 1, 'Animal Farm');
INSERT INTO book VALUES (3, 2, 'O Alquimista');
INSERT INTO book VALUES (4, 2, 'Brida');

Now the question is:

Is EVERY() ID lower than 10?

We’ll ask:

SELECT EVERY(id < 10)
FROM book

And the answer is:

every
-----
true

Does EVERY() book for each author end with the letter ‘a’?

We’ll ask:

SELECT author_id, EVERY(title LIKE '%a')
FROM book
GROUP BY author_id

And the answer is:

author_id   every
-----------------
1           false
2           true

Wonderful!

As with all aggregate functions, we can even use them as a window function!

SELECT 
  book.*, 
  EVERY(title LIKE '%a') OVER (PARTITION BY author_id)
FROM book

Which will produce

id  author_id   title          every
------------------------------------
1   1           1984           false
2   1           Animal Farm    false
3   2           O Alquimista   true
4   2           Brida          true

Who supports EVERY()

Well, the SQL standard has it:

10.9 <aggregate function>

<aggregate function> ::=
    COUNT <left paren> <asterisk> <right paren> [ <filter clause> ]
  | <general set function> [ <filter clause> ]
  | <binary set function> [ <filter clause> ]
  | <ordered set function> [ <filter clause> ]
  | <array aggregate function> [ <filter clause> ]

<general set function> ::=
    <set function type> <left paren> [ <set quantifier> ]
        <value expression> <right paren>

<set function type> ::=
    <computational operation>

<computational operation> ::=
    AVG
  | MAX
  | MIN
  | SUM
  | EVERY <-- yes, here! EVERY!
  | ANY
  | SOME
  | COUNT
  | STDDEV_POP
  | STDDEV_SAMP
  | VAR_SAMP
  | VAR_POP
  | COLLECT
  | FUSION
  | INTERSECTION

And, of course PostgreSQL!

But if your database is not PostgreSQL, don’t worry. EVERY() can be emulated on EVERY() database using SUM() and CASE expressions. Here’s how to emulate the first query:

-- SELECT EVERY(id < 10)
-- FROM book

SELECT MIN(CASE WHEN id < 10 THEN 1 ELSE 0 END) 
FROM book;

Or as window functions

-- SELECT 
--   book.*, 
--   EVERY(title LIKE '%a') OVER (PARTITION BY author_id)
-- FROM book

SELECT 
  book.*,
  MIN(CASE WHEN title LIKE '%a' THEN 1 ELSE 0 END)
    OVER(PARTITION BY author_id)
FROM book;

For a more concise emulation, see Stew Ashton’s comments

And, as always on this blog, we’re happy to conclude that the upcoming jOOQ 3.6 will now handle EVERY(emulation) for you, so you can write:

DSL.using(configuration)
   .select(BOOK.fields())
   .select(every(BOOK.TITLE.like("%a"))
           .over(partitionBy(BOOK.AUTHOR_ID)))
   .from(BOOK)
   .fetch();

Have fun with this new function!

jOOQ: the best way to write SQL in Java