Using DISTINCT ON in Non-PostgreSQL Databases

A nice little gem in PostgreSQL’s SQL syntax is the DISTINCT ON clause, which is as powerful as it is esoteric.

In a previous post, we’ve blogged about some caveats to think of when DISTINCT and ORDER BY are used together. The bigger picture can be seen in our article about the logical order of operations in SQL SELECT.

The PostgreSQL documentation explains it well:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. […] For example:

SELECT DISTINCT ON (location) location, time, report
    FROM weather_reports
    ORDER BY location, time DESC;

retrieves the most recent weather report for each location. […]

Again, this is quite esoteric as the distinct-ness is now decided only based on the columns listed separately in parentheses. For all the other columns, only the first row according to ORDER BY is projected. The SQL language is probably one of the only ones where the syntactic order of operations has absolutely nothing to do with the logical order of operations, and this DISTINCT ON syntax isn’t helping. In a more straightforward language design, the above statement could read, instead:

FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
SELECT 
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
DISTINCT
ORDER BY location

In other words, we would execute these operations in the following logical order:

  1. FROM: Access the weather_reports table
  2. WINDOW: Specify “windows” or “groups” in our data set, grouping by location and ordering contents of each group by time, descendingly (using the term “WINDOW” is no accident as we’ll see afterwards)
  3. SELECT: Project the location (which is the grouping) and per group, the first values of time / report ordered by time
  4. DISTINCT: Remove all the duplicates, because the above operation will produce the same time and report value for each record that shares the same location
  5. ORDER BY: Finally, order the results per grouping

That’s what really happens, and incidentally, the above synthetic SQL syntax matches the actual logical order of operations in the SQL language, so translating it back to an actual SQL statement would yield:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER w AS time,
  FIRST_VALUE (report) OVER w AS report
FROM weather_reports
WINDOW w AS (PARTITION BY location ORDER BY time DESC)
ORDER BY location

If your database doesn’t support the WINDOW clause, just expand it into the individual window functions. E.g. in Oracle, write:

SELECT DISTINCT
  location, 
  FIRST_VALUE (time) OVER (PARTITION BY location ORDER BY time DESC),
  FIRST_VALUE (report) OVER (PARTITION BY location ORDER BY time DESC)
FROM weather_reports
ORDER BY location

From a readability perspective, I would definitely prefer the standard SQL syntax over DISTINCT ON.

Want to play around with it? Here’s some sample data:

create table weather_reports (location text, time date, report text);
insert into weather_reports values ('X', DATE '2000-01-01', 'X1');
insert into weather_reports values ('X', DATE '2000-01-02', 'X2');
insert into weather_reports values ('X', DATE '2000-01-03', 'X3');
insert into weather_reports values ('Y', DATE '2000-01-03', 'Y1');
insert into weather_reports values ('Y', DATE '2000-01-05', 'Y2');
insert into weather_reports values ('Z', DATE '2000-01-04', 'Z1');

The result being:

|location|time      |report|
|--------|----------|------|
|X       |2000-01-03|X3    |
|Y       |2000-01-05|Y2    |
|Z       |2000-01-04|Z1    |

Notice that jOOQ already supports PostgreSQL DISTINCT ON and in the future, we might emulate it for other dialects using the above technique: https://github.com/jOOQ/jOOQ/issues/3564

Using IGNORE NULLS With SQL Window Functions to Fill Gaps

I found a very interesting SQL question on Twitter recently:

Rephrasing the question: We have a set of sparse data points:

+------------+-------+
| VALUE_DATE | VALUE |
+------------+-------+
| 2019-01-01 |   100 |
| 2019-01-02 |   120 |
| 2019-01-05 |   125 |
| 2019-01-06 |   128 |
| 2019-01-10 |   130 |
+------------+-------+

Since dates can be listed as discrete, continuous data points, why not fill in the gaps between 2019-01-02 and 2019-01-05 or 2019-01-06 and 2019-01-10? The desired output would be:

+------------+-------+
| VALUE_DATE | VALUE |
+------------+-------+
| 2019-01-01 |   100 |
| 2019-01-02 |   120 | <-+
| 2019-01-03 |   120 |   | -- Generated
| 2019-01-04 |   120 |   | -- Generated
| 2019-01-05 |   125 |
| 2019-01-06 |   128 | <-+
| 2019-01-07 |   128 |   | -- Generated
| 2019-01-08 |   128 |   | -- Generated
| 2019-01-09 |   128 |   | -- Generated
| 2019-01-10 |   130 |
+------------+-------+

In the generated columns, we’ll just repeat the most recent value.

How to do this with SQL?

For the sake of this example, I’m using Oracle SQL, as the OP was expecting to do this with Oracle. The idea is to do this in two steps:

  1. Generate all the dates between the first and the last data points
  2. For each date, find either the current data point, or the most recent one

But first, let’s create the data:

create table t (value_date, value) as
  select date '2019-01-01', 100 from dual union all
  select date '2019-01-02', 120 from dual union all
  select date '2019-01-05', 125 from dual union all
  select date '2019-01-06', 128 from dual union all
  select date '2019-01-10', 130 from dual;

1. Generating all the dates

In Oracle, we can use the convenient CONNECT BY syntax for this. We could also use some other tool to generate dates to fill the gaps, including SQL standard recursion using WITH, or some PIPELINED function, but I like CONNECT BY for this purpose.

We’ll write:

select (
  select min(t.value_date) 
  from t
) + level - 1 as value_date
from dual
connect by level <= (
  select max(t.value_date) - min(t.value_date) + 1
  from t
)

This produces:

VALUE_DATE|
----------|
2019-01-01|
2019-01-02|
2019-01-03|
2019-01-04|
2019-01-05|
2019-01-06|
2019-01-07|
2019-01-08|
2019-01-09|
2019-01-10|

Now we wrap the above query in a derived table and left join the actual data set:

select 
  d.value_date,
  t.value
from (
  select (
    select min(t.value_date) 
    from t
  ) + level - 1 as value_date
  from dual
  connect by level <= (
    select max(t.value_date) - min(t.value_date) + 1
    from t
  )
) d
left join t
on d.value_date = t.value_date
order by d.value_date;

The date gaps are now filled, but our values column is still sparse:

VALUE_DATE|VALUE|
----------|-----|
2019-01-01|  100|
2019-01-02|  120|
2019-01-03|     |
2019-01-04|     |
2019-01-05|  125|
2019-01-06|  128|
2019-01-07|     |
2019-01-08|     |
2019-01-09|     |
2019-01-10|  130|

2. Fill the value gaps

On each row, the VALUE column should either contain the actual value, or the “last_value” preceding the current row, ignoring all the nulls. Note that I specifically wrote this requirement using specific English language. We can now translate that sentence directly to SQL:

last_value (t.value) ignore nulls over (order by d.value_date)

Since we have added an ORDER BY clause to the window function, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW applies, which colloquially means “all the preceding rows”. (Technically, that’s not accurate. It means all rows with values less than or equal to the value of the current row – see Kim Berg Hansen’s comment)

Convenient! We’re trying to find the last value in the window of all the preceding rows, ignoring the nulls.

This is standard SQL, but unfortunately not all RDBMS support IGNORE NULLS. Among the ones supported by jOOQ, currently these ones support the syntax:

  • DB2
  • H2
  • Informix
  • Oracle
  • Redshift
  • Sybase SQL Anywhere
  • Teradata

Sometimes, not the exact standard syntax is supported, but the standard feature. Use https://www.jooq.org/translate to see different syntax variants.

The full query now reads:

select 
  d.value_date,
  last_value (t.value) ignore nulls over (order by d.value_date)
from (
  select (
    select min(t.value_date) 
    from t
  ) + level - 1 as value_date
  from dual
  connect by level <= (
    select max(t.value_date) - min(t.value_date) + 1
    from t
  )
) d
left join t
on d.value_date = t.value_date
order by d.value_date;

… and it yields the desired result:

VALUE_DATE         |VALUE|
-------------------|-----|
2019-01-01 00:00:00|  100|
2019-01-02 00:00:00|  120|
2019-01-03 00:00:00|  120|
2019-01-04 00:00:00|  120|
2019-01-05 00:00:00|  125|
2019-01-06 00:00:00|  128|
2019-01-07 00:00:00|  128|
2019-01-08 00:00:00|  128|
2019-01-09 00:00:00|  128|
2019-01-10 00:00:00|  130|

Other RDBMS

This solution made use of some Oracle specific features such as CONNECT BY. In other RDBMS, the same idea can be implemented by using a different way of generating data. This article focuses only on using IGNORE NULLS. If you’re interested, feel free to post an alternative solution in the comments for your RDBMS.

How to Calculate a Cumulative Percentage in SQL

A fun report to write is to calculate a cumulative percentage. For example, when querying the Sakila database, we might want to calculate the percentage of our total revenue at any given date.

The result might look like this:

Notice the beautifully generated data. Or as raw data:

payment_date |amount  |percentage
-------------|--------|----------
2005-05-24   |29.92   |0.04      
2005-05-25   |573.63  |0.90      
2005-05-26   |754.26  |2.01      
2005-05-27   |685.33  |3.03      
2005-05-28   |804.04  |4.22      
2005-05-29   |648.46  |5.19      
2005-05-30   |628.42  |6.12      
2005-05-31   |700.37  |7.16      
...
2005-08-18   |2710.79 |79.59     
2005-08-19   |2615.72 |83.47     
2005-08-20   |2723.76 |87.51     
2005-08-21   |2809.41 |91.67     
2005-08-22   |2576.74 |95.49     
2005-08-23   |2523.01 |99.24     
2005-08-24   |514.18  |100.00    

In other words, at the beginning of our timeline, we’ve made 0% revenue, and then that percentage increases over time, until we reach 100% of our revenue at the end of our timeline.

How to do it?

We’re going to do it in two steps. Our PAYMENT table has a PAYMENT_DATE column, which is really a timestamp, i.e. the exact amount in time when we received a payment. We can query the table to see its data (I will be using PostgreSQL syntax in this post):

SELECT
  payment_date,
  amount
FROM payment
ORDER BY payment_date;

This yields:

payment_date        |amount
--------------------|------
2005-05-24 22:53:30 |2.99  
2005-05-24 22:54:33 |2.99  
2005-05-24 23:03:39 |3.99  
2005-05-24 23:04:41 |4.99  
2005-05-24 23:05:21 |6.99  
2005-05-24 23:08:07 |0.99  
2005-05-24 23:11:53 |1.99  
2005-05-24 23:31:46 |4.99  
2005-05-25 00:00:40 |4.99  
2005-05-25 00:02:21 |5.99  
2005-05-25 00:09:02 |8.99  
2005-05-25 00:19:27 |4.99  
2005-05-25 00:22:55 |6.99  
...

Now we could calculate that percentage on this timeline, but that wouldn’t be terribly interesting. We’re interested in the cumulative revenue per date, so let’s run a classic GROUP BY:

SELECT 
  CAST(payment_date AS DATE),
  sum(amount) AS amount
FROM payment
GROUP BY CAST(payment_date AS DATE)
ORDER BY CAST(payment_date AS DATE);

This yields the first two columns of our desired result:

payment_date |amount 
-------------|-------
2005-05-24   |29.92  
2005-05-25   |573.63 
2005-05-26   |754.26 
2005-05-27   |685.33 
2005-05-28   |804.04 
2005-05-29   |648.46 
2005-05-30   |628.42 
2005-05-31   |700.37 
...
2005-08-18   |2710.79
2005-08-19   |2615.72
2005-08-20   |2723.76
2005-08-21   |2809.41
2005-08-22   |2576.74
2005-08-23   |2523.01
2005-08-24   |514.18 

Now about that percentage. The formula in pseudo SQL is this:

cumulative_percentage[N] = SUM(amount[M <= N]) / SUM(amount[any])

In other words, the percentage of the revenue we’ve made up until a given day is equal to the SUM of all amounts until that day divided by the SUM of all amounts. We could do that relatively easily in Microsoft Excel. But we can also do it with SQL, using window functions. The syntax is:

-- Sum of all amounts until that day:
SUM(amount) OVER (ORDER BY payment_date)

-- Sum of all amounts
SUM(amount) OVER ()

So, let’s just plug that into our SQL. For simplicity, we’ll first nest our previous GROUP BY statement in a derived table:

SELECT 
  payment_date,
  amount,
  CAST(100 * sum(amount) OVER (ORDER BY payment_date) 
           / sum(amount) OVER () AS numeric(10, 2)) percentage
FROM (
  SELECT 
    CAST(payment_date AS DATE),
    sum(amount) AS amount
  FROM payment
  GROUP BY CAST(payment_date AS DATE)
) p
ORDER BY payment_date;

Running this yields the desired result:

payment_date |amount  |percentage
-------------|--------|----------
2005-05-24   |29.92   |0.04      
2005-05-25   |573.63  |0.90      
2005-05-26   |754.26  |2.01      
2005-05-27   |685.33  |3.03      
2005-05-28   |804.04  |4.22      
2005-05-29   |648.46  |5.19      
2005-05-30   |628.42  |6.12      
2005-05-31   |700.37  |7.16      
...
2005-08-18   |2710.79 |79.59     
2005-08-19   |2615.72 |83.47     
2005-08-20   |2723.76 |87.51     
2005-08-21   |2809.41 |91.67     
2005-08-22   |2576.74 |95.49     
2005-08-23   |2523.01 |99.24     
2005-08-24   |514.18  |100.00    

Bonus: Nest aggregate functions in window functions

Because of the nature of SQL syntax, and the fact that both GROUP BY and aggregate functions “happen before” window functions, i.e. they are calculated logically before window functions, we can nest aggregate functions in window functions.

This definitely doesn’t drastically improve readability, especially if you are not used to writing window functions every day. But in some more complex cases, it might help to shorten your SQL syntax. The above query is equivalent to this one:

SELECT 
  CAST(payment_date AS DATE) AS payment_date,
  sum(amount) AS amount,
  CAST(100 * sum(sum(amount)) OVER (
               ORDER BY CAST(payment_date AS DATE)) 
           / sum(sum(amount)) OVER () AS numeric(10, 2)) percentage
FROM payment
GROUP BY CAST(payment_date AS DATE)
ORDER BY CAST(payment_date AS DATE);

Beauty is in the eye of the beholder. My eye definitely likes this sum(sum(amount)) OVER () syntax. If you cannot decipher this, don’t worry. You’re not alone. I invite you to review the following post on the order of SQL operations, first.

How to Emulate PERCENTILE_DISC in MySQL and Other RDBMS

In my previous article, I showed what the very useful percentile functions (also known as inverse distribution functions) can be used for.

Unfortunately, these functions are not ubiquitously available in SQL dialects. As of jOOQ 3.11, they are known to work in these dialects:

Dialect As aggregate function As window function
MariaDB 10.3.3 No Yes
Oracle 18c Yes Yes
PostgreSQL 11 Yes No
SQL Server 2017 No Yes
Teradata 16 Yes No

Oracle has the most sophisticated implementation, which supports both the ordered set aggregate function, and the window function version:

  • Aggregate function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x)
  • Window function: PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x) OVER (PARTITION BY y)

Workarounds if the feature is unavailable

Luckily, as soon as an RDBMS supports window functions, we can easily emulate PERCENTILE_DISC using PERCENT_RANK and FIRST_VALUE as follows. We’re using the Sakila database in this example.

Emulating window functions

Let’s emulate these first, as it requires a bit less SQL transformations. This query works out of the box in Oracle:

SELECT DISTINCT
  rating,
  percentile_disc(0.5) 
    WITHIN GROUP (ORDER BY length) 
    OVER() x1,
  percentile_disc(0.5) 
    WITHIN GROUP (ORDER BY length) 
    OVER (PARTITION BY rating) x2
FROM film
ORDER BY rating;

Yielding

RATING  X1      X2
-------------------
G       114     107
NC-17   114     112
PG      114     113
PG-13   114     125
R       114     115

What we can read from this is that the median length of all films is 114 minutes, and the median lengths of films per rating range from 107 minutes to 125 minutes. I’ve used DISTINCT because we don’t care about visualising these values on a per-row basis in this case. This also works in SQL Server.

Now, let’s assume we’re using PostgreSQL, which doesn’t support inverse distribution window functions, or MySQL, which doesn’t support inverse distribution functions at all, but both support PERCENT_RANK and FIRST_VALUE. Here’s the complete query:

SELECT DISTINCT
  rating,
  first_value(length) OVER (
    ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1,
  first_value(length) OVER (
    PARTITION BY rating 
    ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
FROM (
  SELECT
    rating,
    length,
    percent_rank() OVER (ORDER BY length) p1,
    percent_rank() OVER (PARTITION BY rating ORDER BY length) p2
  FROM film
) t
ORDER BY rating;

So, we’re doing this in two steps (visual example further down):

  1. PERCENT_RANK: In a derived table, we’re calculating the PERCENT_RANK value, which attributes a rank to each row ordered by length, going from 0 to 1. This makes sense. When looking for the median value, we’re really looking for the value whose PERCENT_RANK is 0.5 or less. When looking for the 90% percentile, we’re looking for the value whose PERCENT_RANK is 0.9 or less
  2. FIRST_VALUE: Once we’ve found the PERCENT_RANK, we’re not quite done yet. We need to find the last row whose PERCENT_RANK is less or equal to the percentile we’re interested in. I could have used LAST_VALUE, but then I would have needed to resort to using the quite verbose range clause of window functions. Instead, I when ordering the rows by PERCENT_RANK (p1 or p2), I translated all ranks higher than the percentile I’m looking for into NULL using a CASE expression, and then I made sure using NULLS LAST that the percentile I’m looking for will be the first row in the FIRST_VALUE function’s window specification. Easy!

To visualise this, let’s run these queries, which also project the p1 and p2 values respectively:

SELECT
  length,
  CASE WHEN p1 <= 0.5 THEN p1 END::numeric(3,2) p1,
  first_value(length) OVER (
    ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1
FROM (
  SELECT
    length,
    percent_rank() OVER (ORDER BY length) p1
  FROM film
) t
ORDER BY length;

The result is

length |p1   |x1  |
-------|-----|----|
46     |0.00 |114 |
46     |0.00 |114 |
46     |0.00 |114 |
46     |0.00 |114 |
46     |0.00 |114 |
47     |0.01 |114 |
...
113    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 |
114    |0.49 |114 | <-- Last row whose PERCENT_RANK is <= 0.5
115    |     |114 |
115    |     |114 |
115    |     |114 |
115    |     |114 |
115    |     |114 |
115    |     |114 |
...
185    |     |114 |
185    |     |114 |
185    |     |114 |

So the FIRST_VALUE function just searches for that first row (descendingly, i.e. bottom up) whose p1 value is non-null.

The same for p2:

SELECT 
  length,
  rating,
  CASE WHEN p2 <= 0.5 THEN p2 END::numeric(3,2) p2,
  first_value(length) OVER (
    PARTITION BY rating 
    ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
FROM (
  SELECT
    rating,
    length,
    percent_rank() OVER (PARTITION BY rating ORDER BY length) p2
  FROM film
) t
ORDER BY rating, length;

Yielding:

length |rating |p2   |x2  |
-------|-------|-----|----|
47     |G      |0.00 |107 |
47     |G      |0.00 |107 |
48     |G      |0.01 |107 |
48     |G      |0.01 |107 |
...
105    |G      |0.47 |107 |
106    |G      |0.49 |107 |
107    |G      |0.49 |107 |
107    |G      |0.49 |107 | <-- Last row in G partition whose
108    |G      |     |107 |     PERCENT_RANK is <= 0.5
108    |G      |     |107 |
109    |G      |     |107 |
...
185    |G      |     |107 |
185    |G      |     |107 |
46     |PG     |0.00 |113 |
47     |PG     |0.01 |113 |
47     |PG     |0.01 |113 |
...
111    |PG     |0.49 |113 |
113    |PG     |0.49 |113 |
113    |PG     |0.49 |113 | <-- Last row in PG partition whose
114    |PG     |     |113 |     PERCENT_RANK is <= 0.5
114    |PG     |     |113 |
...

Perfect! Notice if your RDBMS doesn’t support the NULLS LAST clause in your ORDER BY clause (e.g. MySQL), you might either hope that it defaults to sorting NULLS LAST (MySQL does), or you can emulate it as such:

-- This
ORDER BY x NULLS LAST

-- Is the same as this
ORDER BY
  CASE WHEN x IS NULL THEN 1 ELSE 0 END,
  x

Emulating aggregate functions

If you’re using SQL Server and want aggregate function behaviour, I recommend using the window function instead and emulate aggregation using DISTINCT. It will probably be easier than the emulation below. Do check for performance though!

When you’re using e.g. MySQL, which doesn’t have inverse distribution function support at all, then this chapter is for you.

Here’s how to use the aggregate function version in Oracle:

-- Without GROUP BY
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY length) x1
FROM film;

-- With GROUP BY
SELECT
  rating,
  percentile_disc(0.5) WITHIN GROUP (ORDER BY length) x2
FROM film
GROUP BY rating
ORDER BY rating;

Trivial! The result is the same as before:

X1
---
114


RATING  X2
-----------
G       107
NC-17   112
PG      113
PG-13   125
R       115

Now, let’s emulate these on e.g. MySQL, using window functions.

-- Without GROUP BY
SELECT
  MAX(x1) x1
FROM (
  SELECT first_value(length) OVER (
    ORDER BY CASE WHEN p1 <= 0.5 THEN p1 END DESC NULLS LAST) x1
  FROM (
    SELECT
      length,
      percent_rank() OVER (ORDER BY length) p1
    FROM film
  ) t
) t;

It’s exactly the same technique as before, except we now have to turn the window function behaviour (don’t group, preserve rows, repeat aggregation value on each row) back into aggregate function behaviour (group, collapse rows) by using an aggregate function, such as MAX(). This is the same as what I did before with DISTINCT, for illustration purposes.

-- With GROUP BY
SELECT
  rating,
  MAX(x2) x2
FROM (
  SELECT
    rating,
    first_value(length) OVER (
      PARTITION BY rating 
      ORDER BY CASE WHEN p2 <= 0.5 THEN p2 END DESC NULLS LAST) x2
  FROM (
    SELECT
      rating,
      length,
      percent_rank() OVER (
        PARTITION BY rating 
        ORDER BY length) p2
    FROM film
  ) t
) t
GROUP BY rating
ORDER BY rating;

All we’re really doing (again) is translate the GROUP BY expression to a PARTITION BY expression in the window function, and then redo the previous exercise.

Conclusion

Window functions are extremely powerful. They can be used and combined to calculate a variety of other aggregations. With the above approach, we can calculate the PERCENTILE_DISC inverse distribution function, which is not readily available in most RDBMS using a more verbose but equally powerful approach that uses PERCENT_RANK and FIRST_VALUE in all RDBMS that support window functions. A similar exercise could be made with PERCENTILE_CONT with a slightly more tricky approach to finding that FIRST_VALUE, which I’ll leave as an exercise to the reader.

A future jOOQ version might emulate this for you, automatically.

Liked this article? You may also like 10 SQL Tricks That You Didn’t Think Were Possible.

Writing Custom Aggregate Functions in SQL Just Like a Java 8 Stream Collector

All SQL databases support the standard aggregate functions COUNT(), SUM(), AVG(), MIN(), MAX().

Some databases support other aggregate functions, like:

  • EVERY()
  • STDDEV_POP()
  • STDDEV_SAMP()
  • VAR_POP()
  • VAR_SAMP()
  • ARRAY_AGG()
  • STRING_AGG()

But what if you want to roll your own?

Java 8 Stream Collector

When using Java 8 streams, we can easily roll our own aggregate function (i.e. a Collector). Let’s assume we want to find the second highest value in a stream. The highest value can be obtained like this:

System.out.println(
    Stream.of(1, 2, 3, 4)
          .collect(Collectors.maxBy(Integer::compareTo))
) ;

Yielding:

Optional[4]

Now, what about the second highest value? We can write the following collector:

System.out.println(
    Stream.of(1, 6, 2, 3, 4, 4, 5).parallel()
          .collect(Collector.of(
              () -> new int[] { 
                  Integer.MIN_VALUE, 
                  Integer.MIN_VALUE 
              },
              (a, i) -> {
                  if (a[0] < i) {
                      a[1] = a[0];
                      a[0] = i;
                  }
                  else if (a[1] < i)
                      a[1] = i;
              },
              (a1, a2) -> {
                  if (a2[0] > a1[0]) {
                      a1[1] = a1[0];
                      a1[0] = a2[0];

                      if (a2[1] > a1[1])
                          a1[1] = a2[1];
                  }
                  else if (a2[0] > a1[1])
                      a1[1] = a2[0];

                  return a1;
              },
              a -> a[1]
          ))
) ;

It doesn’t do anything fancy. It has these 4 functions:

  • Supplier<int[]>: A supplier that provides an intermediary int[] of length 2, initialised with Integer.MIN_VALUE, each. This array will remember the MAX() value in the stream at position 0 and the SECOND_MAX() value in the stream at position 1
  • BiConsumer<int[], Integer>: A accumulator that accumulates new values from the stream into our intermediary data structure.
  • BinaryOperator<int[]>: A combiner that combines two intermediary data structures. This is used for parallel streams only.
  • Function<int[], Integer>: The finisher function that extracts the SECOND_MAX() function from the second position in our intermediary array.

The output is now:

5

How to do the same thing with SQL?

Many SQL databases offer a very similar way of calculating custom aggregate functions. Here’s how to do the exact same thing with…

Oracle:

With the usual syntactic ceremony…

CREATE TYPE u_second_max AS OBJECT (

  -- Intermediary data structure
  MAX NUMBER,
  SECMAX NUMBER,

  -- Corresponds to the Collector.supplier() function
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max) RETURN NUMBER,

  -- Corresponds to the Collector.accumulate() function
  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER,

  -- Corresponds to the Collector.combineer() function
  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER,

  -- Correspodns to the Collector.finisher() function
  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER
)
/

-- This is our "colletor" implementation
CREATE OR REPLACE TYPE BODY u_second_max IS
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max)
  RETURN NUMBER IS
  BEGIN
    SCTX := U_SECOND_MAX(0, 0);
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER IS
  BEGIN
    IF VALUE > SELF.MAX THEN
      SELF.SECMAX := SELF.MAX;
      SELF.MAX := VALUE;
    ELSIF VALUE > SELF.SECMAX THEN
      SELF.SECMAX := VALUE;
    END IF;
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS
  BEGIN
    RETURNVALUE := SELF.SECMAX;
    RETURN ODCIConst.Success;
  END;

  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER IS
  BEGIN
    IF CTX2.MAX > SELF.MAX THEN
      SELF.SECMAX := SELF.MAX;
      SELF.MAX := CTX2.MAX;
    
      IF CTX2.SECMAX > SELF.SECMAX THEN
        SELF.SECMAX := CTX2.SECMAX;
      END IF;
    ELSIF CTX2.MAX > SELF.SECMAX THEN
      SELF.SECMAX := CTX2.MAX;
    END IF;
  
    RETURN ODCIConst.Success;
  END;
END;
/

-- Finally, we have to give this aggregate function a name
CREATE FUNCTION SECOND_MAX (input NUMBER) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING u_second_max;
/

We can now run the above on the Sakila database:

SELECT 
  max(film_id), 
  second_max(film_id) 
FROM film;

To get:

MAX     SECOND_MAX
------------------
1000    999

And what’s even better, we can use the aggregate function as a window function for free!

SELECT 
  film_id,
  length,
  max(film_id) OVER (PARTITION BY length), 
  second_max(film_id) OVER (PARTITION BY length)
FROM film
ORDER BY length, film_id;

The above yields:

FILM_ID  LENGTH  MAX   SECOND_MAX
---------------------------------
15       46      730   505
469      46      730   505
504      46      730   505
505      46      730   505
730      46      730   505
237      47      869   784
247      47      869   784
393      47      869   784
398      47      869   784
407      47      869   784
784      47      869   784
869      47      869   784
2        48      931   866
410      48      931   866
575      48      931   866
630      48      931   866
634      48      931   866
657      48      931   866
670      48      931   866
753      48      931   866
845      48      931   866
866      48      931   866
931      48      931   866

Beautiful, right?

PostgreSQL

PostgreSQL supports a slightly more concise syntax in the CREATE AGGREGATE statement. If we don’t allow for parallelism, we can write this minimal implementation:

CREATE FUNCTION second_max_sfunc (
  state INTEGER[], data INTEGER
) RETURNS INTEGER[] AS
$$
BEGIN
  IF state IS NULL THEN
    RETURN ARRAY[data, NULL];
  ELSE
    RETURN CASE 
      WHEN state[1] > data
      THEN CASE 
        WHEN state[2] > data
        THEN state
        ELSE ARRAY[state[1], data]
      END
      ELSE ARRAY[data, state[1]]
    END;
  END IF;
END;
$$ LANGUAGE plpgsql;
/

CREATE FUNCTION second_max_ffunc (
  state INTEGER[]
) RETURNS INTEGER AS
$$
BEGIN
  RETURN state[2];
END;
$$ LANGUAGE plpgsql;

CREATE AGGREGATE second_max (INTEGER) (
  SFUNC     = second_max_sfunc,
  STYPE     = INTEGER[],
  FINALFUNC = second_max_ffunc
);

Here, we use the STYPE (Collector.supplier()), the SFUNC (Collector.accumulator()), and the FINALFUNC (Collector.finisher()) specifications.

Other databases

Many other databases allow for specifying user defined aggregate functions. Look up your database manual’s details to learn more. They always work in the same way as a Java 8 Collector.

How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

SQL is a verbose language, and one of the most verbose features are window functions.

In a stack overflow question that I’ve encountered recently, someone asked to calculate the difference between the first and the last value in a time series for any given day:

Input

volume  tstamp
---------------------------
29011   2012-12-28 09:00:00
28701   2012-12-28 10:00:00
28830   2012-12-28 11:00:00
28353   2012-12-28 12:00:00
28642   2012-12-28 13:00:00
28583   2012-12-28 14:00:00
28800   2012-12-29 09:00:00
28751   2012-12-29 10:00:00
28670   2012-12-29 11:00:00
28621   2012-12-29 12:00:00
28599   2012-12-29 13:00:00
28278   2012-12-29 14:00:00

Desired output

first  last   difference  date
------------------------------------
29011  28583  428         2012-12-28
28800  28278  522         2012-12-29

How to write the query?

Notice that the value and timestamp progression do not correlate as it may appear. So, there is not a rule that if Timestamp2 > Timestamp1 then Value2 < Value1. Otherwise, this simple query would work (using PostgreSQL syntax):

SELECT 
  max(volume)               AS first,
  min(volume)               AS last,
  max(volume) - min(volume) AS difference,
  CAST(tstamp AS DATE)      AS date
FROM t
GROUP BY CAST(tstamp AS DATE);

There are several ways to find the first and last values within a group that do not involve window functions. For example:

  • In Oracle, you can use the FIRST and LAST functions, which for some arcane reason are not written FIRST(...) WITHIN GROUP (ORDER BY ...) or LAST(...) WITHIN GROUP (ORDER BY ...), like other sorted set aggregate functions, but some_aggregate_function(...) KEEP (DENSE_RANK FIRST ORDER BY ...). Go figure
  • In PostgreSQL, you could use the DISTINCT ON syntax along with ORDER BY and LIMIT

More details about the various approaches can be found here:
https://blog.jooq.org/2017/09/22/how-to-write-efficient-top-n-queries-in-sql

The best performing approach would be to use an aggregate function like Oracle’s, but few databases have this function. So, we’ll resort to using the FIRST_VALUE and LAST_VALUE window functions:

SELECT DISTINCT
  first_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS first,
  last_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last,
  first_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) 
  - last_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
ORDER BY CAST(tstamp AS DATE)

Oops 🤔

That doesn’t look too readable. But it will yield the correct result. Granted, we could wrap the definition for the columns FIRST and LAST in a derived table, but that would still leave us with two repetitions of the window definition:

PARTITION BY CAST(tstamp AS DATE) 
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

WINDOW clause to the rescue

Luckily, at least 3 databases have implemented the SQL standard WINDOW clause:

  • MySQL
  • PostgreSQL
  • Sybase SQL Anywhere

The above query can be refactored to this one:

SELECT DISTINCT
  first_value(volume) OVER w AS first,
  last_value(volume) OVER w AS last,
  first_value(volume) OVER w 
    - last_value(volume) OVER w AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW w AS (
  PARTITION BY CAST(tstamp AS DATE) 
  ORDER BY tstamp
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY CAST(tstamp AS DATE)

Notice how I can specify a window name with a window specification in a similar way as I can define a common table expression (WITH clause):

WINDOW 
    <window-name> AS (<window-specification>)
{  ,<window-name> AS (<window-specification>)... }

Not only can I reuse entire specifications, I could also build a specification from a partial specification, and reuse only parts. My previous query could have been rewritten as such:

SELECT DISTINCT
  first_value(volume) OVER w3 AS first,
  last_value(volume) OVER w3 AS last,
  first_value(volume) OVER w3 
    - last_value(volume) OVER w3 AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW 
  w1 AS (PARTITION BY CAST(tstamp AS DATE)),
  w2 AS (w1 ORDER BY tstamp),
  w3 AS (w2 ROWS BETWEEN UNBOUNDED PRECEDING 
                     AND UNBOUNDED FOLLOWING)
ORDER BY CAST(tstamp AS DATE)

Each window specification can be created from scratch, or be based on a previously defined window specification. Note this is also true when referencing the window definition. If I wanted to reuse the PARTITION BY clause and the ORDER BY clause, but change the FRAME clause (ROWS ...), then I could have written this:

SELECT DISTINCT
  first_value(volume) OVER (
    w2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS first,
  last_value(volume) OVER (
    w2 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS last,
  first_value(volume) OVER (
    w2 ROWS UNBOUNDED PRECEDING
  ) - last_value(volume) OVER (
    w2 ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING
  ) AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW 
  w1 AS (PARTITION BY CAST(tstamp AS DATE)),
  w2 AS (w1 ORDER BY tstamp)
ORDER BY CAST(tstamp AS DATE)

What if my database doesn’t support the WINDOW clause?

In that case, you have to either manually write the window specification on each window function, or you use a SQL builder like jOOQ, which can emulate the window clause:

You can try this translation online on our website: https://www.jooq.org/translate

Find the Next Non-NULL Row in a Series With SQL

I’ve stumbled across this fun SQL question on reddit, recently. The question was looking at a time series of data points where some events happened. For each event, we have the start time and the end time

timestamp             start    end
-----------------------------------
2018-09-03 07:00:00   1        null
2018-09-03 08:00:00   null     null
2018-09-03 09:00:00   null     null
2018-09-03 10:00:00   null     1
2018-09-03 12:00:00   null     null
2018-09-03 13:00:00   null     null
2018-09-03 14:00:00   1        null
2018-09-03 15:00:00   null     1

The desired output of the query should be this additional count column:

timestamp             start    end    count
-------------------------------------------
2018-09-03 07:00:00   1        null   4
2018-09-03 08:00:00   null     null   null
2018-09-03 09:00:00   null     null   null
2018-09-03 10:00:00   null     1      null
2018-09-03 12:00:00   null     null   null
2018-09-03 13:00:00   null     null   null
2018-09-03 14:00:00   1        null   2
2018-09-03 15:00:00   null     1      null

So, the rule is simple. Whenever an event starts, we would like to know how many consecutive entries it takes until the event stops again. We can visually see how that makes sense:

timestamp             start    end    count
-------------------------------------------
2018-09-03 07:00:00   1        null   4     -- 4 Rows in this event
2018-09-03 08:00:00   null     null   null
2018-09-03 09:00:00   null     null   null
2018-09-03 10:00:00   null     1      null

2018-09-03 12:00:00   null     null   null  -- No event here
2018-09-03 13:00:00   null     null   null

2018-09-03 14:00:00   1        null   2     -- 2 Rows in this event
2018-09-03 15:00:00   null     1      null

Some observations and assumptions about the problem at hand:

  • No two events will ever overlap
  • The time series does not progress monotonously, i.e. even if most data points are 1h apart, there can be larger or smaller gaps between data points
  • There are, however, no two identical timestamps in the series

How can we solve this problem?

Create the data set, first

We’re going to be using PostgreSQL for this example, but it will work with any database that supports window functions, which are most databases these days.

In PostgreSQL, we can use the VALUES() clause to generate data in memory easily. For the sake of simplicity, we’re not going to use timestamps, but integer representations of the timestamps. I’ve included the same out-of-the-ordinary gap between 4 and 6:

values (1, 1, null),
       (2, null, null),
       (3, null, null),
       (4, null, 1),
       (6, null, null),
       (7, null, null),
       (8, 1, null),
       (9, null, 1)

If we run this statement (yes, this is a standalone statement in PostgreSQL!), then the database will simply echo back the values we’ve sent it:

column1 |column2 |column3 |
--------|--------|--------|
1       |1       |        |
2       |        |        |
3       |        |        |
4       |        |1       |
6       |        |        |
7       |        |        |
8       |1       |        |
9       |        |1       |

How to deal with non-monotonously growing series

The fact that column1 is not growing monotonously means that we cannot use it / trust it as a means to calculate the length of an event. We need to calculate an additional column that has a guaranteed monotonously growing set of integers in it. The ROW_NUMBER() window function is perfect for that.

Consider this SQL statement:

with 
  d(a, b, c) as (
	values (1, 1, null),
	       (2, null, null),
	       (3, null, null),
	       (4, null, 1),
	       (6, null, null),
	       (7, null, null),
	       (8, 1, null),
	       (9, null, 1)
  ),
  t as (
    select 
      row_number() over (order by a) as rn, a, b, c
    from d
  )
select * from t;

The new rn column is a row number calculated for each row based on the ordering of a. For simplicity, I’ve aliased:

  • a = timestamp
  • b = start
  • c = end

The result of this query is:

rn |a |b |c |
---|--|--|--|
1  |1 |1 |  |
2  |2 |  |  |
3  |3 |  |  |
4  |4 |  |1 |
5  |6 |  |  |
6  |7 |  |  |
7  |8 |1 |  |
8  |9 |  |1 |

Nothing fancy yet.

Now, how to use this rn column to find the length of an event?

Visually, we can get the idea quickly, seeing that an event’s length can be calculated using the formula RN2 - RN1 + 1:

rn |a |b |c |
---|--|--|--|
1  |1 |1 |  | RN1 = 1
2  |2 |  |  |
3  |3 |  |  |
4  |4 |  |1 | RN2 = 4

5  |6 |  |  |
6  |7 |  |  |

7  |8 |1 |  | RN1 = 7
8  |9 |  |1 | RN2 = 8

We have two events:

  • 4 – 1 + 1 = 4
  • 8 – 7 + 1 = 2

So, all we have to do is for each starting point of an event at RN1, find the corresponding RN2, and run the arithmetic. This is quite a bit of syntax, but it isn’t so hard, so bear with me while I explain:

with 
  d(a, b, c) as (
	values (1, 1, null),
	       (2, null, null),
	       (3, null, null),
	       (4, null, 1),
	       (6, null, null),
	       (7, null, null),
	       (8, 1, null),
	       (9, null, 1)
  ),
  t as (
    select 
      row_number() over (order by a) as rn, a, b, c
    from d
  )

-- Interesting bit here:
select
  a, b, c,
  case 
    when b is not null then 
      min(case when c is not null then rn end) 
        over (order by rn 
          rows between 1 following and unbounded following) 
      - rn + 1 
  end cnt
from t;

Let’s look at this new cnt column, step by step. First, the easy part:

The CASE expression

There’s a case expression that goes like this:

case 
  when b is not null then 
    ...
end cnt

All this does is check if b is not null and if this is true, then calculate something. Remember, b = start, so we’re putting a calculated value in the row where an event started. That was the requirement.

The new window function

So, what do we calculate there?

min(...) over (...) ...

A window function that finds the minimum value over a window of data. That minimum value is RN2, the next row number value where the event ends. So, what do we put in the min() function to get that value?

min(case when c is not null then rn end) 
over (...) 
...

Another case expression. When c is not null, we know the event has ended (remember, c = end). And if the event has ended, we want to find that row’s rn value. So that would be the minimum value of that case expression for all the rows after the row that started the event. Visually:

rn |a |b |c | case expr | minimum "next" value
---|--|--|--|-----------|---------------------
1  |1 |1 |  | null      | 4
2  |2 |  |  | null      | null
3  |3 |  |  | null      | null
4  |4 |  |1 | 4         | null

5  |6 |  |  | null      | null
6  |7 |  |  | null      | null

7  |8 |1 |  | null      | 8
8  |9 |  |1 | 8         | null

Now, we only need to specify that OVER() clause to form a window of all rows that follow the current row.

min(case when c is not null then rn end) 
  over (order by rn 
    rows between 1 following and unbounded following) 
...

The window is ordered by rn and it starts 1 row after the current row (1 following) and ends in infinity (unbounded following).

The only thing left to do now is do the arithmetic:

min(case when c is not null then rn end) 
  over (order by rn 
    rows between 1 following and unbounded following) 
- rn + 1

This is a verbose way of calculating RN2 - RN1 + 1, and we’re doing that only in those columns that start an event. The result of the complete query above is now:

a |b |c |cnt |
--|--|--|----|
1 |1 |  |4   |
2 |  |  |    |
3 |  |  |    |
4 |  |1 |    |
6 |  |  |    |
7 |  |  |    |
8 |1 |  |2   |
9 |  |1 |    |

Read more about window functions on this blog.

PostgreSQL 11’s Support for SQL Standard GROUPS and EXCLUDE Window Function Clauses

Exciting discovery when playing around with PostgreSQL 11! New SQL standard window function clauses have been supported. If you want to play with this, you can do so very easily using docker:

docker pull postgres:11
docker run --name POSTGRES11 -e POSTGRES_PASSWORD=postgres -d postgres:11
docker run -it --rm --link POSTGRES11:postgres postgres psql -h postgres -U postgres

See also: https://hub.docker.com/r/_/postgres

The frame clause

When working with window functions, in some cases you want to add the optional frame clause. For example, to get a sliding average over your data, you will write:

SELECT 
  payment_date,
  amount,
  avg(amount) OVER (
    ORDER BY payment_date, payment_id
    ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
  )::DECIMAL(10, 2),
  array_agg(amount) OVER (
    ORDER BY payment_date, payment_id
    ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
  )
FROM payment;

As always I will be running queries against the Sakila database. The above query yields:

payment_date        |amount |avg  |array_agg                   |
--------------------|-------|-----|----------------------------|
2005-05-24 22:53:30 |2.99   |3.32 |          {2.99,2.99,3.99}  |
2005-05-24 22:54:33 |2.99   |3.74 |     {2.99,2.99,3.99,4.99}  |
2005-05-24 23:03:39 |3.99   |4.39 |{2.99,2.99,3.99,4.99,6.99}  |
2005-05-24 23:04:41 |4.99   |3.99 |{2.99,3.99,4.99,6.99,0.99}  |
2005-05-24 23:05:21 |6.99   |3.79 |{3.99,4.99,6.99,0.99,1.99}  |
2005-05-24 23:08:07 |0.99   |3.99 |{4.99,6.99,0.99,1.99,4.99}  |
2005-05-24 23:11:53 |1.99   |3.99 |{6.99,0.99,1.99,4.99,4.99}  |
2005-05-24 23:31:46 |4.99   |3.79 |{0.99,1.99,4.99,4.99,5.99}  |

The array_agg function helps display how the sliding average came to be. For each average value, we’re looking 2 rows ahead and 2 rows behind in the ordered window.

In the above query, I’m using the optional frame clause to specify the frame size. It has three “modes” or “units”:

<window frame units> ::=
  ROWS
| RANGE
| GROUPS

Almost all databases that support window functions support the first two unit types. To my knowledge, only PostgreSQL 11 and H2 1.4.198 now also supports GROUPS. The difference is rather simple to explain:

  • ROWS counts the exact number of rows in the frame.
  • RANGE performs logical windowing where we don’t count the number of rows, but look for a value offset.
  • GROUPS counts all groups of tied rows within the window.

I think this is best explained by example. Let’s look at payments with payment timestamps truncated to the hour:

WITH hourly_payment AS (
  SELECT 
    payment_id,
    date_trunc('h', payment_date) AS hour,
    amount
  FROM payment
)
SELECT *
FROM hourly_payment
ORDER BY hour;

This gives us:

payment_id |hour                |amount |
-----------|--------------------|-------|
12377      |2005-05-24 22:00:00 |2.99   | \  Tied group
3504       |2005-05-24 22:00:00 |2.99   | /

6440       |2005-05-24 23:00:00 |4.99   | \
11032      |2005-05-24 23:00:00 |3.99   |  |
8987       |2005-05-24 23:00:00 |4.99   |  | Tied group
6003       |2005-05-24 23:00:00 |6.99   |  |
14728      |2005-05-24 23:00:00 |0.99   |  |
7274       |2005-05-24 23:00:00 |1.99   | /

12025      |2005-05-25 00:00:00 |0.99   | \
3831       |2005-05-25 00:00:00 |8.99   |  |
7044       |2005-05-25 00:00:00 |4.99   |  |
8623       |2005-05-25 00:00:00 |9.99   |  | Tied group
3386       |2005-05-25 00:00:00 |4.99   |  |
8554       |2005-05-25 00:00:00 |4.99   |  |
10785      |2005-05-25 00:00:00 |5.99   |  |
9014       |2005-05-25 00:00:00 |6.99   | /

15394      |2005-05-25 01:00:00 |2.99   | \
10499      |2005-05-25 01:00:00 |4.99   |  |
5020       |2005-05-25 01:00:00 |2.99   |  | Tied group
490        |2005-05-25 01:00:00 |0.99   |  |
12305      |2005-05-25 01:00:00 |4.99   | /

11796      |2005-05-25 02:00:00 |4.99   | \
9463       |2005-05-25 02:00:00 |4.99   |  | Tied group
13711      |2005-05-25 02:00:00 |4.99   | /

Now we can see that for each hour, we have several payments. When we order payments by hour, there are some “tied” payments within that hour (or “group”), i.e. the order among payments on 2005-05-24 22:00:00 are not ordered deterministically among themselves. The payment ids are pretty random.

Now, if we look at the three window frame units again, how do they behave?

ROWS

WITH hourly_payment AS (
  SELECT 
    payment_id,
    date_trunc('h', payment_date) AS hour
  FROM payment
)
SELECT 
  payment_id,
  hour,
  array_agg(payment_id) OVER (
    ORDER BY hour
    ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
  )
FROM hourly_payment
ORDER BY hour;

We can see that the size of the window is always precisely 5 rows (except at the beginning and end of the data set):

payment_id |hour                |array_agg                      |
-----------|--------------------|-------------------------------|
12377      |2005-05-24 22:00:00 |{12377,3504,6440}              |
3504       |2005-05-24 22:00:00 |{12377,3504,6440,11032}        |
6440       |2005-05-24 23:00:00 |{12377,3504,6440,11032,8987}   |
11032      |2005-05-24 23:00:00 |{3504,6440,11032,8987,6003}    |
8987       |2005-05-24 23:00:00 |{6440,11032,8987,6003,14728}   |
6003       |2005-05-24 23:00:00 |{11032,8987,6003,14728,7274}   |
14728      |2005-05-24 23:00:00 |{8987,6003,14728,7274,12025}   |
7274       |2005-05-24 23:00:00 |{6003,14728,7274,12025,3831}   |
12025      |2005-05-25 00:00:00 |{14728,7274,12025,3831,7044}   |
3831       |2005-05-25 00:00:00 |{7274,12025,3831,7044,8623}    |
7044       |2005-05-25 00:00:00 |{12025,3831,7044,8623,3386}    |
8623       |2005-05-25 00:00:00 |{3831,7044,8623,3386,8554}     |
3386       |2005-05-25 00:00:00 |{7044,8623,3386,8554,10785}    |
8554       |2005-05-25 00:00:00 |{8623,3386,8554,10785,9014}    |
10785      |2005-05-25 00:00:00 |{3386,8554,10785,9014,15394}   |
9014       |2005-05-25 00:00:00 |{8554,10785,9014,15394,10499}  |
15394      |2005-05-25 01:00:00 |{10785,9014,15394,10499,5020}  |
10499      |2005-05-25 01:00:00 |{9014,15394,10499,5020,490}    |
5020       |2005-05-25 01:00:00 |{15394,10499,5020,490,12305}   |
490        |2005-05-25 01:00:00 |{10499,5020,490,12305,11796}   |
12305      |2005-05-25 01:00:00 |{5020,490,12305,11796,9463}    |
11796      |2005-05-25 02:00:00 |{490,12305,11796,9463,13711}   |
9463       |2005-05-25 02:00:00 |{12305,11796,9463,13711,8167}  |
13711      |2005-05-25 02:00:00 |{11796,9463,13711,8167,1011}   |

There is no notion of a “group” among the rows in the window. But the problem is that we’re getting random PAYMENT_ID values unless we also add the PAYMENT_ID to the ORDER BY clause. This isn’t really what we want, most of the time, so we use:

RANGE

WITH hourly_payment AS (
  SELECT 
    payment_id,
    date_trunc('h', payment_date) AS hour
  FROM payment
)
SELECT 
  payment_id,
  hour,
  EXTRACT(epoch FROM hour) / 3600,
  array_agg(payment_id) OVER (
    ORDER BY EXTRACT(epoch FROM hour) / 3600
    RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING
  )
FROM hourly_payment
ORDER BY hour;

I have switched from ROWS to RANGE and now the ORDER BY clause works on a number based on the epoch of the hour. What happens now?

This now yields:

payment_id |hour                |?column? |array_agg                                                                                                                                                              
-----------|--------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
12377      |2005-05-24 22:00:00 |310270   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014}
3504       |2005-05-24 22:00:00 |310270   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014}

6440       |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}
11032      |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}
8987       |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}
6003       |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}
14728      |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}
7274       |2005-05-24 23:00:00 |310271   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305}

12025      |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
3831       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
7044       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
8623       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
3386       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
8554       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
10785      |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}
9014       |2005-05-25 00:00:00 |310272   |{12377,3504,  6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711}

15394      |2005-05-25 01:00:00 |310273   |{6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245}
10499      |2005-05-25 01:00:00 |310273   |{6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245}
5020       |2005-05-25 01:00:00 |310273   |{6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245}
490        |2005-05-25 01:00:00 |310273   |{6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245}
12305      |2005-05-25 01:00:00 |310273   |{6440,11032,8987,6003,14728,7274,  12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245}

11796      |2005-05-25 02:00:00 |310274   |{12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
9463       |2005-05-25 02:00:00 |310274   |{12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
13711      |2005-05-25 02:00:00 |310274   |{12025,3831,7044,8623,3386,8554,10785,9014,  15394,10499,5020,490,12305,  11796,9463,13711,  8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}

I’ve visually separated the rows by their hour and the array aggregation by the “tied” payment_ids, i.e. the payment IDs that have the same hour.

Observations:

  1. We get the same aggregation value for the entire set of tied rows, so if in two rows, HOUR is the same, then ARRAY_AGG is the same as well
  2. The window size is now a logical size, no longer an offset size, so we’re going back 2 hours and ahead 2 hours (instead of 2 rows). This is why I’ve extracted epoch and divided it by hour, so I will get consecutive integer values for consecutive hours

The same result could have been achieved using interval types:

WITH hourly_payment AS (
  SELECT 
    payment_id,
    date_trunc('h', payment_date) AS hour
  FROM payment
)
SELECT 
  payment_id,
  hour,
  EXTRACT(epoch FROM hour) / 3600,
  array_agg(payment_id) OVER (
    ORDER BY hour
    RANGE BETWEEN INTERVAL '2 hours' PRECEDING 
              AND INTERVAL '2 hours' FOLLOWING
  )
FROM hourly_payment
ORDER BY hour;

See also this article for details:
https://blog.jooq.org/2016/10/31/a-little-known-sql-feature-use-logical-windowing-to-aggregate-sliding-ranges/

GROUPS

The third frame unit is quite useful, as we can now frame the window to a number of groups of same values. In our case, all payments of the same hour are in the same group. So, in order to get a similar result again, we can write:

WITH hourly_payment AS (
  SELECT 
    payment_id,
    payment_date,
    date_trunc('h', payment_date) AS hour
  FROM payment
)
SELECT 
  payment_id,
  hour,
  array_agg(payment_id) OVER (
    ORDER BY hour
    GROUPS BETWEEN 2 PRECEDING AND 2 FOLLOWING
  )
FROM hourly_payment
ORDER BY hour;

In fact, this is not exactly the same result, because if we have gaps in the hours, GROUPS will simply jump over the gaps, whereas RANGE will not.

Summary of ROWS, RANGE, GROUPS

The above case was a real world use-case. A more constructed example that might be easier to digest, visually, can be seen here:

WITH t(id, v) AS (
  VALUES (1, 1), (2, 1), (3, 3), (4, 5), (5, 5), (6, 5), (7, 6)
)
SELECT
  id,
  v,
  array_agg(id) OVER rows,
  array_agg(v)  OVER rows,
  array_agg(id) OVER range,
  array_agg(v)  OVER range,
  array_agg(id) OVER groups,
  array_agg(v)  OVER groups
FROM t
WINDOW 
  o AS (ORDER BY v),
  rows AS (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING),
  range AS (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING),
  groups AS (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

Notice, I’m using the SQL standard WINDOW clause to be able to name and reuse a repeated window specification. I’ve seen this clause to be supported in:

  • MySQL 8.0
  • PostgreSQL
  • Sybase SQL Anywhere

The query yields:

id |v |array_agg |array_agg |array_agg |array_agg |array_agg     |array_agg     |
---|--|----------|----------|----------|----------|--------------|--------------|
1  |1 |{1,2}     |{1,1}     |{1,2}     |{1,1}     |{1,2,3}       |{1,1,3}       |
2  |1 |{1,2,3}   |{1,1,3}   |{1,2}     |{1,1}     |{1,2,3}       |{1,1,3}       |
3  |3 |{2,3,4}   |{1,3,5}   |{3}       |{3}       |{1,2,3,4,5,6} |{1,1,3,5,5,5} |
4  |5 |{3,4,5}   |{3,5,5}   |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7}   |{3,5,5,5,6}   |
5  |5 |{4,5,6}   |{5,5,5}   |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7}   |{3,5,5,5,6}   |
6  |5 |{5,6,7}   |{5,5,6}   |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7}   |{3,5,5,5,6}   |
7  |6 |{6,7}     |{5,6}     |{4,5,6,7} |{5,5,5,6} |{4,5,6,7}     |{5,5,5,6}     |

Observation:

  • The ROWS framed window is of size 3 max in this case (1 row preceding, the current row, and 1 row following)
  • The RANGE framed window is a logical window that looks behind a value of 1 and ahead a value of 1
  • The GROUPS framed window is of size 3 groups max in this case (1 group preceding, the current group, and 1 group following)

Neat, huh?

jOOQ 3.12 will add support for this feature: https://github.com/jOOQ/jOOQ/issues/7646

EXCLUDE clause

This is probably a bit less frequently useful than the new GROUPS clause. There is now a new window frame exclusion clause:

<window frame exclusion> ::=
  EXCLUDE CURRENT ROW
| EXCLUDE GROUP
| EXCLUDE TIES
| EXCLUDE NO OTHERS

It can be used to exclude some rows around the current row from being in the window. I have yet to think of a use case for this. Here’s how it works for:

ROWS

WITH t(v) AS (
  VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
  v,
  array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
                       EXCLUDE CURRENT ROW) AS current_row,
  array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE GROUP) AS group,
  array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE TIES) AS ties,
  array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)

Resulting in:

v |current_row |group |ties    |no_others |
--|------------|------|--------|----------|
1 |{1}         |NULL  |{1}     |{1,1}     |
1 |{1,3}       |{3}   |{1,3}   |{1,1,3}   |
3 |{1,5}       |{1,5} |{1,3,5} |{1,3,5}   |
5 |{3,5}       |{3}   |{3,5}   |{3,5,5}   |
5 |{5,5}       |NULL  |{5}     |{5,5,5}   |
5 |{5,6}       |{6}   |{5,6}   |{5,5,6}   |
6 |{5}         |{5}   |{5,6}   |{5,6}     |

As you can see, the window may now be completely empty, which results in NULL being emitted.

  • Excluding the current row seems obvious
  • Excluding the current group also
  • Excluding ties excludes all other rows from the group
  • Excluding no others is the default, just like when you don’t put this EXCLUDE clause

RANGE

The exclusion can be applied to logical windowing as well:

WITH t(v) AS (
  VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
  v,
  array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE CURRENT ROW) AS current_row,
  array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE GROUP) AS group,
  array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE TIES) AS ties,
  array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)

Resulting in:

v |current_row |group   |ties      |no_others |
--|------------|--------|----------|----------|
1 |{1}         |NULL    |{1}       |{1,1}     |
1 |{1}         |NULL    |{1}       |{1,1}     |
3 |NULL        |NULL    |{3}       |{3}       |
5 |{5,5,6}     |{6}     |{5,6}     |{5,5,5,6} |
5 |{5,5,6}     |{6}     |{5,6}     |{5,5,5,6} |
5 |{5,5,6}     |{6}     |{5,6}     |{5,5,5,6} |
6 |{5,5,5}     |{5,5,5} |{5,5,5,6} |{5,5,5,6} |

GROUPS

Same for grouped windows:

WITH t(v) AS (
  VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
  v,
  array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE CURRENT ROW) AS current_row,
  array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE GROUP) AS group,
  array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE TIES) AS ties,
  array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING 
                       EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)

Resulting in:

v |current_row |group       |ties          |no_others     |
--|------------|------------|--------------|--------------|
1 |{1,3}       |{3}         |{1,3}         |{1,1,3}       |
1 |{1,3}       |{3}         |{1,3}         |{1,1,3}       |
3 |{1,1,5,5,5} |{1,1,5,5,5} |{1,1,3,5,5,5} |{1,1,3,5,5,5} |
5 |{3,5,5,6}   |{3,6}       |{3,5,6}       |{3,5,5,5,6}   |
5 |{3,5,5,6}   |{3,6}       |{3,5,6}       |{3,5,5,5,6}   |
5 |{3,5,5,6}   |{3,6}       |{3,5,6}       |{3,5,5,5,6}   |
6 |{5,5,5}     |{5,5,5}     |{5,5,5,6}     |{5,5,5,6}     |

Needless to say that this clause will be supported in jOOQ 3.12 as well: https://github.com/jOOQ/jOOQ/issues/7647

Bonus points for the reader who can think of a real world use-case for this clause, please leave a comment!

Using UNPIVOT to Traverse a Configuration Table’s Rows and Columns

Imagine you have a configuration table like the following:

CREATE TABLE rule (
  name     VARCHAR2(50)         NOT NULL PRIMARY KEY,
  enabled  NUMBER(1)  DEFAULT 1 NOT NULL CHECK (enabled IN (0,1)),
  priority NUMBER(10) DEFAULT 0 NOT NULL,
  flag1    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag2    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag3    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag4    NUMBER(3)  DEFAULT 0 NOT NULL,
  flag5    NUMBER(3)  DEFAULT 0 NOT NULL
);

It specifies a set of rules that

  • Can be enabled / disabled
  • Can be given a priority among themselves
  • Include a set of flags which correspond to the thing you want to configure (e.g. some check to execute)
  • Those flags can be ordered as well

So, given the following data:

INSERT INTO rule (name, priority, flag1, flag5) 
  VALUES ('RULE 1', 1, 1, 2);
INSERT INTO rule (name, priority, flag2, flag5) 
  VALUES ('RULE 2', 2, 2, 1);
INSERT INTO rule (name, priority, flag3, flag4, flag5) 
  VALUES ('RULE 3', 3, 3, 1, 2);
INSERT INTO rule (name, priority, flag3) 
  VALUES ('RULE 4', 4, 1);

SELECT * FROM rule;

We’ll get our configuration “spreadsheet”:

NAME    ENABLED  PRIORITY  FLAG1  FLAG2  FLAG3  FLAG4  FLAG5
------------------------------------------------------------
RULE 1  1        1         1      0      0      0      2
RULE 2  1        2         0      2      0      0      1
RULE 3  1        3         0      0      3      1      2
RULE 4  1        4         0      0      1      0      0

This form is really useful to edit the configuration. If we want to activate FLAG2 in RULE 1, we just go to that cell in some SQL tool like Oracle SQL Developer, and change the value.

But reading the configuration is a bit different. FLAG1 through FLAG5 are not nicely normalised. How to read the data as though it were normalised?

Using UNPIVOT

In Oracle and SQL Server, we can use UNPIVOT for this use case. I’m using Oracle syntax in this blog post. SQL Server’s is just slightly different. Consider the following query:

SELECT name, flag, value
FROM rule
UNPIVOT (
  value FOR flag IN (
    flag1,  
    flag2,  
    flag3,  
    flag4,  
    flag5
  )
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

This will result in the following result set:

NAME    FLAG    VALUE
---------------------
RULE 1  FLAG1   1
RULE 1  FLAG5   2
RULE 2  FLAG5   1
RULE 2  FLAG2   2
RULE 3  FLAG4   1
RULE 3  FLAG5   2
RULE 3  FLAG3   3
RULE 4  FLAG3   1

In this representation, the rules are ordered by priority, and the flags are ordered by their respective value within a rule. The flags that are not turned on (value 0) are simply omitted. This form is much easier to traverse procedurally, when “consuming” the configuration.

How does it work?

In principle, UNPIVOT is just syntax sugar for a bunch of UNION ALL subqueries. We could have written our query like this, instead:

SELECT name, flag, value
FROM (
  SELECT rule.*, 'FLAG1' AS flag, FLAG1 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG2' AS flag, FLAG2 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG3' AS flag, FLAG3 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG4' AS flag, FLAG4 AS value FROM rule
  UNION ALL
  SELECT rule.*, 'FLAG5' AS flag, FLAG5 AS value FROM rule
) rule
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

Which is decidedly more code. It’s also more work for the database. The execution plans are different (I’m using Oracle 12.2.0.1.0):

UNPIVOT version – single table access

---------------------------------------------
| Id  | Operation            | Name | Rows  |
---------------------------------------------
|   0 | SELECT STATEMENT     |      |       |
|   1 |  SORT ORDER BY       |      |     5 |
|*  2 |   VIEW               |      |     5 |
|   3 |    UNPIVOT           |      |       |
|*  4 |     TABLE ACCESS FULL| RULE |     1 |
---------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(("unpivot_view_005"."VALUE">0 AND 
              "unpivot_view_005"."ENABLED"=1))
   4 - filter("RULE"."ENABLED"=1)

UNION ALL version – multi table access

---------------------------------------------
| Id  | Operation            | Name | Rows  |
---------------------------------------------
|   0 | SELECT STATEMENT     |      |       |
|   1 |  SORT ORDER BY       |      |     8 |
|   2 |   VIEW               |      |     8 |
|   3 |    UNION-ALL         |      |       |
|*  4 |     TABLE ACCESS FULL| RULE |     1 |
|*  5 |     TABLE ACCESS FULL| RULE |     1 |
|*  6 |     TABLE ACCESS FULL| RULE |     2 |
|*  7 |     TABLE ACCESS FULL| RULE |     1 |
|*  8 |     TABLE ACCESS FULL| RULE |     3 |
---------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(("RULE"."ENABLED"=1 AND "FLAG1">0))
   5 - filter(("RULE"."ENABLED"=1 AND "FLAG2">0))
   6 - filter(("RULE"."ENABLED"=1 AND "FLAG3">0))
   7 - filter(("RULE"."ENABLED"=1 AND "FLAG4">0))
   8 - filter(("RULE"."ENABLED"=1 AND "FLAG5">0))

We can also measure the time it takes to execute these queries thousands of times. The following shows resulting times relative to the fastest execution (1):

Run 1, Statement 1 : 1.155
Run 1, Statement 2 : 1.88056

Run 2, Statement 1 : 1.04333
Run 2, Statement 2 : 1.95148

Run 3, Statement 1 : 1.02185
Run 3, Statement 2 : 1.86074

Run 4, Statement 1 : 1
Run 4, Statement 2 : 1.85241

Run 5, Statement 1 : 1.0263
Run 5, Statement 2 : 1.82944

The UNION ALL version is consistently about 2x slower on this very small data set. This is significant in the use case presented here, as a configuration table is probably read many times per day.

Knowing when a rule starts and when it ends

The real world use case that is behind this blog post also needed to know when a rule started and when it ended. I.e., which flag entry was the first and which was the last of the rule. This was easy in the non-normalised representation where each rule was a single row.

In the normalised version, we can use LEAD() and LAG().

Using this query:

SELECT 
  CASE WHEN lag(name, 1, 'NULL') 
            OVER (ORDER BY priority, value) != name 
       THEN 1 ELSE 0 END rule_begin,
  CASE WHEN lead(name, 1, 'NULL') 
            OVER (ORDER BY priority, value) != name 
       THEN 1 ELSE 0 END rule_end,
  name, flag, value
FROM rule
UNPIVOT (
  value FOR flag IN (
    flag1,  
    flag2,  
    flag3,  
    flag4,  
    flag5
  )
)
WHERE enabled = 1
AND value > 0
ORDER BY priority, value;

We’re now getting (with some visual emphasis):

RULE_BEGIN  RULE_END  NAME    FLAG    VALUE
-------------------------------------------
1           0         RULE 1  FLAG1   1
0           1         RULE 1  FLAG5   2

1           0         RULE 2  FLAG5   1
0           1         RULE 2  FLAG2   2

1           0         RULE 3  FLAG4   1
0           0         RULE 3  FLAG5   2
0           1         RULE 3  FLAG3   3

1           1         RULE 4  FLAG3   1

LEAD() looks ahead one row to see if the rule name there is different from the rule name on the current row.

LAG() looks behind one row to see if the rule name there is different from the rule name on the current row.

That’s it – very simple. The window functions part of this example is part of my 10 SQL Tricks talk, which I highly recommend you watch.

How to Avoid Excessive Sorts in Window Functions

Usually, this blog is 100% pro window functions and advocates using them at any occasion. But like any tool, window functions come at a price and we must carefully evaluate if that’s a price we’re willing to pay. That price can be a sort operation. And as we all know, sort operations are expensive. They follow O(n log n) complexity, which must be avoided at all costs for large data sets.

In a previous post, I’ve described how to calculate a running total with window functions (among other ways). In this post, we’re going to calculate the cumulative revenue at each payment in our Sakila database.

SELECT
  customer_id,
  payment_date,
  amount,
  SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY payment_date, payment_id
  ) cumulative_amount
FROM payment
ORDER BY customer_id, payment_date, payment_id;

The above will yield something like this:

customer_id |payment_date        |amount |cumulative_amount 
------------|--------------------|-------|------------------
1           |2005-05-25 11:30:37 |2.99   |2.99              
1           |2005-05-28 10:35:23 |0.99   |3.98              
1           |2005-06-15 00:54:12 |5.99   |9.97              
1           |2005-06-15 18:02:53 |0.99   |10.96             
1           |2005-06-15 21:08:46 |9.99   |20.95             
1           |2005-06-16 15:18:57 |4.99   |25.94             
...

As can be seen, in spread sheet notation, cumulative_amount[N] = cumulative_amount[N-1] + amount.

Reusing this calculation in several queries

As in any other language, we don’t want to repeat ourselves, so the SQL way of doing DRY is to create a view or a table valued function. Let’s create a view, first. Something like this:

CREATE VIEW payment_with_revenue AS
SELECT
  customer_id,
  payment_date,
  amount,
  SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY payment_date, payment_id
  ) cumulative_amount
FROM payment

Now, we can do nice things like this:

SELECT 
  customer_id,
  payment_date,
  amount,
  cumulative_amount
FROM payment_with_revenue
WHERE customer_id IN (1, 2, 3)
AND payment_date 
  BETWEEN DATE '2005-05-25'
  AND     DATE '2005-05-29'
ORDER BY customer_id, payment_date

yielding:

customer_id |payment_date        |amount |cumulative_amount 
------------|--------------------|-------|------------------
1           |2005-05-25 11:30:37 |2.99   |2.99              
1           |2005-05-28 10:35:23 |0.99   |3.98              
2           |2005-05-27 00:09:24 |4.99   |4.99              
3           |2005-05-27 17:17:09 |1.99   |1.99              

What about performance?

Now, if we have an index on (CUSTOMER_ID, PAYMENT_DATE), we’d expect to be able to use it, right? Because it seems that our predicate should be able to profit from it:

SELECT 
  count(*),
  count(*) FILTER (
    WHERE customer_id IN (1, 2, 3)
  ),
  count(*) FILTER (
    WHERE customer_id IN (1, 2, 3)
    AND payment_date < DATE '2005-05-29'
  ) 
FROM payment;

yielding:

count |count |count 
------|------|-----
16049 |85    |4     

(To learn more about the cool FILTER clause, read this article here)

How could we best use the index? Let’s look again at our original query, but this time, with an inlined view (“inlined”):

SELECT 
  customer_id,
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
) inlined
WHERE customer_id IN (1, 2, 3)
AND payment_date 
  BETWEEN DATE '2005-05-25'
  AND     DATE '2005-05-29'
ORDER BY customer_id, payment_date;

We should be able to apply two transformations that benefit using the index:

CUSTOMER_ID IN (1, 2, 3) predicate

The CUSTOMER_ID IN (1, 2, 3) predicate should be pushed down into the view, “past” the window function, because it does not affect the window function calculation, which partitions the data set by CUSTOMER_ID. By being pushed “past” the window function, I mean the fact that window functions are calculated late in the order of SELECT clauses.

This means that our original query should be equivalent to this one:

SELECT 
  customer_id,
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
  WHERE customer_id IN (1, 2, 3) -- Pushed down
) inlined
WHERE payment_date 
  BETWEEN DATE '2005-05-25'
  AND     DATE '2005-05-29'
ORDER BY customer_id, payment_date;

The PAYMENT_DATE predicate

The PAYMENT_DATE predicate is a bit more tricky. It cannot be pushed “past” the window function completely, because that would alter the semantics of the window function, which calculates the cumulative amount in the RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW range (which is the default, if we do not specify it).

But intuitively (and if you want to spend the time: formally as well), we can show that we can at least push the upper bound of our range predicate into the view, like this:

SELECT 
  customer_id,
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
  WHERE customer_id IN (1, 2, 3)
  AND payment_date <= DATE '2005-05-29' -- Pushed down
) inlined
WHERE payment_date >= DATE '2005-05-25'
ORDER BY customer_id, payment_date;

And now, we can profit from the index very easily! But is this transformation being done by any database? Unfortunately not. Some databases manage to push down the “more obvious” CUSTOMER_ID predicate past the window function, but none can do the same with the “less obvious” range predicate on PAYMENT_DATE:

DB2 LUW 10.5

The CUSTOMER_ID predicate is pushed down into the view, which generates an index scan (blue) on the pre-existing foreign key index (which doesn’t contain the PAYMENT_DATE column), but the PAYMENT_DATE itself is only filtered much later using an in-memory filter (red):

Explain Plan                                                       
-------------------------------------------------------------------
ID | Operation                       |                  Rows | Cost
 1 | RETURN                          |                       |   40
 2 |  FILTER                         |     4 of 80 (  5.00%) |   40
 3 |   TBSCAN                        |    80 of 80 (100.00%) |   40
 4 |    SORT                         |    80 of 80 (100.00%) |   40
 5 |     NLJOIN                      |               80 of 3 |   40
 6 |      TBSCAN GENROW              |      3 of 3 (100.00%) |    0
 7 |      FETCH PAYMENT              |    27 of 27 (100.00%) |   13
 8 |       IXSCAN IDX_FK_CUSTOMER_ID | 27 of 16049 (   .17%) |    6
                                                                   
Predicate Information                                              
 2 - RESID (Q5.PAYMENT_DATE <= '2005-05-29')                       
     RESID ('2005-05-25' <= Q5.PAYMENT_DATE)                       
 5 - JOIN (Q3.CUSTOMER_ID = Q2.$C0)                                
 8 - START (Q3.CUSTOMER_ID = Q2.$C0)                               
      STOP (Q3.CUSTOMER_ID = Q2.$C0)                               

Conversely, see the plan of the manually optimised query:

Explain Plan                                                  
--------------------------------------------------------------
ID | Operation                   |                 Rows | Cost
 1 | RETURN                      |                      |   40
 2 |  FILTER                     |     4 of 4 (100.00%) |   40
 3 |   TBSCAN                    |     4 of 4 (100.00%) |   40
 4 |    SORT                     |     4 of 4 (100.00%) |   40
 5 |     NLJOIN                  |               4 of 1 |   40
 6 |      TBSCAN GENROW          |     3 of 3 (100.00%) |    0
 7 |      FETCH PAYMENT          |     1 of 1 (100.00%) |   13
 8 |       IXSCAN IDX_PAYMENT_I1 | 1 of 16049 (   .01%) |    6
                                                              
Predicate Information                                         
 2 - RESID ('2005-05-25' <= Q5.PAYMENT_DATE)                  
 5 - JOIN (Q3.CUSTOMER_ID = Q2.$C0)                           
 8 - START (Q3.CUSTOMER_ID = Q2.$C0)                          
      STOP (Q3.CUSTOMER_ID = Q2.$C0)                          
      STOP (Q3.PAYMENT_DATE <= '2005-05-29')                  

This is certainly a better plan.

MySQL 8.0.2

MySQL, very regrettably, doesn’t seem to show any effort at all in optimising this. We’re accessing the entire payment table to get this result.

id   table        type  rows    filtered    Extra
-----------------------------------------------------------------------
1    <derived2>   ALL   16086    3.33       Using where
2    payment      ALL   16086  100.00       Using filesort

Here’s the manually optimised plan:

id   table        type  key             rows  filtered    Extra
-------------------------------------------------------------------------------
1    <derived2>   ALL                   4     3.33        Using where
2    payment      range idx_payment_i1  4      100.00     Using index condition

Oracle 12.2.0.1

Oracle also cannot do this beyond pushing the more obvious CUSTOMER_ID predicate into the view:

-------------------------------------------------------------------------------
| Id  | Operation                              | Name                 | Rows  |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                      |       |
|*  1 |  VIEW                                  | PAYMENT_WITH_REVENUE |    80 |
|   2 |   WINDOW SORT                          |                      |    80 |
|   3 |    INLIST ITERATOR                     |                      |       |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| PAYMENT              |    80 |
|*  5 |      INDEX RANGE SCAN                  | IDX_FK_CUSTOMER_ID   |    80 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(("PAYMENT_DATE">=TO_DATE('2005-05-25 00:00:00') AND 
              "PAYMENT_DATE"<=TO_DATE('2005-05-29 00:00:00')))
   5 - access(("CUSTOMER_ID"=1 OR "CUSTOMER_ID"=2 OR "CUSTOMER_ID"=3))

The manually optimised plan looks better:

-------------------------------------------------------------------------
| Id  | Operation                              | Name           | Rows  |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                |       |
|*  1 |  VIEW                                  |                |     1 |
|   2 |   WINDOW SORT                          |                |     1 |
|   3 |    INLIST ITERATOR                     |                |       |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| PAYMENT        |     1 |
|*  5 |      INDEX RANGE SCAN                  | IDX_PAYMENT_I1 |     1 |
-------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("PAYMENT_DATE">=TO_DATE('2005-05-25 00:00:00'))
   5 - access(("CUSTOMER_ID" IN (1, 2, 3)) AND 
              "PAYMENT_DATE"<=TO_DATE('2005-05-29 00:00:00'))

Much better cardinality estimates!

PostgreSQL 10

PostgreSQL’s version of the Sakila database uses a partitioned payment table, but that should be irrelevant for this analysis. The CUSTOMER_ID predicate could be pushed down…

QUERY PLAN                                                                                          
---------------------------------------------------------------------------------------------------
Subquery Scan on payment_with_revenue  (cost=117.06..124.45 rows=8 width=52)                       
  Filter: ((payment_date >= '2005-05-25') AND (payment_date <= '2005-05-29'))
-> WindowAgg  (cost=117.06..121.49 rows=197 width=56)                                               
   -> Sort  (cost=117.06..117.55 rows=197 width=24)                                              
      Sort Key: payment.customer_id, payment.payment_date, payment.payment_id                  
      -> Result  (cost=0.29..109.55 rows=197 width=24)                                        
         -> Append  (cost=0.29..107.58 rows=197 width=24)                                  
            -> Index Scan using idx_fk.. on payment  (cost=0.29..18.21 rows=77 width=20)
               Index Cond: (customer_id = ANY ('{1,2,3}'::integer[]))
               -> Bitmap Heap Scan on payment_p2007_01  (cost=4.62..14.90 rows=20 width=26)
                  Recheck Cond: (customer_id = ANY ('{1,2,3}'::integer[]))               
                  -> Bitmap Index Scan on idx_fk.. (cost=0.00..4.61 rows=20 width=0)
                     Index Cond: (customer_id = ANY ('{1,2,3}'::integer[]))           
                  -> Bitmap Heap Scan on payment_p2007_02  (cost=4.62..14.90 rows=20 width=26)
                     Recheck Cond: (customer_id = ANY ('{1,2,3}'::integer[]))               
                  -> Bitmap Index Scan on idx_fk.. (cost=0.00..4.61 rows=20 width=0)
                     Index Cond: (customer_id = ANY ('{1,2,3}'::integer[]))           
              ...

But manual optimisation is required to get better behaviour for the date range:

QUERY PLAN                                                                                           
-----------------------------------------------------------------------------------------------------
Subquery Scan on inlined  (cost=18.46..18.56 rows=3 width=48)                                        
  Filter: (inlined.payment_date >= '2005-05-25'::date)                    
-> WindowAgg  (cost=18.46..18.52 rows=3 width=52)                                                 
   -> Sort  (cost=18.46..18.46 rows=3 width=20)                                                
      Sort Key: payment.customer_id, payment.payment_date, payment.payment_id                
      -> Result  (cost=0.29..18.43 rows=3 width=20)                                         
         -> Append  (cost=0.29..18.40 rows=3 width=20)                                   
            -> Index Scan using idx_fk.. on payment  (cost=0.29..18.40 rows=3 width=20)
                Index Cond: (customer_id = ANY ('{1,2,3}'::integer[]))
                Filter: (payment_date <= '2005-05-29'::date)

Interestingly, the index still isn’t used optimally on both columns, which has nothing to do with the current discussion on window functions. PostgreSQL seems to be unable to think of the IN predicate as an equality predicate. See also this article about other optimisations (such as predicate merging) that are not possible (yet) in PostgreSQL.

But still, this is much better as it brings down the estimated cardinalities (in case this query is a subquery in a more sophisticated context), and more importantly, it filters out many many rows prior to calculating the window function.

SQL Server 2014

Another database that cannot push down this predicate past the window function optimally. Only the “obvious” part is pushed down:

|--Sort(ORDER BY:([payment_date] ASC))
   |--Filter(WHERE:([payment_date]>='2005-05-25' AND [payment_date]<='2005-05-29'))
      |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005] END))
         |--Stream Aggregate(GROUP BY:([WindowCount1009]) DEFINE:(..))
            |--Window Spool(RANGE BETWEEN:(UNBOUNDED, [[payment_date], [payment_id]]))
               |--Segment
                  |--Segment
                     |--Sort(ORDER BY:([customer_id] ASC, [payment_date] ASC, [payment_id] ASC))
                        |--Table Scan(OBJECT:([payment]), WHERE:([customer_id] IN (1, 2, 3)))

Interestingly, this doesn’t even use the index at all, but at least the data is filtered out prior to the calculation that relies on sorting. With the manual optimisation, again the same, much better effect:

|--Filter(WHERE:([payment_date]>='2005-05-25'))
   |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005] END))
      |--Stream Aggregate(GROUP BY:([WindowCount1011]) DEFINE:(..))
         |--Window Spool(RANGE BETWEEN:(UNBOUNDED, [[payment_date], [payment_id]]))
            |--Segment
               |--Segment
                  |--Sort(ORDER BY:([payment_date] ASC, [payment_id] ASC))
                     |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))
                        |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1007], [Expr1008], [Expr1006]))
                        |  |--Compute Scalar(DEFINE:(([Expr1007],[Expr1008],[Expr1006])=GetRangeWithMismatchedTypes(NULL,'2005-05-29',(42))))
                        |  |  |--Constant Scan
                        |  |--Index Seek(OBJECT:([idx_payment_i1]), SEEK:([customer_id] IN (1, 2, 3) AND [payment_date] > [Expr1007] AND [payment_date] < [Expr1008]))
                        |--RID Lookup(OBJECT:([payment]))

Certainly, this is a bit cryptic to read but it really means the same thing as always: The manual optimisation worked and we got a better plan.

Meh, does it matter?

I hope so! Let’s benchmark these things against each other! Some info about our benchmarking technique in our previous post and on this page here. Specifically, we don’t publish actual execution times, only relative times within the benchmark as we do not want to compare databases against each other but only against themselves.

DB2 LUW 10.5

RUN |STMT |RATIO  |
----|-----|-------|
1   |1    |3.0890 |
1   |2    |1.2272 |
2   |1    |3.0624 |
2   |2    |1.0100 |
3   |1    |3.0389 |
3   |2    |1.0000 |
4   |1    |3.1566 |
4   |2    |1.0948 |
5   |1    |3.1817 |
5   |2    |1.0905 |

The manually optimised statement is 3x faster in our benchmark. Do bear in mind that we’re operating on a rather small data set of a total of a few thousand rows! This gets worse in a larger data set.

MySQL 8.0.2

The difference is devastating in MySQL 8.0.2, which just recently introduced window functions. Surely, the MySQL team will be able to apply some further optimisations prior to GA – I’ve filed an issue for review:

0	1	431.1905
0	2	1.0000
1	1	372.4286
1	2	1.0000
2	1	413.4762
2	2	1.0000
3	1	381.2857
3	2	1.0000
4	1	400.1429
4	2	1.2857

Oracle 12.2.0.1

Another factor 4x can be observed in Oracle:

Run 1, Statement 1 : 4.58751
Run 1, Statement 2 : 1.37639
Run 2, Statement 1 : 4.71833
Run 2, Statement 2 : 1.03693
Run 3, Statement 1 : 4.05729
Run 3, Statement 2 : 1.04719
Run 4, Statement 1 : 3.86653
Run 4, Statement 2 : 1
Run 5, Statement 1 : 3.99603
Run 5, Statement 2 : 1.0212

PostgreSQL 10

PostgreSQL is quite bad too, here. A factor 7x can be observed:

RUN 1, Statement 1: 7.23373
RUN 1, Statement 2: 1.01438
RUN 2, Statement 1: 6.62028
RUN 2, Statement 2: 1.26183
RUN 3, Statement 1: 8.40322
RUN 3, Statement 2: 1.04074
RUN 4, Statement 1: 6.33401
RUN 4, Statement 2: 1.06750
RUN 5, Statement 1: 6.41649
RUN 5, Statement 2: 1.00000

SQL Server 2014

Another very significant penalty in SQL Server for the unoptimised version:

Run 1, Statement 1: 29.50000
Run 1, Statement 2: 1.07500
Run 2, Statement 1: 28.15000
Run 2, Statement 2: 1.00000
Run 3, Statement 1: 28.00000
Run 3, Statement 2: 1.00000
Run 4, Statement 1: 28.00000
Run 4, Statement 2: 1.00000
Run 5, Statement 1: 31.07500
Run 5, Statement 2: 1.00000

Bad news for views. Is there a better solution?

This is rather bad news for window functions inside of reusable views. None of the databases, not even DB2 or Oracle can push down range predicates past a derived table’s window function, if the column that is part of the range predicate doesn’t correspond to the window function’s PARTITION BY clause.

The problem described above can be easily fixed when the query is written manually, expanding all possible views into their calling SQL, but that kind of sucks – we’d love to make our code reusable. There’s one solution in databases that support inline table valued functions. Among the tested databases, these include:

  • DB2
  • PostgreSQL
  • SQL Server

MySQL doesn’t have table valued functions, and Oracle’s (very regrettably) are not inlineable because they have to be written in PL/SQL.

Here’s how to write these functions:

DB2

Function definition:

CREATE OR REPLACE FUNCTION f_payment_with_revenue (
  p_customer_id BIGINT,
  p_from_date DATE,
  p_to_date DATE
)
RETURNS TABLE (
  customer_id BIGINT,
  payment_date DATE,
  amount DECIMAL(10, 2),
  cumulative_amount DECIMAL(10, 2)
)
LANGUAGE SQL
RETURN
SELECT *
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
  WHERE customer_id = p_customer_id
  AND payment_date <= p_to_date
) t
WHERE payment_date >= p_from_date;

Function call:

SELECT 
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT customer_id FROM customer WHERE customer_id IN (1, 2, 3)
) c(customer_id),
TABLE(sakila.f_payment_with_revenue(
  c.customer_id,
  CAST('2005-05-25' AS DATE),
  CAST('2005-05-29' AS DATE)
))
ORDER BY payment_date;

Execution plan:

Explain Plan                                                    
----------------------------------------------------------------
ID | Operation                     |                 Rows | Cost
 1 | RETURN                        |                      |   33
 2 |  TBSCAN                       |     4 of 4 (100.00%) |   33
 3 |   SORT                        |     4 of 4 (100.00%) |   33
 4 |    NLJOIN                     |               4 of 1 |   33
 5 |     NLJOIN                    |               3 of 1 |   20
 6 |      TBSCAN GENROW            |     3 of 3 (100.00%) |    0
 7 |      IXSCAN PK_CUSTOMER       |   1 of 599 (   .17%) |    6
 8 |     FILTER                    |     1 of 1 (100.00%) |   13
 9 |      TBSCAN                   |     1 of 1 (100.00%) |   13
10 |       SORT                    |     1 of 1 (100.00%) |   13
11 |        FETCH PAYMENT          |     1 of 1 (100.00%) |   13
12 |         IXSCAN IDX_PAYMENT_I1 | 1 of 16049 (   .01%) |    6
                                                                
Predicate Information                                           
  5 - JOIN (Q3.CUSTOMER_ID = Q2.$C0)                            
  7 - START (Q3.CUSTOMER_ID = Q2.$C0)                           
       STOP (Q3.CUSTOMER_ID = Q2.$C0)                           
  8 - RESID ('2005-05-25' <= Q6.PAYMENT_DATE)                   
 12 - START (Q4.CUSTOMER_ID = Q3.CUSTOMER_ID)                   
       STOP (Q4.CUSTOMER_ID = Q3.CUSTOMER_ID)                   
       STOP (Q4.PAYMENT_DATE <= '2005-05-29')                   

Much better!

Benchmark result (Statement 1 = function call, Statement 2 = manually optimised):

RUN |STMT |RATIO  |
----|-----|-------|
1   |1    |1.5945 |
1   |2    |1.0080 |
2   |1    |1.6310 |
2   |2    |1.0768 |
3   |1    |1.5827 |
3   |2    |1.0090 |
4   |1    |1.5486 |
4   |2    |1.0084 |
5   |1    |1.5569 |
5   |2    |1.0000 |

Definitely a huge improvement. The comparison might not be entirely fair because

  • CROSS APPLY / LATERAL unnesting tends to generate nested loops that could be written more optimally with a classic join
  • We have an additional auxiliary customer table access (which could probably be tuned away with another rewrite)

PostgreSQL

Function definition:

CREATE OR REPLACE FUNCTION f_payment_with_revenue (
  p_customer_id BIGINT,
  p_from_date DATE,
  p_to_date DATE
)
RETURNS TABLE (
  customer_id SMALLINT,
  payment_date TIMESTAMP,
  amount DECIMAL(10, 2),
  cumulative_amount DECIMAL(10, 2)
)
AS $$
SELECT *
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
  WHERE customer_id = p_customer_id
  AND payment_date <= p_to_date
) t
WHERE payment_date >= p_from_date
$$ LANGUAGE SQL;

Function call:

SELECT 
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT customer_id FROM customer WHERE customer_id IN (1, 2, 3)
) c(customer_id)
CROSS JOIN LATERAL f_payment_with_revenue(
  c.customer_id,
  CAST('2005-05-25' AS DATE),
  CAST('2005-05-29' AS DATE)
)
ORDER BY payment_date;

Execution plan:

QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------
Sort  (cost=250.39..257.89 rows=3000 width=72)                                                
  Sort Key: f_payment_with_revenue.payment_date                                               
  ->  Nested Loop  (cost=0.53..77.13 rows=3000 width=72)                                      
        ->  Index Only Scan using customer_pkey on customer  (cost=0.28..16.88 rows=3 width=4)
              Index Cond: (customer_id = ANY ('{1,2,3}'::integer[]))                          
        ->  Function Scan on f_payment_with_revenue  (cost=0.25..10.25 rows=1000 width=72)    

Oops, no unnesting of the function is happening. The cardinality defaults to 1000. That’s bad news!

Benchmark result (Statement 1 = function call, Statement 2 = manually optimised):

RUN 1, Statement 1: 25.77538
RUN 1, Statement 2: 1.00000
RUN 2, Statement 1: 27.55197
RUN 2, Statement 2: 1.11581
RUN 3, Statement 1: 27.99331
RUN 3, Statement 2: 1.16463
RUN 4, Statement 1: 29.11022
RUN 4, Statement 2: 1.01159
RUN 5, Statement 1: 26.65781
RUN 5, Statement 2: 1.01654

Rats. This has gotten much worse than with the view. Not surprising, though. Table valued functions are not that good of an idea when they cannot be inlined! Oracle would have had a similar result if I wasn’t too lazy to translate my function to an ordinary PL/SQL table valued function, or a pipelined function.

SQL Server

Function definition:

CREATE FUNCTION f_payment_with_revenue (
  @customer_id BIGINT,
  @from_date DATE,
  @to_date DATE
)
RETURNS TABLE
AS RETURN
SELECT *
FROM (
  SELECT
    customer_id,
    payment_date,
    amount,
    SUM(amount) OVER (
      PARTITION BY customer_id
      ORDER BY payment_date, payment_id
    ) cumulative_amount
  FROM payment
  WHERE customer_id = @customer_id
  AND payment_date <= @to_date
) t
WHERE payment_date >= @from_date;

Function call:

SELECT 
  payment_date,
  amount,
  cumulative_amount
FROM (
  SELECT customer_id FROM customer WHERE customer_id IN (1, 2, 3)
) AS c(customer_id)
CROSS APPLY f_payment_with_revenue(
  c.customer_id,
  CAST('2005-05-25' AS DATE),
  CAST('2005-05-29' AS DATE)
)
ORDER BY payment_date;

Execution plan

|--Sort(ORDER BY:([payment_date] ASC))
   |--Nested Loops(Inner Join, OUTER REFERENCES:([customer_id]))
      |--Index Seek(OBJECT:([PK__customer__CD65CB84E826462D]), SEEK:([customer_id] IN (1, 2, 3))
      |--Filter(WHERE:([payment_date]>='2005-05-25'))
         |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1007]=(0) THEN NULL ELSE [Expr1008] END))
            |--Stream Aggregate(GROUP BY:([WindowCount1014]) DEFINE:(..)))
               |--Window Spool(RANGE BETWEEN:(UNBOUNDED, [[payment_date], [payment_id]]))
                  |--Segment
                     |--Segment
                        |--Sort(ORDER BY:([payment_date] ASC, [payment_id] ASC))
                           |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003]))
                              |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1009]))
                              |  |--Compute Scalar(DEFINE:(([Expr1010],[Expr1011],[Expr1009])=GetRangeWithMismatchedTypes(NULL,'2005-05-29',(42))))
                              |  |  |--Constant Scan
                              |  |--Index Seek(OBJECT:([idx_payment_i1]), SEEK:([customer_id]=CONVERT_IMPLICIT(bigint,[customer_id],0) AND [payment_date] > [Expr1010] AND [payment_date] < [Expr1011]))
                              |--RID Lookup(OBJECT:([payment]), SEEK:([Bmk1003]=[Bmk1003]))

Again, super unreadable IMO, but after looking a bit more closely, we can see that the plan is almost the same as the manually optimised one, and the predicate is applied early on, where it belongs.

Benchmark result (Statement 1 = function call, Statement 2 = manually optimised):

Run 1, Statement 1: 2.50000
Run 1, Statement 2: 1.27778
Run 2, Statement 1: 2.11111
Run 2, Statement 2: 1.27778
Run 3, Statement 1: 2.11111
Run 3, Statement 2: 1.00000
Run 4, Statement 1: 2.22222
Run 4, Statement 2: 1.11111
Run 5, Statement 1: 2.02778
Run 5, Statement 2: 1.19444

Conclusion

Window functions are super cool and powerful. But they come at a price. They sort your data. Normally, when we write complex queries and reuse parts in views, we can profit from predicate push down operations into derived tables and views, which is something that most databases support (see also our previous blog post about such optimisations).

But when it comes to using window functions, they act like a “fence”, past which only few predicates can be pushed automatically. It’s not that it wouldn’t be possible, it simply isn’t done very well by most databases (and in the case of MySQL, not at all as of 8.0.2).

Inline table valued functions can be a remedy to avoid manual building of complex queries, such that at least some parts of your logic can be reused among queries. Unfortunately, they rely on CROSS APPLY or LATERAL JOIN, which can also cause performance issues in more complex setups. Besides, among the databases covered in this article, only DB2 and SQL Server support inline table valued functions. Oracle doesn’t support SQL functions at all, and PostgreSQL’s SQL functions are not inlinable (yet), which means that in these databases, in order to tune such queries, you might not be able to reuse the parts that use window functions in views or stored functions.

However, as always, do measure. Perhaps, a 4x waste of performance for a particular query is OK.