Exciting discovery when playing around with PostgreSQL 11! New SQL standard window function clauses have been supported. If you want to play with this, you can do so very easily using docker:
docker pull postgres:11 docker run --name POSTGRES11 -e POSTGRES_PASSWORD=postgres -d postgres:11 docker run -it --rm --link POSTGRES11:postgres postgres psql -h postgres -U postgres
See also: https://hub.docker.com/r/_/postgres
The frame clause
When working with window functions, in some cases you want to add the optional frame clause. For example, to get a sliding average over your data, you will write:
SELECT payment_date, amount, avg(amount) OVER ( ORDER BY payment_date, payment_id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING )::DECIMAL(10, 2), array_agg(amount) OVER ( ORDER BY payment_date, payment_id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ) FROM payment;
As always I will be running queries against the Sakila database. The above query yields:
payment_date |amount |avg |array_agg | --------------------|-------|-----|----------------------------| 2005-05-24 22:53:30 |2.99 |3.32 | {2.99,2.99,3.99} | 2005-05-24 22:54:33 |2.99 |3.74 | {2.99,2.99,3.99,4.99} | 2005-05-24 23:03:39 |3.99 |4.39 |{2.99,2.99,3.99,4.99,6.99} | 2005-05-24 23:04:41 |4.99 |3.99 |{2.99,3.99,4.99,6.99,0.99} | 2005-05-24 23:05:21 |6.99 |3.79 |{3.99,4.99,6.99,0.99,1.99} | 2005-05-24 23:08:07 |0.99 |3.99 |{4.99,6.99,0.99,1.99,4.99} | 2005-05-24 23:11:53 |1.99 |3.99 |{6.99,0.99,1.99,4.99,4.99} | 2005-05-24 23:31:46 |4.99 |3.79 |{0.99,1.99,4.99,4.99,5.99} |
The array_agg
function helps display how the sliding average came to be. For each average value, we’re looking 2 rows ahead and 2 rows behind in the ordered window.
In the above query, I’m using the optional frame clause to specify the frame size. It has three “modes” or “units”:
<window frame units> ::= ROWS | RANGE | GROUPS
Almost all databases that support window functions support the first two unit types. To my knowledge, only PostgreSQL 11 and H2 1.4.198 now also supports GROUPS
. The difference is rather simple to explain:
ROWS
counts the exact number of rows in the frame.RANGE
performs logical windowing where we don’t count the number of rows, but look for a value offset.GROUPS
counts all groups of tied rows within the window.
I think this is best explained by example. Let’s look at payments with payment timestamps truncated to the hour:
WITH hourly_payment AS ( SELECT payment_id, date_trunc('h', payment_date) AS hour, amount FROM payment ) SELECT * FROM hourly_payment ORDER BY hour;
This gives us:
payment_id |hour |amount | -----------|--------------------|-------| 12377 |2005-05-24 22:00:00 |2.99 | \ Tied group 3504 |2005-05-24 22:00:00 |2.99 | / 6440 |2005-05-24 23:00:00 |4.99 | \ 11032 |2005-05-24 23:00:00 |3.99 | | 8987 |2005-05-24 23:00:00 |4.99 | | Tied group 6003 |2005-05-24 23:00:00 |6.99 | | 14728 |2005-05-24 23:00:00 |0.99 | | 7274 |2005-05-24 23:00:00 |1.99 | / 12025 |2005-05-25 00:00:00 |0.99 | \ 3831 |2005-05-25 00:00:00 |8.99 | | 7044 |2005-05-25 00:00:00 |4.99 | | 8623 |2005-05-25 00:00:00 |9.99 | | Tied group 3386 |2005-05-25 00:00:00 |4.99 | | 8554 |2005-05-25 00:00:00 |4.99 | | 10785 |2005-05-25 00:00:00 |5.99 | | 9014 |2005-05-25 00:00:00 |6.99 | / 15394 |2005-05-25 01:00:00 |2.99 | \ 10499 |2005-05-25 01:00:00 |4.99 | | 5020 |2005-05-25 01:00:00 |2.99 | | Tied group 490 |2005-05-25 01:00:00 |0.99 | | 12305 |2005-05-25 01:00:00 |4.99 | / 11796 |2005-05-25 02:00:00 |4.99 | \ 9463 |2005-05-25 02:00:00 |4.99 | | Tied group 13711 |2005-05-25 02:00:00 |4.99 | /
Now we can see that for each hour, we have several payments. When we order payments by hour, there are some “tied” payments within that hour (or “group”), i.e. the order among payments on 2005-05-24 22:00:00 are not ordered deterministically among themselves. The payment ids are pretty random.
Now, if we look at the three window frame units again, how do they behave?
ROWS
WITH hourly_payment AS ( SELECT payment_id, date_trunc('h', payment_date) AS hour FROM payment ) SELECT payment_id, hour, array_agg(payment_id) OVER ( ORDER BY hour ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ) FROM hourly_payment ORDER BY hour;
We can see that the size of the window is always precisely 5 rows (except at the beginning and end of the data set):
payment_id |hour |array_agg | -----------|--------------------|-------------------------------| 12377 |2005-05-24 22:00:00 |{12377,3504,6440} | 3504 |2005-05-24 22:00:00 |{12377,3504,6440,11032} | 6440 |2005-05-24 23:00:00 |{12377,3504,6440,11032,8987} | 11032 |2005-05-24 23:00:00 |{3504,6440,11032,8987,6003} | 8987 |2005-05-24 23:00:00 |{6440,11032,8987,6003,14728} | 6003 |2005-05-24 23:00:00 |{11032,8987,6003,14728,7274} | 14728 |2005-05-24 23:00:00 |{8987,6003,14728,7274,12025} | 7274 |2005-05-24 23:00:00 |{6003,14728,7274,12025,3831} | 12025 |2005-05-25 00:00:00 |{14728,7274,12025,3831,7044} | 3831 |2005-05-25 00:00:00 |{7274,12025,3831,7044,8623} | 7044 |2005-05-25 00:00:00 |{12025,3831,7044,8623,3386} | 8623 |2005-05-25 00:00:00 |{3831,7044,8623,3386,8554} | 3386 |2005-05-25 00:00:00 |{7044,8623,3386,8554,10785} | 8554 |2005-05-25 00:00:00 |{8623,3386,8554,10785,9014} | 10785 |2005-05-25 00:00:00 |{3386,8554,10785,9014,15394} | 9014 |2005-05-25 00:00:00 |{8554,10785,9014,15394,10499} | 15394 |2005-05-25 01:00:00 |{10785,9014,15394,10499,5020} | 10499 |2005-05-25 01:00:00 |{9014,15394,10499,5020,490} | 5020 |2005-05-25 01:00:00 |{15394,10499,5020,490,12305} | 490 |2005-05-25 01:00:00 |{10499,5020,490,12305,11796} | 12305 |2005-05-25 01:00:00 |{5020,490,12305,11796,9463} | 11796 |2005-05-25 02:00:00 |{490,12305,11796,9463,13711} | 9463 |2005-05-25 02:00:00 |{12305,11796,9463,13711,8167} | 13711 |2005-05-25 02:00:00 |{11796,9463,13711,8167,1011} |
There is no notion of a “group” among the rows in the window. But the problem is that we’re getting random PAYMENT_ID
values unless we also add the PAYMENT_ID
to the ORDER BY
clause. This isn’t really what we want, most of the time, so we use:
RANGE
WITH hourly_payment AS ( SELECT payment_id, date_trunc('h', payment_date) AS hour FROM payment ) SELECT payment_id, hour, EXTRACT(epoch FROM hour) / 3600, array_agg(payment_id) OVER ( ORDER BY EXTRACT(epoch FROM hour) / 3600 RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING ) FROM hourly_payment ORDER BY hour;
I have switched from ROWS
to RANGE
and now the ORDER BY
clause works on a number based on the epoch of the hour. What happens now?
This now yields:
payment_id |hour |?column? |array_agg -----------|--------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------- 12377 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014} 3504 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014} 6440 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 11032 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 8987 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 6003 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 14728 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 7274 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305} 12025 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 3831 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 7044 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 8623 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 3386 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 8554 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 10785 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 9014 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711} 15394 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245} 10499 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245} 5020 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245} 490 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245} 12305 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245} 11796 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186} 9463 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186} 13711 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
I’ve visually separated the rows by their hour and the array aggregation by the “tied” payment_ids, i.e. the payment IDs that have the same hour.
Observations:
- We get the same aggregation value for the entire set of tied rows, so if in two rows,
HOUR
is the same, thenARRAY_AGG
is the same as well - The window size is now a logical size, no longer an offset size, so we’re going back 2 hours and ahead 2 hours (instead of 2 rows). This is why I’ve extracted epoch and divided it by hour, so I will get consecutive integer values for consecutive hours
The same result could have been achieved using interval types:
WITH hourly_payment AS ( SELECT payment_id, date_trunc('h', payment_date) AS hour FROM payment ) SELECT payment_id, hour, EXTRACT(epoch FROM hour) / 3600, array_agg(payment_id) OVER ( ORDER BY hour RANGE BETWEEN INTERVAL '2 hours' PRECEDING AND INTERVAL '2 hours' FOLLOWING ) FROM hourly_payment ORDER BY hour;
See also this article for details:
https://blog.jooq.org/2016/10/31/a-little-known-sql-feature-use-logical-windowing-to-aggregate-sliding-ranges/
GROUPS
The third frame unit is quite useful, as we can now frame the window to a number of groups of same values. In our case, all payments of the same hour are in the same group. So, in order to get a similar result again, we can write:
WITH hourly_payment AS ( SELECT payment_id, payment_date, date_trunc('h', payment_date) AS hour FROM payment ) SELECT payment_id, hour, array_agg(payment_id) OVER ( ORDER BY hour GROUPS BETWEEN 2 PRECEDING AND 2 FOLLOWING ) FROM hourly_payment ORDER BY hour;
In fact, this is not exactly the same result, because if we have gaps in the hours, GROUPS
will simply jump over the gaps, whereas RANGE
will not.
Summary of ROWS, RANGE, GROUPS
The above case was a real world use-case. A more constructed example that might be easier to digest, visually, can be seen here:
WITH t(id, v) AS ( VALUES (1, 1), (2, 1), (3, 3), (4, 5), (5, 5), (6, 5), (7, 6) ) SELECT id, v, array_agg(id) OVER rows, array_agg(v) OVER rows, array_agg(id) OVER range, array_agg(v) OVER range, array_agg(id) OVER groups, array_agg(v) OVER groups FROM t WINDOW o AS (ORDER BY v), rows AS (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING), range AS (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING), groups AS (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
Notice, I’m using the SQL standard WINDOW
clause to be able to name and reuse a repeated window specification. I’ve seen this clause to be supported in:
- MySQL 8.0
- PostgreSQL
- Sybase SQL Anywhere
The query yields:
id |v |array_agg |array_agg |array_agg |array_agg |array_agg |array_agg | ---|--|----------|----------|----------|----------|--------------|--------------| 1 |1 |{1,2} |{1,1} |{1,2} |{1,1} |{1,2,3} |{1,1,3} | 2 |1 |{1,2,3} |{1,1,3} |{1,2} |{1,1} |{1,2,3} |{1,1,3} | 3 |3 |{2,3,4} |{1,3,5} |{3} |{3} |{1,2,3,4,5,6} |{1,1,3,5,5,5} | 4 |5 |{3,4,5} |{3,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} | 5 |5 |{4,5,6} |{5,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} | 6 |5 |{5,6,7} |{5,5,6} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} | 7 |6 |{6,7} |{5,6} |{4,5,6,7} |{5,5,5,6} |{4,5,6,7} |{5,5,5,6} |
Observation:
- The
ROWS
framed window is of size 3 max in this case (1 row preceding, the current row, and 1 row following) - The
RANGE
framed window is a logical window that looks behind a value of 1 and ahead a value of 1 - The
GROUPS
framed window is of size 3 groups max in this case (1 group preceding, the current group, and 1 group following)
Neat, huh?
jOOQ 3.12 will add support for this feature: https://github.com/jOOQ/jOOQ/issues/7646
EXCLUDE clause
This is probably a bit less frequently useful than the new GROUPS
clause. There is now a new window frame exclusion clause:
<window frame exclusion> ::= EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS
It can be used to exclude some rows around the current row from being in the window. I have yet to think of a use case for this. Here’s how it works for:
ROWS
WITH t(v) AS ( VALUES (1), (1), (3), (5), (5), (5), (6) ) SELECT v, array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW) AS current_row, array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS group, array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) AS ties, array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE NO OTHERS) AS no_others FROM t WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others | --|------------|------|--------|----------| 1 |{1} |NULL |{1} |{1,1} | 1 |{1,3} |{3} |{1,3} |{1,1,3} | 3 |{1,5} |{1,5} |{1,3,5} |{1,3,5} | 5 |{3,5} |{3} |{3,5} |{3,5,5} | 5 |{5,5} |NULL |{5} |{5,5,5} | 5 |{5,6} |{6} |{5,6} |{5,5,6} | 6 |{5} |{5} |{5,6} |{5,6} |
As you can see, the window may now be completely empty, which results in NULL
being emitted.
- Excluding the current row seems obvious
- Excluding the current group also
- Excluding ties excludes all other rows from the group
- Excluding no others is the default, just like when you don’t put this
EXCLUDE
clause
RANGE
The exclusion can be applied to logical windowing as well:
WITH t(v) AS ( VALUES (1), (1), (3), (5), (5), (5), (6) ) SELECT v, array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW) AS current_row, array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS group, array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) AS ties, array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE NO OTHERS) AS no_others FROM t WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others | --|------------|--------|----------|----------| 1 |{1} |NULL |{1} |{1,1} | 1 |{1} |NULL |{1} |{1,1} | 3 |NULL |NULL |{3} |{3} | 5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} | 5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} | 5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} | 6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
GROUPS
Same for grouped windows:
WITH t(v) AS ( VALUES (1), (1), (3), (5), (5), (5), (6) ) SELECT v, array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW) AS current_row, array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS group, array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) AS ties, array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE NO OTHERS) AS no_others FROM t WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others | --|------------|------------|--------------|--------------| 1 |{1,3} |{3} |{1,3} |{1,1,3} | 1 |{1,3} |{3} |{1,3} |{1,1,3} | 3 |{1,1,5,5,5} |{1,1,5,5,5} |{1,1,3,5,5,5} |{1,1,3,5,5,5} | 5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} | 5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} | 5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} | 6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
Needless to say that this clause will be supported in jOOQ 3.12 as well: https://github.com/jOOQ/jOOQ/issues/7647
Bonus points for the reader who can think of a real world use-case for this clause, please leave a comment!