Exciting discovery when playing around with PostgreSQL 11! New SQL standard window function clauses have been supported. If you want to play with this, you can do so very easily using docker:
docker pull postgres:11 docker run --name POSTGRES11 -e POSTGRES_PASSWORD=postgres -d postgres:11 docker run -it --rm --link POSTGRES11:postgres postgres psql -h postgres -U postgres
See also: https://hub.docker.com/r/_/postgres
The frame clause
When working with window functions, in some cases you want to add the optional frame clause. For example, to get a sliding average over your data, you will write:
SELECT
payment_date,
amount,
avg(amount) OVER (
ORDER BY payment_date, payment_id
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)::DECIMAL(10, 2),
array_agg(amount) OVER (
ORDER BY payment_date, payment_id
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM payment;
As always I will be running queries against the Sakila database. The above query yields:
payment_date |amount |avg |array_agg |
--------------------|-------|-----|----------------------------|
2005-05-24 22:53:30 |2.99 |3.32 | {2.99,2.99,3.99} |
2005-05-24 22:54:33 |2.99 |3.74 | {2.99,2.99,3.99,4.99} |
2005-05-24 23:03:39 |3.99 |4.39 |{2.99,2.99,3.99,4.99,6.99} |
2005-05-24 23:04:41 |4.99 |3.99 |{2.99,3.99,4.99,6.99,0.99} |
2005-05-24 23:05:21 |6.99 |3.79 |{3.99,4.99,6.99,0.99,1.99} |
2005-05-24 23:08:07 |0.99 |3.99 |{4.99,6.99,0.99,1.99,4.99} |
2005-05-24 23:11:53 |1.99 |3.99 |{6.99,0.99,1.99,4.99,4.99} |
2005-05-24 23:31:46 |4.99 |3.79 |{0.99,1.99,4.99,4.99,5.99} |
The array_agg function helps display how the sliding average came to be. For each average value, we’re looking 2 rows ahead and 2 rows behind in the ordered window.
In the above query, I’m using the optional frame clause to specify the frame size. It has three “modes” or “units”:
<window frame units> ::= ROWS | RANGE | GROUPS
Almost all databases that support window functions support the first two unit types. To my knowledge, only PostgreSQL 11 now also supports GROUPS The difference is rather simple to explain:
ROWScounts the exact number of rows in the frame.RANGEperforms logical windowing where we don’t count the number of rows, but look for a value offset.GROUPScounts all groups of tied rows within the window.
I think this is best explained by example. Let’s look at payments with payment timestamps truncated to the hour:
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour,
amount
FROM payment
)
SELECT *
FROM hourly_payment
ORDER BY hour;
This gives us:
payment_id |hour |amount | -----------|--------------------|-------| 12377 |2005-05-24 22:00:00 |2.99 | \ Tied group 3504 |2005-05-24 22:00:00 |2.99 | / 6440 |2005-05-24 23:00:00 |4.99 | \ 11032 |2005-05-24 23:00:00 |3.99 | | 8987 |2005-05-24 23:00:00 |4.99 | | Tied group 6003 |2005-05-24 23:00:00 |6.99 | | 14728 |2005-05-24 23:00:00 |0.99 | | 7274 |2005-05-24 23:00:00 |1.99 | / 12025 |2005-05-25 00:00:00 |0.99 | \ 3831 |2005-05-25 00:00:00 |8.99 | | 7044 |2005-05-25 00:00:00 |4.99 | | 8623 |2005-05-25 00:00:00 |9.99 | | Tied group 3386 |2005-05-25 00:00:00 |4.99 | | 8554 |2005-05-25 00:00:00 |4.99 | | 10785 |2005-05-25 00:00:00 |5.99 | | 9014 |2005-05-25 00:00:00 |6.99 | / 15394 |2005-05-25 01:00:00 |2.99 | \ 10499 |2005-05-25 01:00:00 |4.99 | | 5020 |2005-05-25 01:00:00 |2.99 | | Tied group 490 |2005-05-25 01:00:00 |0.99 | | 12305 |2005-05-25 01:00:00 |4.99 | / 11796 |2005-05-25 02:00:00 |4.99 | \ 9463 |2005-05-25 02:00:00 |4.99 | | Tied group 13711 |2005-05-25 02:00:00 |4.99 | /
Now we can see that for each hour, we have several payments. When we order payments by hour, there are some “tied” payments within that hour (or “group”), i.e. the order among payments on 2005-05-24 22:00:00 are not ordered deterministically among themselves. The payment ids are pretty random.
Now, if we look at the three window frame units again, how do they behave?
ROWS
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
array_agg(payment_id) OVER (
ORDER BY hour
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
We can see that the size of the window is always precisely 5 rows (except at the beginning and end of the data set):
payment_id |hour |array_agg |
-----------|--------------------|-------------------------------|
12377 |2005-05-24 22:00:00 |{12377,3504,6440} |
3504 |2005-05-24 22:00:00 |{12377,3504,6440,11032} |
6440 |2005-05-24 23:00:00 |{12377,3504,6440,11032,8987} |
11032 |2005-05-24 23:00:00 |{3504,6440,11032,8987,6003} |
8987 |2005-05-24 23:00:00 |{6440,11032,8987,6003,14728} |
6003 |2005-05-24 23:00:00 |{11032,8987,6003,14728,7274} |
14728 |2005-05-24 23:00:00 |{8987,6003,14728,7274,12025} |
7274 |2005-05-24 23:00:00 |{6003,14728,7274,12025,3831} |
12025 |2005-05-25 00:00:00 |{14728,7274,12025,3831,7044} |
3831 |2005-05-25 00:00:00 |{7274,12025,3831,7044,8623} |
7044 |2005-05-25 00:00:00 |{12025,3831,7044,8623,3386} |
8623 |2005-05-25 00:00:00 |{3831,7044,8623,3386,8554} |
3386 |2005-05-25 00:00:00 |{7044,8623,3386,8554,10785} |
8554 |2005-05-25 00:00:00 |{8623,3386,8554,10785,9014} |
10785 |2005-05-25 00:00:00 |{3386,8554,10785,9014,15394} |
9014 |2005-05-25 00:00:00 |{8554,10785,9014,15394,10499} |
15394 |2005-05-25 01:00:00 |{10785,9014,15394,10499,5020} |
10499 |2005-05-25 01:00:00 |{9014,15394,10499,5020,490} |
5020 |2005-05-25 01:00:00 |{15394,10499,5020,490,12305} |
490 |2005-05-25 01:00:00 |{10499,5020,490,12305,11796} |
12305 |2005-05-25 01:00:00 |{5020,490,12305,11796,9463} |
11796 |2005-05-25 02:00:00 |{490,12305,11796,9463,13711} |
9463 |2005-05-25 02:00:00 |{12305,11796,9463,13711,8167} |
13711 |2005-05-25 02:00:00 |{11796,9463,13711,8167,1011} |
There is no notion of a “group” among the rows in the window. But the problem is that we’re getting random PAYMENT_ID values unless we also add the PAYMENT_ID to the ORDER BY clause. This isn’t really what we want, most of the time, so we use:
RANGE
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
EXTRACT(epoch FROM hour) / 3600,
array_agg(payment_id) OVER (
ORDER BY EXTRACT(epoch FROM hour) / 3600
RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
I have switched from ROWS to RANGE and now the ORDER BY clause works on a number based on the epoch of the hour. What happens now?
This now yields:
payment_id |hour |?column? |array_agg
-----------|--------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
12377 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014}
3504 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014}
6440 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
11032 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
8987 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
6003 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
14728 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
7274 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
12025 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
3831 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
7044 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
8623 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
3386 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
8554 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
10785 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
9014 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
15394 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
10499 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
5020 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
490 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
12305 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
11796 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
9463 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
13711 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
I’ve visually separated the rows by their hour and the array aggregation by the “tied” payment_ids, i.e. the payment IDs that have the same hour.
Observations:
- We get the same aggregation value for the entire set of tied rows, so if in two rows,
HOURis the same, thenARRAY_AGGis the same as well - The window size is now a logical size, no longer an offset size, so we’re going back 2 hours and ahead 2 hours (instead of 2 rows). This is why I’ve extracted epoch and divided it by hour, so I will get consecutive integer values for consecutive hours
The same result could have been achieved using interval types:
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
EXTRACT(epoch FROM hour) / 3600,
array_agg(payment_id) OVER (
ORDER BY hour
RANGE BETWEEN INTERVAL '2 hours' PRECEDING
AND INTERVAL '2 hours' FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
See also this article for details:
https://blog.jooq.org/2016/10/31/a-little-known-sql-feature-use-logical-windowing-to-aggregate-sliding-ranges/
GROUPS
The third frame unit is quite useful, as we can now frame the window to a number of groups of same values. In our case, all payments of the same hour are in the same group. So, in order to get a similar result again, we can write:
WITH hourly_payment AS (
SELECT
payment_id,
payment_date,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
array_agg(payment_id) OVER (
ORDER BY hour
GROUPS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
In fact, this is not exactly the same result, because if we have gaps in the hours, GROUPS will simply jump over the gaps, whereas RANGE will not.
Summary of ROWS, RANGE, GROUPS
The above case was a real world use-case. A more constructed example that might be easier to digest, visually, can be seen here:
WITH t(id, v) AS ( VALUES (1, 1), (2, 1), (3, 3), (4, 5), (5, 5), (6, 5), (7, 6) ) SELECT id, v, array_agg(id) OVER rows, array_agg(v) OVER rows, array_agg(id) OVER range, array_agg(v) OVER range, array_agg(id) OVER groups, array_agg(v) OVER groups FROM t WINDOW o AS (ORDER BY v), rows AS (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING), range AS (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING), groups AS (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
Notice, I’m using the SQL standard WINDOW clause to be able to name and reuse a repeated window specification. I’ve seen this clause to be supported in:
- MySQL 8.0
- PostgreSQL
- Sybase SQL Anywhere
The query yields:
id |v |array_agg |array_agg |array_agg |array_agg |array_agg |array_agg |
---|--|----------|----------|----------|----------|--------------|--------------|
1 |1 |{1,2} |{1,1} |{1,2} |{1,1} |{1,2,3} |{1,1,3} |
2 |1 |{1,2,3} |{1,1,3} |{1,2} |{1,1} |{1,2,3} |{1,1,3} |
3 |3 |{2,3,4} |{1,3,5} |{3} |{3} |{1,2,3,4,5,6} |{1,1,3,5,5,5} |
4 |5 |{3,4,5} |{3,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
5 |5 |{4,5,6} |{5,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
6 |5 |{5,6,7} |{5,5,6} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
7 |6 |{6,7} |{5,6} |{4,5,6,7} |{5,5,5,6} |{4,5,6,7} |{5,5,5,6} |
Observation:
- The
ROWSframed window is of size 3 max in this case (1 row preceding, the current row, and 1 row following) - The
RANGEframed window is a logical window that looks behind a value of 1 and ahead a value of 1 - The
GROUPSframed window is of size 3 groups max in this case (1 group preceding, the current group, and 1 group following)
Neat, huh?
jOOQ 3.12 will add support for this feature: https://github.com/jOOQ/jOOQ/issues/7646
EXCLUDE clause
This is probably a bit less frequently useful than the new GROUPS clause. There is now a new window frame exclusion clause:
<window frame exclusion> ::= EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS
It can be used to exclude some rows around the current row from being in the window. I have yet to think of a use case for this. Here’s how it works for:
ROWS
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|------|--------|----------|
1 |{1} |NULL |{1} |{1,1} |
1 |{1,3} |{3} |{1,3} |{1,1,3} |
3 |{1,5} |{1,5} |{1,3,5} |{1,3,5} |
5 |{3,5} |{3} |{3,5} |{3,5,5} |
5 |{5,5} |NULL |{5} |{5,5,5} |
5 |{5,6} |{6} |{5,6} |{5,5,6} |
6 |{5} |{5} |{5,6} |{5,6} |
As you can see, the window may now be completely empty, which results in NULL being emitted.
- Excluding the current row seems obvious
- Excluding the current group also
- Excluding ties excludes all other rows from the group
- Excluding no others is the default, just like when you don’t put this
EXCLUDEclause
RANGE
The exclusion can be applied to logical windowing as well:
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|--------|----------|----------|
1 |{1} |NULL |{1} |{1,1} |
1 |{1} |NULL |{1} |{1,1} |
3 |NULL |NULL |{3} |{3} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
GROUPS
Same for grouped windows:
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|------------|--------------|--------------|
1 |{1,3} |{3} |{1,3} |{1,1,3} |
1 |{1,3} |{3} |{1,3} |{1,1,3} |
3 |{1,1,5,5,5} |{1,1,5,5,5} |{1,1,3,5,5,5} |{1,1,3,5,5,5} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
Needless to say that this clause will be supported in jOOQ 3.12 as well: https://github.com/jOOQ/jOOQ/issues/7647
Bonus points for the reader who can think of a real world use-case for this clause, please leave a comment!