Exciting discovery when playing around with PostgreSQL 11! New SQL standard window function clauses have been supported. If you want to play with this, you can do so very easily using docker:
docker pull postgres:11
docker run --name POSTGRES11 -e POSTGRES_PASSWORD=postgres -d postgres:11
docker run -it --rm --link POSTGRES11:postgres postgres psql -h postgres -U postgres
See also:
https://hub.docker.com/r/_/postgres
The frame clause
When working with window functions, in some cases you want to add the optional frame clause. For example, to get a sliding average over your data, you will write:
SELECT
payment_date,
amount,
avg(amount) OVER (
ORDER BY payment_date, payment_id
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)::DECIMAL(10, 2),
array_agg(amount) OVER (
ORDER BY payment_date, payment_id
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM payment;
As always I will be running queries against the
Sakila database. The above query yields:
payment_date |amount |avg |array_agg |
--------------------|-------|-----|----------------------------|
2005-05-24 22:53:30 |2.99 |3.32 | {2.99,2.99,3.99} |
2005-05-24 22:54:33 |2.99 |3.74 | {2.99,2.99,3.99,4.99} |
2005-05-24 23:03:39 |3.99 |4.39 |{2.99,2.99,3.99,4.99,6.99} |
2005-05-24 23:04:41 |4.99 |3.99 |{2.99,3.99,4.99,6.99,0.99} |
2005-05-24 23:05:21 |6.99 |3.79 |{3.99,4.99,6.99,0.99,1.99} |
2005-05-24 23:08:07 |0.99 |3.99 |{4.99,6.99,0.99,1.99,4.99} |
2005-05-24 23:11:53 |1.99 |3.99 |{6.99,0.99,1.99,4.99,4.99} |
2005-05-24 23:31:46 |4.99 |3.79 |{0.99,1.99,4.99,4.99,5.99} |
The
array_agg
function helps display how the sliding average came to be. For each average value, we’re looking 2 rows ahead and 2 rows behind in the ordered window.
In the above query, I’m using the optional
frame clause to specify the frame size. It has three “modes” or “units”:
<window frame units> ::=
ROWS
| RANGE
| GROUPS
Almost all databases that support window functions support the first two unit types. To my knowledge, only PostgreSQL 11 and H2 1.4.198 now also supports
GROUPS
. The difference is rather simple to explain:
ROWS
counts the exact number of rows in the frame.
RANGE
performs logical windowing where we don’t count the number of rows, but look for a value offset.
GROUPS
counts all groups of tied rows within the window.
I think this is best explained by example. Let’s look at payments with payment timestamps truncated to the hour:
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour,
amount
FROM payment
)
SELECT *
FROM hourly_payment
ORDER BY hour;
This gives us:
payment_id |hour |amount |
-----------|--------------------|-------|
12377 |2005-05-24 22:00:00 |2.99 | \ Tied group
3504 |2005-05-24 22:00:00 |2.99 | /
6440 |2005-05-24 23:00:00 |4.99 | \
11032 |2005-05-24 23:00:00 |3.99 | |
8987 |2005-05-24 23:00:00 |4.99 | | Tied group
6003 |2005-05-24 23:00:00 |6.99 | |
14728 |2005-05-24 23:00:00 |0.99 | |
7274 |2005-05-24 23:00:00 |1.99 | /
12025 |2005-05-25 00:00:00 |0.99 | \
3831 |2005-05-25 00:00:00 |8.99 | |
7044 |2005-05-25 00:00:00 |4.99 | |
8623 |2005-05-25 00:00:00 |9.99 | | Tied group
3386 |2005-05-25 00:00:00 |4.99 | |
8554 |2005-05-25 00:00:00 |4.99 | |
10785 |2005-05-25 00:00:00 |5.99 | |
9014 |2005-05-25 00:00:00 |6.99 | /
15394 |2005-05-25 01:00:00 |2.99 | \
10499 |2005-05-25 01:00:00 |4.99 | |
5020 |2005-05-25 01:00:00 |2.99 | | Tied group
490 |2005-05-25 01:00:00 |0.99 | |
12305 |2005-05-25 01:00:00 |4.99 | /
11796 |2005-05-25 02:00:00 |4.99 | \
9463 |2005-05-25 02:00:00 |4.99 | | Tied group
13711 |2005-05-25 02:00:00 |4.99 | /
Now we can see that for each hour, we have several payments. When we order payments by hour, there are some “tied” payments within that hour (or “group”), i.e. the order among payments on 2005-05-24 22:00:00 are not ordered deterministically among themselves. The payment ids are pretty random.
Now, if we look at the three window frame units again, how do they behave?
ROWS
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
array_agg(payment_id) OVER (
ORDER BY hour
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
We can see that the size of the window is always precisely 5 rows (except at the beginning and end of the data set):
payment_id |hour |array_agg |
-----------|--------------------|-------------------------------|
12377 |2005-05-24 22:00:00 |{12377,3504,6440} |
3504 |2005-05-24 22:00:00 |{12377,3504,6440,11032} |
6440 |2005-05-24 23:00:00 |{12377,3504,6440,11032,8987} |
11032 |2005-05-24 23:00:00 |{3504,6440,11032,8987,6003} |
8987 |2005-05-24 23:00:00 |{6440,11032,8987,6003,14728} |
6003 |2005-05-24 23:00:00 |{11032,8987,6003,14728,7274} |
14728 |2005-05-24 23:00:00 |{8987,6003,14728,7274,12025} |
7274 |2005-05-24 23:00:00 |{6003,14728,7274,12025,3831} |
12025 |2005-05-25 00:00:00 |{14728,7274,12025,3831,7044} |
3831 |2005-05-25 00:00:00 |{7274,12025,3831,7044,8623} |
7044 |2005-05-25 00:00:00 |{12025,3831,7044,8623,3386} |
8623 |2005-05-25 00:00:00 |{3831,7044,8623,3386,8554} |
3386 |2005-05-25 00:00:00 |{7044,8623,3386,8554,10785} |
8554 |2005-05-25 00:00:00 |{8623,3386,8554,10785,9014} |
10785 |2005-05-25 00:00:00 |{3386,8554,10785,9014,15394} |
9014 |2005-05-25 00:00:00 |{8554,10785,9014,15394,10499} |
15394 |2005-05-25 01:00:00 |{10785,9014,15394,10499,5020} |
10499 |2005-05-25 01:00:00 |{9014,15394,10499,5020,490} |
5020 |2005-05-25 01:00:00 |{15394,10499,5020,490,12305} |
490 |2005-05-25 01:00:00 |{10499,5020,490,12305,11796} |
12305 |2005-05-25 01:00:00 |{5020,490,12305,11796,9463} |
11796 |2005-05-25 02:00:00 |{490,12305,11796,9463,13711} |
9463 |2005-05-25 02:00:00 |{12305,11796,9463,13711,8167} |
13711 |2005-05-25 02:00:00 |{11796,9463,13711,8167,1011} |
There is no notion of a “group” among the rows in the window. But the problem is that we’re getting random
PAYMENT_ID
values unless we also add the
PAYMENT_ID
to the
ORDER BY
clause. This isn’t really what we want, most of the time, so we use:
RANGE
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
EXTRACT(epoch FROM hour) / 3600,
array_agg(payment_id) OVER (
ORDER BY EXTRACT(epoch FROM hour) / 3600
RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
I have switched from
ROWS
to
RANGE
and now the
ORDER BY
clause works on a number based on the epoch of the hour. What happens now?
This now yields:
payment_id |hour |?column? |array_agg
-----------|--------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
12377 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014}
3504 |2005-05-24 22:00:00 |310270 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014}
6440 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
11032 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
8987 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
6003 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
14728 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
7274 |2005-05-24 23:00:00 |310271 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305}
12025 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
3831 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
7044 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
8623 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
3386 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
8554 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
10785 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
9014 |2005-05-25 00:00:00 |310272 |{12377,3504, 6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711}
15394 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
10499 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
5020 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
490 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
12305 |2005-05-25 01:00:00 |310273 |{6440,11032,8987,6003,14728,7274, 12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245}
11796 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
9463 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
13711 |2005-05-25 02:00:00 |310274 |{12025,3831,7044,8623,3386,8554,10785,9014, 15394,10499,5020,490,12305, 11796,9463,13711, 8167,1011,1203,10019,6245,14396,13055,15984,9975,8188,5596,2388,7347,11598,6186}
I’ve visually separated the rows by their hour and the array aggregation by the “tied” payment_ids, i.e. the payment IDs that have the same hour.
Observations:
- We get the same aggregation value for the entire set of tied rows, so if in two rows,
HOUR
is the same, then ARRAY_AGG
is the same as well
- The window size is now a logical size, no longer an offset size, so we’re going back 2 hours and ahead 2 hours (instead of 2 rows). This is why I’ve extracted epoch and divided it by hour, so I will get consecutive integer values for consecutive hours
The same result could have been achieved using interval types:
WITH hourly_payment AS (
SELECT
payment_id,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
EXTRACT(epoch FROM hour) / 3600,
array_agg(payment_id) OVER (
ORDER BY hour
RANGE BETWEEN INTERVAL '2 hours' PRECEDING
AND INTERVAL '2 hours' FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
See also this article for details:
A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges
GROUPS
The third frame unit is quite useful, as we can now frame the window to a number of groups of same values. In our case, all payments of the same hour are in the same group. So, in order to get a similar result again, we can write:
WITH hourly_payment AS (
SELECT
payment_id,
payment_date,
date_trunc('h', payment_date) AS hour
FROM payment
)
SELECT
payment_id,
hour,
array_agg(payment_id) OVER (
ORDER BY hour
GROUPS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM hourly_payment
ORDER BY hour;
In fact, this is not
exactly the same result, because if we have gaps in the hours,
GROUPS
will simply jump over the gaps, whereas
RANGE
will not.
Summary of ROWS, RANGE, GROUPS
The above case was a real world use-case. A more constructed example that might be easier to digest, visually, can be seen here:
WITH t(id, v) AS (
VALUES (1, 1), (2, 1), (3, 3), (4, 5), (5, 5), (6, 5), (7, 6)
)
SELECT
id,
v,
array_agg(id) OVER rows,
array_agg(v) OVER rows,
array_agg(id) OVER range,
array_agg(v) OVER range,
array_agg(id) OVER groups,
array_agg(v) OVER groups
FROM t
WINDOW
o AS (ORDER BY v),
rows AS (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING),
range AS (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING),
groups AS (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
Notice, I’m using the SQL standard
WINDOW
clause to be able to name and reuse a repeated window specification. I’ve seen this clause to be supported in:
- MySQL 8.0
- PostgreSQL
- Sybase SQL Anywhere
The query yields:
id |v |array_agg |array_agg |array_agg |array_agg |array_agg |array_agg |
---|--|----------|----------|----------|----------|--------------|--------------|
1 |1 |{1,2} |{1,1} |{1,2} |{1,1} |{1,2,3} |{1,1,3} |
2 |1 |{1,2,3} |{1,1,3} |{1,2} |{1,1} |{1,2,3} |{1,1,3} |
3 |3 |{2,3,4} |{1,3,5} |{3} |{3} |{1,2,3,4,5,6} |{1,1,3,5,5,5} |
4 |5 |{3,4,5} |{3,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
5 |5 |{4,5,6} |{5,5,5} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
6 |5 |{5,6,7} |{5,5,6} |{4,5,6,7} |{5,5,5,6} |{3,4,5,6,7} |{3,5,5,5,6} |
7 |6 |{6,7} |{5,6} |{4,5,6,7} |{5,5,5,6} |{4,5,6,7} |{5,5,5,6} |
Observation:
- The
ROWS
framed window is of size 3 max in this case (1 row preceding, the current row, and 1 row following)
- The
RANGE
framed window is a logical window that looks behind a value of 1 and ahead a value of 1
- The
GROUPS
framed window is of size 3 groups max in this case (1 group preceding, the current group, and 1 group following)
Neat, huh?
jOOQ 3.12 will add support for this feature:
https://github.com/jOOQ/jOOQ/issues/7646
EXCLUDE clause
This is probably a bit less frequently useful than the new
GROUPS
clause. There is now a new window frame exclusion clause:
<window frame exclusion> ::=
EXCLUDE CURRENT ROW
| EXCLUDE GROUP
| EXCLUDE TIES
| EXCLUDE NO OTHERS
It can be used to exclude some rows around the current row from being in the window. I have yet to think of a use case for this. Here’s how it works for:
ROWS
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|------|--------|----------|
1 |{1} |NULL |{1} |{1,1} |
1 |{1,3} |{3} |{1,3} |{1,1,3} |
3 |{1,5} |{1,5} |{1,3,5} |{1,3,5} |
5 |{3,5} |{3} |{3,5} |{3,5,5} |
5 |{5,5} |NULL |{5} |{5,5,5} |
5 |{5,6} |{6} |{5,6} |{5,5,6} |
6 |{5} |{5} |{5,6} |{5,6} |
As you can see, the window may now be completely empty, which results in
NULL
being emitted.
- Excluding the current row seems obvious
- Excluding the current group also
- Excluding ties excludes all other rows from the group
- Excluding no others is the default, just like when you don’t put this
EXCLUDE
clause
RANGE
The exclusion can be applied to logical windowing as well:
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|--------|----------|----------|
1 |{1} |NULL |{1} |{1,1} |
1 |{1} |NULL |{1} |{1,1} |
3 |NULL |NULL |{3} |{3} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
5 |{5,5,6} |{6} |{5,6} |{5,5,5,6} |
6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
GROUPS
Same for grouped windows:
WITH t(v) AS (
VALUES (1), (1), (3), (5), (5), (5), (6)
)
SELECT
v,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE CURRENT ROW) AS current_row,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE GROUP) AS group,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE TIES) AS ties,
array_agg(v) OVER (o GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
EXCLUDE NO OTHERS) AS no_others
FROM t
WINDOW o AS (ORDER BY v)
Resulting in:
v |current_row |group |ties |no_others |
--|------------|------------|--------------|--------------|
1 |{1,3} |{3} |{1,3} |{1,1,3} |
1 |{1,3} |{3} |{1,3} |{1,1,3} |
3 |{1,1,5,5,5} |{1,1,5,5,5} |{1,1,3,5,5,5} |{1,1,3,5,5,5} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
5 |{3,5,5,6} |{3,6} |{3,5,6} |{3,5,5,5,6} |
6 |{5,5,5} |{5,5,5} |{5,5,5,6} |{5,5,5,6} |
Needless to say that this clause will be supported in jOOQ 3.12 as well:
https://github.com/jOOQ/jOOQ/issues/7647
Bonus points for the reader who can think of a real world use-case for this clause, please leave a comment!
Like this:
Like Loading...