I’m frequently telling developers to
put window functions almost everywhere, because they’re so awesome!
One feature that I rarely see in the wild (even if it is extremely useful for reporting) is called “logical windowing” in Oracle, and it’s most useful when used with
INTERVAL
ranges. Let’s see what we may want to do.
I have a
payment
table in my
Sakila DVD rental store database. The payment table has payment dates, as can be seen here:
SELECT CAST(payment_date AS TIMESTAMP) ts
FROM payment
ORDER BY ts
output:
TS
---------------------------
24.05.05 22:53:30.000000000
24.05.05 22:54:33.000000000
24.05.05 23:03:39.000000000
24.05.05 23:04:41.000000000
24.05.05 23:05:21.000000000
24.05.05 23:08:07.000000000
24.05.05 23:11:53.000000000
24.05.05 23:31:46.000000000
25.05.05 00:00:40.000000000
25.05.05 00:02:21.000000000
25.05.05 00:09:02.000000000
25.05.05 00:19:27.000000000
Now, let’s assume I’m interested in these things:
- How many payments were there in the same hour as any given payment?
- How many payments were there in the same hour before any given payment?
- How many payments were there within one hour before any given payment?
Those are three entirely different questions. The expected solution will be this:
TS 1 2 3
-------------------------------------------
24.05.05 22:53:30.000000000 2 1 1
24.05.05 22:54:33.000000000 2 2 2
24.05.05 23:03:39.000000000 6 1 3
24.05.05 23:04:41.000000000 6 2 4
24.05.05 23:05:21.000000000 6 3 5
24.05.05 23:08:07.000000000 6 4 6
24.05.05 23:11:53.000000000 6 5 7
24.05.05 23:31:46.000000000 6 6 8
25.05.05 00:00:40.000000000 4 1 7
25.05.05 00:02:21.000000000 4 2 8
25.05.05 00:09:02.000000000 4 3 5
25.05.05 00:19:27.000000000 4 4 5
As you can see, in column #1, we’re getting always the same number of payments for any given hour. If we were using
GROUP BY trunc(payment_date, 'HH24')
, it would be simple to see that this would the result we were looking for:
TS 1
-----------------------------------
24.05.05 22:00:00 2
24.05.05 23:00:00 6
25.05.05 00:00:00 4
Column #2 is a bit more sophisticated as it will also aggregatee the number of payments per hour, but only those that are before any given payment. For instance, in the hour of
24.05.05 23:00:00
, we have 6 payments (see above), and those are the different payment numbers within that hour. E.g.
24.05.05 23:11:53
is the fifth payment within that hour:
TS 2
-----------------------------------
24.05.05 23:03:39.000000000 1
24.05.05 23:04:41.000000000 2
24.05.05 23:05:21.000000000 3
24.05.05 23:08:07.000000000 4
24.05.05 23:11:53.000000000 5
24.05.05 23:31:46.000000000 6
Finally, column #3 is using a sliding interval. For each payment, it checks how many payments were there in the time range between the current payment and one hour before. Now, excuse my ASCII art:
TS 3
-----------------------------------
24.05.05 22:53:30.000000000 1
24.05.05 22:54:33.000000000 2
24.05.05 23:03:39.000000000 3 <------\
24.05.05 23:04:41.000000000 4 |
24.05.05 23:05:21.000000000 5 |
24.05.05 23:08:07.000000000 6 |
24.05.05 23:11:53.000000000 7 <----\ |
24.05.05 23:31:46.000000000 8 <--\ | |
25.05.05 00:00:40.000000000 7 | | |
25.05.05 00:02:21.000000000 8 ---|-|-/
25.05.05 00:09:02.000000000 5 ---|-/
25.05.05 00:19:27.000000000 5 ---/
All of these are really useful for reporting, and all of these can be done rather easily with window functions. I’m using Oracle syntax here:
SELECT
CAST(payment_date AS TIMESTAMP) ts,
-- Column 1: Payments in the same hour
COUNT(*) OVER (
PARTITION BY TRUNC(payment_date, 'HH24')
),
-- Column 2: Preceding payments in the same hour
COUNT(*) OVER (
PARTITION BY TRUNC(payment_date, 'HH24')
ORDER BY payment_date
),
-- Column 3: Preceding payments within one hour
COUNT(*) OVER (
ORDER BY payment_date
RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
)
FROM payment
ORDER BY ts
Remember how column #1 is essentially the same thing as running a
GROUP BY
operation? That’s simple with window functions too. Just use a single
PARTITION BY
clause, and you get that grouping by the hour:
COUNT(*) OVER (
PARTITION BY TRUNC(payment_date, 'HH24')
)
Column #2 is more interesting. Because we’re ordering the window, and because we’re not putting any explicit frame clause (
ROWS
or
RANGE
), by default,
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is implied. The following two are the same:
COUNT(*) OVER (
PARTITION BY TRUNC(payment_date, 'HH24')
ORDER BY payment_date
)
COUNT(*) OVER (
PARTITION BY TRUNC(payment_date, 'HH24')
ORDER BY payment_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
In other words, we group (
PARTITION BY
) all payments by the hour, order each group by the actual date, and limit / frame the window to the payments that preceed the current payment.
Column #3 is the most interesting and the least known, however. Unfortunately, it is not yet supported in all databases, even if it’s part of the SQL standard.
COUNT(*) OVER (
ORDER BY payment_date
RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
)
We’re no longer using the
PARITION BY
clause, because we don’t want to group payments into hours. Instead, we want to look behind one hour from each individual payment to see how many payments there were in that hour. But hours now don’t start / end at
00:00
minutes/seconds, they end in a given payment’s
payment_date
column (where Oracle DATE is really a timestamp).
This is a really nice feature that is available with
DATE
or
TIMESTAMP
columns only, when you pass an
INTERVAL
data type to the frame clause. I bet, however, that most of you have already had 1-2 reports where this might have been useful to know!
Want to learn more?
Like this:
Like Loading...
At the risk of being accused of self-marketing, I’ve done a 30-video series on Analytic functions (including windowing functions) here
https://www.youtube.com/playlist?list=PLJMaoEWvHwFJDyhMLCkNSSUQWw9waFkIj
Cheers,
Connor
That’s a huge risk, but really interesting looking series! Thanks for sharing
I was pleasantly surprised they are present in PostgreSQL: https://www.postgresql.org/docs/9.1/static/tutorial-window.html
Window functions: Yes of course :). But this particular feature (logical windowing over date ranges) is not yet available, unfortunately.
Correction: It was added to PostgreSQL 11