A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges

I’m frequently telling developers to put window functions almost everywhere, because they’re so awesome! One feature that I rarely see in the wild (even if it is extremely useful for reporting) is called “logical windowing” in Oracle, and it’s most useful when used with INTERVAL ranges. Let’s see what we may want to do. I have a payment table in my Sakila DVD rental store database. The payment table has payment dates, as can be seen here:


SELECT CAST(payment_date AS TIMESTAMP) ts
FROM payment
ORDER BY ts

output:

TS                        
---------------------------
24.05.05 22:53:30.000000000 
24.05.05 22:54:33.000000000 
24.05.05 23:03:39.000000000 
24.05.05 23:04:41.000000000 
24.05.05 23:05:21.000000000 
24.05.05 23:08:07.000000000 
24.05.05 23:11:53.000000000 
24.05.05 23:31:46.000000000 
25.05.05 00:00:40.000000000 
25.05.05 00:02:21.000000000 
25.05.05 00:09:02.000000000 
25.05.05 00:19:27.000000000

Now, let’s assume I’m interested in these things:

How many payments were there in the same hour as any given payment?
How many payments were there in the same hour before any given payment?
How many payments were there within one hour before any given payment?

Those are three entirely different questions. The expected solution will be this:

TS                                1   2   3
-------------------------------------------
24.05.05 22:53:30.000000000       2   1   1
24.05.05 22:54:33.000000000       2   2   2
24.05.05 23:03:39.000000000       6   1   3
24.05.05 23:04:41.000000000       6   2   4
24.05.05 23:05:21.000000000       6   3   5
24.05.05 23:08:07.000000000       6   4   6
24.05.05 23:11:53.000000000       6   5   7
24.05.05 23:31:46.000000000       6   6   8
25.05.05 00:00:40.000000000       4   1   7
25.05.05 00:02:21.000000000       4   2   8
25.05.05 00:09:02.000000000       4   3   5
25.05.05 00:19:27.000000000       4   4   5

As you can see, in column #1, we’re getting always the same number of payments for any given hour. If we were using GROUP BY trunc(payment_date, 'HH24'), it would be simple to see that this would the result we were looking for:

TS                                1
-----------------------------------
24.05.05 22:00:00                 2
24.05.05 23:00:00                 6
25.05.05 00:00:00                 4

Column #2 is a bit more sophisticated as it will also aggregatee the number of payments per hour, but only those that are before any given payment. For instance, in the hour of 24.05.05 23:00:00, we have 6 payments (see above), and those are the different payment numbers within that hour. E.g. 24.05.05 23:11:53 is the fifth payment within that hour:

TS                                2
-----------------------------------
24.05.05 23:03:39.000000000       1
24.05.05 23:04:41.000000000       2
24.05.05 23:05:21.000000000       3
24.05.05 23:08:07.000000000       4
24.05.05 23:11:53.000000000       5
24.05.05 23:31:46.000000000       6

Finally, column #3 is using a sliding interval. For each payment, it checks how many payments were there in the time range between the current payment and one hour before. Now, excuse my ASCII art:

TS                                3
-----------------------------------
24.05.05 22:53:30.000000000       1
24.05.05 22:54:33.000000000       2
24.05.05 23:03:39.000000000       3 <------\
24.05.05 23:04:41.000000000       4        |
24.05.05 23:05:21.000000000       5        |
24.05.05 23:08:07.000000000       6        |
24.05.05 23:11:53.000000000       7 <----\ |
24.05.05 23:31:46.000000000       8 <--\ | |
25.05.05 00:00:40.000000000       7    | | |
25.05.05 00:02:21.000000000       8 ---|-|-/
25.05.05 00:09:02.000000000       5 ---|-/
25.05.05 00:19:27.000000000       5 ---/

All of these are really useful for reporting, and all of these can be done rather easily with window functions. I’m using Oracle syntax here:


SELECT 
  CAST(payment_date AS TIMESTAMP) ts,

  -- Column 1: Payments in the same hour
  COUNT(*) OVER (
    PARTITION BY TRUNC(payment_date, 'HH24')
  ),

  -- Column 2: Preceding payments in the same hour
  COUNT(*) OVER (
    PARTITION BY TRUNC(payment_date, 'HH24') 
    ORDER BY payment_date
  ),

  -- Column 3: Preceding payments within one hour
  COUNT(*) OVER (
    ORDER BY payment_date 
    RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
  )
FROM payment
ORDER BY ts

Remember how column #1 is essentially the same thing as running a GROUP BY operation? That’s simple with window functions too. Just use a single PARTITION BY clause, and you get that grouping by the hour:


COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24')
)

Column #2 is more interesting. Because we’re ordering the window, and because we’re not putting any explicit frame clause (ROWS or RANGE), by default, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is implied. The following two are the same:


COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24') 
  ORDER BY payment_date
)

COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24') 
  ORDER BY payment_date
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

In other words, we group (PARTITION BY) all payments by the hour, order each group by the actual date, and limit / frame the window to the payments that preceed the current payment. Column #3 is the most interesting and the least known, however. Unfortunately, it is not yet supported in all databases, even if it’s part of the SQL standard.


COUNT(*) OVER (
  ORDER BY payment_date 
  RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
)

We’re no longer using the PARITION BY clause, because we don’t want to group payments into hours. Instead, we want to look behind one hour from each individual payment to see how many payments there were in that hour. But hours now don’t start / end at 00:00 minutes/seconds, they end in a given payment’s payment_date column (where Oracle DATE is really a timestamp). This is a really nice feature that is available with DATE or TIMESTAMP columns only, when you pass an INTERVAL data type to the frame clause. I bet, however, that most of you have already had 1-2 reports where this might have been useful to know! Want to learn more?

4 thoughts on “A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges”

connormcdonald says:

October 31, 2016 at 12:07

At the risk of being accused of self-marketing, I’ve done a 30-video series on Analytic functions (including windowing functions) here

https://www.youtube.com/playlist?list=PLJMaoEWvHwFJDyhMLCkNSSUQWw9waFkIj

Cheers,
Connor

Loading...

1. lukaseder says:
  
  October 31, 2016 at 12:37
  
  That’s a huge risk, but really interesting looking series! Thanks for sharing
  
  Loading...
  
Ricardo Bánffy (@rbanffy) says:

December 20, 2016 at 11:22

I was pleasantly surprised they are present in PostgreSQL: https://www.postgresql.org/docs/9.1/static/tutorial-window.html

Loading...

1. lukaseder says:
  
  December 20, 2016 at 11:27
  
  Window functions: Yes of course :). But this particular feature (logical windowing over date ranges) is not yet available, unfortunately.
  
  Correction: It was added to PostgreSQL 11
  
  Loading...

A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges

Like this:

Published by lukaseder

4 thoughts on “A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges”

Leave a ReplyCancel reply

Like this:

Published by lukaseder

4 thoughts on “A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.