A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges


I’m frequently telling developers to put window functions almost everywhere, because they’re so awesome!

One feature that I rarely see in the wild (even if it is extremely useful for reporting) is called “logical windowing” in Oracle, and it’s most useful when used with INTERVAL ranges. Let’s see what we may want to do.

I have a payment table in my Sakila DVD rental store database. The payment table has payment dates, as can be seen here:

SELECT CAST(payment_date AS TIMESTAMP) ts
FROM payment
ORDER BY ts

output:

TS                        
---------------------------
24.05.05 22:53:30.000000000 
24.05.05 22:54:33.000000000 
24.05.05 23:03:39.000000000 
24.05.05 23:04:41.000000000 
24.05.05 23:05:21.000000000 
24.05.05 23:08:07.000000000 
24.05.05 23:11:53.000000000 
24.05.05 23:31:46.000000000 
25.05.05 00:00:40.000000000 
25.05.05 00:02:21.000000000 
25.05.05 00:09:02.000000000 
25.05.05 00:19:27.000000000 

Now, let’s assume I’m interested in these things:

  1. How many payments were there in the same hour as any given payment?
  2. How many payments were there in the same hour before any given payment?
  3. How many payments were there within one hour before any given payment?

Those are three entirely different questions. The expected solution will be this:

TS                                1   2   3
-------------------------------------------
24.05.05 22:53:30.000000000       2   1   1
24.05.05 22:54:33.000000000       2   2   2
24.05.05 23:03:39.000000000       6   1   3
24.05.05 23:04:41.000000000       6   2   4
24.05.05 23:05:21.000000000       6   3   5
24.05.05 23:08:07.000000000       6   4   6
24.05.05 23:11:53.000000000       6   5   7
24.05.05 23:31:46.000000000       6   6   8
25.05.05 00:00:40.000000000       4   1   7
25.05.05 00:02:21.000000000       4   2   8
25.05.05 00:09:02.000000000       4   3   5
25.05.05 00:19:27.000000000       4   4   5

As you can see, in column #1, we’re getting always the same number of payments for any given hour. If we were using GROUP BY trunc(payment_date, 'HH24'), it would be simple to see that this would the result we were looking for:

TS                                1
-----------------------------------
24.05.05 22:00:00                 2
24.05.05 23:00:00                 6
25.05.05 00:00:00                 4

Column #2 is a bit more sophisticated as it will also aggregatee the number of payments per hour, but only those that are before any given payment. For instance, in the hour of 24.05.05 23:00:00, we have 6 payments (see above), and those are the different payment numbers within that hour. E.g. 24.05.05 23:11:53 is the fifth payment within that hour:

TS                                2
-----------------------------------
24.05.05 23:03:39.000000000       1
24.05.05 23:04:41.000000000       2
24.05.05 23:05:21.000000000       3
24.05.05 23:08:07.000000000       4
24.05.05 23:11:53.000000000       5
24.05.05 23:31:46.000000000       6

Finally, column #3 is using a sliding interval. For each payment, it checks how many payments were there in the time range between the current payment and one hour before. Now, excuse my ASCII art:

TS                                3
-----------------------------------
24.05.05 22:53:30.000000000       1
24.05.05 22:54:33.000000000       2
24.05.05 23:03:39.000000000       3 <------\
24.05.05 23:04:41.000000000       4        |
24.05.05 23:05:21.000000000       5        |
24.05.05 23:08:07.000000000       6        |
24.05.05 23:11:53.000000000       7 <----\ |
24.05.05 23:31:46.000000000       8 <--\ | |
25.05.05 00:00:40.000000000       7    | | |
25.05.05 00:02:21.000000000       8 ---|-|-/
25.05.05 00:09:02.000000000       5 ---|-/
25.05.05 00:19:27.000000000       5 ---/

All of these are really useful for reporting, and all of these can be done rather easily with window functions. I’m using Oracle syntax here:

SELECT 
  CAST(payment_date AS TIMESTAMP) ts,

  -- Column 1: Payments in the same hour
  COUNT(*) OVER (
    PARTITION BY TRUNC(payment_date, 'HH24')
  ),

  -- Column 2: Preceding payments in the same hour
  COUNT(*) OVER (
    PARTITION BY TRUNC(payment_date, 'HH24') 
    ORDER BY payment_date
  ),

  -- Column 3: Preceding payments within one hour
  COUNT(*) OVER (
    ORDER BY payment_date 
    RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
  )
FROM payment
ORDER BY ts

Remember how column #1 is essentially the same thing as running a GROUP BY operation? That’s simple with window functions too. Just use a single PARTITION BY clause, and you get that grouping by the hour:

COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24')
)

Column #2 is more interesting. Because we’re ordering the window, and because we’re not putting any explicit frame clause (ROWS or RANGE), by default, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is implied. The following two are the same:

COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24') 
  ORDER BY payment_date
)

COUNT(*) OVER (
  PARTITION BY TRUNC(payment_date, 'HH24') 
  ORDER BY payment_date
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

In other words, we group (PARTITION BY) all payments by the hour, order each group by the actual date, and limit / frame the window to the payments that preceed the current payment.

Column #3 is the most interesting and the least known, however. Unfortunately, it is not yet supported in all databases – e.g. PostgreSQL doesn’t have it), even if it’s part of the SQL standard.

COUNT(*) OVER (
  ORDER BY payment_date 
  RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
)

We’re no longer using the PARITION BY clause, because we don’t want to group payments into hours. Instead, we want to look behind one hour from each individual payment to see how many payments there were in that hour. But hours now don’t start / end at 00:00 minutes/seconds, they end in a given payment’s payment_date column (where Oracle DATE is really a timestamp).

This is a really nice feature that is available with DATE or TIMESTAMP columns only, when you pass an INTERVAL data type to the frame clause. I bet, however, that most of you have already had 1-2 reports where this might have been useful to know!

Want to learn more?

4 thoughts on “A Little Known SQL Feature: Use Logical Windowing to Aggregate Sliding Ranges

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s