How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

SQL is a verbose language, and one of the most verbose features are window functions. In a stack overflow question that I’ve encountered recently, someone asked to calculate the difference between the first and the last value in a time series for any given day: Input

volume  tstamp
---------------------------
29011   2012-12-28 09:00:00
28701   2012-12-28 10:00:00
28830   2012-12-28 11:00:00
28353   2012-12-28 12:00:00
28642   2012-12-28 13:00:00
28583   2012-12-28 14:00:00
28800   2012-12-29 09:00:00
28751   2012-12-29 10:00:00
28670   2012-12-29 11:00:00
28621   2012-12-29 12:00:00
28599   2012-12-29 13:00:00
28278   2012-12-29 14:00:00

Desired output

first  last   difference  date
------------------------------------
29011  28583  428         2012-12-28
28800  28278  522         2012-12-29

How to write the query?

Notice that the value and timestamp progression do not correlate as it may appear. So, there is not a rule that if Timestamp2 > Timestamp1 then Value2 < Value1. Otherwise, this simple query would work (using PostgreSQL syntax):


SELECT 
  max(volume)               AS first,
  min(volume)               AS last,
  max(volume) - min(volume) AS difference,
  CAST(tstamp AS DATE)      AS date
FROM t
GROUP BY CAST(tstamp AS DATE);

There are several ways to find the first and last values within a group that do not involve window functions. For example:

In Oracle, you can use the FIRST and LAST functions, which for some arcane reason are not written FIRST(...) WITHIN GROUP (ORDER BY ...) or LAST(...) WITHIN GROUP (ORDER BY ...), like other sorted set aggregate functions, but some_aggregate_function(...) KEEP (DENSE_RANK FIRST ORDER BY ...). Go figure
In PostgreSQL, you could use the DISTINCT ON syntax along with ORDER BY and LIMIT

More details about the various approaches can be found here:

How to Write Efficient TOP N Queries in SQL

The best performing approach would be to use an aggregate function like Oracle’s, but few databases have this function. So, we’ll resort to using the FIRST_VALUE and LAST_VALUE window functions:


SELECT DISTINCT
  first_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS first,
  last_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last,
  first_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) 
  - last_value(volume) OVER (
    PARTITION BY CAST(tstamp AS DATE) 
    ORDER BY tstamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
ORDER BY CAST(tstamp AS DATE)

Oops 🤔 That doesn’t look too readable. But it will yield the correct result. Granted, we could wrap the definition for the columns FIRST and LAST in a derived table, but that would still leave us with two repetitions of the window definition:

PARTITION BY CAST(tstamp AS DATE) 
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

WINDOW clause to the rescue

Luckily, at least 3 databases have implemented the SQL standard WINDOW clause:

MySQL
PostgreSQL
Sybase SQL Anywhere

The above query can be refactored to this one:


SELECT DISTINCT
  first_value(volume) OVER w AS first,
  last_value(volume) OVER w AS last,
  first_value(volume) OVER w 
    - last_value(volume) OVER w AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW w AS (
  PARTITION BY CAST(tstamp AS DATE) 
  ORDER BY tstamp
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY CAST(tstamp AS DATE)

Notice how I can specify a window name with a window specification in a similar way as I can define a common table expression (WITH clause):

WINDOW 
    <window-name> AS (<window-specification>)
{  ,<window-name> AS (<window-specification>)... }

Not only can I reuse entire specifications, I could also build a specification from a partial specification, and reuse only parts. My previous query could have been rewritten as such:


SELECT DISTINCT
  first_value(volume) OVER w3 AS first,
  last_value(volume) OVER w3 AS last,
  first_value(volume) OVER w3 
    - last_value(volume) OVER w3 AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW 
  w1 AS (PARTITION BY CAST(tstamp AS DATE)),
  w2 AS (w1 ORDER BY tstamp),
  w3 AS (w2 ROWS BETWEEN UNBOUNDED PRECEDING 
                     AND UNBOUNDED FOLLOWING)
ORDER BY CAST(tstamp AS DATE)

Each window specification can be created from scratch, or be based on a previously defined window specification. Note this is also true when referencing the window definition. If I wanted to reuse the PARTITION BY clause and the ORDER BY clause, but change the FRAME clause (ROWS ...), then I could have written this:


SELECT DISTINCT
  first_value(volume) OVER (
    w2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS first,
  last_value(volume) OVER (
    w2 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS last,
  first_value(volume) OVER (
    w2 ROWS UNBOUNDED PRECEDING
  ) - last_value(volume) OVER (
    w2 ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING
  ) AS diff,
  CAST(tstamp AS DATE) AS date
FROM t
WINDOW 
  w1 AS (PARTITION BY CAST(tstamp AS DATE)),
  w2 AS (w1 ORDER BY tstamp)
ORDER BY CAST(tstamp AS DATE)

What if my database doesn’t support the WINDOW clause?

In that case, you have to either manually write the window specification on each window function, or you use a SQL builder like jOOQ, which can emulate the window clause:

So, jOOQ can translate this as well, now pic.twitter.com/ucbKNkhTjW
— Lukas Eder (@lukaseder) July 12, 2018

You can try this translation online on our website: https://www.jooq.org/translate

4 thoughts on “How to Reduce Syntactic Overhead Using the SQL WINDOW Clause”

Dag Wanvik says:

September 26, 2018 at 06:46
Hi Lukas,

thanks for the blog!

It seems part of the BETWEEN specification fell out on this line (#9 in last example in
section “WINDOW clause to the rescue”):
```
w2 ROWS BETWEEN UNBOUNDED PRECEDING  
```
The query works (MySQL 8.0) if I add “AND CURRENT ROW”

Thanks,
Dag

Loading...
1. lukaseder says:
  
  September 26, 2018 at 09:40
  
  Thanks a lot for the pointer, Dag. Fixed. What I meant to write there is w3 ROWS UNBOUNDED PRECEDING, i.e. use the syntax that specifies only the lower bound of the window.
  
  Loading...
  
Lukasz says:

September 26, 2018 at 16:48

Starting from SQLite 3.25.0(2018-09-18 ) also supports WINDOW clause/windowed functions.

https://www.sqlite.org/windowfunctions.html

so we have at least 4 DBs :)

Loading...

1. lukaseder says:
  
  September 26, 2018 at 16:53
  
  You’re right, thanks for the pointer
  
  Loading...

How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

How to write the query?

WINDOW clause to the rescue

What if my database doesn’t support the WINDOW clause?

Like this:

Published by lukaseder

4 thoughts on “How to Reduce Syntactic Overhead Using the SQL WINDOW Clause”

Leave a ReplyCancel reply

How to write the query?

WINDOW clause to the rescue

What if my database doesn’t support the WINDOW clause?

Like this:

Published by lukaseder

4 thoughts on “How to Reduce Syntactic Overhead Using the SQL WINDOW Clause”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.