Use this Neat Window Function Trick to Calculate Time Differences in a Time Series

Whenever you feel that itch…

Can’t I calculate this with SQL?

The answer is: Yes you can! And you should! Let’s see how…

Calculating time differences between rows

Let’s consider the following database containing timestamps (e.g. in a log database). We’re using PostgreSQL syntax for this:

CREATE TABLE timestamps (
  ts timestamp
);

INSERT INTO timestamps VALUES
  ('2015-05-01 12:15:23.0'),
  ('2015-05-01 12:15:24.0'),
  ('2015-05-01 12:15:27.0'),
  ('2015-05-01 12:15:31.0'),
  ('2015-05-01 12:15:40.0'),
  ('2015-05-01 12:15:55.0'),
  ('2015-05-01 12:16:01.0'),
  ('2015-05-01 12:16:03.0'),
  ('2015-05-01 12:16:04.0'),
  ('2015-05-01 12:16:04.0');

Obviously, you’ll be adding constraints and indexes, etc. Now, let’s assume that each individual timestamp represents an event in your system, and you’d like to keep track of how long ago the previous event has happened. I.e. you’d like the following result:

ts                   delta
-------------------------------
2015-05-01 12:15:23
2015-05-01 12:15:24  00:00:01
2015-05-01 12:15:27  00:00:03
2015-05-01 12:15:31  00:00:04
2015-05-01 12:15:40  00:00:09
2015-05-01 12:15:55  00:00:15
2015-05-01 12:16:01  00:00:06
2015-05-01 12:16:03  00:00:02
2015-05-01 12:16:04  00:00:01
2015-05-01 12:16:04  00:00:00

In other words

  • ts1 (12:15:23) + delta (00:00:01) = ts2 (12:15:24)
  • ts2 (12:15:24) + delta (00:00:03) = ts3 (12:15:27)

This can be achieved very easily with the LAG() window function:

SELECT 
  ts, 
  ts - lag(ts, 1) OVER (ORDER BY ts) delta
FROM timestamps
ORDER BY ts;

The above reads simply:

Give me the difference between the ts value of the current row and the ts value of the row that “lags” behind this row by one, with rows ordered by ts.

Easy, right? With LAG() you can actually access any row from another row within a “sliding window” by simply specifying the lag index.

We’ve already described this wonderful window function in a previous blog post.

Bonus: A running total interval

In addition to the difference between this timestamp and the previous one, we might be interested in the total difference between this timestamp and the first timestamp. This may sound like a running total (see our previous article about running totals using SQL), but it can be calculated much more easily using FIRST_VALUE() – a “cousin” of LAG()

SELECT 
  ts, 
  ts - lag(ts, 1) OVER w delta,
  ts - first_value(ts) OVER w total
FROM timestamps
WINDOW w AS (ORDER BY ts)
ORDER BY ts;

… the above query then yields

ts                   delta     total
---------------------------------------
2015-05-01 12:15:23            00:00:00
2015-05-01 12:15:24  00:00:01  00:00:01
2015-05-01 12:15:27  00:00:03  00:00:04
2015-05-01 12:15:31  00:00:04  00:00:08
2015-05-01 12:15:40  00:00:09  00:00:17
2015-05-01 12:15:55  00:00:15  00:00:32
2015-05-01 12:16:01  00:00:06  00:00:38
2015-05-01 12:16:03  00:00:02  00:00:40
2015-05-01 12:16:04  00:00:01  00:00:41
2015-05-01 12:16:04  00:00:00  00:00:41

Extra bonus: The total since a “reset” event

We can take this as far as we want. Let’s assume that we want to reset the total from time to time:

CREATE TABLE timestamps (
  ts timestamp,
  event varchar(50)
);

INSERT INTO timestamps VALUES
  ('2015-05-01 12:15:23.0', null),
  ('2015-05-01 12:15:24.0', null),
  ('2015-05-01 12:15:27.0', 'reset'),
  ('2015-05-01 12:15:31.0', null),
  ('2015-05-01 12:15:40.0', null),
  ('2015-05-01 12:15:55.0', 'reset'),
  ('2015-05-01 12:16:01.0', null),
  ('2015-05-01 12:16:03.0', null),
  ('2015-05-01 12:16:04.0', null),
  ('2015-05-01 12:16:04.0', null);

We can now run the following query:

SELECT
  ts, 
  ts - lag(ts, 1) 
       OVER (ORDER BY ts) delta,
  ts - first_value(ts) 
       OVER (PARTITION BY c ORDER BY ts) total
FROM (
  SELECT 
    COUNT(*) FILTER (WHERE EVENT = 'reset') 
             OVER (ORDER BY ts) c,
    ts
  FROM timestamps
) timestamps
ORDER BY ts;

… to produce

ts                   delta     total
---------------------------------------
2015-05-01 12:15:23            00:00:00
2015-05-01 12:15:24  00:00:01  00:00:01
2015-05-01 12:15:27  00:00:03  00:00:00 <-- reset
2015-05-01 12:15:31  00:00:04  00:00:04
2015-05-01 12:15:40  00:00:09  00:00:13
2015-05-01 12:15:55  00:00:15  00:00:00 <-- reset
2015-05-01 12:16:01  00:00:06  00:00:06
2015-05-01 12:16:03  00:00:02  00:00:08
2015-05-01 12:16:04  00:00:01  00:00:09
2015-05-01 12:16:04  00:00:00  00:00:09

The beautiful part is in the derived table

  SELECT 
    COUNT(*) FILTER (WHERE EVENT = 'reset') 
             OVER (ORDER BY ts) c,
    ts
  FROM timestamps

This derived table just adds the “partition” to each set of timestamps given the most recent “reset” event. The result of the above subquery is:

c  ts
----------------------
0  2015-05-01 12:15:23
0  2015-05-01 12:15:24
1  2015-05-01 12:15:27 <-- reset
1  2015-05-01 12:15:31
1  2015-05-01 12:15:40
2  2015-05-01 12:15:55 <-- reset
2  2015-05-01 12:16:01
2  2015-05-01 12:16:03
2  2015-05-01 12:16:04
2  2015-05-01 12:16:04

As you can see, the COUNT(*) window function counts all the previous “reset” events, ordered by timestamp. This information can then be used as the PARTITION for the FIRST_VALUE() window function in order to find the first timestamp in each partition, i.e. at the time of the most recent “reset” event:

  ts - first_value(ts) 
       OVER (PARTITION BY c ORDER BY ts) total

Conclusion

It’s almost a running gag on this blog to say that…

There was SQL before window functions and SQL after window functions

Window functions are extremely powerful and they’re a part of the SQL standard, supported in most commercial databases, in PostgreSQL, in Firebird 3.0, and in CUBRID. If you aren’t using them already, start using them today!

If you’ve liked this article, find out more about window functions in any of the following articles:

Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()

If you’re using a commercial database or PostgreSQL / Firebird / CUBRID, you will be able to take advantage of the full power of window functions. We’ve blogged about window functions’ awesomeness a couple of times, in particular about ROW_NUMBER(), RANK(), DENSE_RANK().

Today, we’re going to look into some awesome window functions that produce values of other rows that are positioned before or after the current row.

Setting up the test data

We’re going to do some interesting statistics today using publicly available data from the World Bank. To keep things simple, we’ll only do analyses for the G8 countries:

  • Canada (CA)
  • France (FR)
  • Germany (DE)
  • Italy (IT)
  • Japan (JP)
  • Russian Federation (RU)
  • United Kingdom (GB)
  • United States (US)

And for those countries, let’s consider the following data points for the years 2009-2012:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

Central government debt, total (% of GDP)

          2009    2010    2011    2012
CA        51.3    51.4    52.5    53.5	
DE        47.6    55.5    55.1    56.9	
FR        85.0    89.2    93.2   103.8	
GB        71.7    85.2    99.6   103.2	
IT       121.3   119.9   113.0   131.1	
JP       166.8   174.8   189.5   196.5	
RU         8.7     9.1     9.3     9.4	
US        76.3    85.6    90.1    93.8	

Let’s put all that data into a fact table like so (PostgreSQL syntax):

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL,
  govt_debt DECIMAL(10, 2) NOT NULL
);

INSERT INTO countries
VALUES ('CA', 2009, 40764, 51.3),
       ('CA', 2010, 47465, 51.4),
       ('CA', 2011, 51791, 52.5),
       ('CA', 2012, 52409, 53.5),
       ('DE', 2009, 40270, 47.6),
       ('DE', 2010, 40408, 55.5),
       ('DE', 2011, 44355, 55.1),
       ('DE', 2012, 42598, 56.9),
       ('FR', 2009, 40488, 85.0),
       ('FR', 2010, 39448, 89.2),
       ('FR', 2011, 42578, 93.2),
       ('FR', 2012, 39759,103.8),
       ('GB', 2009, 35455,121.3),
       ('GB', 2010, 36573, 85.2),
       ('GB', 2011, 38927, 99.6),
       ('GB', 2012, 38649,103.2),
       ('IT', 2009, 35724,121.3),
       ('IT', 2010, 34673,119.9),
       ('IT', 2011, 36988,113.0),
       ('IT', 2012, 33814,131.1),
       ('JP', 2009, 39473,166.8),
       ('JP', 2010, 43118,174.8),
       ('JP', 2011, 46204,189.5),
       ('JP', 2012, 46548,196.5),
       ('RU', 2009,  8616,  8.7),
       ('RU', 2010, 10710,  9.1),
       ('RU', 2011, 13324,  9.3),
       ('RU', 2012, 14091,  9.4),
       ('US', 2009, 46999, 76.3),
       ('US', 2010, 48358, 85.6),
       ('US', 2011, 49855, 90.1),
       ('US', 2012, 51755, 93.8);

Start the querying fun

People who are used to SQL-92 syntax will be able to quickly find the highest GDP per capita or the highest debt from the table. It’s an easy query like this one:

SELECT MAX(gdp_per_capita), MAX(govt_debt)
FROM countries;

Which will return:

52409.00    196.50

But that’s not interesting. We don’t even know what countries and what years these values are associated with.

A standard SQL-92 (and also a standard relational) query to return all of these values would look something like this:

SELECT 
  'highest gdp per capita' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.gdp_per_capita < c2.gdp_per_capita
)
UNION ALL
SELECT
  'highest government debt' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.govt_debt < c2.govt_debt
)

In essence, we select those rows for which there doesn’t exist any other row with a higher value for either gdp_per_capita (first subselect) or govt_debt (second subselect).

Trick! Use quantified comparison predicates!

If your database supports quantified comparison predicates, then you can write this a bit more concisely like this:

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita >= ALL (
  SELECT gdp_per_capita FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt >= ALL (
  SELECT govt_debt FROM countries
)

Which is essentially the same as…

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita = (
  SELECT MAX(gdp_per_capita) FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt = (
  SELECT MAX(govt_debt) FROM countries
)

The output would be:

what                     code year       gdp    debt
----------------------------------------------------
highest gdp per capita   CA   2012  52409.00   53.50
highest government debt  JP   2012  46548.00  196.50

That’s a lot of SQL for only little analysis capability, and somehow, it just doesn’t feel entirely right to query the same table four times with all these subselects!

FIRST_VALUE() and LAST_VALUE()

This is where window functions come into play, and in this particular case, FIRST_VALUE() or LAST_VALUE(). For now, let’s focus on calculating the maximum GDP per capita from the data set:

SELECT
  countries.*,
  FIRST_VALUE (code)           OVER (w_gdp) AS max_gdp_code,
  FIRST_VALUE (year)           OVER (w_gdp) AS max_gdp_year,
  FIRST_VALUE (gdp_per_capita) OVER (w_gdp) AS max_gdp_gdp,
  FIRST_VALUE (govt_debt)      OVER (w_gdp) AS max_gdp_debt
FROM
  countries
WINDOW
  w_gdp  AS (ORDER BY gdp_per_capita DESC)
ORDER BY
  code, year

Notice how we make use of the SQL standard WINDOW clause, which is only currently supported by PostgreSQL and Sybase SQL Anywhere.

If you’re using Oracle or any other commercial database, you can simply substitute the window reference w_gdp into the various OVER() clauses to achieve equivalent behaviour – or you can use jOOQ’s WINDOW clause support and let jOOQ do the same for you.

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

The above query will not produce any aggregates, but it will add the values for the country / year with the highest GDP per capita to every row in the table:

each country             highest per year
-----------------------------------------------
CA 2009 40764.00 51.30   CA 2012 52409.00 53.50
CA 2010 47465.00 51.40   CA 2012 52409.00 53.50
CA 2011 51791.00 52.50   CA 2012 52409.00 53.50
CA 2012 52409.00 53.50   CA 2012 52409.00 53.50

This is extremely interesting because the data is not yet aggregated – the original data set remains unchanged, enriched with new computed columns.

You can then further process things, e.g. compare each country / year with the highest GDP per capita and with the highest debt per GDP of that country / year:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp) , '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt      / FIRST_VALUE (govt_debt)      OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY year ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY year ORDER BY govt_debt DESC)
ORDER BY
  code, year

Notice how I’ve added PARTITION BY to the window definitions of the WINDOW clause. I’ve done this because I want to partition the data set by year, in order to find the highest GDP / debt values for each year, not for the whole data set.

The outcome of the above query can then be seen here:

country                   percentages
------------------------------------------
CA   2009  40764   51.3    86.73%   30.76%
CA   2010  47465   51.4    98.15%   29.41%
CA   2011  51791   52.5   100.00%   27.70%
CA   2012  52409   53.5   100.00%   27.23%
DE   2009  40270   47.6    85.68%   28.54%
DE   2010  40408   55.5    83.56%   31.75%
DE   2011  44355   55.1    85.64%   29.08%
DE   2012  42598   56.9    81.28%   28.96%
FR   2009  40488   85.0    86.15%   50.96%
FR   2010  39448   89.2    81.57%   51.03%
FR   2011  42578   93.2    82.21%   49.18%
FR   2012  39759  103.8    75.86%   52.82%
GB   2009  35455  121.3    75.44%   72.72%
GB   2010  36573   85.2    75.63%   48.74%
GB   2011  38927   99.6    75.16%   52.56%
GB   2012  38649  103.2    73.74%   52.52%
IT   2009  35724  121.3    76.01%   72.72%
IT   2010  34673  119.9    71.70%   68.59%
IT   2011  36988  113.0    71.42%   59.63%
IT   2012  33814  131.1    64.52%   66.72%
JP   2009  39473  166.8    83.99%  100.00%
JP   2010  43118  174.8    89.16%  100.00%
JP   2011  46204  189.5    89.21%  100.00%
JP   2012  46548  196.5    88.82%  100.00%
RU   2009  8616     8.7    18.33%    5.22%
RU   2010  10710    9.1    22.15%    5.21%
RU   2011  13324    9.3    25.73%    4.91%
RU   2012  14091    9.4    26.89%    4.78%
US   2009  46999   76.3   100.00%   45.74%
US   2010  48358   85.6   100.00%   48.97%
US   2011  49855   90.1    96.26%   47.55%
US   2012  51755   93.8    98.75%   47.74%

We could say that among the G8 countries, Canada has really improved the most in the last years, decreasing their debt compared to the GDP on a global comparison, while at the same time increasing their GDP per capita on a global comparison.

Instead of partitioning the data set by year, we could also partition it by country, and find the best / worst year for each country over the years:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp), '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt / FIRST_VALUE (govt_debt) OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY code ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY code ORDER BY govt_debt DESC)
ORDER BY
  code, year

The result would now look quite different:

country                    percentages
------------------------------------------
CA   2009  40764   51.3    77.78%   95.89%
CA   2010  47465   51.4    90.57%   96.07%
CA   2011  51791   52.5    98.82%   98.13%
CA   2012  52409   53.5   100.00%  100.00%
DE   2009  40270   47.6    90.79%   83.66%
DE   2010  40408   55.5    91.10%   97.54%
DE   2011  44355   55.1   100.00%   96.84%
DE   2012  42598   56.9    96.04%  100.00%
FR   2009  40488   85.0    95.09%   81.89%
FR   2010  39448   89.2    92.65%   85.93%
FR   2011  42578   93.2   100.00%   89.79%
FR   2012  39759  103.8    93.38%  100.00%
GB   2009  35455  121.3    91.08%  100.00%
GB   2010  36573   85.2    93.95%   70.24%
GB   2011  38927   99.6   100.00%   82.11%
GB   2012  38649  103.2    99.29%   85.08%
IT   2009  35724  121.3    96.58%   92.52%
IT   2010  34673  119.9    93.74%   91.46%
IT   2011  36988  113.0   100.00%   86.19%
IT   2012  33814  131.1    91.42%  100.00%
JP   2009  39473  166.8    84.80%   84.89%
JP   2010  43118  174.8    92.63%   88.96%
JP   2011  46204  189.5    99.26%   96.44%
JP   2012  46548  196.5   100.00%  100.00%
RU   2009   8616    8.7    61.15%   92.55%
RU   2010  10710    9.1    76.01%   96.81%
RU   2011  13324    9.3    94.56%   98.94%
RU   2012  14091    9.4   100.00%  100.00%
US   2009  46999   76.3    90.81%   81.34%
US   2010  48358   85.6    93.44%   91.26%
US   2011  49855   90.1    96.33%   96.06%
US   2012  51755   93.8   100.00%  100.00%

As you can see, most countries have now generally performed better in terms of GDP per capita over the years, and also most countries have almost strictly increased their own debt per GDP (except for Germany, France and Italy), except for the (United Kingdom). Russia and Canada have seen the most growth.

In the above examples, we’ve been mainly using FIRST_VALUE(). LAST_VALUE() is almost the opposite function with respect to ordering, much like MAX() is the opposite function of MIN(). I’m saying almost because there is a caveat when using LAST_VALUE() with ORDER BY, because a window definition that uses ORDER BY is implicitly equivalent to a window definition that uses ORDER BY with a so-called “frame clause”:

-- Find the "last" year over the complete data set
-- This may not behave as expected, so always provide
-- an explicit ORDER BY clause
LAST_VALUE (year) OVER()

-- These two are implicitly equivalent. We're not
-- looking for the "last" year in the complete data
-- set, but only in the frame that is "before" the
-- current row. In other words, the current row is
-- always the "last value"!
LAST_VALUE (year) OVER(ORDER BY year)
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND CURRENT ROW
)

-- Find the "last" year in the complete data set with
-- explicit ordering
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND UNBOUNDED FOLLOWING
)

LEAD() and LAG()

The previous functions were about comparing values with the maximum / minimum (FIRST_VALUE() and LAST_VALUE()) within a data set. But using window functions, you can also compare things with the next / previous value. Or with the second next / second previous, etc. The functions used for this are called LEAD() (for the next value) and LAG() (for the previous value).

This is best explained by example:

-- Use this view as a data source containing
-- all the distinct years: 2009-2012
WITH years AS (
  SELECT DISTINCT year
  FROM countries
)
SELECT
  FIRST_VALUE (year)    OVER w_year AS first,
  LEAD        (year, 2) OVER w_year AS lead2,
  LEAD        (year)    OVER w_year AS lead1,
  year,
  LAG         (year)    OVER w_year AS lag1,
  LAG         (year, 2) OVER w_year AS lag2,
  LAST_VALUE  (year)    OVER w_year AS last
FROM
  years
WINDOW
  w_year AS (
    ORDER BY year DESC
    ROWS BETWEEN UNBOUNDED PRECEDING
             AND UNBOUNDED FOLLOWING
  )
ORDER BY year

The result is now simply:

first  lead2  lead1  year   lag1   lag2   last
----------------------------------------------
2012                 2009   2010   2011   2009
2012          2009   2010   2011   2012   2009
2012   2009   2010   2011   2012          2009
2012   2010   2011   2012                 2009

LEAD() and LAG() are really the best window functions to help understand the whole concept of window functions. For each year, you can see immediately how the previous and next year in the same window and frame can be generated using very simple function calls.

This could be used, for instance, to find the “neighboring” countries in terms of GDP per capita for every country / year:

SELECT
  year,
  code,
  gdp_per_capita,
  LEAD (code)           OVER w_gdp AS runner_up_code,
  LEAD (gdp_per_capita) OVER w_gdp AS runner_up_gdp,
  LAG  (code)           OVER w_gdp AS leader_code,
  LAG  (gdp_per_capita) OVER w_gdp AS leader_gdp
FROM
  countries
WINDOW
  w_gdp AS (PARTITION BY year ORDER BY gdp_per_capita DESC)
ORDER BY year DESC, gdp_per_capita DESC

Which returns:

year   country      runner-up    leader
------------------------------------------
2012   CA  52409    US  51755
2012   US  51755    JP  46548    CA  52409
2012   JP  46548    DE  42598    US  51755
2012   DE  42598    FR  39759    JP  46548
2012   FR  39759    GB  38649    DE  42598
2012   GB  38649    IT  33814    FR  39759
2012   IT  33814    RU  14091    GB  38649
2012   RU  14091                 IT  33814

2011   CA  51791    US  49855
2011   US  49855    JP  46204    CA  51791
2011   JP  46204    DE  44355    US  49855
2011   DE  44355    FR  42578    JP  46204
2011   FR  42578    GB  38927    DE  44355
2011   GB  38927    IT  36988    FR  42578
2011   IT  36988    RU  13324    GB  38927
2011   RU  13324                 IT  36988

2010   US  48358    CA  47465
2010   CA  47465    JP  43118    US  48358
2010   JP  43118    DE  40408    CA  47465
2010   DE  40408    FR  39448    JP  43118
2010   FR  39448    GB  36573    DE  40408
2010   GB  36573    IT  34673    FR  39448
2010   IT  34673    RU  10710    GB  36573
2010   RU  10710                 IT  34673

2009   US  46999    CA  40764
2009   CA  40764    FR  40488    US  46999
2009   FR  40488    DE  40270    CA  40764
2009   DE  40270    JP  39473    FR  40488
2009   JP  39473    IT  35724    DE  40270
2009   IT  35724    GB  35455    JP  39473
2009   GB  35455    RU   8616    IT  35724
2009   RU   8616                 GB  35455

If you want to do more fancy analyses, you could now compare percentages between leaders and runner-ups, etc. Another great use-case for LEAD() and LAG() can be seen in this article.

Conclusion

Window functions are an incredibly powerful feature that is available from all major commercial databases, and also from a couple of Open Source databases like PostgreSQL, Firebird, and CUBRID. There has essentially been SQL before window functions, and SQL after window functions.

With jOOQ, you can leverage window functions on a type safe level like anything else related to SQL. The last query we’ve seen can be written simply like this:

// Static import the generated tables and all
// of jOOQ's functions from DSL
import static org.jooq.example.db.postgres.Tables.*;
import static org.jooq.impl.DSL.*;

// Shorten the table reference by aliasing
Countries c = COUNTRIES;

// Specifiy a window definition
WindowDefinition w_gdp = 
  name("w_gdp").as(
    partitionBy(c.YEAR)
   .orderBy(c.GDP_PER_CAPITA.desc()
  )
);

// Write the query as if it were native SQL
System.out.println(
    DSL.using(conn)
       .select(
           c.YEAR,
           c.CODE,
           c.GDP_PER_CAPITA,
           lead(c.CODE)          .over(w_gdp).as("runner_up_code"),
           lead(c.GDP_PER_CAPITA).over(w_gdp).as("runner_up_gdp"),
           lag (c.CODE)          .over(w_gdp).as("leader_code"),
           lag (c.GDP_PER_CAPITA).over(w_gdp).as("leader_gdp")
       )
       .from(c)
       .window(w_gdp)
       .orderBy(c.YEAR.desc(), c.GDP_PER_CAPITA.desc())
       .fetch()
);

The above program will output

+----+----+--------------+--------------+-------------+-----------+----------+
|year|code|gdp_per_capita|runner_up_code|runner_up_gdp|leader_code|leader_gdp|
+----+----+--------------+--------------+-------------+-----------+----------+
|2012|CA  |      52409.00|US            |     51755.00|{null}     |    {null}|
|2012|US  |      51755.00|JP            |     46548.00|CA         |  52409.00|
|2012|JP  |      46548.00|DE            |     42598.00|US         |  51755.00|
|2012|DE  |      42598.00|FR            |     39759.00|JP         |  46548.00|
|2012|FR  |      39759.00|GB            |     38649.00|DE         |  42598.00|
|2012|GB  |      38649.00|IT            |     33814.00|FR         |  39759.00|
|2012|IT  |      33814.00|RU            |     14091.00|GB         |  38649.00|
|2012|RU  |      14091.00|{null}        |       {null}|IT         |  33814.00|
|2011|CA  |      51791.00|US            |     49855.00|{null}     |    {null}|
|2011|US  |      49855.00|JP            |     46204.00|CA         |  51791.00|
|2011|JP  |      46204.00|DE            |     44355.00|US         |  49855.00|
|2011|DE  |      44355.00|FR            |     42578.00|JP         |  46204.00|
|2011|FR  |      42578.00|GB            |     38927.00|DE         |  44355.00|
|2011|GB  |      38927.00|IT            |     36988.00|FR         |  42578.00|
|2011|IT  |      36988.00|RU            |     13324.00|GB         |  38927.00|
|2011|RU  |      13324.00|{null}        |       {null}|IT         |  36988.00|
|2010|US  |      48358.00|CA            |     47465.00|{null}     |    {null}|
|2010|CA  |      47465.00|JP            |     43118.00|US         |  48358.00|
|2010|JP  |      43118.00|DE            |     40408.00|CA         |  47465.00|
|2010|DE  |      40408.00|FR            |     39448.00|JP         |  43118.00|
|2010|FR  |      39448.00|GB            |     36573.00|DE         |  40408.00|
|2010|GB  |      36573.00|IT            |     34673.00|FR         |  39448.00|
|2010|IT  |      34673.00|RU            |     10710.00|GB         |  36573.00|
|2010|RU  |      10710.00|{null}        |       {null}|IT         |  34673.00|
|2009|US  |      46999.00|CA            |     40764.00|{null}     |    {null}|
|2009|CA  |      40764.00|FR            |     40488.00|US         |  46999.00|
|2009|FR  |      40488.00|DE            |     40270.00|CA         |  40764.00|
|2009|DE  |      40270.00|JP            |     39473.00|FR         |  40488.00|
|2009|JP  |      39473.00|IT            |     35724.00|DE         |  40270.00|
|2009|IT  |      35724.00|GB            |     35455.00|JP         |  39473.00|
|2009|GB  |      35455.00|RU            |      8616.00|IT         |  35724.00|
|2009|RU  |       8616.00|{null}        |       {null}|GB         |  35455.00|
+----+----+--------------+--------------+-------------+-----------+----------+

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

No matter whether you’re using jOOQ for your database integration, or just plain SQL – start using window functions today.

Liked this article?

Read more about how ROW_NUMBER(), RANK(), and DENSE_RANK() work.

jOOQ Newsletter: April 16, 2014 – Monthly, Yearly, Perpetual licenses now available

Subscribe to this newsletter here

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ to the world. Here are:

Mahmud who cannot wait to make more magic with jOOQ.

https://twitter.com/bigthingist/status/455985890125287424

Peter Kopfler who, after hearing about jOOQ and SQL in Vienna is thrilled to take a deep dive into the awesome features of PostgreSQL

Thanks for the shouts, guys!

New license models – now available

We’ve done all the legal work and we’re happy to announce that we’re now ready to offer you a new set of alternative licensing options! For each of the jOOQ Express, jOOQ Professional, and jOOQ Enterprise licenses, you may now purchase any of the following subscriptions:

  • A new monthly subscription for short-running tasks, such as DB migrations
  • The existing yearly subscription for default use-cases
  • A new major release perpetual license for long-running jOOQ 3.x integrations with little need for upgrades

We would like to thank our customers who have been giving us great feedback on our licensing model, and to those of you who have been eagerly waiting for the perpetual license.

Can’t wait? Download your copy of jOOQ now

Are you an existing customer of the jOOQ yearly subscription interested in a switch to other terms? We’ll offer you a 50% refund discount on your existing yearly subscription, should you choose to switch to the perpetual license by the end of April.

Contact sales for a tailor-made license migration discount.

Internet Explorer 8 support on our website

No one loves the old Internet Explorer versions, agreed, but that is not a reason not to support them. We’ve finally re-worked our manual and the rest of our website to also support Internet Explorer 8. Jumping on the HTML5 train was done prematurely, which is why many of our customers in the banking sector who cannot upgrade, or use Firefox, had to go through hassles to read the jOOQ manual.

We would like to apologise for all the inconvience this has introduced to some of you! If you encounter any issues with our website, please drop us a note, and we’ll fix it immediately.

Community Zone – Another great article by Petri Kainulainen

It’s hard to believe, but Petri Kainulainen (author of a variety of books and tutorials on Spring) has done it again! And he did it even better than before. We’re very proud to present to you part 3 of his great jOOQ / Spring tutorial. This time:

CRUD is a very important part of your application, and getting it right is essential to save time and money on your development efforts. jOOQ implements an ActiveRecord-like pattern, similar to Ruby’s ActiveRecords. In his article, Petri shows how to tie these ActiveRecords to Spring’s Repository pattern. Convince yourselves! And while you’re at it, don’t miss Petri’s other two tutorials:

SQL Zone – Window Functions – A Must-Have Tool

There is SQL before window functions and SQL after window functions. If you’re fortunate enough to use a commercial database, or PostgreSQL, then you get to enjoy the merits of one of the greatest SQL features that have ever been standardised (into SQL:2003).

We often blog about window functions, and when we go to conferences to talk about jOOQ or about SQL, window functions are all over our slides.

CUME_DIST()

In this blog post, we show you the great CUME_DIST() function, which is essentially the same as the ROW_NUMBER() divided by the amount of rows. So, if you ever need to indicate the position of your row within the whole result set as a percentage, CUME_DIST() is your weapon of choice.

LEAD() and LAG()

Just yesterday, we were able to solve a very fun data problem for our friends from FanPictor, a neighbouring startup from our offices. In an Excel export of their stadium data (see above), they wanted to group blocks of similar colours and create delimiters at the beginning and at the end of each block. Essentially, they wanted to create instructions like “The next five seats are red, the next 2 seats are white, the next 10 seats are red”. This can be done very easily using the awesome LEAD() and LAG() functions.

Upcoming Events

After a great JUG Saxony Day in Dresden and an awesome Java/Scala/jOOQ/SQL talk at VSUG in Vienna, we’re looking forward to a couple of great conferences in May / June.

Have you missed any of our previous jOOQ talks? Soon you’ll get another chance to hear us talk about jOOQ or SQL in general in any of these upcoming events:

Stay informed about 2014 events on www.jooq.org/news.