Here are 10 SQL Tricks That You Didn’t Think Were Possible:
Introduction
In order to understand the value of these 10 SQL tricks, it is first important to understand the context of the SQL language. Why do I talk about SQL at Java conferences? (and I’m usually the only one!) This is why:
- Establish a formal foreign key relationship between the tables (this tells the database that every address is guaranteed to have a corresponding user)
- Add an index on the search field: The country (this tells the database that specific countries can be found in
O(log N)
instead ofO(N)
)
1. Everything is a Table
This is the most trivial of tricks, and not even really a trick, but it is fundamental to a thorough understanding of SQL: Everything is a table! When you see a SQL statement like this:
SELECT *
FROM person
person
sitting right there in the FROM
clause. That’s cool, that is a table. But did you realise that the whole statement is also a table? For instance, you can write:
SELECT *
FROM (
SELECT *
FROM person
) t
SELECT
statement in a FROM
clause.
That’s trivial, but if you think of it, quite elegant. You can also create ad-hoc, in-memory tables with the VALUES()
constructor as such, in some databases (e.g. PostgreSQL, SQL Server):
SELECT *
FROM (
VALUES(1),(2),(3)
) t(a)
a --- 1 2 3If that clause is not supported, you can revert to derived tables, e.g. in Oracle:
SELECT *
FROM (
SELECT 1 AS a FROM DUAL UNION ALL
SELECT 2 AS a FROM DUAL UNION ALL
SELECT 3 AS a FROM DUAL
) t
VALUES()
and derived tables are really the same thing, conceptually, let’s review the INSERT
statement, which comes in two flavours:
-- SQL Server, PostgreSQL, some others:
INSERT INTO my_table(a)
VALUES(1),(2),(3);
-- Oracle, many others:
INSERT INTO my_table(a)
SELECT 1 AS a FROM DUAL UNION ALL
SELECT 2 AS a FROM DUAL UNION ALL
SELECT 3 AS a FROM DUAL
INSERT
really does.
Everything is a table. In PostgreSQL, even functions are tables:
SELECT *
FROM substring('abcde', 2, 3)
substring --------- bcdIf you’re programming in Java, you can use the analogy of the Java 8
Stream
API to take this one step further. Consider the following equivalent concepts:
TABLE : Stream<Tuple<..>> SELECT : map() DISTINCT : distinct() JOIN : flatMap() WHERE / HAVING : filter() GROUP BY : collect() ORDER BY : sorted() UNION ALL : concat()With Java 8, “everything is a Stream” (as soon as you start working with Streams, at least). No matter how you transform a stream, e.g. with
map()
or filter()
, the resulting type is always a Stream again.
We’ve written an entire article to explain this more deeply, and to compare the Stream API with SQL:
Common SQL Clauses and Their Equivalents in Java 8 StreamsAnd if you’re looking for “better streams” (i.e. streams with even more SQL semantics), do check out jOOλ, an open source library that brings SQL window functions to Java.
2. Data Generation with Recursive SQL
Common Table Expressions (also: CTE, also referred to as subquery factoring, e.g. in Oracle) are the only way to declare variables in SQL (apart from the obscureWINDOW
clause that only PostgreSQL and Sybase SQL Anywhere know).
This is a powerful concept. Extremely powerful. Consider the following statement:
-- Table variables
WITH
t1(v1, v2) AS (SELECT 1, 2),
t2(w1, w2) AS (
SELECT v1 * 2, v2 * 2
FROM t1
)
SELECT *
FROM t1, t2
v1 v2 w1 w2 ----------------- 1 2 2 4Using the simple
WITH
clause, you can specify a list of table variables (remember: everything is a table), which may even depend on each other.
That is easy to understand. This makes CTE (Common Table Expressions) already very useful, but what’s really really awesome is that they’re allowed to be recursive! Consider the following PostgreSQL example:
WITH RECURSIVE t(v) AS (
SELECT 1 -- Seed Row
UNION ALL
SELECT v + 1 -- Recursion
FROM t
)
SELECT v
FROM t
LIMIT 5
v --- 1 2 3 4 5How does it work? It’s relatively easy, once you see through the many keywords. You define a common table expression that has exactly two
UNION ALL
subqueries.
The first UNION ALL
subquery is what I usually call the “seed row”. It “seeds” (initialises) the recursion. It can produce one or several rows on which we will recurse afterwards. Remember: everything is a table, so our recursion will happen on a whole table, not on an individual row/value.
The second UNION ALL
subquery is where the recursion happens. If you look closely, you will observe that it selects from t
. I.e. the second subquery is allowed to select from the very CTE that we’re about to declare. Recursively. It thus has also access to the column v
, which is being declared by the CTE that already uses it.
In our example, we seed the recursion with the row (1)
, and then recurse by adding v + 1
. The recursion is then stopped at the use-site by setting a LIMIT 5
(beware of potentially infinite recursions – just like with Java 8 Streams).
Side note: Turing completeness
Recursive CTE make SQL:1999 turing complete, which means that any program can be written in SQL! (if you’re crazy enough)
One impressive example that frequently shows up on blogs: The Mandelbrot Set, e.g. as displayed on http://explainextended.com/2013/12/31/happy-new-year-5/
WITH RECURSIVE q(r, i, rx, ix, g) AS (
SELECT r::DOUBLE PRECISION * 0.02, i::DOUBLE PRECISION * 0.02,
.0::DOUBLE PRECISION , .0::DOUBLE PRECISION, 0
FROM generate_series(-60, 20) r, generate_series(-50, 50) i
UNION ALL
SELECT r, i, CASE WHEN abs(rx * rx + ix * ix) <= 2 THEN rx * rx - ix * ix END + r,
CASE WHEN abs(rx * rx + ix * ix) <= 2 THEN 2 * rx * ix END + i, g + 1
FROM q
WHERE rx IS NOT NULL AND g < 99
)
SELECT array_to_string(array_agg(s ORDER BY r), '')
FROM (
SELECT i, r, substring(' .:-=+*#%@', max(g) / 10 + 1, 1) s
FROM q
GROUP BY i, r
) q
GROUP BY i
ORDER BY i
.-.:-.......==..*.=.::-@@@@@:::.:.@..*-. =. ...=...=...::+%.@:@@@@@@@@@@@@@+*#=.=:+-. ..- .:.:=::*....@@@@@@@@@@@@@@@@@@@@@@@@=@@.....::...:. ...*@@@@=.@:@@@@@@@@@@@@@@@@@@@@@@@@@@=.=....:...::. .::@@@@@:-@@@@@@@@@@@@@@@@@@@@@@@@@@@@:@..-:@=*:::. .-@@@@@-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.=@@@@=..: ...@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:@@@@@:.. ....:-*@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:: .....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@-.. .....@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@-:... .--:+.@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@... .==@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@-.. ..+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@-#. ...=+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.. -.=-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@..: .*%:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:@- . ..:... ..-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ .............. ....-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@%@= .--.-.....-=.:..........::@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.. ..=:-....=@+..=.........@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:. .:+@@::@==@-*:%:+.......:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@. ::@@@-@@@@@@@@@-:=.....:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@: .:@@@@@@@@@@@@@@@=:.....%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ .:@@@@@@@@@@@@@@@@@-...:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:- :@@@@@@@@@@@@@@@@@@@-..%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@. %@@@@@@@@@@@@@@@@@@@-..-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@. @@@@@@@@@@@@@@@@@@@@@::+@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@+ @@@@@@@@@@@@@@@@@@@@@@:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.. @@@@@@@@@@@@@@@@@@@@@@-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@- @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@.Impressive, huh?
3. Running Total Calculations
This blog is full of running total examples. They’re some of the most educational examples to learn about advanced SQL, because there are at least a dozen of ways how to implement a running total. A running total is easy to understand, conceptually.
Window functions are aggregations / rankings on a subset of rows relative to the current row being transformed by SELECTThat’s it. :) What it essentially means is that a window function can perform calculations on rows that are “above” or “below” the current row. Unlike ordinary aggregations and
GROUP BY
, however, they don’t transform the rows, which makes them very useful.
The syntax can be summarised as follows, with individual parts being optional
function(...) OVER (
PARTITION BY ...
ORDER BY ...
ROWS BETWEEN ... AND ...
)
OVER()
clause, which specifies the window. I.e. this OVER()
clause defines:
- The
PARTITION
: Only rows that are in the same partition as the current row will be considered for the window - The
ORDER
: The window can be ordered independently of what we’re selecting - The
ROWS
(orRANGE
) frame definition: The window can be restricted to a fixed amount of rows “ahead” and “behind”
| ID | VALUE_DATE | AMOUNT | BALANCE | |------|------------|--------|------------| | 9997 | 2014-03-18 | 99.17 | 19985.81 | | 9981 | 2014-03-16 | 71.44 | 19886.64 | | 9979 | 2014-03-16 | -94.60 | 19815.20 | | 9977 | 2014-03-16 | -6.96 | 19909.80 | | 9971 | 2014-03-15 | -65.95 | 19916.76 |Let’s assume that
BALANCE
is what we want to calculate from AMOUNT
Intuitively, we can immediately see that the following holds true:

TOP_BALANCE - SUM(AMOUNT) OVER ( "all the rows on top of the current row" )In real SQL, that would then be written as follows:
SUM(t.amount) OVER (
PARTITION BY t.account_id
ORDER BY t.value_date DESC,
t.id DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING
)
- The partition will calculate the sum for each bank account, not for the entire data set
- The ordering will guarantee that transactions are ordered (within the partition) prior to summing
- The rows clause will consider only preceding rows (within the partition, given the ordering) prior to summing
FROM .. WHERE
etc. clauses, and is thus extremely fast.
Intermezzo
Before we move on to all the other awesome tricks, consider this: We’ve seen- (Recursive) Common Table Expressions (CTE)
- Window functions
- Awesome
- Exremely powerful
- Declarative
- Part of the SQL standard
- Available in most popular RDBMS
- Very important building blocks

4. Finding the Largest Series with no Gaps
Stack Overflow has this very nice feature to motivate people to stay on their website for as long as possible. Badges:
| LOGIN_TIME | |---------------------| | 2014-03-18 05:37:13 | | 2014-03-16 08:31:47 | | 2014-03-16 06:11:17 | | 2014-03-16 05:59:33 | | 2014-03-15 11:17:28 | | 2014-03-15 10:00:11 | | 2014-03-15 07:45:27 | | 2014-03-15 07:42:19 | | 2014-03-14 09:38:12 |That doesn’t help much. Let’s remove the hours from the timestamp. That’s easy:
SELECT DISTINCT
cast(login_time AS DATE) AS login_date
FROM logins
WHERE user_id = :user_id
| LOGIN_DATE | |------------| | 2014-03-18 | | 2014-03-16 | | 2014-03-15 | | 2014-03-14 |Now, that we’ve learned about window functions, let’s just add a simple row number to each of these dates:
SELECT
login_date,
row_number() OVER (ORDER BY login_date)
FROM login_dates
| LOGIN_DATE | RN | |------------|----| | 2014-03-18 | 4 | | 2014-03-16 | 3 | | 2014-03-15 | 2 | | 2014-03-14 | 1 |Still easy. Now, what happens, if instead of selecting these values separately, we subtract them?
SELECT
login_date -
row_number() OVER (ORDER BY login_date)
FROM login_dates
| LOGIN_DATE | RN | GRP | |------------|----|------------| | 2014-03-18 | 4 | 2014-03-14 | | 2014-03-16 | 3 | 2014-03-13 | | 2014-03-15 | 2 | 2014-03-13 | | 2014-03-14 | 1 | 2014-03-13 |Wow. Interesting. So,
14 - 1 = 13
, 15 - 2 = 13
, 16 - 3 = 13
, but 18 - 4 = 14
. No one can say it better than Doge:

- ROW_NUMBER() never has gaps. That’s how it’s defined
- Our data, however, does
GROUP BY
this arbitrary date value:
SELECT
min(login_date), max(login_date),
max(login_date) -
min(login_date) + 1 AS length
FROM login_date_groups
GROUP BY grp
ORDER BY length DESC
| MIN | MAX | LENGTH | |------------|------------|--------| | 2014-03-14 | 2014-03-16 | 3 | | 2014-03-18 | 2014-03-18 | 1 |With the full query being:
WITH
login_dates AS (
SELECT DISTINCT cast(login_time AS DATE) login_date
FROM logins WHERE user_id = :user_id
),
login_date_groups AS (
SELECT
login_date,
login_date - row_number() OVER (ORDER BY login_date) AS grp
FROM login_dates
)
SELECT
min(login_date), max(login_date),
max(login_date) - min(login_date) + 1 AS length
FROM login_date_groups
GROUP BY grp
ORDER BY length DESC

5. Finding the Length of a Series
Previously, we had seen series of consecutive values. That’s easy to deal with as we can abuse of the consecutiveness of integers. What if the definition of a “series” is less intuitive, and in addition to that, several series contain the same values? Consider the following data, whereLENGTH
is the length of each series that we want to calculate:
| ID | VALUE_DATE | AMOUNT | LENGTH | |------|------------|--------|------------| | 9997 | 2014-03-18 | 99.17 | 2 | | 9981 | 2014-03-16 | 71.44 | 2 | | 9979 | 2014-03-16 | -94.60 | 3 | | 9977 | 2014-03-16 | -6.96 | 3 | | 9971 | 2014-03-15 | -65.95 | 3 | | 9964 | 2014-03-15 | 15.13 | 2 | | 9962 | 2014-03-15 | 17.47 | 2 | | 9960 | 2014-03-15 | -3.55 | 1 | | 9959 | 2014-03-14 | 32.00 | 1 |Yes, you’ve guessed right. A “series” is defined by the fact that consecutive (ordered by ID) rows have the same SIGN(AMOUNT). Check again the data formatted as below:
| ID | VALUE_DATE | AMOUNT | LENGTH | |------|------------|--------|------------| | 9997 | 2014-03-18 | +99.17 | 2 | | 9981 | 2014-03-16 | +71.44 | 2 | | 9979 | 2014-03-16 | -94.60 | 3 | | 9977 | 2014-03-16 | - 6.96 | 3 | | 9971 | 2014-03-15 | -65.95 | 3 | | 9964 | 2014-03-15 | +15.13 | 2 | | 9962 | 2014-03-15 | +17.47 | 2 | | 9960 | 2014-03-15 | - 3.55 | 1 | | 9959 | 2014-03-14 | +32.00 | 1 |How do we do it? “Easy” ;) First, let’s get rid of all the noise, and add another row number:
SELECT
id, amount,
sign(amount) AS sign,
row_number()
OVER (ORDER BY id DESC) AS rn
FROM trx
| ID | AMOUNT | SIGN | RN | |------|--------|------|----| | 9997 | 99.17 | 1 | 1 | | 9981 | 71.44 | 1 | 2 | | 9979 | -94.60 | -1 | 3 | | 9977 | -6.96 | -1 | 4 | | 9971 | -65.95 | -1 | 5 | | 9964 | 15.13 | 1 | 6 | | 9962 | 17.47 | 1 | 7 | | 9960 | -3.55 | -1 | 8 | | 9959 | 32.00 | 1 | 9 |Now, the next goal is to produce the following table:
| ID | AMOUNT | SIGN | RN | LO | HI | |------|--------|------|----|----|----| | 9997 | 99.17 | 1 | 1 | 1 | | | 9981 | 71.44 | 1 | 2 | | 2 | | 9979 | -94.60 | -1 | 3 | 3 | | | 9977 | -6.96 | -1 | 4 | | | | 9971 | -65.95 | -1 | 5 | | 5 | | 9964 | 15.13 | 1 | 6 | 6 | | | 9962 | 17.47 | 1 | 7 | | 7 | | 9960 | -3.55 | -1 | 8 | 8 | 8 | | 9959 | 32.00 | 1 | 9 | 9 | 9 |In this table, we want to copy the row number value into “LO” at the “lower” end of a series, and into “HI” at the “upper” end of a series. For this we’ll be using the magical LEAD() and LAG(). LEAD() can access the n-th next row from the current row, whereas LAG() can access the n-th previous row from the current row. For example:
SELECT
lag(v) OVER (ORDER BY v),
v,
lead(v) OVER (ORDER BY v)
FROM (
VALUES (1), (2), (3), (4)
) t(v)

SELECT
trx.*,
CASE WHEN lag(sign)
OVER (ORDER BY id DESC) != sign
THEN rn END AS lo,
CASE WHEN lead(sign)
OVER (ORDER BY id DESC) != sign
THEN rn END AS hi,
FROM trx
lag(sign)
) with the “current” sign (sign
). If they’re different, we put the row number in “LO”, because that’s the lower bound of our series.
Then we compare the “next” sign (lead(sign)
) with the “current” sign (sign
). If they’re different, we put the row number in “HI”, because that’s the upper bound of our series.
Finally, a little boring NULL handling to get everything right, and we’re done:
SELECT -- With NULL handling...
trx.*,
CASE WHEN coalesce(lag(sign)
OVER (ORDER BY id DESC), 0) != sign
THEN rn END AS lo,
CASE WHEN coalesce(lead(sign)
OVER (ORDER BY id DESC), 0) != sign
THEN rn END AS hi,
FROM trx
| ID | AMOUNT | SIGN | RN | LO | HI | |------|--------|------|----|----|----| | 9997 | 99.17 | 1 | 1 | 1 | 2 | | 9981 | 71.44 | 1 | 2 | 1 | 2 | | 9979 | -94.60 | -1 | 3 | 3 | 5 | | 9977 | -6.96 | -1 | 4 | 3 | 5 | | 9971 | -65.95 | -1 | 5 | 3 | 5 | | 9964 | 15.13 | 1 | 6 | 6 | 7 | | 9962 | 17.47 | 1 | 7 | 6 | 7 | | 9960 | -3.55 | -1 | 8 | 8 | 8 | | 9959 | 32.00 | 1 | 9 | 9 | 9 |We’re using a feature that is available at least in Redshift, Sybase SQL Anywhere, DB2, Oracle. We’re using the “IGNORE NULLS” clause that can be passed to some window functions:
SELECT
trx.*,
last_value (lo) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS lo,
first_value(hi) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS hi
FROM trx
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), but ignoring all the nulls. From those previous values, we take the last value, and that’s our new “LO” value. In other words, we take the “closest preceding” “LO” value.
The same with “HI”. From any given “current” row, we look at all the “subsequent values” (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
), but ignoring all the nulls. From the subsequent values, we take the first value, and that’s our new “HI” value. In other words, we take the “closest following” “HI” value.
Explained in Powerpoint:

SELECT -- With NULL handling...
trx.*,
coalesce(last_value (lo) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW), rn) AS lo,
coalesce(first_value(hi) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING), rn) AS hi
FROM trx
SELECT
trx.*,
1 + hi - lo AS length
FROM trx
| ID | AMOUNT | SIGN | RN | LO | HI | LENGTH| |------|--------|------|----|----|----|-------| | 9997 | 99.17 | 1 | 1 | 1 | 2 | 2 | | 9981 | 71.44 | 1 | 2 | 1 | 2 | 2 | | 9979 | -94.60 | -1 | 3 | 3 | 5 | 3 | | 9977 | -6.96 | -1 | 4 | 3 | 5 | 3 | | 9971 | -65.95 | -1 | 5 | 3 | 5 | 3 | | 9964 | 15.13 | 1 | 6 | 6 | 7 | 2 | | 9962 | 17.47 | 1 | 7 | 6 | 7 | 2 | | 9960 | -3.55 | -1 | 8 | 8 | 8 | 1 | | 9959 | 32.00 | 1 | 9 | 9 | 9 | 1 |And the full query here:
WITH
trx1(id, amount, sign, rn) AS (
SELECT id, amount, sign(amount), row_number() OVER (ORDER BY id DESC)
FROM trx
),
trx2(id, amount, sign, rn, lo, hi) AS (
SELECT trx1.*,
CASE WHEN coalesce(lag(sign) OVER (ORDER BY id DESC), 0) != sign
THEN rn END,
CASE WHEN coalesce(lead(sign) OVER (ORDER BY id DESC), 0) != sign
THEN rn END
FROM trx1
)
SELECT
trx2.*, 1
- last_value (lo) IGNORE NULLS OVER (ORDER BY id DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
+ first_value(hi) IGNORE NULLS OVER (ORDER BY id DESC
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM trx2

6. The subset sum problem with SQL
This is my favourite! What is the subset sum problem? Find a fun explanation here: https://xkcd.com/287 And a boring one here: https://en.wikipedia.org/wiki/Subset_sum_problem Essentially, for each of these totals…| ID | TOTAL | |----|-------| | 1 | 25150 | | 2 | 19800 | | 3 | 27511 |… we want to find the “best” (i.e. the closest) sum possible, consisting of any combination of these items:
| ID | ITEM | |------|-------| | 1 | 7120 | | 2 | 8150 | | 3 | 8255 | | 4 | 9051 | | 5 | 1220 | | 6 | 12515 | | 7 | 13555 | | 8 | 5221 | | 9 | 812 | | 10 | 6562 |As you’re all quick with your mental mathemagic processing, you have immediately calculated these to be the best sums:
| TOTAL | BEST | CALCULATION |-------|-------|-------------------------------- | 25150 | 25133 | 7120 + 8150 + 9051 + 812 | 19800 | 19768 | 1220 + 12515 + 5221 + 812 | 27511 | 27488 | 8150 + 8255 + 9051 + 1220 + 812How to do it with SQL? Easy. Just create a CTE that contains all the 2n *possible* sums and then find the closest one for each
TOTAL
:
-- All the possible 2N sums
WITH sums(sum, max_id, calc) AS (...)
-- Find the best sum per “TOTAL”
SELECT
totals.total,
something_something(total - sum) AS best,
something_something(total - sum) AS calc
FROM draw_the_rest_of_the_*bleep*_owl

WITH sums(sum, id, calc) AS (
SELECT item, id, to_char(item) FROM items
UNION ALL
SELECT item + sum, items.id, calc || ' + ' || item
FROM sums JOIN items ON sums.id < items.id
)
SELECT
totals.id,
totals.total,
min (sum) KEEP (
DENSE_RANK FIRST ORDER BY abs(total - sum)
) AS best,
min (calc) KEEP (
DENSE_RANK FIRST ORDER BY abs(total - sum)
) AS calc
FROM totals
CROSS JOIN sums
GROUP BY totals.id, totals.total
How to Find the Closest Subset Sum with SQLEnjoy reading the details, but be sure to come back here for the remaining 4 tricks:
7. Capping a Running Total
So far, we’ve seen how to calculate an “ordinary” running total with SQL using window functions. That was easy. Now, how about if we cap the running total such that it never goes below zero? Essentially, we want to calculate this:| DATE | AMOUNT | TOTAL | |------------|--------|-------| | 2012-01-01 | 800 | 800 | | 2012-02-01 | 1900 | 2700 | | 2012-03-01 | 1750 | 4450 | | 2012-04-01 | -20000 | 0 | | 2012-05-01 | 900 | 900 | | 2012-06-01 | 3900 | 4800 | | 2012-07-01 | -2600 | 2200 | | 2012-08-01 | -2600 | 0 | | 2012-09-01 | 2100 | 2100 | | 2012-10-01 | -2400 | 0 | | 2012-11-01 | 1100 | 1100 | | 2012-12-01 | 1300 | 2400 |So, when that big negative
AMOUNT
-20000
was subtracted, instead of displaying the real TOTAL
of -15550
, we simply display 0
. In other words (or data sets):
| DATE | AMOUNT | TOTAL | |------------|--------|-------| | 2012-01-01 | 800 | 800 | GREATEST(0, 800) | 2012-02-01 | 1900 | 2700 | GREATEST(0, 2700) | 2012-03-01 | 1750 | 4450 | GREATEST(0, 4450) | 2012-04-01 | -20000 | 0 | GREATEST(0, -15550) | 2012-05-01 | 900 | 900 | GREATEST(0, 900) | 2012-06-01 | 3900 | 4800 | GREATEST(0, 4800) | 2012-07-01 | -2600 | 2200 | GREATEST(0, 2200) | 2012-08-01 | -2600 | 0 | GREATEST(0, -400) | 2012-09-01 | 2100 | 2100 | GREATEST(0, 2100) | 2012-10-01 | -2400 | 0 | GREATEST(0, -300) | 2012-11-01 | 1100 | 1100 | GREATEST(0, 1100) | 2012-12-01 | 1300 | 2400 | GREATEST(0, 2400)How will we do it?


MODEL
in your SELECT, and you’re opening up a can of awesome SQL worms!
SELECT ... FROM some_table
-- Put this after any table
MODEL ...
MODEL
there, we can implement spreadsheet logic directly in our SQL statements, just as with Microsoft Excel.
The following three clauses are the most useful and widely used (i.e. 1-2 per year by anyone on this planet):
MODEL
-- The spreadsheet dimensions
DIMENSION BY ...
-- The spreadsheet cell type
MEASURES ...
-- The spreadsheet formulas
RULES ...
DIMENSION BY
clause specifies the dimensions of your spreadsheet. Unlike in MS Excel, you can have any number of dimensions in Oracle:

MEASURES
clause specifies the values that are available in each cell of your spreadsheet. Unlike in MS Excel, you can have a whole tuple in each cell in Oracle, not just a single value.

RULES
clause specifies the formulas that apply to each cell in your spreadsheet. Unlike in MS Excel, these rules / formulas are centralised at a single place, instead of being put inside of each cell:

MODEL
a bit harder to use than MS Excel, but much more powerful, if you dare. The whole query will then be “trivially”:
SELECT *
FROM (
SELECT date, amount, 0 AS total
FROM amounts
)
MODEL
DIMENSION BY (row_number() OVER (ORDER BY date) AS rn)
MEASURES (date, amount, total)
RULES (
total[any] = greatest(0,
coalesce(total[cv(rn) - 1], 0) + amount[cv(rn)])
)
Click to access 10gr1-twp-bi-dw-sqlmodel-131067.pdf
8. Time Series Pattern Recognition
If you’re into fraud detection or any other field that runs real time analytics on large data sets, time series pattern recognition is certainly not a new term to you. If we review the “length of a series” data set, we might want to generate triggers on complex events over our time series as such:| ID | VALUE_DATE | AMOUNT | LEN | TRIGGER |------|------------|---------|-----|-------- | 9997 | 2014-03-18 | + 99.17 | 1 | | 9981 | 2014-03-16 | - 71.44 | 4 | | 9979 | 2014-03-16 | - 94.60 | 4 | x | 9977 | 2014-03-16 | - 6.96 | 4 | | 9971 | 2014-03-15 | - 65.95 | 4 | | 9964 | 2014-03-15 | + 15.13 | 3 | | 9962 | 2014-03-15 | + 17.47 | 3 | | 9960 | 2014-03-15 | + 3.55 | 3 | | 9959 | 2014-03-14 | - 32.00 | 1 |The rule of the above trigger is:
Trigger on the 3rd repetition of an event if the event occurs more than 3 times.Similar to the previous
MODEL
clause, we can do this with an Oracle-specific clause that was added to Oracle 12c:
SELECT ... FROM some_table
-- Put this after any table to pattern-match
-- the table’s contents
MATCH_RECOGNIZE (...)
MATCH_RECOGNIZE
includes the following subclauses:
SELECT *
FROM series
MATCH_RECOGNIZE (
-- Pattern matching is done in this order
ORDER BY ...
-- These are the columns produced by matches
MEASURES ...
-- A short specification of what rows are
-- returned from each match
ALL ROWS PER MATCH
-- «Regular expressions» of events to match
PATTERN (...)
-- The definitions of «what is an event»
DEFINE ...
)
SELECT *
FROM series
MATCH_RECOGNIZE (
ORDER BY id
MEASURES classifier() AS trg
ALL ROWS PER MATCH
PATTERN (S (R X R+)?)
DEFINE
R AS sign(R.amount) = prev(sign(R.amount)),
X AS sign(X.amount) = prev(sign(X.amount))
)
- We order the table by
ID
, which is the order in which we want to match events. Easy. - We then specify the values that we want as a result. We want the “MEASURE” trg, which is defined as the classifier, i.e. the literal that we’ll use in the
PATTERN
afterwards. Plus we want all the rows from a match. - We then specify a regular expression-like pattern. The pattern is an event “S” for Start, followed optionally by “R” for Repeat, “X” for our special event X, followed by one or more “R” for repeat again. If the whole pattern matches, we get SRXR or SRXRR or SRXRRR, i.e. X will be at the third position of a series of length >= 4
- Finally, we define R and X as being the same thing: The event when
SIGN(AMOUNT)
of the current row is the same asSIGN(AMOUNT)
of the previous row. We don’t have to define “S”. “S” is just any other row.
| ID | VALUE_DATE | AMOUNT | TRG | |------|------------|---------|-----| | 9997 | 2014-03-18 | + 99.17 | S | | 9981 | 2014-03-16 | - 71.44 | R | | 9979 | 2014-03-16 | - 94.60 | X | | 9977 | 2014-03-16 | - 6.96 | R | | 9971 | 2014-03-15 | - 65.95 | S | | 9964 | 2014-03-15 | + 15.13 | S | | 9962 | 2014-03-15 | + 17.47 | S | | 9960 | 2014-03-15 | + 3.55 | S | | 9959 | 2014-03-14 | - 32.00 | S |We can see a single “X” in our event stream. Exactly where we had expected it. At the 3rd repetition of an event (same sign) in a series of length > 3. Boom! As we don’t really care about “S” and “R” events, let’s just remove them as such:
SELECT
id, value_date, amount,
CASE trg WHEN 'X' THEN 'X' END trg
FROM series
MATCH_RECOGNIZE (
ORDER BY id
MEASURES classifier() AS trg
ALL ROWS PER MATCH
PATTERN (S (R X R+)?)
DEFINE
R AS sign(R.amount) = prev(sign(R.amount)),
X AS sign(X.amount) = prev(sign(X.amount))
)
| ID | VALUE_DATE | AMOUNT | TRG | |------|------------|---------|-----| | 9997 | 2014-03-18 | + 99.17 | | | 9981 | 2014-03-16 | - 71.44 | | | 9979 | 2014-03-16 | - 94.60 | X | | 9977 | 2014-03-16 | - 6.96 | | | 9971 | 2014-03-15 | - 65.95 | | | 9964 | 2014-03-15 | + 15.13 | | | 9962 | 2014-03-15 | + 17.47 | | | 9960 | 2014-03-15 | + 3.55 | | | 9959 | 2014-03-14 | - 32.00 | |Thank you Oracle!

9. Pivoting and Unpivoting
If you’ve read this far, the following will be almost too embarassingly simple: This is our data, i.e. actors, film titles, and film ratings:| NAME | TITLE | RATING | |-----------|-----------------|--------| | A. GRANT | ANNIE IDENTITY | G | | A. GRANT | DISCIPLE MOTHER | PG | | A. GRANT | GLORY TRACY | PG-13 | | A. HUDSON | LEGEND JEDI | PG | | A. CRONYN | IRON MOON | PG | | A. CRONYN | LADY STAGE | PG | | B. WALKEN | SIEGE MADRE | R |This is what we call pivoting:
| NAME | NC-17 | PG | G | PG-13 | R | |-----------|-------|-----|-----|-------|-----| | A. GRANT | 3 | 6 | 5 | 3 | 1 | | A. HUDSON | 12 | 4 | 7 | 9 | 2 | | A. CRONYN | 6 | 9 | 2 | 6 | 4 | | B. WALKEN | 8 | 8 | 4 | 7 | 3 | | B. WILLIS | 5 | 5 | 14 | 3 | 6 | | C. DENCH | 6 | 4 | 5 | 4 | 5 | | C. NEESON | 3 | 8 | 4 | 7 | 3 |Observe how we kinda grouped by the actors and then “pivoted” the number films per rating each actor played in. Instead of displaying this in a “relational” way, (i.e. each group is a row) we pivoted the whole thing to produce a column per group. We can do this, because we know all the possible groups in advance. Unpivoting is the opposite, when from the above, we want to get back to the “row per group” representation:
| NAME | RATING | COUNT | |-----------|--------|-------| | A. GRANT | NC-17 | 3 | | A. GRANT | PG | 6 | | A. GRANT | G | 5 | | A. GRANT | PG-13 | 3 | | A. GRANT | R | 6 | | A. HUDSON | NC-17 | 12 | | A. HUDSON | PG | 4 |It’s actually really easy. This is how we’d do it in PostgreSQL:
SELECT
first_name, last_name,
count(*) FILTER (WHERE rating = 'NC-17') AS "NC-17",
count(*) FILTER (WHERE rating = 'PG' ) AS "PG",
count(*) FILTER (WHERE rating = 'G' ) AS "G",
count(*) FILTER (WHERE rating = 'PG-13') AS "PG-13",
count(*) FILTER (WHERE rating = 'R' ) AS "R"
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
FILTER
clause to an aggregate function in order to count only some of the data.
In all other databases, we’d do it like this:
SELECT
first_name, last_name,
count(CASE rating WHEN 'NC-17' THEN 1 END) AS "NC-17",
count(CASE rating WHEN 'PG' THEN 1 END) AS "PG",
count(CASE rating WHEN 'G' THEN 1 END) AS "G",
count(CASE rating WHEN 'PG-13' THEN 1 END) AS "PG-13",
count(CASE rating WHEN 'R' THEN 1 END) AS "R"
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
NULL
values, so if we make all the values NULL
that are not interesting per aggregation, we’ll get the same result.
Now, if you’re using either SQL Server, or Oracle, you can use the built-in PIVOT
or UNPIVOT
clauses instead. Again, as with MATCH_RECOGNIZE
, just append this new keyword after a table and get the same result:
-- PIVOTING
SELECT something, something
FROM some_table
PIVOT (
count(*) FOR rating IN (
'NC-17' AS "NC-17",
'PG' AS "PG",
'G' AS "G",
'PG-13' AS "PG-13",
'R' AS "R"
)
)
-- UNPIVOTING
SELECT something, something
FROM some_table
UNPIVOT (
count FOR rating IN (
"NC-17" AS 'NC-17',
"PG" AS 'PG',
"G" AS 'G',
"PG-13" AS 'PG-13',
"R" AS 'R'
)
)
10. Abusing XML and JSON
First off
JSON is just XML with less features and less syntaxNow, everyone knows that XML is awesome. The corollary is thus:
JSON is less awesomeDon’t use JSON. Now that we’ve settled this, we can safely ignore the ongoing JSON-in-the-database-hype (which most of you will regret in five years anyway), and move on to the final example. How to do XML in the database. This is what we want to do:

WITH RECURSIVE
x(v) AS (SELECT '...'::xml),
actors(
actor_id, first_name, last_name, films
) AS (...),
films(
actor_id, first_name, last_name,
film_id, film
) AS (...)
SELECT *
FROM films
WITH RECURSIVE
x(v) AS (SELECT '
<actors>
<actor>
<first-name>Bud</first-name>
<last-name>Spencer</last-name>
<films>God Forgives... I Don’t, Double Trouble, They Call Him Bulldozer</films>
</actor>
<actor>
<first-name>Terence</first-name>
<last-name>Hill</last-name>
<films>God Forgives... I Don’t, Double Trouble, Lucky Luke</films>
</actor>
</actors>'::xml),
actors(actor_id, first_name, last_name, films) AS (...),
films(actor_id, first_name, last_name, film_id, film) AS (...)
SELECT *
FROM films
WITH RECURSIVE
x(v) AS (SELECT '...'::xml),
actors(actor_id, first_name, last_name, films) AS (
SELECT
row_number() OVER (),
(xpath('//first-name/text()', t.v))[1]::TEXT,
(xpath('//last-name/text()' , t.v))[1]::TEXT,
(xpath('//films/text()' , t.v))[1]::TEXT
FROM unnest(xpath('//actor', (SELECT v FROM x))) t(v)
),
films(actor_id, first_name, last_name, film_id, film) AS (...)
SELECT *
FROM films
WITH RECURSIVE
x(v) AS (SELECT '...'::xml),
actors(actor_id, first_name, last_name, films) AS (...),
films(actor_id, first_name, last_name, film_id, film) AS (
SELECT actor_id, first_name, last_name, 1,
regexp_replace(films, ',.+', '')
FROM actors
UNION ALL
SELECT actor_id, a.first_name, a.last_name, f.film_id + 1,
regexp_replace(a.films, '.*' || f.film || ', ?(.*?)(,.+)?', '\1')
FROM films AS f
JOIN actors AS a USING (actor_id)
WHERE a.films NOT LIKE '%' || f.film
)
SELECT *
FROM films

Conclusion
All of what this article has shown was declarative. And relatively easy. Of course, for the fun effect that I’m trying to achieve in this talk, some exaggerated SQL was taken and I expressly called everything “easy”. It’s not at all easy, you have to practice SQL. Like many other languages, but a bit harder because:- The syntax is a bit awkward from time to time
- Declarative thinking is not easy. At least, it’s very different
Thanks for your lovely tricks. Could you suggest me to replicate problem 7 in sybase?
Huh, that is using something so Oracle-specific, I doubt it can be done in any other database in this way. You could use recursive CTE if you’re using Sybase SQL Anywhere. With Sybase Adaptive Server Enterprise, I think you don’t have a lot of useful options to do that with SQL…
Ok. To replicate Problem 7 in Sybase, I suggest YOU. Happy now?
Hi Lukas!
A somewhat belated answer, but… you can solve 7 (computing a capped running total that resets the sum to 0 whenever it would get negative) with a single (non recursive) CTE + OVER().
The trick is to just shift the sum whenever it gets negative.
So we need neither recursion nor proprietary extensions nor match_recognize there, a simple subquery does the trick.
You can’t just make this claim and not show off an actual solution, though!
I thought it better not to write the actual answer too close to the claim so that the reader may find the solution by themselves :) , but… here it is.
So, unless I misunderstood what you were doing it can be done with window functions. Of course, we could (as always) also replace the OVER with non-equijoins to stick to plain old SQL without even the window functions, but that was not the point and the OVER version is arguably “better”.
Cheers.
I’m afraid, this doesn’t work. Here, with an additional CTE, so this can be executed right away, e.g. in Oracle:
This yields:
Your solution only works for the first “reset to zero event”, not for subsequent ones. Because your drift is not calculated recursively.
Lukas, I’m sorry but your did not copy correctly my answer:
SUM(CASE WHEN drift = 0 THEN -drift ELSE 0 END) OVER (ORDER BY d) is an incorrect simplification, but my original solution is correct I believe.
I can provide a full example via pm if you wish, but actually if you just copy paste my query it should work fine. I tested.
The idea is as follows: the column drift can be viewed as a list of resets, one for each time the sum would go below 0.
The second OVER() sums all these resets up to the current row (this is what replaces recursion, actually).
Nope, this was a paste error on my side. I forgot to quote < to become <. Fixed my comment. Result is still wrong.
Ah, I see then. You are right. My solution is incorrect, sorry.
No worries, I like your thinking. I just never managed yet to emulate that recursion that would be necessary. I think it’s doable with 3 levels of window functions, though.
Thanks for the idea. But, after checking more carefully even a solution with 3 functions would surprise me: you have rather convinced me now that some form of recursion is needed (it seems it is already needed to find the places at which a reset is performed – at least I found no other way). I will not try to prove that, though, because proving that a recursive SQL query cannnot be expressed without recursion can be extremely hard (e.g., Libkin’s ICDT’2001 paper “expressive power of SQL”).
changed my mind again. I believe I have a solution but with many steps (at least for the time being. Are you interested, should I post here or discuss somewhere else (pad, file transfer)?
In short: one possible solution using just plain SQL (or, say, window functions) is:
1) we can actually compute the dates at which the cap is reached and the running sum will be reset
One possible approach to compute them: for each date x, get the first subsequent date where the running total drops below that of x. This gives too many pairs but we can remove the ones that are “nested” in another which yields exactly the set “reset” dates.
The problem admits a nice geometrical interpretation : if you reverse the curve of the running total,
we are looking for points that can “see the horizon” (would be clearer with a picture).
2) from the list of dates needing resets, we can compute on each date by how much the running total should be offset to compensate for the cap (essentially, substract the running total at the time of the previous reset).
That approach works with plain SQL, but is not very elegant (I have not managed to summarize to a few lines).
I see this got you going :-) However, I’m not sure if I can follow your description. I’d have to “see” it, indeed…
I sent a solution by mail @datageekery so that I could add a schema…
Hi lukaseder and Benoît,
Is there a solution already? Could you please post it here or sent to my email. I am currently converting Oracle P/L-sql to Sql Sever T-Sql, since there’s no equivalent function for Match_Recognize in SQL Server, I need to find a way just using plain Sql to do the job. Thanks for any suggestions and comments.
I don’t think there’s an easy way to translate match_recognize to SQL Server (or any other RDBMS, for that matter). You could try your luck with a set of nested window functions. Probably 3-4 nesting levels will be required. See this presentation for some examples: https://modern-sql.com/feature/match_recognize
Hi Huang, if you are just looking for a solution to this challenge 7, yes I remember there is a solution which is correct and Lukas agreed with me there (though I did not post it). Is that what you want? if so, i can try to dig it out from my computer some time (unless lukas finds it first)
Hi Benoît, Yes please. You can post here or send it to my email: huangb.616@gmail.com.
Thanks so much!
Hi lukaseder, very good presentation for better understanding of this function.
You were close, there’s a simple solution using a window function:
Calculate the Cumulative Sum, the Cumulative Min of this sum and subtract both values:
WITH
data (d, amount) AS (
SELECT DATE ‘2012-01-01’, 800 FROM dual UNION ALL
SELECT DATE ‘2012-02-01’, 1900 FROM dual UNION ALL
SELECT DATE ‘2012-03-01’, 1750 FROM dual UNION ALL
SELECT DATE ‘2012-04-01’, -20000 FROM dual UNION ALL
SELECT DATE ‘2012-05-01’, 900 FROM dual UNION ALL
SELECT DATE ‘2012-06-01’, 3900 FROM dual UNION ALL
SELECT DATE ‘2012-07-01’, -2600 FROM dual UNION ALL
SELECT DATE ‘2012-08-01’, -2600 FROM dual UNION ALL
SELECT DATE ‘2012-09-01’, 2100 FROM dual UNION ALL
SELECT DATE ‘2012-10-01’, -2400 FROM dual UNION ALL
SELECT DATE ‘2012-11-01’, 1100 FROM dual UNION ALL
SELECT DATE ‘2012-12-01’, 1300 FROM dual
),
cte AS(
SELECT d, amount,
SUM(amount) OVER(ORDER BY d ROWS UNBOUNDED PRECEDING) AS cumsum
FROM data
)
SELECT
d,
amount,
— cumsum,
— LEAST is needed for the first rows ony (as long as cumsum is positive)
— LEAST(MIN(cumsum) OVER (ORDER BY d ROWS UNBOUNDED PRECEDING), 0) AS adjust,
cumsum – LEAST(MIN(cumsum) OVER (ORDER BY d ROWS UNBOUNDED PRECEDING), 0)
FROM cte
SQL is a domain language not 4th generation general purpose language.
Do you know what else is Turing complete? CSS3 !!!
I agree about the success though, SQL is success, unlike CSS :D
CSS3 is turing complete!? Holy cow, TIL: http://codepen.io/pixelass/pen/HbnCv
Typo in #2 that should be v1 v2 w1 w2, not v1 v2 v3 v4. Otherwise very informative. Recursive CTEs are mind-boggling, yes; I saw a blog post explaining how they can solve the TSP problem.
Yes, of course, you’re right. Thank you very much for pointing this out. Fixed.
Interesting challenge, solving the TSP problem with SQL. Challenge accepted :)
Loved the article and the examples. Thank you very much. That said, I have to take umbrage with a couple things.
>> JSON is just XML with less features and less syntax
…and 1/20 the size, simpler, is more human readable, and (if your using JS anyway) is already a JavaScript object so it doesn’t need to be parsed to be used.
Actually, JSON is NOT XML. JSON was designed specifically for use in data structures (and almost never gets used directly for markup) while XML was designed specifically for markup but gets used as a data structure while still lugging around all the extra markup crap you don’t need when using it exclusively for data.
>> Now, everyone knows that XML is awesome.
XML is the camel to JSON’s horse. It is incredibly useful for the 20% of the dev world that is desert, but most of us just want a fast, easy, usable data structure that doesn’t have built in water storage or require us to define a DTD just to make a todo list.
Thank you very much for your thoughtful comments. It’s called humour. Granted, that would have been a bit more obvious in the actual talk. Cheers, Lukas
reading the article is much less interesting than listening to it during a conference! ;)
I KNOW
Hi Lukas,
Great approach (use Java *and* SQL) and article.
On 4., users of Oracle 12c can leverage the MATCH_RECOGNIZE clause:
Thanks a lot for your feedback. Excellent, indeed
MATCH_RECOGNIZE
can be leveraged many times. I didn’t want to use it in 4. already, because it’s already explained in 8. (later on = more sophisticated, and less standard)Yes, I certainly understand and tricks 4. and 5. needed to be presented as is. My comments were just an aside directed to those who have MATCH_RECOGNIZE available: once you learn it, you can find many uses for it.
I absolutely agree with you. Thanks for sharing those, very nice to see!
Lukas,
On 5., users of Oracle 12c can leverage the MATCH_RECOGNIZE clause:
I just realized even 7. can be done with MATCH_RECOGNIZE!
I hope the draft proposal gets accepted by the SQL standard committee so more developers can take advantage of this.
Hah, you’re on a roll! :) Excellent. Indeed, this would be a very very nice feature for the standard.
Hello Lukas, very nice tricks.
I’m currently trying to make use of your 4. Finding the Largest Series with no Gaps with my case but to no success.
I was hoping if you could lend some help on my case problem.
I have a table of something like:
LOG_IN | LOG_OFF
————————————
2017-01-22 08:00:00 | 2017-01-22 09:00:00
2017-01-22 09:01:00 | 2017-01-22 10:00:00
2017-01-22 10:00:00 | 2017-01-22 10:15:00
2017-01-22 10:15:00 | 2017-01-22 10:45:00
2017-01-22 11:00:00 | 2017-01-22 11:29:00
2017-01-22 11:30:00 | 2017-01-22 12:00:00
I would like to select the rows where it has a gap of 1 minute with other rows.
Desired result:
———————————————————–
2017-01-22 08:00:00 | 2017-01-22 09:00:00
2017-01-22 09:01:00 | 2017-01-22 10:00:00
2017-01-22 11:00:00 | 2017-01-22 11:29:00
2017-01-22 11:30:00 | 2017-01-22 12:00:00
Thanks in advance.
Thanks for your message. Glad the post was useful. I could offer to quote you my hourly rates, but then again, you could just take this to Stack Overflow and get an answer for free ;-)
Cheers,
Lukas
Cheers!
https://stackoverflow.com/q/47908423/1142394 :)
Awesome stuff Lukas!
Totally enjoyed #4 and #5.
I was searching quite long for a collection of generic algorithms and their implementation in SQL, your article is a great starter.
Let me add the FiFo algorithm and using telly tables to your list of tricks and please write more on algorithms in particular
One update, as of MySQL 8 (or MariaDB 10.4, I believe) you can do these things. I have run several of the examples using MariaDB 10.4.6.
But of course, what you wrote was true in 2016. :-)
You’re right of course. Thanks a lot for pointing this out. Unbelievable how much MySQL has evolved in terms of SQL syntax support. I would not have thought this possible at the time :) They even have hash joins now, still hoping for support for heap tables, too.
Hi Lukas, I’ve stumbled across your blog and I just can’t stop reading it
I’m a little bit of a newby when it comes to SQL and your blog has been beyond amazing.
I just have a question though (probably a very dumb one but, oh well) where can I find the databases to run your SQL code for examples #4 and #5? is it hosted in some sort of github repo? I really want to be able to follow along those queries but not sure how to get ahold of those databases for this to work. Thank you very much in advance.
Thanks for your nice words! There weren’t actual databases here. Just make something up. Or use the Sakila database, which is always a great source of sample data: https://www.jooq.org/sakila