How can I do this? I have these data in Excel and I want to group / sort / assign / combine …While you could probably pull up a Visual Basic script doing the work or export the data to Java or any other procedural language of choice, why not just use SQL?
The use-case: Counting neighboring colours in a stadium choreographyThis might not be an everyday use-case for many of you, but for our office friends at FanPictor, it is. They’re creating software to draw a fan choreography directly into a stadium. Here’s the use-case on a high level: It’s immediately clear what this fun software does, right?
- You submit a choreography suggestion
- The event organisers choose the best submission
- The event organisers export the choreography as an Excel file
- The Excel file is fed into a print shop, printing red/red, red/white, white/red, white/white panels (or any other colours)
- The event helpers distribute the coloured panels on the appropriate seat
- The fans get all excited
- … whether a consecutive row of identical panels starts or stops
- … how many identical panels there are in such a row
How do we solve this problem?We solve this problem with SQL of course – and with a decent database, that supports window functions, e.g. PostgreSQL, or any commercial database of your choice! (you won’t be finding this sort of feature in MySQL). Here’s the query:
That’s it! Not too hard, is it? Let’s go through a couple of details. We’re using quite a few awesome SQL standard / PostgreSQL concepts, which deserve to be explained: Row value constructor The
with data as ( select d.*, row(sektor, row, scene1, scene2) block from d ) select sektor, row, seat, scene1, scene2, case when lag (block) over(o) is distinct from block and lead(block) over(o) is distinct from block then 'start / stop' when lag (block) over(o) is distinct from block then 'start' when lead(block) over(o) is distinct from block then 'stop' else '' end start_stop, count(*) over( partition by sektor, row, scene1, scene2 ) cnt from data window o as ( order by sektor, row, seat ) order by sektor, row, seat;
ROW()value constructor is a very powerful feature that can be used to combine several columns (or rows) into a single
This type can then be used for row value comparisons, saving you a lot of time comparing column by column. The DISTINCT predicate
row(sektor, row, scene1, scene2) block
The result of the above window function is compared with the previously constructed
lag (block) over(o) is distinct from block
ROWby using the DISTINCT predicate, which is a great way of comparing things “null-safely” in SQL. Remember that SQL NULLs are some of the hardest things in SQL to get right. Window functions Window functions are a very awesome concept. Without any
GROUP BYclause, you can calculate aggregate functions, window functions, ranking functions etc. in the context of a current row while you’re projecting the
SELECTclause. For instance:
The above window function counts all rows that are in the same partition (“group”) as the current row, given the partition criteria. In other words, all the seats that have the same
count(*) over( partition by sektor, row, scene1, scene2 ) cnt
(scene1, scene2)colouring and that are located in the same
(sector, row). The other window functions are
lag, which return a value from a previous or subsequent row, given a specific ordering:
Note also the use of the SQL standard
lag (block) over(o), lead(block) over(o) -- ... window o as ( order by sektor, row, seat )
WINDOWclause, which is supported only by PostgreSQL and Sybase SQL Anywhere. In the above snippet,
blockvalue of the previous row given the ordering
lead()would return the next row’s value for
NULL, in case of which we’re glad that we used the DISTINCT predicate, before. Note that you can also optionally supply an additional numeric parameter, to indicate that you want to access the second, third, fifth, or eighth, row backwards or forward.
SQL is your most powerful and underestimated toolAt Data Geekery, we always say that
SQL is a device whose mystery is only exceeded by its powerIf you’ve been following our blog, you may have noticed that we try to evangelise SQL as a great first-class citizen for Java developers. Most of the above features are supported by jOOQ, and translated to your native SQL dialect, if they’re not available. So, if you haven’t already, listen to Peter Kopfler who was so thrilled after our recent jOOQ/SQL talks in Vienna that he’s now all into studying standards and using PostgreSQL:
5 thoughts on “How can I do This? – With SQL of Course!”
yup, awesome indeed! Thanks a lot.
“All of the above features are supported by jOOQ,” – could you show us how to actually write the code? I would love to use the ‘WITH’ construction in jOOQ but thought, it’s not supported at the moment.
We’re (finally!) implementing common table expression support in jOOQ 3.4. As a matter of fact, I just realised that the
ROW()constructor currently cannot be used in a
SELECTclause like in the PostgreSQL example. This feature is on the roadmap for jOOQ 3.4 as well: https://github.com/jOOQ/jOOQ/issues/2360. Another missing feature is the ability to pass such a
ROW()type to the
LAG()functions. I have registered another feature request for this: https://github.com/jOOQ/jOOQ/issues/3188 (and I will fix the blog post: “Most of the above features…”)
An example of the current state of the integration tests for common table expressions can be seen here:
Inlining common table expression definitions
Using CommonTableExpression types
Let me know if you’re interested in seeing more, at the moment.
That’s great news!
Reading your example, I was a bit struggling to translate it to SQL. Would the first code read as follows in Oracle?
WITH T AS(SELECT 1 AS F1, ‘A’ AS F2 FROM DUAL )
SELECT * FROM T
In this case, yes, maybe another example with real table data would be interesting. Just to reassure me that the expressions within the CTE can be formulated as easy and powerfully as with the usual selects.
Almost. The first example would translate to this:
Unfortunately, as with derived tables in jOOQ, a lot of typesafety is lost when writing common table expressions. I.e. even if you assign a
CommonTableExpressionto a reusable local variable (as in the second example) there is no typesafe way to dereference
T.F1. I think this would only be possible, if there was a compiler plugin or some sort of type-provider to provide a new local type:
But currently, we’re not aware of this capability in Java.
Can you provide me with a sample SQL statement, then?