jOOQ Newsletter: April 30, 2014


Subscribe to this newsletter here

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ to the world. Here are:

Santiago M. Mola who appreciates jOOQ’s affinity to SQL features and its correctness

Vlad Mihalcea who’s reading the jOOQ docs to learn about SQL

Thanks for the shouts, guys! It looks our attempts to make our users competent and enthusiast SQL aficionados are effective. If you, our customers, are productive with Java and SQL, then we did our job right. Last but not least:

Tom Bujok who’s is now a happy SQL aficionado.

Want awesome free jOOQ stickers yourself? Contact us!.

Book Promotion – 10% off SQL Performance Explained

This book is a must-read for every SQL developer! Invest a very reasonable amount of money and around six hours of quality reading time, and you’ll stop wasting days and days of confused SQL tuning guesswork, because once you understand indexes, you are able to solve 90% of your performance issues.

Get your copy of SQL Performance Explained now! Use the limited-time “jOOQ” coupon to get a 10% partner discount:

http://sql-performance-explained.com

Community Zone – The jOOQ aficionados have been active!

In last week’s newsletter, we’ve praised Petri Kainulainen’s latest piece of work, his jOOQ / Spring / CRUD tutorial:

But that was not his last shout. He’s already promised another tutorial explaining how to do sorting and pagination with jOOQ:

Luckily for us Markus Winand – our SQL Performance Partner jumped in and reminded us of the difference between OFFSET pagination and KEYSET pagination. More details about these tools in his book – see above.

Michael Hughes from codinginthetrenches.com has also been discovering the benefits of using a SQL DSL, which he has explained in his article here. We always like it when jOOQ users realise the power of using jOOQ with stored procedures. You will be indefinitely more efficient compared to the standards JDBC or JPA.

Krisztian Horvath from SequenceIQ has published a post about using jOOQ with HBase – why not? If you’re restricting yourself to using plain SQL, you can use the jOOQ Open Source Edition with a lot of databases, although you’re going to be missing out on the coolest features and productivity boosters, of course.

Harmeet Singh appreciates the jOOQ code generator in his introductory blog post, an important point to mention. jOOQ’s code generator has helped our customers avoid so many errors already at compile-time, instead of waiting for them to appear in production, at runtime.

SQL Zone – When to use views

There are a couple of great reasons why you should occasionally (or systematically) use views instead of tables. These reasons include:

  • Views provide abstraction over tables. You can add/remove fields easily in a view without modifying your underlying schema.
  • Views can model complex joins easily.
  • Views can hide database-specific stuff from you, e.g. using Oracle’s SYS_CONTEXT for security checks.
  • Views can be useful for managing GRANTS, and thus hiding tables from you.
  • Views can help you with backwards compatibility when you change the underlying schema, but leave the (versioned) views in place.
  • Views can implement an additional security layer, e.g. by using Oracle’s WITH CHECK OPTION directly in the view.

See the relevant Stack Overflow question here.

SQL Zone – Calculating a running total

We’ve said it before. Use your database for your calculations, when this is reasonable. And by this, we don’t just mean simple COUNT(*) or SUM(AMOUNT) calls. We cannot stress the fact that you should be using window functions enough.

Imagine you want to do something fancy like calculating a balance value on every bank transaction based on the current balance and each transaction’s amount:

| ID   | VALUE_DATE | AMOUNT |  BALANCE |
|------|------------|--------|----------|
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 |  71.44 | 19886.64 |
| 9979 | 2014-03-16 | -94.60 | 19815.20 |
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |

You can do this in SQL! Check out our recent blog post about how to calculate a SQL standard running total directly in SQL!

Upcoming Events

The next weeks are a bit quiet, but we’re going to be active again in May and June! Have you missed any of our previous jOOQ talks? Soon you’ll get another chance to hear us talk about jOOQ or SQL in general in any of these upcoming events:

Stay informed about 2014 events on www.jooq.org/news.

NoSQL? No, SQL! – How to Calculate Running Totals


If anything at all, our jOOQ talks at various JUGs and conferences have revealed mostly one thing:

Java developers don’t know SQL.

And it isn’t even necessarily our fault. We’re just not exposed to SQL nowadays.

But consider this: We developers (or our customers) are paying millions of dollars every year to Oracle, Microsoft, IBM, SAP for their excellent RDBMS, only to ignore 90% of their database features and to perform a little bit of CRUD and a little bit of ACID with ORMs like Hibernate. We have forgotten about why those RDBMS were so expensive in the first place. We haven’t paid attention to the various improvements to the SQL standards, including SQL:1999, SQL:2003, SQL:2008, and the recent SQL:2011, which are all mostly left unsupported by JPA.

If only we Java developers knew how easy and how much fun it can be to replace thousands of lines of erroneous Java code with five lines of SQL. Don’t believe it? Check this out:

Promoting SQL, the Language

Instead of simply promoting jOOQ, we started to help Java developers appreciate actual SQL, regardless of the access pattern that they’re using. Because true SQL can be appreciated through any of these APIs:

jOOQ: The best way to use Oracle AQ in Java

How the above APIs can be leveraged in Java 8 can be seen here.

And believe us, most developers were astonished by what was possible in SQL, when they saw our NoSQL? No, SQL! talk:

Calculating a Running Total

So let’s delve into the essence of the talk and calculate a running total with SQL. What’s a running total? It’s easy. Imagine you have these bank account transaction data in your database:

| ID   | VALUE_DATE | AMOUNT |
|------|------------|--------|
| 9997 | 2014-03-18 |  99.17 |
| 9981 | 2014-03-16 |  71.44 |
| 9979 | 2014-03-16 | -94.60 |
| 9977 | 2014-03-16 |  -6.96 |
| 9971 | 2014-03-15 | -65.95 |

You’ll notice immediately, that the balance per account transaction is missing. Yes, we want to calculate that balance as such:

| ID   | VALUE_DATE | AMOUNT |  BALANCE |
|------|------------|--------|----------|
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 |  71.44 | 19886.64 |
| 9979 | 2014-03-16 | -94.60 | 19815.20 |
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |

If we’re assuming that we know the current balance on the bank account, we can use each account transaction’s AMOUNT value and subtract it from that current balance. Alternatively, we could assume an initial balance of zero and add up all the AMOUNT value till today. This is illustrated here:

| ID   | VALUE_DATE | AMOUNT |  BALANCE |
|------|------------|--------|----------|
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 | +71.44 |=19886.64 | n
| 9979 | 2014-03-16 | -94.60 |+19815.20 | n + 1
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |

Each transaction’s balance can be calculated through either one of these formulas:

BALANCE(ROWn) = BALANCE(ROWn+1) + AMOUNT(ROWn)
BALANCE(ROWn+1) = BALANCE(ROWn) – AMOUNT(ROWn)

So, that’s a running total. Easy, right?

But how can we do it in SQL?

Most of us would probably pull out a little Java programme out of their sleeves, keeping all the amounts in memory, writing unit tests, fixing all sorts of bugs (we’re not mathematicians, after all), wrestling with BigDecimals, etc. Few of us would probably go through the hassle of doing the same in PL/SQL or T-SQL, or whatever other procedural language you have at disposition, and possibly update each balance directly into the table when inserting / updating new transactions.

But as you might have guessed so far, the solution we’re looking for here is a solution in SQL. Please bear with us as we’re going through the examples. They’re getting better and better. And if you want to play around with the examples, download Oracle XE and the running-totals.sql script and get up and running!

What we’ve learned from college / SQL-92 would probably involve a…

Using Nested SELECT

Let’s assume that we have a view like v_transactions, which already joins the accounts table to the account transactions table in order to access the current_balance. Here’s how we would write this query, then:

SELECT
  t1.*,
  t1.current_balance - (
    SELECT NVL(SUM(amount), 0)
    FROM v_transactions t2
    WHERE t2.account_id = t1.account_id
    AND  (t2.value_date, t2.id) >
         (t1.value_date, t1.id)
  ) AS balance
FROM     v_transactions t1
WHERE    t1.account_id = 1
ORDER BY t1.value_date DESC, t1.id DESC

Notice how the nested SELECT uses row value expression predicates to express the filtering criteria. If your database doesn’t support the SQL standard row value expression predicates (and you’re not using jOOQ to emulate them), you can factor them out yourself to form this equivalent query, instead:

SELECT
  t1.*,
  t1.current_balance - (
    SELECT NVL(SUM(amount), 0)
    FROM v_transactions t2
    WHERE t2.account_id = t1.account_id
    AND ((t2.value_date > t1.value_date) OR
         (t2.value_date = t1.value_date AND
          t2.id         > t1.id))
  ) AS balance
FROM     v_transactions t1
WHERE    t1.account_id = 1
ORDER BY t1.value_date DESC, t1.id DESC

So in essence, for any given account transaction, your nested SELECT simply fetches the sum of all AMOUNT values for account transactions that are more recent than the currently projected account transaction.

| ID   | VALUE_DATE |  AMOUNT |  BALANCE |
|------|------------|---------|----------|
| 9997 | 2014-03-18 | -(99.17)|+19985.81 |
| 9981 | 2014-03-16 | -(71.44)| 19886.64 |
| 9979 | 2014-03-16 |-(-94.60)| 19815.20 |
| 9977 | 2014-03-16 |   -6.96 |=19909.80 |
| 9971 | 2014-03-15 |  -65.95 | 19916.76 |

Does it perform?

Execution plan for nested SELECT

Execution plan for nested SELECT

Nope. As you can see, for the relatively simple sample data set (only 1101 records filtered from account_id = 1 in line 9), there is an INDEX RANGE SCAN materialising a whopping total of 1212K rows in memory. This looks like we have O(n2) complexity. I.e. a very naïve algorithm is being applied.

(and don’t think that 770ms is fast for this trivial query!)

While you could probably tune this query slightly, we should still feel that Oracle should be able to devise an O(n) algorithm for this simple task.

Using Recursive SQL

No one enjoys writing recursive SQL. No one. Let me convince you.

For simplicity, we’re assuming that we also have a TRANSACTION_NR column enumerating transactions in their sort order, which can be used to simplify recursion:

| ID   | VALUE_DATE | AMOUNT | TRANSACTION_NR |
|------|------------|--------|----------------|
| 9997 | 2014-03-18 |  99.17 |              1 |
| 9981 | 2014-03-16 |  71.44 |              2 |
| 9979 | 2014-03-16 | -94.60 |              3 |
| 9977 | 2014-03-16 |  -6.96 |              4 |
| 9971 | 2014-03-15 | -65.95 |              5 |

Ready? Check out this gorgeous piece of SQL!

WITH ordered_with_balance (
  account_id, value_date, amount, 
  balance, transaction_number
)
AS (
  SELECT t1.account_id, t1.value_date, t1.amount, 
         t1.current_balance, t1.transaction_number
  FROM   v_transactions_by_time t1
  WHERE  t1.transaction_number = 1

  UNION ALL

  SELECT t1.account_id, t1.value_date, t1.amount, 
         t2.balance - t2.amount, t1.transaction_number
  FROM   ordered_with_balance t2
  JOIN   v_transactions_by_time t1
  ON     t1.transaction_number = 
         t2.transaction_number + 1
  AND    t1.account_id = t2.account_id
)
SELECT   *
FROM     ordered_with_balance
WHERE    account_id= 1
ORDER BY transaction_number ASC

Ach… How to read this beauty?

Essentially, we’re self-joining the view (common table expression) that we’re about to declare:

WITH ordered_with_balance (
  account_id, value_date, amount, 
  balance, transaction_number
)
AS (
  SELECT t1.account_id, t1.value_date, t1.amount, 
         t1.current_balance, t1.transaction_number
  FROM   v_transactions_by_time t1
  WHERE  t1.transaction_number = 1

  UNION ALL

  SELECT t1.account_id, t1.value_date, t1.amount, 
         t2.balance - t2.amount, t1.transaction_number
  FROM   ordered_with_balance t2
  JOIN   v_transactions_by_time t1
  ON     t1.transaction_number = 
         t2.transaction_number + 1
  AND    t1.account_id = t2.account_id
)
SELECT   *
FROM     ordered_with_balance
WHERE    account_id= 1
ORDER BY transaction_number ASC

In the first subselect of the UNION ALL expression, we’re projecting the current_balance of the account, only for the first transaction_number.

In the second subselect of the UNION ALL expression, we’re projecting the difference of the balance of the previous account transaction and the AMOUNT of the current account transaction.

WITH ordered_with_balance (
  account_id, value_date, amount, 
  balance, transaction_number
)
AS (
  SELECT t1.account_id, t1.value_date, t1.amount, 
         t1.current_balance, t1.transaction_number
  FROM   v_transactions_by_time t1
  WHERE  t1.transaction_number = 1

  UNION ALL

  SELECT t1.account_id, t1.value_date, t1.amount, 
         t2.balance - t2.amount, t1.transaction_number
  FROM   ordered_with_balance t2
  JOIN   v_transactions_by_time t1
  ON     t1.transaction_number = 
         t2.transaction_number + 1
  AND    t1.account_id = t2.account_id
)
SELECT   *
FROM     ordered_with_balance
WHERE    account_id= 1
ORDER BY transaction_number ASC

And because we’re recursing into the ordered_with_balance common table expression, this will continue until we reach the “last” transaction.

Now let’s make an educated guess, whether this performs well…

Execution plan for recursive SQL

Execution plan for recursive SQL

Well. It doesn’t. We get even more rows in memory, namely 11M rows for what should be at most 1101. Parts of this plan are due to the fact that the TRANSACTION_NUMBER utility column is another calculated column that couldn’t be optimised by Oracle. But the essence here is the fact that it is already very hard to get it right, it’s even harder to get it fast.

Using Window Functions

So, we’ve suffered enough. Let’s hear some good news.

tweet thisThere is SQL before window functions, and there is SQL after window functions
Dimitri Fontaine in this great post

The best solution for this problem is this one:

SELECT
  t.*,
  t.current_balance - NVL(
    SUM(t.amount) OVER (
      PARTITION BY t.account_id
      ORDER BY     t.value_date DESC,
                   t.id         DESC
      ROWS BETWEEN UNBOUNDED PRECEDING
           AND     1         PRECEDING
    ),
  0) AS balance
FROM     v_transactions t
WHERE    t.account_id = 1
ORDER BY t.value_date DESC,
         t.id         DESC

Essentially, we’re doing exactly the same thing as with the nested SELECT. We’re subtracting the SUM() of all AMOUNT values “over” the subset of rows that is:

  • in the same PARTITION as the current row (i.e. has the same account_id)
  • ordered by the same ordering criteria as the account transactions (from the outer query)
  • positioned strictly before the current row in the sense of the above ordering

Or, again, visually:

| ID   | VALUE_DATE |  AMOUNT |  BALANCE |
|------|------------|---------|----------|
| 9997 | 2014-03-18 | -(99.17)|+19985.81 |
| 9981 | 2014-03-16 | -(71.44)| 19886.64 |
| 9979 | 2014-03-16 |-(-94.60)| 19815.20 |
| 9977 | 2014-03-16 |   -6.96 |=19909.80 |
| 9971 | 2014-03-15 |  -65.95 | 19916.76 |

And now, does this perform?

Execution plan for window functions

Execution plan for window functions

Hallelujah!

It couldn’t be much faster! Window functions are probably the most underestimated SQL feature.

Using the Oracle MODEL Clause

Now, this is more of a special treat for those SQL nerds among you who want to piss off your fellow developers with eerie, weird-looking SQL. The MODEL clause (only available in Oracle).

SELECT account_id, value_date, amount, balance
FROM (
  SELECT id, account_id, value_date, amount,
         current_balance AS balance
  FROM   v_transactions
) t
WHERE account_id = 1
MODEL
  PARTITION BY (account_id)
  DIMENSION BY (
    ROW_NUMBER() OVER (
      ORDER BY value_date DESC, id DESC
    ) AS rn
  )
  MEASURES (value_date, amount, balance)
  RULES (
    balance[rn > 1] = balance[cv(rn) - 1] 
                    - amount [cv(rn) - 1]
  )
ORDER BY rn ASC

Now, how to read this beast? We’re taking the sample data and transforming it to:

  • be PARTITION‘ed by the usual criteria
  • be DIMENSION‘ed along the sort order, i.e. the transaction row number
  • be MEASURE‘ed, i.e. to provide calculated values for date, amount, balance (where date and amount remain untouched, original data)
  • be calculated according to RULES, which define the balance of each transaction (except for the first one) to be the balance of the previous transaction minus the amount of the previous transaction

Still too abstract? I know. But think about it this way:

Does it remind you of something?

Does it remind you of something?

MS Excel! Every time you have a problem that your project manager thinks is peanuts to solve with his fancy MS Excel spreadsheets, then the MODEL clause is your friend!

And does it perform?

Execution plan for the MODEL clause

Execution plan for the MODEL clause

… pretty much so (although the above shouldn’t be confused with an actual benchmark).

If you haven’t seen enough, see another great use-case for the MODEL clause here. And for all the details, consider reading Oracle’s official MODEL clause whitepaper

Do it yourself

Did you like the above examples? Don’t worry. When you’ve seen these things for the first time, they can be pretty confusing. But in essence, they’re really not that complicated. And once you have those features in your tool-chain, you will be indefinitely more productive than if you had to write out all those algorithms in an imperative programming language.

If you want to play around with the examples, download Oracle XE and the running-totals.sql script and get up and running!

Conclusion

At Data Geekery, we always say:

SQL is a device whose mystery is only exceeded by its power

And sometimes, we also quote Winston Churchill for having said something along the lines of:

tweet thisSQL is the worst form of database querying, except for all the other forms

jOOQ: The best way to use Oracle AQ in Java

Indeed, as the MODEL clause has shown us, SQL can become very extreme. But once you know the tricks and expressions (and most importantly, window functions), you’ll be incredibly more productive with SQL than with many other technologies, if the task at hand is a bulk calculation operation on a simple or complex data set. And your query is often faster than if you had hand-written it, at least when you’re using a decent database.

So let’s put SQL to action in our software!

Are you interested in hosting our NoSQL? No, SQL! talk at your local JUG or as an in-house presentation? Contact us, we’re more than happy to help you improve your SQL skills!

jOOQ Newsletter: April 16, 2014 – Monthly, Yearly, Perpetual licenses now available


Subscribe to this newsletter here

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ to the world. Here are:

Mahmud who cannot wait to make more magic with jOOQ.

Peter Kopfler who, after hearing about jOOQ and SQL in Vienna is thrilled to take a deep dive into the awesome features of PostgreSQL

Thanks for the shouts, guys!

New license models – now available

We’ve done all the legal work and we’re happy to announce that we’re now ready to offer you a new set of alternative licensing options! For each of the jOOQ Express, jOOQ Professional, and jOOQ Enterprise licenses, you may now purchase any of the following subscriptions:

  • A new monthly subscription for short-running tasks, such as DB migrations
  • The existing yearly subscription for default use-cases
  • A new major release perpetual license for long-running jOOQ 3.x integrations with little need for upgrades

We would like to thank our customers who have been giving us great feedback on our licensing model, and to those of you who have been eagerly waiting for the perpetual license.

Can’t wait? Download your copy of jOOQ now

Are you an existing customer of the jOOQ yearly subscription interested in a switch to other terms? We’ll offer you a 50% refund discount on your existing yearly subscription, should you choose to switch to the perpetual license by the end of April.

Contact sales for a tailor-made license migration discount.

Internet Explorer 8 support on our website

No one loves the old Internet Explorer versions, agreed, but that is not a reason not to support them. We’ve finally re-worked our manual and the rest of our website to also support Internet Explorer 8. Jumping on the HTML5 train was done prematurely, which is why many of our customers in the banking sector who cannot upgrade, or use Firefox, had to go through hassles to read the jOOQ manual.

We would like to apologise for all the inconvience this has introduced to some of you! If you encounter any issues with our website, please drop us a note, and we’ll fix it immediately.

Community Zone – Another great article by Petri Kainulainen

It’s hard to believe, but Petri Kainulainen (author of a variety of books and tutorials on Spring) has done it again! And he did it even better than before. We’re very proud to present to you part 3 of his great jOOQ / Spring tutorial. This time:

CRUD is a very important part of your application, and getting it right is essential to save time and money on your development efforts. jOOQ implements an ActiveRecord-like pattern, similar to Ruby’s ActiveRecords. In his article, Petri shows how to tie these ActiveRecords to Spring’s Repository pattern. Convince yourselves! And while you’re at it, don’t miss Petri’s other two tutorials:

SQL Zone – Window Functions – A Must-Have Tool

There is SQL before window functions and SQL after window functions. If you’re fortunate enough to use a commercial database, or PostgreSQL, then you get to enjoy the merits of one of the greatest SQL features that have ever been standardised (into SQL:2003).

We often blog about window functions, and when we go to conferences to talk about jOOQ or about SQL, window functions are all over our slides.

CUME_DIST()

In this blog post, we show you the great CUME_DIST() function, which is essentially the same as the ROW_NUMBER() divided by the amount of rows. So, if you ever need to indicate the position of your row within the whole result set as a percentage, CUME_DIST() is your weapon of choice.

LEAD() and LAG()

Just yesterday, we were able to solve a very fun data problem for our friends from FanPictor, a neighbouring startup from our offices. In an Excel export of their stadium data (see above), they wanted to group blocks of similar colours and create delimiters at the beginning and at the end of each block. Essentially, they wanted to create instructions like “The next five seats are red, the next 2 seats are white, the next 10 seats are red”. This can be done very easily using the awesome LEAD() and LAG() functions.

Upcoming Events

After a great JUG Saxony Day in Dresden and an awesome Java/Scala/jOOQ/SQL talk at VSUG in Vienna, we’re looking forward to a couple of great conferences in May / June.

Have you missed any of our previous jOOQ talks? Soon you’ll get another chance to hear us talk about jOOQ or SQL in general in any of these upcoming events:

Stay informed about 2014 events on www.jooq.org/news.

How can I do This? – With SQL of Course!


Haven’t we all been wondering:

How can I do this? I have these data in Excel and I want to group / sort / assign / combine …

While you could probably pull up a Visual Basic script doing the work or export the data to Java or any other procedural language of choice, why not just use SQL?

The use-case: Counting neighboring colours in a stadium choreography

This might not be an everyday use-case for many of you, but for our office friends at FanPictor, it is. They’re creating software to draw a fan choreography directly into a stadium. Here’s the use-case on a high level:

Draw your fan choreography with FanPictor

Draw your fan choreography with FanPictor. In this case, a tribute to Roger Federer

It’s immediately clear what this fun software does, right?

  • You submit a choreography suggestion
  • The event organisers choose the best submission
  • The event organisers export the choreography as an Excel file
  • The Excel file is fed into a print shop, printing red/red, red/white, white/red, white/white panels (or any other colours)
  • The event helpers distribute the coloured panels on the appropriate seat
  • The fans get all excited

Having a look at the Excel spreadsheet

So this is what the Excel spreadsheet looks like:

Print shop instructions

Print shop instructions

Now, distributing these panels is silly, repetitive work. From experience, our friends at FanPictor wanted to have something along the lines of this, instead:

Print shop instructions for dummies

Print shop instructions for dummies

Notice that there are instructions associated with each panel to indicate:

  • … whether a consecutive row of identical panels starts or stops
  • … how many identical panels there are in such a row

“consecutive” means that within a stadium sector and row, there are adjacent seats with the same (Scene1, Scene2) tuple.

How do we solve this problem?

We solve this problem with SQL of course – and with a decent database, that supports window functions, e.g. PostgreSQL, or any commercial database of your choice! (you won’t be finding this sort of feature in MySQL).

Here’s the query:

with data
as (
  select 
    d.*,
    row(sektor, row, scene1, scene2) block
  from d
)
select 
  sektor,
  row,
  seat,
  scene1,
  scene2,
  case 
    when lag (block) over(o) is distinct from block 
     and lead(block) over(o) is distinct from block 
    then 'start / stop'
    when lag (block) over(o) is distinct from block 
    then 'start'
    when lead(block) over(o) is distinct from block 
    then 'stop'
    else ''
  end start_stop,
  count(*) over(
    partition by sektor, row, scene1, scene2
  ) cnt
from data
window o as (
  order by sektor, row, seat
)
order by sektor, row, seat;

That’s it! Not too hard, is it?

Let’s go through a couple of details. We’re using quite a few awesome SQL standard / PostgreSQL concepts, which deserve to be explained:

Row value constructor

The ROW() value constructor is a very powerful feature that can be used to combine several columns (or rows) into a single ROW / RECORD type:

row(sektor, row, scene1, scene2) block

This type can then be used for row value comparisons, saving you a lot of time comparing column by column.

The DISTINCT predicate

lag (block) over(o) is distinct from block 

The result of the above window function is compared with the previously constructed ROW by using the DISTINCT predicate, which is a great way of comparing things “null-safely” in SQL. Remember that SQL NULLs are some of the hardest things in SQL to get right.

Window functions

Window functions are a very awesome concept. Without any GROUP BY clause, you can calculate aggregate functions, window functions, ranking functions etc. in the context of a current row while you’re projecting the SELECT clause. For instance:

count(*) over(
  partition by sektor, row, scene1, scene2
) cnt

The above window function counts all rows that are in the same partition (“group”) as the current row, given the partition criteria. In other words, all the seats that have the same (scene1, scene2) colouring and that are located in the same (sector, row).

The other window functions are lead and lag, which return a value from a previous or subsequent row, given a specific ordering:

lag (block) over(o),
lead(block) over(o)
-- ...
window o as (
  order by sektor, row, seat
)

Note also the use of the SQL standard WINDOW clause, which is supported only by PostgreSQL and Sybase SQL Anywhere.

In the above snippet, lag() returns the block value of the previous row given the ordering o, whereas lead() would return the next row’s value for block – or NULL, in case of which we’re glad that we used the DISTINCT predicate, before.

Note that you can also optionally supply an additional numeric parameter, to indicate that you want to access the second, third, fifth, or eighth, row backwards or forward.

SQL is your most powerful and underestimated tool

At Data Geekery, we always say that

SQL is a device whose mystery is only exceeded by its powertweet this

If you’ve been following our blog, you may have noticed that we try to evangelise SQL as a great first-class citizen for Java developers. Most of the above features are supported by jOOQ, and translated to your native SQL dialect, if they’re not available.

So, if you haven’t already, listen to Peter Kopfler who was so thrilled after our recent jOOQ/SQL talks in Vienna that he’s now all into studying standards and using PostgreSQL:

Further reading

There was SQL before window functions and SQL after window functions

We’re Hacking JDBC, so You Don’t Have To


We love working with JDBC

Said no one. Ever.

On a more serious note, JDBC is actually a very awesome API, if you think about it. It is probably also one of the very reasons Java has become the popular platform it is today. Before the JDK 1.1, and before ODBC (and that’s a very long time ago) it was hard to imagine any platform that would standardise database access at all. Heck, SQL itself was hardly even standardised at the time and along came Java with JDBC, a simple API with only few items that you have to know of in every day work:

  • Connection: the object that models all your DB interactions
  • PreparedStatement: the object that lets you execute a statement
  • ResultSet: the object that lets you fetch data from the database

That’s it!

Back to reality

That was the theory. In practice, enterprise software operating on top of JDBC quickly evolved towards this:

Hacking JDBC. Image copyright information on this page

JDBC is one of the last resorts for Java developers, where they can feel like real hackers, hacking this very stateful, very verbose, very arcane API in many ways. Pretty much everyone operating on JDBC will implement wrappers around the API to prevent at least:

  • Common syntax errors
  • Bind variable index mismatches
  • Dynamic SQL construction
  • Edge cases around the usage LOBs
  • Resource handling and closing
  • Array and UDT management
  • Stored procedure abstraction

… and so much more.

So while everyone is doing the above infrastructure work, they’re not working on their business logic. And pretty much everyone does these things, when working with JDBC. Hibernate and JPA do not have most these problems, but they’re not SQL APIs any longer, either.

Here are a couple of examples that we have been solving inside of jOOQ, so you don’t have to:

How to fetch generated keys in some databases

case DERBY:
case H2:
case MARIADB:
case MYSQL: {
    try {
        listener.executeStart(ctx);
        result = ctx.statement().executeUpdate();
        ctx.rows(result);
        listener.executeEnd(ctx);
    }

    // Yes. Not all warnings may have been consumed yet
    finally {
        consumeWarnings(ctx, listener);
    }

    // Yep. Should be as simple as this. But it isn't.
    rs = ctx.statement().getGeneratedKeys();

    try {
        List<Object> list = new ArrayList<Object>();

        // Some JDBC drivers seem to illegally return null
        // from getGeneratedKeys() sometimes
        if (rs != null) {
            while (rs.next()) {
                list.add(rs.getObject(1));
            }
        }

        // Because most JDBC drivers cannot fetch all
        // columns, only identity columns
        selectReturning(ctx.configuration(), list.toArray());
        return result;
    }
    finally {
        JDBCUtils.safeClose(rs);
    }
}

How to handle BigInteger and BigDecimal

else if (type == BigInteger.class) {
    // The SQLite JDBC driver doesn't support BigDecimals
    if (ctx.configuration().dialect() == SQLDialect.SQLITE) {
        return Convert.convert(rs.getString(index),
                               (Class) BigInteger.class);
    }
    else {
        BigDecimal result = rs.getBigDecimal(index);
        return (T) (result == null ? null :
                    result.toBigInteger());
    }
}
else if (type == BigDecimal.class) {
    // The SQLite JDBC driver doesn't support BigDecimals
    if (ctx.configuration().dialect() == SQLDialect.SQLITE) {
        return Convert.convert(rs.getString(index),
                               (Class) BigDecimal.class);
    }
    else {
        return (T) rs.getBigDecimal(index);
    }
}

How to fetch all exceptions from SQL Server

switch (configuration.dialect().family()) {
    case SQLSERVER:
        consumeLoop: for (;;)
            try {
                if (!stmt.getMoreResults() &&
                     stmt.getUpdateCount() == -1)
                    break consumeLoop;
            }
            catch (SQLException e) {
                previous.setNextException(e);
                previous = e;
            }
}

Convinced?

This is nasty code. And we have more examples of nasty code here, or in our source code.

All of these examples show that when working with JDBC, you’ll write code that you don’t want to / shouldn’t have to write in your application. This is why…

we have been hacking JDBC, so you don’t have to

Java 8 Friday: No More Need for ORMs


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

No More Need for ORMs

Debates about the usefulness of ORM (Object-Relational Mapping) have been going on for the last decade. While many people would agree that Hibernate and JPA solve a lot of problems very well (mostly the persistence of complex object graphs), others may claim that the mapping complexity is mostly overkill for data-centric applications.

JPA solves mapping problems by establishing standardised, declarative mapping rules through hard-wired annotations on the receiving target types. We claim that many data-centric problems should not be limited by the narrow scope of these annotations, but be solved in a much more functional way. Java 8, and the new Streams API finally allow us to do this in a very concise manner!

Let’s start with a simple example, where we’re using H2’s INFORMATION_SCHEMA to collect all tables and their columns. We’ll want to produce an ad-hoc data structure of the type Map<String, List<String>> to contain this information. For simplicity of SQL interaction, we’ll use jOOQ (as always, a shocker on this blog). Here’s how we prepare this:

public static void main(String[] args)
throws Exception {
    Class.forName("org.h2.Driver");
    try (Connection c = getConnection(
            "jdbc:h2:~/sql-goodies-with-mapping", 
            "sa", "")) {

        // This SQL statement produces all table
        // names and column names in the H2 schema
        String sql =
            "select table_name, column_name " +
            "from information_schema.columns " +
            "order by " +
                "table_catalog, " +
                "table_schema, " +
                "table_name, " +
                "ordinal_position";

        // This is jOOQ's way of executing the above
        // statement. Result implements List, which
        // makes subsequent steps much easier
        Result<Record> result =
        DSL.using(c)
           .fetch(sql)
    }
}

Now that we’ve set up this query, let’s see how we can produce the Map<String, List<String>> from the jOOQ Result:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

The above example produces the following output:

FUNCTION_COLUMNS: [ALIAS_CATALOG, ALIAS_SCHEMA, ...]
CONSTANTS: [CONSTANT_CATALOG, CONSTANT_SCHEMA, ...]
SEQUENCES: [SEQUENCE_CATALOG, SEQUENCE_SCHEMA, ...]

How does it work? Let’s go through it step-by-step

DSL.using(c)
   .fetch(sql)

// Here, we transform a List into a Stream
   .stream()

// We're collecting Stream elements into a new
// collection type
   .collect(

// The Collector is a grouping operation, producing
// a Map
            groupingBy(

// The grouping operation's group key is defined by
// the jOOQ Record's TABLE_NAME value
       r -> r.getValue("TABLE_NAME"),

// The grouping operation's group value is generated
// by this mapping expression...
       mapping(

// ... which is essentially mapping each grouped
// jOOQ Record to the Record's COLUMN_NAME value
           r -> r.getValue("COLUMN_NAME"),

// ... and then collecting all those values into a
// java.util.List. Whew
           toList()
       )
   ))

// Once we have this Map<String, List<String>> we can
// simply consume its entries with the following Consumer
// lambda expression
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

Got it? These things are certainly a bit tricky when playing around with it for the first time. The combination of new types, extensive generics, lambda expressions can be a bit confusing at first. The best thing is to simply practice with these things until you get a hang of it. After all, the whole Streams API is really a revolution compared to previous Java Collections APIs.

The good news is: This API is final and here to stay. Every minute you spend practicing it is an investment into your own future.

Note that the above programme used the following static import:

import static java.util.stream.Collectors.*;

Note also, that the output was no longer ordered as in the database. This is because the groupingBy collector returns a java.util.HashMap. In our case, we might prefer collecting things into a java.util.LinkedHashMap, which preserves insertion / collection order:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),

       // Add this Supplier to the groupingBy
       // method call
       LinkedHashMap::new,
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(...);

We could go on with other means of transforming results. Let’s imagine, we would like to generate simplistic DDL from the above schema. It’s very simple. First, we’ll need to select column’s data type. We’ll simply add it to our SQL query:

String sql =
    "select " +
        "table_name, " +
        "column_name, " +
        "type_name " + // Add the column type
    "from information_schema.columns " +
    "order by " +
        "table_catalog, " +
        "table_schema, " +
        "table_name, " +
        "ordinal_position";

I have also introduced a new local class for the example, to wrap name and type attributes:

class Column {
    final String name;
    final String type;

    Column(String name, String type) {
        this.name = name;
        this.type = type;
    }
}

Now, let’s see how we’ll change our Streams API method calls:

result
    .stream()
    .collect(groupingBy(
        r -> r.getValue("TABLE_NAME"),
        LinkedHashMap::new,
        mapping(

            // We now collect this new wrapper type
            // instead of just the COLUMN_NAME
            r -> new Column(
                r.getValue("COLUMN_NAME", String.class),
                r.getValue("TYPE_NAME", String.class)
            ),
            toList()
        )
    ))
    .forEach(
        (table, columns) -> {

            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

            System.out.println(");");
        }
    );

The output couldn’t be more awesome!

CREATE TABLE CATALOGS(
  CATALOG_NAME VARCHAR
);
CREATE TABLE COLLATIONS(
  NAME VARCHAR,
  KEY VARCHAR
);
CREATE TABLE COLUMNS(
  TABLE_CATALOG VARCHAR,
  TABLE_SCHEMA VARCHAR,
  TABLE_NAME VARCHAR,
  COLUMN_NAME VARCHAR,
  ORDINAL_POSITION INTEGER,
  COLUMN_DEFAULT VARCHAR,
  IS_NULLABLE VARCHAR,
  DATA_TYPE INTEGER,
  CHARACTER_MAXIMUM_LENGTH INTEGER,
  CHARACTER_OCTET_LENGTH INTEGER,
  NUMERIC_PRECISION INTEGER,
  NUMERIC_PRECISION_RADIX INTEGER,
  NUMERIC_SCALE INTEGER,
  CHARACTER_SET_NAME VARCHAR,
  COLLATION_NAME VARCHAR,
  TYPE_NAME VARCHAR,
  NULLABLE INTEGER,
  IS_COMPUTED BOOLEAN,
  SELECTIVITY INTEGER,
  CHECK_CONSTRAINT VARCHAR,
  SEQUENCE_NAME VARCHAR,
  REMARKS VARCHAR,
  SOURCE_DATA_TYPE SMALLINT
);

Excited? The ORM era may have ended just now

This is a strong statement. The ORM era may have ended. Why? Because using functional expressions to transform data sets is one of the most powerful concepts in software engineering. Functional programming is very expressive and very versatile. It is at the core of data and data streams processing. We Java developers already know existing functional languages. Everyone has used SQL before, for instance. Think about it. With SQL, you declare table sources, project / transform them onto new tuple streams, and feed them either as derived tables to other, higher-level SQL statements, or to your Java program.

If you’re using XML, you can declare XML transformation using XSLT and feed results to other XML processing entities, e.g. another XSL stylesheet, using XProc pipelining.

Java 8’s Streams are nothing else. Using SQL and the Streams API is one of the most powerful concepts for data processing. If you add jOOQ to the stack, you can profit from typesafe access to your database records and query APIs. Imagine writing the previous statement using jOOQ’s fluent API, instead of using SQL strings.

jooq-the-best-way-to-write-sql-in-java

The whole method chain could be one single fluent data transformation chain as such:

DSL.using(c)
   .select(
       COLUMNS.TABLE_NAME,
       COLUMNS.COLUMN_NAME,
       COLUMNS.TYPE_NAME
   )
   .from(COLUMNS)
   .orderBy(
       COLUMNS.TABLE_CATALOG,
       COLUMNS.TABLE_SCHEMA,
       COLUMNS.TABLE_NAME,
       COLUMNS.ORDINAL_POSITION
   )
   .fetch()  // jOOQ ends here
   .stream() // Streams start here
   .collect(groupingBy(
       r -> r.getValue(COLUMNS.TABLE_NAME),
       LinkedHashMap::new,
       mapping(
           r -> new Column(
               r.getValue(COLUMNS.COLUMN_NAME),
               r.getValue(COLUMNS.TYPE_NAME)
           ),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> {
            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

           System.out.println(");");
       }
   );

Java 8 is the future, and with jOOQ, Java 8, and the Streams API, you can write powerful data transformation APIs. I hope we got you as excited as we are! Stay tuned for more awesome Java 8 content on this blog.

Java Rocks More Than Ever


On the TIOBE index, Java and C have been sharing the #1 and #2 rank for a long time now, and with the recent GA release of the JDK 8, things are not going to get any worse for our community.

Java simply rocks! And it’s the best platform to build almost any of your applications, out there.

But why does Java rock so much? Is it the JVM? Is it the backwards-compatibility? Is it the easy syntax? Or the millions of free and commercial software available to build your software? All of this and much more.

The Top 10 Reasons why Java Rocks More Than Ever

ZeroTurnaround’s RebelLabs often publish awesome blog posts, which we can only recommend. In this case, we’ve discovered a very well-written series of blog posts explaining why Java is so great in 10 steps, by ZeroTurnaround’s Geert Bevin. The articles include:

Part 1: The Java Compiler

The compiler is one of the things we take for granted in any language, without thinking about its great features. In Java, unlike C++, you can simply compile your code without thinking too much about linking, optimisation and all sorts of other usual compiler features. This is partially due to the JIT (Just In Time compiler), which does further compilation work at runtime.

Read the full article here

Part 2: The Core API

The JDK’s core API consists of a very solid, stable and well-understood set of libraries. While many people complain about the lack of functionality in this area (resorting to Google Guava or Apache Commons), people often forget that the core API is still the one that is underneath all those extensions. Again, from a C++ perspective, this is a truly luxurious situation.

Read the full article here

Part 3: Open Source

In this section, ZeroTurnaround’s Geert Bevin‘s mind-set aligns well with our own at Data Geekery when it comes to the spirit of Open Source – no matter whether this is about free-as-in-freedom, or free-as-in-beer, the point is that so many things about Java are “open”. We’re all in this together.

Read the full article here

Part 4: The Java Memory Model

Again, a very interesting point of view from someone with a solid C++ background. We’re taking many things for granted as Java has had a very good threading and memory model from the beginning, which was corrected only once in the JDK 1.5 in 2004, and which has built a solid grounds for newer API like actor-based ones, Fork/JOIN, etc.

Read the full article here

Part 5: High-Performance JVM

The JVM is the most obvious thing to talk about it has allowed for so many languages to work on so many hardware environments, and it runs so fast, nowadays!

Read the full article here

Part 6: Bytecode

… and the JVM also rocks because of bytecode, of course. Bytecode is a vendor-independent abstraction of machine code, which is very predictable and can be generated, manipulated, and transformed by various technologies. We’ve recently had a guest post by Dr. Ming-Yee Iu who has shown how bytecode transformations can be used to emulate LINQ in Java. Let’s hear it for bytecode!

Read the full article here

Part 7: Intelligent IDEs

15 years ago, developing software worked quite differently. People can write assembler or C programs with vi or Notepad. But when you’re writing a very complex enterprise-scale Java program, you wouldn’t want to miss IDEs, nowadays. We’ve blogged about various reasons why SQLJ has died. The lack of proper IDE support was one of them.

Read the full article here

Part 8: Profiling Tools

Remember when Oracle released Java Mission Control for free developer use with the JDK 7u40? Profiling is something very very awesome. With modern profilers, you can know exactly where your bottleneck is by simply measuring every aspect of your JVM. You don’t have to guess, you can know. How powerful is that?

Read the full article here

Part 9: Backwards Compatibility

While backwards-compatibility has its drawbacks, too, it is still very impressive how long the Java language, the JVM, and the JDK have existed so far without introducing any major backwards-compatibility regressions. The only thing that comes to mind is the introduction of keywords like assert and enum.

Could you imagine introducing the Java 8 Streams API, lambda expressions, default methods, generics, enums, and loads of other features without ever breaking anything? That’s just great!

Read the full article here

Part 10: Maturity With Innovation

In fact, this article is a summary of all the others, saying that Java has been a very well-designed and mature platform from the beginning without ever ceasing to innovate. And it’s true. With Java 8, a great next step has been published that will – again – change the way the enterprise perceives software development for good.

Read the full article here

Java Rocks More Than Ever

It does, and it’s a great great platform with a bright future for all its community participants.