Why Most Programmers Get Pagination Wrong


Pagination is one of those things that almost everyone gets wrong for two reasons:

  • User experience
  • Database performance

Here’s why.

What’s wrong with pagination?

Most applications blindly produce pagination like this:

pagination

This is how GMail implements pagination. With my current settings, it displays 100 E-Mails at a time and also shows how many E-Mails there are in total, namely 1094. Those aren’t the total number of E-Mails I’ve ever had, they’re the total number of E-Mails in my “blog-idea” label (I’m using GMail as a TODO list, and yes, this blog won’t run out of articles any time soon).

What’s wrong with this practice?

Bad user experience

As a user, in most cases, I don’t really care about the total number of objects that are in my result set. At least, I don’t care about the exact number. Does it make any difference if I have 1094 or 1093 E-Mails? What about if I had 1067? Or 1000? 1000 would be precise enough for what I’m concerned.

Also, as a user, in most cases, I don’t care that I’m on page 317 of my paginated screen that displays me rows 3170-3179 (assuming 10 rows per page). I really don’t. The page number is absolutely useless in terms of user experience.

Who got it right?

  • Facebook
  • Twitter
  • Reddit

And all the other websites that do timelines. Yes, I want to display only 10 rows at a time (or perhaps 100, but almost never all). So, pagination is important. But I don’t care about the fact that I’ve clicked 317 times on that “next page” button. If I ever browse that many pages (and I hardly ever do), then the only thing that matters is the next 10 rows. Just like when you play Civilization. You don’t care that you’re in turn 317. You just want to play one more turn:

4b665f86cbb10d44e2db6ae4c96fef4050f0ce42878015ab30cf681b84537a30[1]

Moreover, I never ever ever want to jump to page 317 right from the beginning. There’s absolutely no use case out there, where I search for something, and then I say, hey, I believe my search result will be item #3175 in the current sort order. Exactly. Instead, I will do any of these:

  • Refine the search
  • Sort the result

In both cases, I will get a result where the record that I’m looking for is much more likely to appear on page #1 or perhaps #2. Again, when was the last time you googled for SQL and then went to page #18375 to find that particular blog post that you were looking for? No. You searched for “Java 8 SQL” to find jOOQ, the best way to write SQL in Java 8. For instance.

How to implement a timeline with SQL

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”). We’ve described this in past articles here:

The SQL syntax is a bit cumbersome as the pagination criteria becomes an ordinary predicate, but if you’re using jOOQ, you can use the simple synthetic SQL SEEK clause as such:

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15)
   .limit(10)
   .fetch();

The above will fetch the next 10 players after the player with SCORE 949 and ID 15. The pagination really depends on the ORDER BY clause, which is why you have to provide as many values in the pagination as you provided columns in the ORDER BY clause.

Now, that we’ve fixed the user experience let’s also look at …

How OFFSET pagination is bad for performance

The previously linked articles about keyset pagination also mention the poor performance of OFFSET pagination. Which is kind of obvious as OFFSET has to skip a given number of rows after applying all predicates and sorting, etc. So the database has to do all the work and then throw away 3170 records (if you’re jumping to page 317 with 10 rows per page). What a waste.

The following diagram shows very nicely how OFFSET gets slower and slower for large offsets:

Reproduced from use-the-index-luke.com with permission by Markus Winand

That’s the obvious problem, but there’s another one. People always count the total number of rows to calculate the total number of possible pages. Why? To display nonsense like the following:

Page number:
1 2 3 ... 315 316 317 318 319 ... 50193 50194

Wow. OK so we’re on page number 317, which we don’t really care about in the first place, but we could just as well jump to page number 50194. This means that the database needed to run the query across all the rows just to be sure we get exactly 50194 pages in total.

Google something like page number pagination and observe the number of tutorials that show how you can implement the above nonsense. On Google Image search, you’ll find:

pagination-google

At the same time, the Google search itself reveals:

pagination-google-search

As you can see, Google estimates that there are probably at least 10 pages for your search and you can go “next”. Yes, you can skip some pages, but you cannot skip to a page number 50194, because, again:

  • No one wants that
  • It’s costly to predict, even for Google

In fact, Google search implements keyset pagination as well, just like Twitter, Facebook, Reddit. And they don’t display the total number of pages because counting that total can be very costly, depending on your database.

In particular, databases that do not support window functions will require you to run two separate queries:

  1. The actual query with a LIMIT clause
  2. An additional query replacing the SELECT column list with a simple COUNT(*)

Needless to say that this is not the best approach. If your database supports window functions (read about that miraculous SQL feature here on the jOOQ blog), you could produce the total row count in one go as such:

SELECT 
  rental_date, 
  inventory_id,
  COUNT(*) OVER()
FROM rental
WHERE customer_id = 1
ORDER BY rental_date
LIMIT 10

That COUNT(*) OVER() window function is like an ordinary aggregate function, except that it doesn’t group your results. It just counts all the rows of your result and produces that count in each row, prior to limiting the result to 10.

When run against the Sakila database, the above produces:

rental_date          inventory_id  count
2005-05-25 11:30:37          3021     32
2005-05-28 10:35:23          4020     32
2005-06-15 00:54:12          2785     32
2005-06-15 18:02:53          1021     32
2005-06-15 21:08:46          1407     32
2005-06-16 15:18:57           726     32
2005-06-18 08:41:48           197     32
2005-06-18 13:33:59          3497     32
2005-06-21 06:24:45          4566     32
2005-07-08 03:17:05          1443     32

So, we’re displaying the first page with 10 rows and we need to provide navigational links for a total of 4 pages because we have a total of 32 rows.

What happens when we benchmark this query on PostgreSQL? The first run doesn’t calculate this COUNT(*) OVER() value, whereas the second one does:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 10000;
  rec RECORD;
BEGIN
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 1: %', (clock_timestamp() - v_ts); 
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 2: %', (clock_timestamp() - v_ts); 
END$$;

The result clearly indicates that in PostgreSQL, there’s a significant overhead in calculating this value:

INFO:  Statement 1: 00:00:01.041823
INFO:  Statement 2: 00:00:03.57145

Oracle optimises things a bit better when you’re using ROWNUM to paginate:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 5000;
BEGIN
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM (
        SELECT 
          rental.*, 
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 1: ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM (
        SELECT 
          rental.*,  
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 2: ' || (SYSTIMESTAMP - v_ts));
END;
/

Result:

Statement 1: X
Statement 2: X +/- 1%

So, the COUNT(*) seems to be calculated “for free”. Bonus question: Why is that?

Due to Oracle license restrictions, we cannot publish benchmark results here, comparing Oracle with PostgreSQL, sorry, but you can run the above code yourself against the Sakila database:
https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila

Conclusion

TL;DR: OFFSET pagination bad. Keyset pagination good.

no-offset-banner-468x60.white

If you want to paginate in your application, please make sure whether you really, really, really need:

  • Exact page number
  • High page numbers
  • The last page number
  • The total number of rows

Because if you don’t (and in 98% of all UIs, you really don’t), then you can drastically speed up your queries while providing your users a much better experience. If that’s not a win-win situation worth thinking about…?

And don’t forget, jOOQ ships with native keyset pagination support!

jOOQ Tuesdays: Markus Winand is on a Modern SQL Mission


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. markuswinandThis includes people who work with SQL, Java, Open Source, and a variety of other related topics.

We are excited to talk with Markus Winand in this sixth edition. Markus is the author of the popular book SQL Performance Explained and the even more popular website Use The Index, Luke, and we’re thrilled to see that he’s pulling off another stunt:

Hi Markus – You have recently launched modern-sql.com. What is your goal with this website?

My goal for modern-sql.com is to create a textbook and reference about the SQL goodies you didn’t learn in school or university. Interestingly, online manuals about these features are pretty sparse. They come in two fashions: blog posts and vendor documentation. Blog posts are usually one-off events covering a particular feature or use case. There are many great blogs out there – the jOOQ blog being one of them – but there is no one I could recommend to learn all about recent SQL features. Vendor documentation, on the other hand, is mostly a reference about syntax—quite often even a bad one: they often don’t mention standard compliance at all and tend to follow a “proprietary features first” approach.

The consequence is that SQL market is very fragmented: besides SQL-92, there is no obvious base that is common to all databases. This becomes particularly evident on the job market: job offers either require just SQL—meaning good old relational SQL—or they require experience with a specific product. That’s pretty much the norm nowadays and nobody questions it. However, how would you think about this job opening: “Google Chrome Web Developer.” Web developers can’t choose the client’s browser. Many tried, but failed. Remember “optimized for XYZ”? That’s why web developers demanded standard conformance from browsers over the past decades. Just having launched a new website I can say that CSS conformance has improved drastically over the past five years. Ultimately, I’d like the same thing to happen for SQL. I hope that modern-sql.com sparks interest in standard conforming SQL so that developers also start to demand standard conformance from the database vendors. Quite an ambitious goal.

Last year, you’ve gone into battle against the SQL OFFSET clause. Want to shed some light on the background of that campaign?

The most striking problem with OFFSET is that it is generally used for an invalid use case: pagination. In this case, OFFSET is used to skip over a number of rows in the intention to find the rows following the previously selected ones. However, OFFSET does per definition not return the rows following one that was selected earlier, but just discards the first N rows of the result. Coincidentally, OFFSET yields the expected result if the data has not changed in the meanwhile—a case that is pretty common during development. But as soon as rows are added or deleted, discarding a fixed number of rows just doesn’t give the right result. The correct approach is to remember the last row fetched and use this data in a WHERE clause to select the rows following. This approach is explained in detail at http://use-the-index-luke.com/no-offset.

Besides the fact that OFFSET cannot be used to implement correct pagination, OFFSET is also bad for performance. OFFSET is wrong and slow. What else do you need? As a matter of fact, the only valid use case I know for OFFSET it to implement SLEEP in SQL—not that I ever need that. Unfortunately, OFFSET made it into the SQL standard in 2011. I consider this the worst mistake in recent history of SQL because it can’t be corrected. The only good part is that it is an optional feature—vendors don’t need to implement for standard conformance. Nevertheless, Oracle and Microsoft just recently added OFFSET to their SQL databases.

You’ve written a very popular book on SQL Performance called SQL Performance explained. How does it compare to other SQL books and why should our readers buy it?

I’ll start with the second question. First of all you must know that the full content of SQL Performance Explained is available for free at http://use-the-index-luke.com/. Most people I’m asking why they bought the book did so because the like the web site. They bought the book either to support my work on Use The Index, Luke (greatly appreciated!) or, more importantly, to finally read the book from cover to cover. A typical answer I get goes along these lines: “I knew Use The Index, Luke for years and have read many articles there, but I finally wanted to read everything from the beginning to end.”

Now coming to the first question why the world needed another SQL performance tome: it didn’t. Therefore, I wrote a very small book that can be read in less than a day. I focused on the basic concepts, which are the same in most databases, and boldly skipped less common special cases. Its shortness is also most appreciated in the reviews. On the other hand, the book has occasionally being criticized as being incomplete—probably because the sub-title reads “Everything developers need to know about SQL performance”. Personally, I think these critiques somehow proof my point: Obviously, Java, PHP or .NET developers don’t need to know as much about SQL performance as database performance consultants. When writing for such an audience, you must skip a lot.

Where do you see SQL in 10 years from today?

I hope that the temporal features of SQL:2011 (see here) become commonly available—also in free open source databases. At the moment, they are only available in commercial databases—even there the completeness and standard conformance varies. I would also hope that the SQL standard finds a way to cope with the current trend that every database vendor adds its own proprietary set of JSON functions. Unfortunately, it might be too late for that already.

However, my greatest hope is that developers realize that SQL is not stuck in 1992. The standard has added many useful features since than. Most databases offer a good part of these features. It’s really just our perception of SQL that got stuck in 1992.

Learn more about Markus’s work

… Markus is giving his Modern SQL talk at conferences. Learn more about it here:

10 SQL Articles Everyone Must Read


We’ve been blogging about Java and SQL for a while now, on the jOOQ blog. Over the years, while researching interesting blog topics, we’ve discovered a lot of SQL gems in the blogosphere that have inspired our work and our passion for SQL.

Today, we’re presenting to you a list of 10 articles that we think you should absolutely read. At the end of the list, you will agree that either:

  • SQL is awesome
  • SQL is crazy

… or probably both. Here goes, in no particular order:

1. Joe Celko: “Divided We Stand: The SQL of Relational Division”

Relational division is a very powerful concept in relational algebra. It answers questions like:

Give me all the students that have completed a given set of courses

Unfortunately, division doesn’t have any direct equivalent in SQL. We wish there would be a clause like

TABLE_A
  DIVIDE BY TABLE_B 
  ON [ some predicate ]

Nonetheless, you can express a division in SQL in various ways.

Read Joe’s: “Divided We Stand: The SQL of Relational Division

2. Alex Bolenok: “Happy New Year!”

Alek Bolenok (a.k.a. Quassnoi) blogs about various interesting SQL-related things, but one of his top contributions every year are his “happy new year” series. Alek paints “beautiful” (beauty is in the eye of the beholder), and certainly impressive pictures into your SQL console. For instance:

Read Alek’s, “Christmas tree in SQL

3. Markus Winand: “Clustering Data: The Second Power of Indexing”

Markus Winand is the author of the popular book SQL Performance Explained, parts of which you can also read on his blog “Use The Index Luke“. There is an incredible amount of very useful knowledge both in the book and on this page, but one of the most revealing and neat SQL tricks is to know about “covering indexes”, “clustering indexes”, or “index only scans”

Read Markus’s: “Clustering Data: The Second Power of Indexing

4. Dimitri Fontaine: “Understanding Window Functions”

There was SQL before window functions and SQL after window functions

Window functions are some of the most powerful and underused features of SQL. They’re available in all commercial databases, in PostgreSQL, and soon also in Firebird 3.0. We’ve blogged about window functions a couple of times ourselves, but one of the best summaries and explanations about what they really are and how they work has been written by Dimitri Fontaine.

Read Dimitri’s: “Understanding Window Functions

5. Lukas Eder: “10 Common Mistakes Java Developers Make when Writing SQL”

A bit of advertising for our own writing. We’ve collected 10 of the most common mistakes that Java developers make when writing SQL. These mistakes are actually not even specific to Java developers, they could happen to any developer. This article has generated so much traction on our blog, there must be some great truth in it.

Read Lukas’s: “10 Common Mistakes Java Developers Make when Writing SQL

6. András Gábor’s “Techniques for Pagination in SQL”

Up until recently, offset pagination has been rather difficult to implement in commercial databases like Oracle, SQL Server, DB2, Sybase, which didn’t feature the equivalent of MySQL’s / PostgreSQL’s LIMIT .. OFFSET clause. Pagination could be emulated, however, and there are a lot of techniques for doing that. Picking the right technique is essential for performance. If you’re using Oracle 11g or less, you should filter on ROWNUM:

Read András’s “Techniques for Pagination in SQL

On a side-note, you should probably consider not using OFFSET at all. For details about the NO OFFSET movement, read…

7. Markus Windand: “We need tool support for keyset pagination”

If you think about OFFSET pagination, it’s actually a rather dumb thing from a technical perspective, and a useless thing from a business perspective. Here’s why.

From a technical perspective…

… you need to apply filtering, grouping, and ordering on a vast amount of data, skipping and throwing away all the data that appears before the offset until you reach the first row of interest. That is a lot of waste of resources given that…

From a business perspective…

… perhaps, pages 1-3 are interesting, but there is absolutely no meaning in offering users to navigate to page 1337. After a certain offset, the meaning of the offset from a business perspective has vanished. You might as well display random, unordered data samples. The user wouldn’t notice. Proably, when you reach a higher page on Google search results, this is exactly what happens. Random stuff.

Or on reddit. There, you get random stuff already on the first page – such as this popular display of a Stabilized head on green vine snake.

Much better than offset pagination is keyset pagination (which we’ve blogged about as well).

Read Markus’s “We need tool support for keyset pagination

no-offset-banner-468x60.white

8. Josh Berkus “Tag All The Things”

Implementing tagging in a relational database can be a beast from a performance perspective. Should you normalise (one-to-many)? Should you normalise heavily (many-to-many)? Should you use nested collections / arrays / or even JSON data structures?

Josh has written a very interesting write-up on the performance of heavy tagging in PostgreSQL, showing that normalisation isn’t always the best choice.

Read Josh’s “Tag All The Things

9. Alek Bolenok’s “10 things in SQL Server (which don’t work as expected)”

This is again Alek’s (Quassnoi’s) work. A very interesting set of things that happen inside of SQL Server, which you might not have expected when you’re used to using other databases. Whether you’re using SQL Server or not, this is a must-read to re-raise awareness of the subtle little differences between SQL implementations

Read Alek’s “10 things in SQL Server (which don’t work as expected)

10. Aaron Bertrand: “Best approaches for running totals”

Running totals are a very typical use-case for SQL-based reporting. A running total is something that every project manager using Excel knows intuitively how to do. Just drag-and-drop that sweet sweet formula across your spreadsheet and done:

excel-running-total

How to do the same in SQL? There are again tons of ways. Aaron Bertrand has summarised various solutions for SQL Server 2012.

Read Aaron’s “Best approaches for running totals

Many other articles

There are, of course, many other very good articles providing deep insight into useful SQL tricks. If you find you’ve encountered an article that would nicely complement this list, please leave a link and description in the comments section. Future readers will appreciate the additional insight.

jOOQ Newsletter: August 15, 2014 – jOOQ 3.5 Outlook


Subscribe to this newsletter here

jOOQ 3.5 Outlook

We’re working hard on the next release. Already 90 issues for jOOQ 3.5 are closed and counting! Today, we’re going to look at the highlights of what will be implemented in the next, exciting minor release, due for Q4 2014:

  • Support for new databases

    Our customers have been asking us for support of the Informix and Oracle TimesTen databases. While Informix is a very popular (and also old!) database, still widely used in the Industry, Oracle TimesTen is a promising new in-memory database with a very similar syntax to that of Oracle.

    With these two new additions, jOOQ will now support 18 RDBMS!

  • File-based code generation support

    This has been on our roadmap for a very long time, and finally we’re tackling it! If your development workflow prevents you from accessing a database during code generation, you can now also supply database meta information in XML format. We chose XML over any other format as it will be very easy to transform arbitrary pre-existing formats using XSLT (e.g. Hibernate hbm.xml, or ERD tools like Vertabelo‘s export format).

    We’re really looking forward to going live with this awesome feature, and in seeing a variety of community-contributed XSLT pop up, to help you integrate jOOQ with your favourite database schema definition format.

  • TypeProviders

    Sophisticated databases like PostgreSQL ship with a large variety of vendor-specific data types. It’s hard for jOOQ to support them all natively, but why add native support, when we can add another awesome SPI?

    TypeProviders will allow for abstracting over the “<T>” type, jOOQ’s column type. This will go far beyond data type conversion, it will allow you to specify how jOOQ will bind your user type to JDBC completely transparently.

These are just a few major features that we’ll be including in jOOQ 3.5, along with a lot of minor ones – so stay tuned for more jOOQ goodness.

Data Geekery Partner Network

Isaac Newton coined it like no one else:

If I have seen further it is by standing on ye sholders of Giants.

At Data Geekery, we’re looking into seeing further with jOOQ as well as we are now starting to offer and recommend services to the jOOQ ecosystem through our trusted integration partners. Today, we’re happy to recommend to you:


Germany based UWS Software Service (UWS) specialises in custom software development, application modernisation and outsourcing with a distinct focus on the Java Enterprise ecosystem.

UWS has successfully integrated the jOOQ Open-Source Edition with a variety of enterprise software projects. Their offering include custom jOOQ integrations into your system landscape and migration solutions from JDBC and/or JPA to jOOQ. UWS further offers development of custom enterprise applications using jOOQ.


“Almost every performance problem is caused by excessive use of ORM tools or improper indexing.”

Markus Winand specialises in these topics and provides SQL training and tuning services for developers. “It is difficult to tell Java developers to use SQL when Hibernate is not the right tool for a particular query” Winand said, and continued “JDBC is just too cumbersome and dangerous. jOOQ makes SQL in Java simple and safe—now I can show people how to get best of both worlds.”


 

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ with the world and we can hardly catch up with them! Here are:

Majid Azimi, who is writing SQL like a boss with jOOQ

Christoph Henkelmann, who Has found the most awesome of all stacks to build great web applications. And that consists of Ninjaframework, jOOQ, BoneCP – Slim, Fast, Reliable. We couldn’t have said it any better, ourselves.

Nat Pryce, who simply loves doing SQL queries with jOOQ in Java 8.

Thanks for the shouts, guys! You make the jOOQ experience rock!

SQL Zone – Keyset Pagination

Markus Winand, author of Use The Index, Luke! has recently started a promotion against OFFSET pagination, in favour of keyset pagination, which he called #NoOffset.

We’ve blogged about this ourselves, before. Most people make use of OFFSET pagination because it is the default that is supported by almost all RDBMS.

In many cases, however, you do not need to paginate using OFFSETs, which can turn out to be very slow for large results and large offsets. Keyset pagination is very useful when you want to implement infinite scrolling, like Twitter, Facebook, etc.

jOOQ is one of the few APIs, and the only Java API that natively support keyset pagination.

SQL Zone – PIVOT your data

Every now and then, you have one of those fancy reporting problems where SQL just fits in perfectly. We’ve blogged about it: Are You Using PIVOT Yet?

With the Oracle and SQL Server PIVOT clause, it is very easy to flip rows and columns in a table. Imagine you have a table like this:

+------+----------------+-------------------+
| dnId |  propertyName  |   propertyValue   |
+------+----------------+-------------------+
|    1 | objectsid      | S-1-5-32-548      |
|    1 | _objectclass   | group             |
|    1 | cn             | Account Operators |
|    1 | samaccountname | Account Operators |
|    1 | name           | Account Operators |
|    2 | objectsid      | S-1-5-32-544      |
|    2 | _objectclass   | group             |
|    2 | cn             | Administrators    |
|    2 | samaccountname | Administrators    |
|    2 | name           | Administrators    |
|    3 | objectsid      | S-1-5-32-551      |
|    3 | _objectclass   | group             |
|    3 | cn             | Backup Operators  |
|    3 | samaccountname | Backup Operators  |
|    3 | name           | Backup Operators  |
+------+----------------+-------------------+

And now, you’d like to transform this table to the below:

+------+--------------+--------------+-------------------+-----
| dnId |  objectsid   | _objectclass |        cn         | ... 
+------+--------------+--------------+-------------------+-----
|    1 | S-1-5-32-548 | group        | Account Operators | ... 
|    2 | S-1-5-32-544 | group        | Administrators    | ... 
|    3 | S-1-5-32-551 | group        | Backup Operators  | ... 
+------+--------------+--------------+-------------------+-----

This is a piece of cake using PIVOT:

SELECT p.*
FROM (
  SELECT dnId, propertyName, propertyValue
  FROM myTable
) AS t
PIVOT(
  MAX(propertyValue)
  FOR propertyName IN (
    objectsid, 
    _objectclass, 
    cn, 
    samaccountname, 
    name
  )
) AS p;

jOOQ natively supports the PIVOT clause, which is definitely one of those tools to have on every reporting SQL developer’s tool chain. Read more about it here and here (original source on Stack Overflow).

Join the No OFFSET Movement!


Markus Winand from Use The Index, Luke! did it again. He started an exciting battle against one the biggest flaws in the SQL language:

No More OFFSET

We’ve blogged about this before. OFFSET pagination is terribly slow, once you reach higher page numbers. Besides, chances are, that your database doesn’t even implement it correctly, yet (and your emulation is probably wrong, too).

Join Markus’s movement for KEYSET pagination, which isn’t only much faster, but also more intuitive. Popular websites like Reddit, Twitter, Facebook and many more already implement keyset pagination. Why don’t you?

jOOQ is the only Java SQL API that already implements KEYSET pagination natively using the synthetic SEEK clause. Here’s how to do it:

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15) // This jumps to the tuple (949, 15)
   .limit(10)
   .fetch();

Read more about this new movement here: http://use-the-index-luke.com/no-offset

jOOQ Newsletter: June 18, 2014 – jOOQ goes JavaOne™


jOOQ Goes JavaOne™

We’re super excited to announce that our CEO and Head of R&D Lukas will be heading to San Francisco this fall to talk about jOOQ at JavaOne™! This is not just great for Data Geekery and jOOQ, but also for the whole Java / SQL ecosystem, as we believe that the Java / SQL integration deserves much more focus at conferences, where buzzwords like Big Data and NoSQL dominate the agenda disproportionally.

From our perspective, the JVM is the best platform for general purpose languages, whereas SQL is the best tool for database interaction – with Oracle SQL being a leader in the industry. So…

2014 will be a great year for Java and SQL

Prior to JavaOne™, we have also been talking at the awesome 33rd Degree and GeekOut conferences, the latter having been hosted by our friends over at ZeroTurnaround who have launched XRebel, a very promising tool to help you find rogue SQL statements in your application. Stay tuned as we’ll be trying out XRebel to compare jOOQ with Hibernate on our blog, soon.

Stay informed about 2014 events on www.jooq.org/news.

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ with the world and we can hardly catch up with them! Here are:

Alessio Harri, who is having a great time using OpenJPA (the workhorse) with jOOQ (the artist)

The Vertabelo team over at e-point, who like using jOOQ as a database consumer for their SaaS, collaborative ERD tool.

Petri Kainulainen, who is working around performance problems by getting back in control of his SQL.

Thanks for the shouts, guys! You make the jOOQ experience rock!

Upcoming License Improvements

From our recent negotiations with site license leads, we’ve come to two conclusions that will benefit all of the jOOQ Professional and jOOQ Enterprise customer base.

  • When you buy a car or a TV, you probably don’t run to the supplier every time you encounter a small defect that prevents you from fully enjoying your product. You’ll fix it yourself. We want to do the same in the future. As we trust our customers, and as we already ship our sources, we will soon allow you to implement urgent fixes to jOOQ yourselves, as we believe that this will improve the jOOQ experience for everyone and add further value to your own experience.
  • We understand the requirements of purchasing departments in large organisations. Often, it is easier to purchase a site license from a supplier rather than going through the hassles of adapting workstation-based subscriptions all the time. To respond to this need, we’ll soon publish a discounted, tiered pricing model for large-volume purchases of our perpetual licenses.

Both of these improvements will be deployed to all of our customer base in the beginning of July.

Do you already have any questions regarding what will change / improve? Do not hesitate to contact us.

Community Zone – The jOOQ aficionados have been active!

The jOOQ community has been very active again in the last month. We’re happy to point out these editor’s picks from our radar:

Vlad Mihalcea is a very active blogger on the subject of Hibernate integrations, transaction mangagement and connection pooling performance. We’re looking forward to his future blog posts about how to integrate ORMs with SQL/jOOQ, e.g. by applying emerging architecture patterns such as CQRS. One of his most recent, very interesting blog posts deals precisely with that subject.

Micha Kops has been blogging about a variety of Java tool integrations and has now published this comprehensive and very useful jOOQ tutorial. It is great to see fresh opinions from people just getting to know the platform and blogging about it.

SQL Zone – More common SQL mistakes

Our popular blog series “Top 10 mistakes Java developers make when writing SQL”has been enhanced with yet another must-read article for the Java/SQL community:

Did you know about all those common mistakes? We bet you didn’t. If you haven’t seen enough, see also our previous articles:

SQL Zone – Don’t roll your own OFFSET pagination emulation

One of the great reasons why you should use jOOQ is the fact that jOOQ abstracts away all the hard parts of your SQL dialect. If you’re using Oracle (prior to 12c), SQL Server (prior to 2012), or DB2, you might need to emulate what other databases know as OFFSET pagination. While most people get the simple use-cases right, we’ve tried to outline all of the other issues that may arise when you try to do it yourself in our blog post:

A must-read for all SQL transformation aficionados.

Feedback zone

You’ve read to the end of this newsletter, that’s great! Did you like it? What did we do great? What can we improve? What other subjects would you like us to cover?

We’d love to hear from you, so if you want to reach out to us, just drop a message to contact@datageekery.com. Looking forward to hearing from you!

SQL Server Trick: Circumvent Missing ORDER BY Clause


SQL Server is known to have a very strict interpretation of the SQL standard. For instance, the following expressions or statements are not possible in SQL Server:

-- Get arbitrarily numbered row_numbers
SELECT ROW_NUMBER() OVER ()

-- Skip arbitrary rows
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
OFFSET 3 ROWS

Strictly speaking, that limitation makes sense because the above ROW_NUMBER() or OFFSET expressions are non-deterministic. Two subsequent executions of the same query might produce different results. But then again, any ORDER BY clause is non-deterministic, if you do not order by a strictly UNIQUE expression, such as a primary key.

So, that’s a bit of a pain, because other databases aren’t that strict and after all, you might just not care about explicit ordering for a quick, ad-hoc query, so a “reasonable”, lenient default would be useful.

Constant ORDER BY clauses don’t work

You cannot add a constant ORDER BY clause to window functions either. I.e.:

-- This doesn't work:
SELECT ROW_NUMBER() OVER (ORDER BY 'a')

-- But this does!
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY 'a'
OFFSET 3 ROWS

Note that ORDER BY 'a' uses a constant VARCHAR expression, not a numeric one, as that would be generating column-reference-by-index expressions, which would be non-constant in the second example.

Random column references don’t work

So you’re thinking that you can just add a random column reference? Sometimes you can, but often you cannot:

-- This doesn't work:
SELECT ROW_NUMBER() OVER (
  ORDER BY [no-column-available-here]
)

-- But this does!
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY a
OFFSET 3 ROWS

The above examples show that you do not always have a column reference available in any given SQL expression. There is no useful column that you could refer to from the ROW_NUMBER() function. At the same time, you can write ORDER BY a in the second example, but only if a is a “comparable” value, i.e. not a LOB, such as text or image.

Besides, as we don’t really care about the actual ordering, is it worth ordering the result set by anything at all? Do you happen to have an index on a?

Quasi-constant ORDER BY expressions do work

So, to stay on the safe side, if ever you need a dummy ORDER BY expression in SQL Server, use a quasi-constant expression, like @@version (or @@language, or any of these). The following will always work:

-- This always works:
SELECT ROW_NUMBER() OVER (ORDER BY @@version)

-- So does this:
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY @@version
OFFSET 3 ROWS

From the upcoming jOOQ 3.4, we’ll also generate such synthetic ORDER BY clauses that will help you simplify writing vendor-agnostic SQL in these edge-cases, as we believe that you simply shouldn’t think of these things all the time.