Why Most Programmers Get Pagination Wrong


Pagination is one of those things that almost everyone gets wrong for two reasons:

  • User experience
  • Database performance

Here’s why.

What’s wrong with pagination?

Most applications blindly produce pagination like this:

pagination

This is how GMail implements pagination. With my current settings, it displays 100 E-Mails at a time and also shows how many E-Mails there are in total, namely 1094. Those aren’t the total number of E-Mails I’ve ever had, they’re the total number of E-Mails in my “blog-idea” label (I’m using GMail as a TODO list, and yes, this blog won’t run out of articles any time soon).

What’s wrong with this practice?

Bad user experience

As a user, in most cases, I don’t really care about the total number of objects that are in my result set. At least, I don’t care about the exact number. Does it make any difference if I have 1094 or 1093 E-Mails? What about if I had 1067? Or 1000? 1000 would be precise enough for what I’m concerned.

Also, as a user, in most cases, I don’t care that I’m on page 317 of my paginated screen that displays me rows 3170-3179 (assuming 10 rows per page). I really don’t. The page number is absolutely useless in terms of user experience.

Who got it right?

  • Facebook
  • Twitter
  • Reddit

And all the other websites that do timelines. Yes, I want to display only 10 rows at a time (or perhaps 100, but almost never all). So, pagination is important. But I don’t care about the fact that I’ve clicked 317 times on that “next page” button. If I ever browse that many pages (and I hardly ever do), then the only thing that matters is the next 10 rows. Just like when you play Civilization. You don’t care that you’re in turn 317. You just want to play one more turn:

4b665f86cbb10d44e2db6ae4c96fef4050f0ce42878015ab30cf681b84537a30[1]

Moreover, I never ever ever want to jump to page 317 right from the beginning. There’s absolutely no use case out there, where I search for something, and then I say, hey, I believe my search result will be item #3175 in the current sort order. Exactly. Instead, I will do any of these:

  • Refine the search
  • Sort the result

In both cases, I will get a result where the record that I’m looking for is much more likely to appear on page #1 or perhaps #2. Again, when was the last time you googled for SQL and then went to page #18375 to find that particular blog post that you were looking for? No. You searched for “Java 8 SQL” to find jOOQ, the best way to write SQL in Java 8. For instance.

How to implement a timeline with SQL

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”). We’ve described this in past articles here:

The SQL syntax is a bit cumbersome as the pagination criteria becomes an ordinary predicate, but if you’re using jOOQ, you can use the simple synthetic SQL SEEK clause as such:

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15)
   .limit(10)
   .fetch();

The above will fetch the next 10 players after the player with SCORE 949 and ID 15. The pagination really depends on the ORDER BY clause, which is why you have to provide as many values in the pagination as you provided columns in the ORDER BY clause.

Now, that we’ve fixed the user experience let’s also look at …

How OFFSET pagination is bad for performance

The previously linked articles about keyset pagination also mention the poor performance of OFFSET pagination. Which is kind of obvious as OFFSET has to skip a given number of rows after applying all predicates and sorting, etc. So the database has to do all the work and then throw away 3170 records (if you’re jumping to page 317 with 10 rows per page). What a waste.

The following diagram shows very nicely how OFFSET gets slower and slower for large offsets:

Reproduced from use-the-index-luke.com with permission by Markus Winand

That’s the obvious problem, but there’s another one. People always count the total number of rows to calculate the total number of possible pages. Why? To display nonsense like the following:

Page number:
1 2 3 ... 315 316 317 318 319 ... 50193 50194

Wow. OK so we’re on page number 317, which we don’t really care about in the first place, but we could just as well jump to page number 50194. This means that the database needed to run the query across all the rows just to be sure we get exactly 50194 pages in total.

Google something like page number pagination and observe the number of tutorials that show how you can implement the above nonsense. On Google Image search, you’ll find:

pagination-google

At the same time, the Google search itself reveals:

pagination-google-search

As you can see, Google estimates that there are probably at least 10 pages for your search and you can go “next”. Yes, you can skip some pages, but you cannot skip to a page number 50194, because, again:

  • No one wants that
  • It’s costly to predict, even for Google

In fact, Google search implements keyset pagination as well, just like Twitter, Facebook, Reddit. And they don’t display the total number of pages because counting that total can be very costly, depending on your database.

In particular, databases that do not support window functions will require you to run two separate queries:

  1. The actual query with a LIMIT clause
  2. An additional query replacing the SELECT column list with a simple COUNT(*)

Needless to say that this is not the best approach. If your database supports window functions (read about that miraculous SQL feature here on the jOOQ blog), you could produce the total row count in one go as such:

SELECT 
  rental_date, 
  inventory_id,
  COUNT(*) OVER()
FROM rental
WHERE customer_id = 1
ORDER BY rental_date
LIMIT 10

That COUNT(*) OVER() window function is like an ordinary aggregate function, except that it doesn’t group your results. It just counts all the rows of your result and produces that count in each row, prior to limiting the result to 10.

When run against the Sakila database, the above produces:

rental_date          inventory_id  count
2005-05-25 11:30:37          3021     32
2005-05-28 10:35:23          4020     32
2005-06-15 00:54:12          2785     32
2005-06-15 18:02:53          1021     32
2005-06-15 21:08:46          1407     32
2005-06-16 15:18:57           726     32
2005-06-18 08:41:48           197     32
2005-06-18 13:33:59          3497     32
2005-06-21 06:24:45          4566     32
2005-07-08 03:17:05          1443     32

So, we’re displaying the first page with 10 rows and we need to provide navigational links for a total of 4 pages because we have a total of 32 rows.

What happens when we benchmark this query on PostgreSQL? The first run doesn’t calculate this COUNT(*) OVER() value, whereas the second one does:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 10000;
  rec RECORD;
BEGIN
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 1: %', (clock_timestamp() - v_ts); 
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 2: %', (clock_timestamp() - v_ts); 
END$$;

The result clearly indicates that in PostgreSQL, there’s a significant overhead in calculating this value:

INFO:  Statement 1: 00:00:01.041823
INFO:  Statement 2: 00:00:03.57145

Oracle optimises things a bit better when you’re using ROWNUM to paginate:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 5000;
BEGIN
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM (
        SELECT 
          rental.*, 
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 1: ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM (
        SELECT 
          rental.*,  
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 2: ' || (SYSTIMESTAMP - v_ts));
END;
/

Result:

Statement 1: X
Statement 2: X +/- 1%

So, the COUNT(*) seems to be calculated “for free”. Bonus question: Why is that?

Due to Oracle license restrictions, we cannot publish benchmark results here, comparing Oracle with PostgreSQL, sorry, but you can run the above code yourself against the Sakila database:
https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila

Conclusion

TL;DR: OFFSET pagination bad. Keyset pagination good.

no-offset-banner-468x60.white

If you want to paginate in your application, please make sure whether you really, really, really need:

  • Exact page number
  • High page numbers
  • The last page number
  • The total number of rows

Because if you don’t (and in 98% of all UIs, you really don’t), then you can drastically speed up your queries while providing your users a much better experience. If that’s not a win-win situation worth thinking about…?

And don’t forget, jOOQ ships with native keyset pagination support!

“What Java ORM do You Prefer, and Why?” – SQL of Course!


Catchy headline, yes. But check out this Stack Overflow question by user Mike:

(I’m duplicating it here on the blog, as it might be deleted soon)

It’s a pretty open ended question. I’ll be starting out a new project and am looking at different ORMs to integrate with database access.

Do you have any favorites? Are there any you would advise staying clear of?

And the top voted answer (164 points by David Crawshaw is: “Just use SQL”:

I have stopped using ORMs.

The reason is not any great flaw in the concept. Hibernate works well. Instead, I have found that queries have low overhead and I can fit lots of complex logic into large SQL queries, and shift a lot of my processing into the database.

So consider just using the JDBC package.

The second answer (66 points by user simon) is, again: “Just use SQL”:

None, because having an ORM takes too much control away with small benefits. The time savings gained are easily blown away when you have to debug abnormalities resulting from the use of the ORM. Furthermore, ORMs discourage developers from learning SQL and how relational databases work and using this for their benefit.

The third answer (51 points by myself) is saying, once more: “Use SQL” (and use it with jOOQ).

The best way to write SQL in Java

Only the fourth answer (46 points by Abdullah Jibaly) mentiones Hibernate, the most popular ORM in the Java ecosystem.

The truth is, as we’ve shown numerous times on this blog: Hibernate/JPA/ORMs are good tools to get rid of boring (and complex) CRUD. But that’s just boilerplate logic with little value to your business logic. The interesting stuff – the queries, the batch and bulk processing, the analytics, the reporting, they’re all best done with SQL. Here are some additional articles:

Stay tuned as we’re entering an era of programming where object orientation fades, and functional / declarative programming makes data processing extremely easy and lean again.

With Commercial Licensing, Invest in Innovation, not Protection


When people start creating commercially licensed software (like we did, in 2013 with jOOQ), there is always the big looming question:

What do I do about piracy?

I’ve had numerous discussions with fellow entrepreneurs about this topic, and this fear is omnipresent. There has also been a recent discussion on reddit, titled “prevent sharing of a Java library”. I felt reminded of the early commercial jOOQ days, when I discussed the various options / modalities of the new commercial jOOQ license with the Data Geekery legal counsel – which was clearly the biggest financial investment in early company days.

Build licensing around trust, not fear

One thing that bothered me most about jOOQ’s dual license in its early days is that our paying customers will have less rights than our continued Open Source users. Obviously. If you’re using jOOQ with an Open Source database, you can use the jOOQ Open Source Edition for free under the terms of the very permissive Apache License 2.0. You can do pretty much anything with jOOQ including forking it, rewriting it, creating and distributing a new jOOQ (just don’t name it jOOQ, we own the trademark). The only limitation is: it doesn’t work with commercial databases, but you don’t care about that if you’re using MySQL or PostgreSQL for the next 10 years.

If you’re using jOOQ with a commercial database, however, you need to purchase a jOOQ Professional Edition or jOOQ Enterprise Edition license. Of course, there are costs, but that’s not the problem, because jOOQ is awesome and delivers 50x the value it costs.

The problem is that:

  • Interested developers, architects, etc. now have to go through the hassle of convincing their employer’s legal / purchasing / compliance / … departments and do all sorts of paperwork.
  • Paying customers (at the beginning) could no longer patch jOOQ if they found a bug. They had to wait for us, the vendor, to deliver a fix.

These things were remedied rather quickly in the commercial license text. The commercial jOOQ license now grants access to the commercial source code and allows users to modify jOOQ themselves (of course warranty is then disclaimed), without needing to wait for the vendor to deliver the fix. It is still not allowed to distribute the modification’s source code, although we’re looking into possible legal solutions to allow that as well, such that commercial customers can share patches for commercial parts of jOOQ as well.

In other words: We want our commercial customers feel as if jOOQ were Open Source for their every day job.

Isn’t that crazy?

So, people get the entire source code and can build jOOQ. Aren’t we afraid that our commercial, “cracked” jOOQ distributions wind up on warez sites? Of course we are. And it happens. And we’re maintaining a list of companies that “obviously” don’t comply with our terms (e.g. they’ve been using the free trial in production for 2 years). But they’re only very few. And even if they weren’t few, should we introduce tracking logic in jOOQ to check if customers / trial users are compliant? Should we collect IP addresses? User E-Mails? Count workstations? Shut down jOOQ, if non compliant? Shut it down where, on production servers, perhaps?

The truth is, most companies are honest. We’ve had many customers frequently upgrade their contracts. E.g. every couple of months, they’ve purchased new licenses. We’ve had other customers reduce their contracts. The team started with 5 licenses and now consists only of 1 person doing maintenance work. We’ve had customers not touching their contracts, using jOOQ with e.g. 10 licenses for a long time. Some of these are overlicensed, yes. Some of these are underlicensed. It happens. It’s not good, but it’s also not the end of the world. We’re in constant touch with them to see if their license count is still up to date. In the end, we trust them. They trust us.

The worst thing we could do now is introduce some sort of license checker that might be buggy and accidentally shuts down our honest customers’ production system! For the slight chance that we might be catching someone who “cracked” our software (and who probably manages to “crack” also our license checker anyway).

Even if we’re not shutting down the software but logging messages somewhere, the end user (our customers’ customer) might get a very weird feeling when they see it. We would be indirectly damaging our customers’ reputation for what was probably just an oversight. Or, again, a bug in our license checker.

Is that really worth the trouble? No way!

(an exception is the free trial version, which must be deleted after one month)

Different types of software

The problem is: There are different types of software. Essentially, there are:

  • Tools
  • Libraries
  • Servers
  • SaaS

The difference is:

  • Tools run on premise but are independent of the end user application. The customer doesn’t depend on the tool.
    For instance: IntelliJ IDEA
  • Libraries run on premise and are embedded in the end user application. The customer completely depends on the library.
    For instance: jOOQ
  • Servers run on premise and are linked to the end user application but run independently. The customer can often replace the server, but cannot “open” it. It’s a black box.
    For instance: Oracle Database
  • SaaS runs in the cloud and is completely independent of the end user application. The customer cannot influence it in any way.
    For instance: Salesforce.com

As you can see, tools are “not critical” for the customer (being “just” UIs giving access to “the real system”), and while servers and SaaS are critical, all three of them run independently. It is easy for a vendor to implement license checkers inside of those because they’re in control of running the software. With libraries, it’s different. You cannot make any assumptions about how your library is “run”. Is it part of a single JVM? Or multiple JVMs? Run for a couple of seconds only, or for a long time? Used frequently or only very rarely? Is it allowed to spawn its own threads, collect its own data?

Too many open questions.

Long story short: Invest in innovation, not protection

Unless protection is really important in your area because you’re using extremely complex algorithms that no one should know about, then you shouldn’t worry about piracy too much from a technical perspective. In our case, jOOQ isn’t super secret. Anyone can build their own jOOQ, and there are many (much simpler) competitor frameworks. Our business is maintaining by far the best Java SQL DSL and apparently, no one else wants to compete with us in this niche, so why be afraid of theft.

If protection is important, well then run a SaaS, because then you don’t need to ship any software. For instance: Google, one of the biggest SaaS vendors out there, doesn’t share its search engine algorithms with you.

Once you stop worrying about who is going to steal from you, you can start investing all that time in awesome new features and quality to make those loyal and honest customers happy who happily pay for your software. And who knows. Perhaps some of your “pirate customers” will eventually switch jobs and work for someone who takes compliance more seriously. They have been happy “customers” too, and will also recommend your software to their new peers.

Using Stored Procedures With JPA, JDBC… Meh, Just Use jOOQ


The current edition of the Java magazine has an article about Big Data Best Practices for JDBC and JPA by Josh Juneau:
http://www.javamagazine.mozaicreader.com/MayJune2016

The article shows how to use a stored procedure with JDBC (notice how resources aren’t closed, unfortunately. This is commonly forgotten, even in Java Magazine articles)

// Using JDBC to call upon a database stored
// procedure
CallableStatement cs = null;
try {
    cs = conn.prepareCall("{call DUMMY_PROC(?,?)}");
    cs.setString(1, "This is a test");
    cs.registerOutParameter(2, Types.VARCHAR);
    cs.executeQuery();

    // Do something with result
    String returnStr = cs.getString(2);
} catch (SQLException ex){
    ex.printStackTrace();
}

And with JPA:

// Utilize JPA to call a database stored procedure
// Add @NamedStoredProcedureQuery to entity class
@NamedStoredProcedureQuery(
    name="createEmp", procedureName="CREATE_EMP",
    parameters = {
        @StoredProcedureParameter(
            mode= ParameterMode.IN,
            type=String.class,
            name="first"),
        @StoredProcedureParamter(
            mode = ParameterMode.IN,
            type=String.class,
            name="last")
    })

// Calling upon stored procedure
StoredProcedureQuery qry =
    em.createStoredProcedureQuery("createEmp");
qry.setParameter("first", "JOSH");
qry.setParameter("last","JUNEAU");
qry.execute();

Specifically the latter was also recently discussed in blog posts by Vlad Mihalcea and Thorben Janssen.

Do you like verbosity and complexity?

No? We neither. This is why we give you a third option instead: Just use jOOQ. Here’s the equivalent jOOQ code:

// JDBC example:
String returnStr = Routines.dummyProc(
    config, "This is a test");

// JPA example
Routines.createEmp(config, "JOSH", "JUNEAU");

Yes! That’s it. Don’t waste time manually configuring your bind variables with JDBC API calls, or JPA annotations. No one likes writing annotations for stored procedures. With jOOQ and jOOQ’s code generator, procedure calls are:

  • A one-liner
  • A no-brainer
  • A way to bring back the fun to stored procedures

Learn more about using Oracle stored procedures with nested collections and object types here:
https://blog.jooq.org/2014/11/04/painless-access-from-java-to-plsql-procedures-with-jooq

JSR-308 and the Checker Framework Add Even More Typesafety to jOOQ 3.9


Java 8 introduced JSR-308, which added new annotation capabilities to the Java language. Most importantly: Type annotations. It is now possible to design monsters like the below:

The code displayed in that tweet really compiles. Every type can be annotated now, in order to enhance the type system in any custom way. Why, you may ask? One of the main driving use-cases for this language enhancement is the Checker Framework, an Open Source library that allows you to easily implement arbitrary compiler plugins for sophisticated type checking. The most boring and trivial example would be nullability. Consider the following code:

import org.checkerframework.checker.nullness.qual.Nullable;

class YourClassNameHere {
    void foo(Object nn, @Nullable Object nbl) {
        nn.toString(); // OK
        nbl.toString(); // Fail
        if (nbl != null)
            nbl.toString(); // OK again
    }
}

The above example can be run directly in the Checker Framework live demo console. Compiling the above code with the following annotation processor:

javac -processor org.checkerframework.checker.nullness.NullnessChecker afile.java

Yields:

Error: [dereference.of.nullable] dereference of possibly-null reference nbl:5:9

That’s pretty awesome! It works in quite a similar way as the flow sensitive typing that is implemented in Ceylon or Kotlin, for instance, except that it is much more verbose. But it is also much much more powerful, because the rules that implement your enhanced and annotated Java type system can be implemented directly in Java using annotation processors! Which makes annotations turing-complete, in a way 😉

award

How does this help jOOQ?

jOOQ has shipped with two types of API documentation annotations for quite a while. Those annotations are:

  • @PlainSQL – To indicate that a DSL method accepts a “plain SQL” string which may introduce SQL injection risks
  • @Support – To indicate that a DSL method works either natively with, or can be emulated for a given set of SQLDialect

An example of such a method is the CONNECT BY clause, which is supported by Cubrid, Informix, and Oracle, and it is overloaded to accept also a “plain SQL” predicate, for convenience:

@Support({ CUBRID, INFORMIX, ORACLE })
@PlainSQL
SelectConnectByConditionStep<R> connectBy(String sql);

Thus far, these annotations were there only for documentation purposes. With jOOQ 3.9, not anymore. We’re now introducing two new annotations to the jOOQ API:

  • org.jooq.Allow – to allow for a set of dialects (or for the @PlainSQL annotation) to be used within a given scope
  • org.jooq.Require – to require for a set of dialects to be supported via the @Support annotation within a given scope

This is best explained by example. Let’s look at @PlainSQL first

Restricting access to @PlainSQL

One of the biggest advantages of using the jOOQ API is that SQL injection is pretty much a thing of the past. With jOOQ being an internal domain-specific language, users really define the SQL expression tree directly in their Java code, rather than a stringified version of the statement as with JDBC. The expression tree being compiled in Java, there’s no possibility of injecting any unwanted or unforeseen expressions via user input.

There is one exception though. jOOQ doesn’t support every SQL feature in every database. This is why jOOQ ships with a rich “plain SQL” API where custom SQL strings can be embedded anywhere in the SQL expression tree. For instance, the above CONNECT BY clause:

DSL.using(configuration)
   .select(level())
   .connectBy("level < ?", bindValue)
   .fetch();

The above jOOQ query translates to the following SQL query:

SELECT level
FROM dual
CONNECT BY level < ?

As you can see, it is perfectly possible to “do it wrong” and create a SQL injection risk, just like in JDBC:

DSL.using(configuration)
   .select(level())
   .connectBy("level < " + bindValue)
   .fetch();

The difference is very subtle. With jOOQ 3.9 and the Checker Framework, it is now possible to specify the following Maven compiler configuration:

<plugin>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.3</version>
    <configuration>
        <source>1.8</source>
        <target>1.8</target>
        <fork>true</fork>
        <annotationProcessors>
            <annotationProcessor>org.jooq.checker.PlainSQLChecker</annotationProcessor>
        </annotationProcessors>
        <compilerArgs>
            <arg>-Xbootclasspath/p:1.8</arg>
        </compilerArgs>
    </configuration>
</plugin>

The org.jooq.checker.PlainSQLChecker will ensure that no client code using API annotated with @PlainSQL will compile. The error message we’re getting is something like:

C:\Users\lukas\workspace\jOOQ\jOOQ-examples\jOOQ-checker-framework-example\src\main\java\org\jooq\example\checker\PlainSQLCheckerTests.java:[17,17] error: [Plain SQL usage not allowed at current scope. Use @Allow.PlainSQL.]

If you know-what-you’re-doing™ and you absolutely must use jOOQ’s @PlainSQL API at a very specific location (scope), you can annotate that location (scope) with @Allow.PlainSQL and the code compiles just fine again:

// Scope: Single method.
@Allow.PlainSQL
public List<Integer> iKnowWhatImDoing() {
    return DSL.using(configuration)
              .select(level())
              .connectBy("level < ?", bindValue)
              .fetch(0, int.class);
}

Or even:

// Scope: Entire class.
@Allow.PlainSQL
public class IKnowWhatImDoing {
    public List<Integer> iKnowWhatImDoing() {
        return DSL.using(configuration)
                  .select(level())
                  .connectBy("level < ?", bindValue)
                  .fetch(0, int.class);
    }
}

Or even (but then you might just turn off the checker):

// Scope: entire package (put in package-info.java)
@Allow.PlainSQL
package org.jooq.example.checker;

The benefits are clear, though. If security is very important to you (and it should be), then just enable the org.jooq.checker.PlainSQLChecker on each developer build, or at least in CI builds, and get compilation errors whenever “accidental” @PlainSQL API usage is encountered.

Restricting access to SQLDialect

Now, much more interesting for most users is the ability to check whether jOOQ API that is used in client code really supports your database. For instance, the above CONNECT BY clause is supported only in Oracle (if we ignore the not so popular Cubrid and Informix databases). Let’s assume you do work with Oracle only. You want to make sure that all jOOQ API that you’re using is Oracle-compatible. You can now put the following annotation to all packages that use the jOOQ API:

// Scope: entire package (put in package-info.java)
@Allow(ORACLE)
package org.jooq.example.checker;

Now, simply activate the org.jooq.checker.SQLDialectChecker to type check your code for @Allow compliance and you’re done:

<plugin>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.3</version>
    <configuration>
        <source>1.8</source>
        <target>1.8</target>
        <fork>true</fork>
        <annotationProcessors>
            <annotationProcessor>org.jooq.checker.SQLDialectChecker</annotationProcessor>
        </annotationProcessors>
        <compilerArgs>
            <arg>-Xbootclasspath/p:1.8</arg>
        </compilerArgs>
    </configuration>
</plugin>

From now on, whenever you use any jOOQ API, the above checker will verify that any of the following three yields true:

  • The jOOQ API being used is not annotated with @Support
  • The jOOQ API being used is annotated with @Support, but without any explicit SQLDialect (i.e. “works on all databases”), such as DSLContext.select()
  • The jOOQ API being used is annotated with @Support and with at least one of the SQLDialects referenced from @Allow

Thus, within a package annotated as such…

// Scope: entire package (put in package-info.java)
@Allow(ORACLE)
package org.jooq.example.checker;

… using a method annotated as such is fine:

@Support({ CUBRID, INFORMIX, ORACLE })
@PlainSQL
SelectConnectByConditionStep<R> connectBy(String sql);

… but using a method annotated as such is not:

@Support({ MARIADB, MYSQL, POSTGRES })
SelectOptionStep<R> forShare();

In order to allow for this method to be used, client code could, for instance, allow the MYSQL dialect in addition to the ORACLE dialect:

// Scope: entire package (put in package-info.java)
@Allow({ MYSQL, ORACLE })
package org.jooq.example.checker;

From now on, all code in this package may refer to methods supporting either MySQL and/or Oracle.

The @Allow annotation helps giving access to API on a global level. Multiple @Allow annotations (of potentially different scope) create a disjunction of allowed dialects as illustrated here:

// Scope: class
@Allow(MYSQL)
class MySQLAllowed {

    @Allow(ORACLE)
	void mySQLAndOracleAllowed() {
	    DSL.using(configuration)
		   .select()
		   
		   // Works, because Oracle is allowed
		   .connectBy("...")
		   
		   // Works, because MySQL is allowed
		   .forShare();
	}
}

As can be seen above, allowing for two dialects disjunctively won’t ensure that a given statement will work on either of the databases. So…

What if I want both databases to be supported?

In this case, we’ll resort to using the new @Require annotation. Multiple @Require annotations (of potentially different scope) create a conjunction of required dialects as illustrated here:

// Scope: class
@Allow
@Require({ MYSQL, ORACLE })
class MySQLAndOracleRequired {

    @Require(ORACLE)
	void onlyOracleRequired() {
	    DSL.using(configuration)
		   .select()
		   
		   // Works, because only Oracle is required
		   .connectBy("...")
		   
		   // Doesn't work because Oracle is required
		   .forShare();
	}
}

How to put this in use

Let’s assume your application only requires to work with Oracle. You can now put the following annotation on your package, and you will be prevented from using any MySQL-only API, for instance, because MySQL is not allowed as a dialect in your code:

@Allow(ORACLE)
package org.jooq.example.checker;

Now, as requirements change, you want to start supporting MySQL as well from your application. Just change the package specification to the following and start fixing all compilation errors in your jOOQ usage.

// Both dialects are allowed, no others are
@Allow({ MYSQL, ORACLE })

// Both dialects are also required on each clause
@Require({ MYSQL, ORACLE })
package org.jooq.example.checker;

Defaults

By default, for any scope, the following annotations are assumed by the org.jooq.checker.SQLDialectChecker:

  • Nothing is allowed. Each @Allow annotation adds to the set of allowed dialects.
  • Everything is required. Each @Require annotation removes from the set of required dialects.

See it in action

These features will be an integral part of jOOQ 3.9. They’re available simply by adding the following dependency:

<dependency>
    <!-- Use org.jooq            for the Open Source edition
             org.jooq.pro        for commercial editions, 
             org.jooq.pro-java-6 for commercial editions with Java 6 support,
             org.jooq.trial      for the free trial edition -->
	
    <groupId>org.jooq</groupId>
    <artifactId>jooq-checker</artifactId>
    <version>${org.jooq.version}</version>
</dependency>

… and then choosing the appropriate annotation processors to your compiler plugin.

Cannot wait until jOOQ 3.9? You don’t have to. Just check out the 3.9.0-SNAPSHOT version from GitHub and follow the example project given here:

https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/jOOQ-checker-framework-example

Done! From now on, when using jOOQ, you can be sure that whatever code you write will work on all the databases that you plan to support!

I think that this year’s Annotatiomaniac Champion title should go to the makers of the Checker Framework:

award

Further reading about the Checker Framework:

jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk:

Using jOOQ’s ExecuteListener to Prevent Write Operations on a Connection


Security is important, especially on the data access layer. Most commercial databasese allow for fine-grained privilege control using database access grants. For instance, you would be restricting access from a user to a certain set of tables (or even better: views), via GRANT statements:

GRANT SELECT ON table TO user;

With this fine-grained access control, write operations on certain database objects can be prevented directly in the database.

What if that’s not possible?

Not all databases ship with sophisticated access privilege implementations, or perhaps, your application cannot profit from those features for operational reasons. In that case, you should at least be able to implement security on the client, e.g. by using jOOQ’s ExecuteListener (for coarse grained access control), or by using jOOQ’s VisitListener (for fine grained access control).

An example using an ExecuteListener might look like this:

class ReadOnlyListener extends DefaultExecuteListener {
    @Override
    public void executeStart(ExecuteContext ctx) {
        if (ctx.type() != READ)
            throw new DataAccessException("No privilege to execute " + ctx.sql());
    }
}

If you hook this listener into your jOOQ Configuration, you will no longer be able to execute any write operations on that Configuration. It’s that easy!

For more fine-grained control (e.g. a per-table ACL), a VisitListener will do the trick. An (very much simplified) example implementation that shows what can be done can be seen here:

static class ACLListener extends DefaultVisitListener {

    @Override
    public void visitStart(VisitContext context) {
        if (context.queryPart() instanceof Table
                && Arrays.asList(context.clauses()).contains(INSERT_INSERT_INTO)
                && ((Table<?>) context.queryPart()).getName().equals("AUTHOR"))
            throw new DataAccessException("No privilege to insert into AUTHOR");
    }
}

Essentially, this check prevents a client session from running insert statements into the AUTHOR table. A future version of jOOQ will ship with this kind of ACL VisitListener out of the box, when https://github.com/jOOQ/jOOQ/issues/5197 is implemented.