Should I Implement the Arcane Iterator.remove() Method? Yes You (Probably) Should


An interesting question was asked on reddit’s /r/java recently:

Should Iterators be used to modify a custom Collection?

Paraphrasing the question: The author wondered whether a custom java.util.Iterator that is returned from a mutable Collection.iterator() method should implement the weird Iterator.remove() method.

A totally understandable question.

What does Iterator.remove() do?

Few people ever use this method at all. For instance, if you want to implement a generic way to remove null values from an arbitrary Collection, this would be the most generic approach:

Collection<Integer> collection =
Stream.of(1, 2, null, 3, 4, null, 5, 6)
      .collect(Collectors.toCollection(ArrayList::new));

System.out.println(collection);

Iterator<Integer> it = collection.iterator();
while (it.hasNext())
    if (it.next() == null)
        it.remove();

System.out.println(collection);

The above program will print:

[1, 2, null, 3, 4, null, 5, 6]
[1, 2, 3, 4, 5, 6]

Somehow, this API usage does feel dirty. An Iterator seems to be useful to … well … iterate its backing collection. It’s really weird that it also allows for modifying it. It’s even weirder that it only offers removal. E.g. we cannot add a new element before or after the current element of iteration, or replace it.

Luckily, Java 8 provides us with a much better method on the Collection API directly, namely Collection.removeIf(Predicate).

The above iteration code can be re-written as such:

collection.removeIf(Objects::isNull);

OK, now should I implement remove() on my own iterators?

Yes, you should – if your custom collection is mutable. For a very simple reason. Check out the default implementation of Collection.removeIf():

default boolean removeIf(Predicate<? super E> filter) {
    Objects.requireNonNull(filter);
    boolean removed = false;
    final Iterator<E> each = iterator();
    while (each.hasNext()) {
        if (filter.test(each.next())) {
            each.remove();
            removed = true;
        }
    }
    return removed;
}

As I said. The most generic way to remove specific elements from a Collection is precisely to go by its Iterator.remove() method and that’s precisely what the JDK does. Subtypes like ArrayList may of course override this implementation because there’s a more performant alternative, but in general, if you write your own custom, modifiable collection, you should implement this method.

And enjoy the ride into Java’s peculiar, historic caveats for which we all love the language.

How to Write a Quick and Dirty Converter in jOOQ


One of jOOQ‘s most powerful features is the capability of introducing custom data types, pretending the database actually understands them. For instance, when working with SQL TIMESTAMP types, users mostly want to use the new JSR-310 LocalDateTime, rather than the JDBC java.sql.Timestamp type.

In jOOQ 3.9+, this is a no brainer, as we’ve finally introduced the <javaTimeTypes> flag to automatically generate JSR 310 types instead of JDBC types. But sometimes, you want some custom conversion behaviour, so you write a Converter.

To the rescue our new jOOQ 3.9+ converter constructors, which essentially take two lambdas to construct a converter for you. For instance:

Converter<Timestamp, LocalDateTime> converter =
Converter.of(
    Timestamp.class,
    LocalDateTime.class,
    t -> t == null ? null : t.toLocalDateTime(),
    u -> u == null ? null : Timestamp.valueOf(u)
);

And you’re set! Even easier, if you don’t need any special null encoding (as above), just write this equivalent converter, instead:

Converter<Timestamp, LocalDateTime> converter =
Converter.ofNullable(
    Timestamp.class,
    LocalDateTime.class,
    Timestamp::toLocalDateTime
    Timestamp::valueOf
);

Where’s that useful? The code generator needs a concrete converter class, so you cannot use that with the code generator, but there are many other places in the jOOQ API where converters are useful, including when you write plain SQL like this:

DSL.field(
    "my_table.my_timestamp", 
    SQLDataType.TIMESTAMP.asConvertedDataType(
        Converter.ofNullable(...)
));

How to Prevent JDBC Resource Leaks with JDBC and with jOOQ


In a recent consulting gig, I was analysing a client’s connection pool issue in a productive system, where during some peak loads, all the Java processes involving database interactions just started queueing up until nothing really worked anymore. No exceptions, though, and when the peak load was gone in the evening, everything returned back to normal. The database load looked pretty healthy at the time, so no actual database problem was involved – the problem had to be a client side problem.

Weblogic operations teams quickly identified the connection pool to be the bottleneck. All the connections were constantly allocated to some client process. The immediate thought was: A resource leak is happeneing, and it didn’t show before because this was an exceptional situation: Around the beginning of the new year when everyone wanted to download their electronic documents from the bank (and some new features introduced many more document related database calls).

The obvious problem

That particular system still runs a lot of legacy code in Java 6 style, which means, there are tons of code elements of the following kind:

Connection connection = null;
try {

  // Get the connection from the pool through JNDI
  connection = JDBCHelper.getConnection();
}
finally {

  // Release the connection
  JDBCHelper.close(connection);  
}

While the above code is perfectly fine, and 99% of all database interactions were of the above type, there was an occasional instance of someone badly copy-pasting some code and doing something like this:

Connection connection = JDBCHelper.getConnection();
PreparedStatement stmt = null;

try {
  stmt = connection.prepareStatement("SELECT ...");
}
finally {

  // Release the statement
  JDBCHelper.close(stmt);
}

// But the connection is never released

Sometimes, things were even more subtle, as a utility method expected a connection like this:

// Utility method doesn't have to close the connection:
public void databaseCalls(Connection connection) {
  try {
    stmt = connection.prepareStatement("SELECT ...");
  }
  finally {

    // Release the statement
    JDBCHelper.close(stmt);
  }
}

public void businessLogic() {
  // Oops, subtle connection leak
  databaseCalls(JDBCHelper.getConnection());
}

Thoroughly fixing these things

There’s a quick fix to all these problems. The easiest fix is to just continue rigorously using the JDBCHelper.close() method (or just call connection.close() with appropriate error handling) every time. But apparently, that’s not easy enough as there will always be a non-vigilant developer (or a junior developer who doesn’t know these things), who will get it wrong, who will simply forget things.

I mean, even the official JDBC tutorial gets it “wrong” on their first page:
https://docs.oracle.com/javase/tutorial/jdbc/overview/index.html

The bad example being:

public void connectToAndQueryDatabase(
    String username, String password) {

    Connection con = DriverManager.getConnection(
                         "jdbc:myDriver:myDatabase",
                         username,
                         password);

    Statement stmt = con.createStatement();
    ResultSet rs = stmt.executeQuery(
        "SELECT a, b, c FROM Table1");

    while (rs.next()) {
        int x = rs.getInt("a");
        String s = rs.getString("b");
        float f = rs.getFloat("c");
    }
}

All resources leak in this example!

Of course, it’s just an example, and of course, it’s not a terrible situation, because resources can usually clean up themselves when they go out of scope, i.e. when the GC kicks in. But as software engineers we shouldn’t rely on that, and as the productive issues have shown, there are always edge cases, where precisely this lack of vigilance will cause great harm. After all,

It works on my machine

… is simply not an excuse. We should design our software for productive use.

Fix #1: Use try-with-resources. Always

If you want to stay on the safe side, always follow this rule:

The scope that acquires the resource, closes the resource

As long as you’re working with JDBC, save yourself the trouble of writing those JDBCUtilities classes that close non-null resources and safely catch exceptions that may arise. Just use try-with-resources, all the time. For instance, take the example from the Oracle JDBC tutorial, which should read:

public void connectToAndQueryDatabase(
     String username, String password) {

    // All of these resources are allocated in this method. Thus,
    // this method's responsibility is to also close / free all
    // these resources.
    try (Connection con = DriverManager.getConnection(
            "jdbc:myDriver:myDatabase", username, password);
         Statement stmt = con.createStatement();
         ResultSet rs = stmt.executeQuery(
            "SELECT a, b, c FROM Table1")) {

        while (rs.next()) {
            int x = rs.getInt("a");
            String s = rs.getString("b");
            float f = rs.getFloat("c");
        }
    }
}

This already feels that much better and cleaner, doesn’t it? All the resources are acquired in the above method, and the try-with-resources block will close all of them when they go out of scope. It’s just syntax sugar for something we’ve been doing manually all the time. But now, we will (hopefully) never again forget!

Of course, you could introduce automatic leak detection in your integration tests, because it’s rather easy to proxy the JDBC DataSource and count all connection acquisitions and closings. An example can be seen in this post:
The best way to detect database connection leaks

Fix #2: Use jOOQ, which manages resources for you

Historically, JDBC works on lazy resources that are kept around for a while. The assumption in 1997 (when JDBC was introduced) was that database interactions were relatively slow and it made sense to fetch and process one record at a time, even for moderately sized result sets.

In fact, it was even common to abort fetching records from a cursor when we’ve had enough results and close it eagerly before consuming all the rows.

Today, these assumptions are (mostly) no longer true, and jOOQ (like other, more modern database APIs) invert the lazy/eager API default behaviour. In jOOQ, the JDBC types have the following corresponding counterparts:

  • JDBC DataSource / Connection => jOOQ ConnectionProvider:
    jOOQ doesn’t know the concept of an “open connection” like JDBC. jOOQ only has this ConnectionProvider which works in a similar way to JDBC’s / JavaEE’s DataSource. The semantics here is that the connection / session is “managed” and jOOQ will acquire / release it once per statement. This happens automatically, so users don’t have to worry about any connection resource.
  • JDBC Statement (and subtypes) => jOOQ Query:
    While the JDBC statement (especially the PreparedStatement) is a resource that binds some server-side objects, such as an execution plan, for instance, jOOQ again doesn’t have such a resourceful thing. The Query just wraps the SQL string (or AST) and bind variables. All resources are created lazily only when the query is actually executed – and released immediately after execution. Again, users don’t have to worry about any statement resource.
  • JDBC ResultSet => jOOQ Result:
    The JDBC ResultSet corresponds to a server-side cursor, another object that possibly binds quite a few resources, depending on your fetch mode. Again, in jOOQ no resources are bound / exposed, because jOOQ by default eagerly fetches your entire result set – the assumption being that a low-level optimisation here doesn’t add much value for moderately sized result sets

With the above inverted defaults (from lazy to eager resource allocation / freeing), the jOOQ-ified Oracle JDBC tutorial code would look like this:

Working with a standalone Connection

public void connectToAndQueryDatabase(
    String username, String password) {

    // If you're using a standalone connection, you can pass that
    // one to jOOQ, but you're still responsible of closing it
    // again:
    try (Connection con = DriverManager.getConnection(
            "jdbc:myDriver:myDatabase", username, password)) {

        // There is no statment resource anymore, and the result
        // is fetched eagerly from the database, so you don't have
        // to worry about it
        for (Record record : DSL.using(con).fetch(
                "SELECT a, b, c FROM Table1")) {
            int x = record.get("a", int.class);
            String s = record.get("b", String.class);
            float f = record.get("c", float.class);
        }
    }
}

Working with a connection pool / DataSource

// You probably have some means of injecting / discovering
// a JDBC DataSource, e.g. from Spring, or from your JavaEE
// container, etc.
@Inject
DataSource ds;

public void connectToAndQueryDatabase(
    String username, String password) {

    // With a DataSource, jOOQ will automatically acquire and
    // close the JDBC Connection for you, so the last remaining
    // resource has also disappeared from your client code.
    for (Record record : DSL
           .using(ds, SQLDialect.ORACLE)
           .fetch("SELECT a, b, c FROM Table1")) {
        int x = record.get("a", int.class);
        String s = record.get("b", String.class);
        float f = record.get("c", float.class);
    }
}

With jOOQ, all resource management is automatic, by default, because by default, you don’t want to worry about this low level stuff. It’s not 1997 anymore. The JDBC API really is too low level for most use-cases.

If you do want to optimise resource management and not fetch everything eagerly, you can, of course. jOOQ will allow you to fetch your results lazily, in two ways:

Using a Cursor

@Inject
DataSource ds;

public void connectToAndQueryDatabase(
    String username, String password) {

    // jOOQ's Cursor type is a resource, just like JDBC's
    // ResultSet. It actually keeps a reference to an open
    // ResultSet, internally. This is an opt-in
    // feature, though, only to be used if desired.
    try (Cursor<Record> cursor : DSL
            .using(ds, SQLDialect.ORACLE)
            .fetchLazy("SELECT a, b, c FROM Table1")) {

        for (Record record : cursor) {
            int x = record.get("a", int.class);
            String s = record.get("b", String.class);
            float f = record.get("c", float.class);
        }
    }
}

Using a Java 8 Stream (lazy, resourceful version)

@Inject
DataSource ds;

public void connectToAndQueryDatabase(
    String username, String password) {

    // This can also work with a stream
    try (Stream<Record> stream : DSL
        .using(ds, SQLDialect.ORACLE)
        .fetchStream("SELECT a, b, c FROM Table1")) {

        stream.forEach(record -> {
            int x = record.get("a", int.class);
            String s = record.get("b", String.class);
            float f = record.get("c", float.class);
        });
    }
}

Unfortunately, there are no auto-closing streams in Java, which is why we have to resort to using the try-with-resources statement, breaking the fluency of jOOQ’s API.

Do note though, that you can use the Stream API in an eager fashion:

Using a Java 8 Stream (eager version)

@Inject
DataSource ds;

public void connectToAndQueryDatabase(
    String username, String password) {

    // Fetch the jOOQ Result eagerly into memory, then stream it
    // Again, no resource management
    DSL.using(ds, SQLDialect.ORACLE)
       .fetch()
       .stream("SELECT a, b, c FROM Table1")
       .forEach(record -> {
            int x = record.get("a", int.class);
            String s = record.get("b", String.class);
            float f = record.get("c", float.class);
        });
}

Conclusion

Developers, unfortunately, often suffer from

Works on my machine

This leads to problems that can be discovered only in production, under load. When it comes to resources, it is important to constantly remind ourselves that …

The scope that acquires the resource, closes the resource

JDBC (and the JDK’s IO APIs), “unfortunately”, deal with resources on a very low level. This way, their default behaviour is very resource-efficient. For instance, when you only need to read a file header, you don’t load the entire file into memory through the InputStream. You can explicitly, manually, only load the first few lines.

But in many applications, this default and its low level nature gets in the way of correctness (accidental resource leaks are easy to create), and convenience (a lot of boiler plate code needs to be written).

With database interactions, it’s usually best to migrate your JDBC code towards a more modern API like jOOQ, which abstracts resource handling away in its API and inverts the lazy/eager semantics: Eager by default, lazy on demand.

More information about the differences between jOOQ and JDBC can be seen here, in the manual.

Do You Really Have to Name Everything in Software?


This is one of software engineering’s oldest battles. No, I’m not talking about where to put curly braces, or whether to use tabs or spaces. I mean the eternal battle between nominal typing and structural typing.

This article is inspired by a very vocal blogger who eloquently reminds us to …

[…] Please Avoid Functional Vomit

Read the full article here:
https://dzone.com/articles/using-java-8-please-avoid-functional-vomit

What’s the post really about?

It is about naming things. As we all know:

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Now, for some reason, there is a group of people who wants constant pain and suffering by explicitly naming everything, including rather abstract concepts and algorithmic components, such as compound predicates. Those people like nominal typing and all the features that are derived from it. What is nominal typing (as opposed to structural typing)?

Structural typing

SQL is a good example to study the two worlds. When you write SQL statements, you’re creating structural row types all the time. For instance, when you write:

SELECT first_name, last_name
FROM customer

… what you’re really doing is you’re creating a new rowtype of the structure (in pseudo-SQL):

TYPE (
  first_name VARCHAR,
  last_name VARCHAR
)

The type has the following properties:

  • It is a tuple or record (as always in SQL)
  • It contains two attributes or columns
  • Those two attributes / columns are called first_name and last_name
  • Their types is VARCHAR

This is a structural type, because the SQL statement that produces the type only declares the type’s structure implicitly, by producing a set of column expressions.

In Java, we know lambda expressions, which are (incomplete) structural types, such as:

// A type that can check for i to be even
i -> i % 2 == 0

Nominal typing

Nominal typing takes things one step further. In SQL, nominal typing is perfectly possible as well, for instance, in the above statement, we selected from a well-known table by name customer. Nominal typing assigns a name to a structural type (and possibly stores the type somewhere, for reuse).

If we want to name our (first_name, last_name) type, we could do things like:

-- By using a derived table:
SELECT *
FROM (
  SELECT first_name, last_name
  FROM customer
) AS people

-- By using a common table expression:
WITH people AS (
  SELECT first_name, last_name
  FROM customer
)
SELECT *
FROM people

-- By using a view
CREATE VIEW people AS
SELECT first_name, last_name
FROM customer

In all cases, we’ve assigned the name people to the structural type (first_name, last_name). The only difference being the scope for which the name (and the corresponding content) is defined.

In Java, we can only use lambda expressions, once we assign them to a typed name, either by using an assignment, or by passing the expression to a method that takes a named type argument:

// Naming the lambda expression itself
Predicate<Integer> p = i -> i % 2 == 0

// Passing the lambda expression to a method
Stream.of(1, 2, 3)
      .filter(i -> i % 2 == 0);

Back to the article

The article claims that giving a name to things is always better. For instance, the author proposes giving a name to what we would commonly refer to as a “predicate”:

//original, less clear code
if(barrier.value() > LIMIT && barrier.value() > 0){
//extracted out to helper function. More code, more clear
if(barrierHasPositiveLimitBreach()){

So, the author thinks that extracting a rather trivial predicate into an external function is better because a future reader of such code will better understand what’s going on. At least in the article’s opinion. Let’s refute this claim for the sake of the argument:

  • The proposed name is verbose and requires quite some thinking.
  • What does breach mean?
  • Is breach the same as >= or the same as >?
  • Is LIMIT a constant? From where?
  • Where is barrier? Who owns it?
  • What does the verb “has” mean, here? Does it depend on something outside of barrier? E.g. some shared state?
  • What happens if there’s a negative limit?

By naming the predicate (remember, naming things is hard), the OP has added several layers of cognitive complexity to the reader, while quite possibly introducing subtle bugs, because probably both LIMIT and barrier should be function arguments, rather than global (im)mutable state that is assumed to be there, by the function.

The name introduced several concepts (“to have a breach”, “positive limit”, “breach”) that are not well defined and need some deciphering. How do we decipher it? Probably by looking inside the function and reading the actual code. So what do we gain? Better reuse, perhaps? But is this really reusable?

Finally, there is a (very slight) risk of introducing a performance penalty by the additional indirection. If we translate this to SQL, we could have written a stored function and then queried:

SELECT *
FROM orders -- Just an assumption here
WHERE barrier_has_positive_limit_breach(orders.barrier)

If this was some really complicated business logic depending on a huge number of things, perhaps extracting the function might’ve been worthwile. But in this particular case, is it really better than:

SELECT *
FROM orders
WHERE barrier > :limit AND barrier > 0

or even

SELECT *
FROM orders
WHERE barrier > GREATEST(:limit, 0)

Conclusion

There are some people in our industry who constantly want to see the world in black and white. As soon as they’ve had one small success story (e.g. reusing a very common predicate 4-5 times by extracting it into a function), they conclude with a general rule of this approach being always superior.

They struggle with the notion of “it depends”. Nominal typing and structural typing are both very interesting concepts. Structural typing is extremely powerful, whereas nominal typing helps us humans keep track of complexity. In SQL, we’ve always liked to structure our huge SQL statements, e.g. in nameable views. Likewise, Java programmers structure their code in nameable classes and methods.

But it should be immediately clear to anyone reading the linked article that the author seems to like hyperboles and probably wasn’t really serious, given the silly example he came up with. The message he’s conveying is wrong, because it claims that naming things is always better. It’s not true.

Be pragmatic. Name things where it really helps. Don’t name things where it doesn’t. Or as Leon Bambrick amended Phil Karlton’s quote:

There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors

Here’s my advice to you, dear nominal typing loving blogger. There’s are only two ways of typing: nominal typing and structural typing. And it depends typing.

SQL, Streams, For Comprehension… It’s All the Same


Recently, at Devoxx, I’ve seen this beautiful slide in a talk by Kevlin Henney

In his talk, he was displaying a variety of approaches to solve the FizzBuzz “problem”, including a couple of very elegant solutions in completely declarative approaches and languages.

In this particular slide, Kevlin used a notation that is derived from maths. The set builder notation. Here’s an example from Wikipedia:

even-numbers

The example reads: For all n in (the set of all integer numbers), take those for which there exists () another integer k, for which the following equation is satisfied: n = 2k.

Or in plain English: All even integers. (because for even integers, there exists another integer that is half the even integer)

Beautiful, eh? In imperative programming, we’d probably do something like this instead:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i++)
    even.add(i * 2);

Or this:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i = i + 2)
    even.add(i);

But there are several problems with the imperative approach:

  • We have to realistically start somewhere
  • We have to realistically end somewhere
  • We have to store all values in an intermediate collection

Sure, those aren’t severe limitations in every day use-cases, because we’re probably solving a real world problem where we don’t actually need an infinite number of even integers, and storing them in an intermediate collection doesn’t consume all of our memory, but still, the declarative, mathematical approach is much leaner, because we can still answer those questions about where to start and where to end later, and we never need to materialise any intermediate collection before we make those final decisions.

For instance, we can declare X to be that set, and then declare Y to be a set that is derived from X, and finally materialise Z, which is a very tiny set derived from Y. For this, we may have never needed to materialise all the (even) integers.

How this compares to SQL

Kevlin made a cunning comparison. Of course, all functional programming aficionados will immediately recognise that languages like Scala have something called a “for comprehension”, which models precisely the mathematical set-builder notation.

Java 8 now has the Streams API, which allows us, to some extent, model something similar (although not as powerful). But Kevlin didn’t use those “modern” languages. He used SQL as a comparison. That “arcane” declarative programming language that has been around forever, and that we love so much. Yes, here’s how we can declare all the even numbers in SQL:

SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE n = 2 * k
)

If optimisers were perfect, this semi-self-join between the two references of the integers “table” could be optimised perfectly. In most databases, we’d probably manually transform the above notation to this equivalent one:

SELECT n
FROM integers
WHERE MOD(n, 2) = 0

Yes, indeed. The set-builder notation and the SQL language are very similar beasts. The former prefers using mathematical symbols for brevity and conciseness, the latter prefers using English words to connect the different operators, but it’s the same thing. And if you squint hard enough, you’ll see that Java 8 Streams, for instance, are also pretty much the same thing:

everything-is-a-table

I’ve blogged about this recently where all the Java 8 Streams operations are compared to their SQL clause counterparts:
https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-equivalents-in-java-8-streams

How is this better?

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

In the imperative approach, we have to refactor constantly, to avoid the overhead of

  • Producing too many integers
  • Storing too many integers (or storing them at all)

In truly declarative languages like SQL, we’re just describing “even integers” with an expression, possibly assigning the expression a name:

CREATE VIEW even_integers AS
SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE k = 2 * n
)

So, when we actually use and materialise the even integers, e.g. positive integers less than 100, the optimiser can optimise away the double access to the integer table and produce only the exact number of values that we’re requesting (without materialising them in intermediate collections):

SELECT n
FROM even_integers
WHERE n BETWEEN 0 AND 100

Conclusion

Thinking in terms of sets, in terms of declaring sets, has always been our dream as software engineers. The approach is extremely compelling and elegant. We can delegate a lot of boring algorithmic work to the implementation engine of the declarative programming language. In the case of SQL, it would be a SQL database optimiser, which figures out a great lot of optimisations that we might not have thought of.

The above example is trivial. We can perfectly live in a world where we manually iterate over a local integer variable that goes from 0 to 100:

for (int i = 0; i <= 100; i++)
  doSomething(i);

But stuff gets hairy quite quickly. Compare Mario Fusco‘s famous tweet’s two versions of the same algorithm:

This also applies to SQL, and what’s even better in SQL than with Streams: The SQL statement is a declarative expression tree, not a formally ordered set of stream pipeline operations. The optimiser can freely reorder / transform the expression tree into something that it thinks is more optimal. This isn’t just a promise. This works in modern SQL databases every day, for very complex queries, which you can write in a matter of seconds, rather than hours.

Stay tuned for a short series of blog posts on the jOOQ blog illustrating what modern cost-based optimisation can do for you, when you’re using the SQL language.

Prevent SQL Injection with SQL Builders Like jOOQ


As long as we allow ourselves to write string-based dynamic SQL embedded in other programming languages like Java, we will have a certain risk of being vulnerable to SQL injection. That’s a fact. Don’t believe it? Check out this website exposing all vulnerabilities on Stack Overflow for PHP questions:

https://laurent22.github.io/so-injections

In a previous blog post, I’ve shown how fatal such a single vulnerability can be, if discovered. A lot of blog posts out there warn about the potential of attackers injecting a DROP DATABASE statement.

18mpenleoksq8jpg

Also, everyone knows this famous xkcd:
https://xkcd.com/327

But in my opinion, a much more important threat is not immediate damage to your system, but data leakage. Using tools like sqlmap, every script kiddie can download your credit card information and other sensitive data from your database.

The best remedy: Avoid string-based embedded SQL

The best remedy is to always avoid string-based embedded SQL, whenever you can. I.e. try not to do things like this too often:

try (PreparedStatement s = c.prepareStatement(
    "SELECT first_name, last_name "
  + "FROM users "
  + "WHERE user_id = ? ")) {

    s.setInt(1, userId);
    try (ResultSet rs = s.executeQuery()) {
        ...
    }
}

Sure, there is currently nothing wrong with the above Java code. There is no vulnerability as we’re using a bind variable. But the risk of some developer not paying attention and accidentally adding a vulnerability when adding another predicate is too high.

Fine, so we’re not using string-based embedded SQL. But what are the alternatives?

Use a query DSL like jOOQ to build your dynamic SQL

APIs like jOOQ help you build SQL statements in a type safe, composable way as if the Java language actually understood SQL. The previous JDBC prepared statement now translates to the following:

Result<?> result =
ctx.select(USERS.FIRST_NAME, USERS.LAST_NAME)
   .from(USERS)
   // Bind variable embedded in statement
   .where(USERS.USER_ID.eq(userId))
   .fetch();

In jOOQ, the underlying JDBC PreparedStatement is created transparently and the userId bind variable is placed right in the middle of the statement so you don’t have to worry about the boring details. There’s no way you can have any accidental SQL injection vulnerability in such jOOQ API calls, because every SQL clause and expression is part of an expression tree that is managed by jOOQ.

And what’s best: The Java compiler can now type check your SQL statement to a certain degree. This is a huge benefit in terms of productivity and code quality.

Of course, SQL builders aren’t a perfect shield for SQL injection, as they usually expose some API to insert custom SQL strings. For instance, in jOOQ, you can write:

Result<?> result =
ctx.select(USERS.FIRST_NAME, USERS.LAST_NAME)
   .from(USERS)
   // Bind variable embedded in "plain SQL" string
   .where("user_id = ?", userId)
   .fetch();

This alternative API usage is just as convenient as the previous one. The bind variable is located right where it is needed (not applied later on a PreparedStatement), but of course this is again string-based embedded SQL, which is potentially vulnerable if you get it wrong.

The important thing here is that the string-based method is the exception, which is used only very rarely when you need a very advanced SQL feature that is not supported by jOOQ. By default, you’re using the “save” approach without strings. And since jOOQ 3.9, you can also add a type checker that generates a compilation warning or error as soon as you’re using jOOQ’s “plain SQL” API. Read more about that here:
https://blog.jooq.org/2016/05/09/jsr-308-and-the-checker-framework-add-even-more-typesafety-to-jooq-3-9

Use views and stored procedures

The safest and least intrusive way to prevent such problems, of course, is not to use embedded SQL at all, but to use views and/or stored procedures instead. By moving and storing the SQL statements into the database, you get a few advantages in addition to the lack of SQLi vulnerability:

  • The database can type-check your statement in its entirety
  • You can easily reuse a statement in different applications, e.g. not all written in Java
  • You could even revoke grants to the underlying tables and grant access only to these views and stored procedures, which adds another layer of security to your system

If you’re writing PL/SQL in Oracle, the previous statement would be transformed to something like this:

FOR rec IN (
  SELECT first_name, last_name
  FROM users
  -- Bind variable again embedded in SQL statement
  WHERE user_id = l_user_id
) LOOP
  ...
END LOOP;

Just like with jOOQ, you can easily embed the SQL statement in the “host language”, except that this time, it isn’t Java but PL/SQL. There’s absolutely no way the above statement will ever be vulnerable to SQL injection, as the statement isn’t even a dynamic SQL statement anymore.

Again, you can write dynamic SQL also in PL/SQL (or other database’s stored procedure languages), e.g. by using EXECUTE IMMEDIATE, but the important thing is again that dynamic, string-based, embedded SQL is the exception not the default. That’s the important thing here!

Now, if you do decide to use stored procedures, then jOOQ is again here to help you, as you can easily call that procedure from jOOQ, again in a type safe way. More about that in this blog post:
Painless Access from Java to PL/SQL Procedures with jOOQ

Beware! SQL isn’t the only language vulnerable to injection

Vlad Mihalcea displayed an equally important threat to JPA based applications. Scroll down in his blog post to find him mentioning JPQL injection:
A beginner’s guide to SQL injection and how you should prevent it

Yes, if you’re doing something silly as:

public List<Post> getPostsByTitle(String title) {
    return doInJPA(entityManager -> {
        return entityManager.createQuery(
            "select p " +
            "from Post p " +
            "where" +
            "   p.title = '" + title + "'", Post.class)
        .getResultList();
    });
}

… then an attacker can inject any sort of JPQL code into the title variable. The possibilities are a bit more limited than with SQL injection, but it is still perfectly possible to read and dump the entire database (including credit card information, remember?) from such a vulnerability.

Again, if you’re doing this quite often, you should consider switching to a more type safe way to write JPQL (e.g. using the the infamous Criteria API), or again switch to SQL and jOOQ.

Read Vlad’s post for more details:
A beginner’s guide to SQL injection and how you should prevent it

Conclusion

There are many external DSLs, like SQL, JPQL, XPath, regular expressions, and what not. Some of them are extremely powerful and they’re used to operate on sensitive data. Which means that if you leak control of the language outside of your application, you’re very vulnerable.

Vulnerabiliy mostly happens when you embed those external DSLs into Java in a string-based form. The best remedies are:

  • Use an internal DSL that models the external DSL instead
  • Keep the external DSL external, e.g. by using views and stored procedures

Both of these approaches work because they both avoid strings.

The Java Ecosystem’s Obsession with NonNull Annotations


I’m not well known for my love of annotations. While I do recognise that they can serve a very limited purpose in some areas (e.g. hinting stuff to the compiler or extending the language where we don’t want new keywords), I certainly don’t think they were ever meant to be used for API design.

“unfortunately” (but this is a matter of taste), Java 8 introduced type annotations. An entirely new extension to the annotation type system, which allows you to do things like:

@Positive int positive = 1;

Thus far, I’ve seen such common type restriction features only in the Ada or PL/SQL languages in a much more rigid way, but others may have similar features.

The nice thing about Java 8’s implementation is the fact that the meaning of the type of the above local variable (@Positive int) is unknown to the Java compiler (and to the runtime), until you write and activate a specific compiler plugin to enforce your custom meaning. The easiest way to do that is by using the Checker Framework (and yes, we’re guilty at jOOQ. We have our own checker for SQL dialect validation). You can implement any semantics, for instance:

// This compiles because @Positive int is a subtype of int
int number = positive;

// Doesn't compile, because number might be negative
@Positive int positive2 = number;

// Doesn't compile, because -1 is negative
@Positive int positive3 = -1;

As you can see, using type annotations is a very strategic decision. Either you want to create hundreds of types in this parallel universe as in this example:

Or, in my opinion, you better leave this set of features alone, because probably: YAGNI

Unfortunately, and to the disappointment of Mike Ernst, the author of the Checker Framework (whom I’ve talked to about this some years ago), most people abuse this new JSR-308 feature for boring and simple null checking. For instance, just recently, there had been a feature request on the popular Javaslang library to add support for such annotations that help users and IDEs guarantee that Javaslang API methods return non-null results.

Please no. Don’t use this atomic bomb for boring null checks

Let me make this very clear:

Type annotations are the wrong tool to enforce nullability

– Lukas Eder, timeless

You may quote me on that. The only exception to the above is if you strategically embrace JSR-308 type annotations in every possible way and start adding annotations for all sorts of type restrictions, including the @Positive example that I’ve given, then yes, adding nullability annotations won’t hurt you much anymore, as your types will take 50 lines of code to declare and reference anyway. But frankly, this is an extremely niche approach to type systems that only few general purpose programs, let alone publicly available APIs can profit from. If in doubt, don’t use type annotations.

One important reason why a library like Javaslang shouldn’t add such annotations is the fact that in a library like Javaslang, you can be very sure that you will hardly ever encounter null, because references in Javaslang are mostly one of three things:

  • A collection, which is never null but empty
  • An Option which replaces null (it is in fact a collection of cardinality 0..1)
  • A non-null reference (because in the presence of Option, all references can be expected to be non-null)

Of course, these rules aren’t valid for every API. There are some low quality APIs out there that return “unexpected” null values, or leak “internal” null values (and historically, some of the JDK APIs, unfortunately, are part of these “low quality APIs”). But Javaslang is not one of them, and the APIs you are designing also shouldn’t be one of them.

So, let go of your null fear. null is not a problem in well-designed software. You can spare yourself the work of adding a @NonNull annotation on 99% of all of your types just to shut up your IDE, in case you turned on those warnings. Focus on writing high-quality software rather than bikeshedding null.

Because: YAGNI.

And, if you haven’t had enough bikeshedding already, consider watching this entertaining talk by Stuart Marks: