JDBC 4.0’s Lesser-known Clob.free() and Blob.free() Methods

When I talk about jOOQ at conferences, I always show this slide containing a bunch of very common JDBC mistakes that people often commit:

Six common JDBC bugs in this image
Six common JDBC bugs in this image

Can you find the bugs? Some of them are obvious, such as:

  • Line 4: Syntax errors resulting from bad concatenation on line 3
  • Line 7: Syntax errors and SQL injection risk due to variable inlining
  • Line 8: Wrong bind index resulting from a potential mismatch on line 3
  • Line 14: Wrong column name due to sloppy rename
  • Line 18: Bad resource management

But then, there’s another very subtle bug that most people are unaware of because the fix was only possible since the upgrade in Java 6 / JDBC 4.0. See the solution, below:

Solution to the previous six bugs
Solution to the previous six bugs

With JDBC 4.0, the Clob.free() and the Blob.free() methods were introduced. While calling them is optional, it may be a very bad idea not to call them as early as possible, as you should not rely on the garbage collector to kick in early enough to free these resources. In fact, in certain databases / JDBC drivers, LOBs can outlive individual statements and/or transactions. They’re beasts of their own. If you’re reading through the JDBC tutorial (and also in the JDBC specification), it says:

Blob, Clob, and NClob Java objects remain valid for at least the duration of the transaction in which they are created. This could potentially result in an application running out of resources during a long running transaction.

The same is true for arrays, which also have an Array.free() method since Java 6 / JDBC 4.0.

So if your application has long-running transactions, do call these free() methods, or make it a habit to always call them. We’ll file an issue to FindBugs to make this a potential bug pattern.

jOOQ Newsletter: January 22, 2014

Subscribe to the newsletter here

Tweet of the Day

We are contributing this new section of the newsletter to our followers, users, and customers. Here are:

Jose M. Arranz who has had plans to build jOOQ, when he happily discovered that jOOQ already exists

https://twitter.com/jmarranz/status/417956223468982272

Majid Azimi who wishes for jOOQ to become the new de facto standard in all languages. (we wish for the same, shocker, I know)

https://twitter.com/majidazimi/status/420879296807186432

jOOQ 3.3 Preview

We’re closing in on releasing jOOQ 3.3 towards the beginning of February 2014, which is an exciting release for both existing and new jOOQ users. Apart from many defects fixed, there are now also

… and much more. jOOQ Open Source Edition users can download a preview from GitHub, commercial users can request a pre-built download directly from us.

The Data Geekery Business Case at RedHat’s opensource.com

At Data Geekery, we’re in close touch with various communities, among which those by Oracle or RedHat. RedHat has been selling Open Source software as a business model for a long time. In the enterprise, apart from the flagship RHEL, RedHat is also providing support for a variety of other stacks, such as the JBoss platform, or cloud computing solutions

We’re thrilled to present to you our feature article on RedHat’s opensource.com platform, where we have published an article on our own commercial Open Source business case:

http://opensource.com/business/14/1/how-to-transition-open-source-to-revenue

This is the first part of a series of blog posts. In the next part, we’re going to talk about the five lessons learned when making a business of Open Source, so stay tuned!

Community Work

In the last few weeks, there had been a couple of excellent blog posts by jOOQ community members, which we do not want to keep from you. In December, Gregor Riegler has written a witty rant about the Annotation Nightmare, which we’re increasingly suffering from in the Java ecosystem. Declarative programming at its best.

In January 2014, Petri Kainulainen has started writing a series of excellent blog posts and tutorials for new jOOQ users. The first two posts he has written can be seen here:

Petri is a very active blogger, whom we have been following for a while now. We’re eagerly looking forward to future posts about jOOQ from him.

There are more goodies. Johannes Bühler has contributed new functionality around the loader API and JSON, whereas Darren Shepherd who is a very active contributor to Apache CloudStack has been very active on the jOOQ User Group as well, discovering many issues. Thank you very much for all your help, guys!

Upcoming Events

In January, we have been visiting the JUG-HH in Hamburg for our talk about jOOQ. This was a very exciting and welcoming event with a heavily “JPA / JEE – biased” audience. Thrilling to see how jOOQ (and SQL!) finds high resonance in this kind of industry!

Here is an overview of our upcoming events. We’re very happy to talk at geecon this year!

Stay informed about 2014 events on www.jooq.org/news.

SQL Zone – DEFAULT values

We’re always surprised ourselves at the depth and breadth of the SQL language, and we love share our discoveries with you. Few people know that in SQL (in the standard and in many SQL dialects), you can use the DEFAULT keyword in INSERT and UPDATE statements. Read more about DEFAULT VALUES on our blog.

SQL Zone – JDBC Batch

In heavy-throughput environments, SQL users rely on vendor-specific tools, such as Oracle’s SQL*Loader. In slightly less performance-critical environments, JDBC batch operations are still good enough. But how good are they compared to standalone statements? The answer is: very good. But not in all databases / JDBC drivers. We have found a very interesting benchmark by James Sutherland, which we want to share with you.

What you Didn’t Know About JDBC Batch

In our previous blog post “10 Common Mistakes Java Developers Make When Writing SQL“, we have made a point about batching being important when inserting large data sets. In most databases and with most JDBC drivers, you can get a significant performance improvement when running a single prepared statement in batch mode as such:

PreparedStatement s = connection.prepareStatement(
    "INSERT INTO author(id, first_name, last_name)"
  + "  VALUES (?, ?, ?)");

s.setInt(1, 1);
s.setString(2, "Erich");
s.setString(3, "Gamma");
s.addBatch();

s.setInt(1, 2);
s.setString(2, "Richard");
s.setString(3, "Helm");
s.addBatch();

s.setInt(1, 3);
s.setString(2, "Ralph");
s.setString(3, "Johnson");
s.addBatch();

s.setInt(1, 4);
s.setString(2, "John");
s.setString(3, "Vlissides");
s.addBatch();

int[] result = s.executeBatch();

Or with jOOQ:

create.batch(
        insertInto(AUTHOR, ID, FIRST_NAME, LAST_NAME)
       .values((Integer) null, null, null))
      .bind(1, "Erich", "Gamma")
      .bind(2, "Richard", "Helm")
      .bind(3, "Ralph", "Johnson")
      .bind(4, "John", "Vlissides")
      .execute();

What you probably didn’t know, however, is how dramatic the improvement really is and that JDBC drivers like that of MySQL don’t really support batching, whereas Derby, H2, and HSQLDB don’t really seem to benefit from batching. James Sutherland has assembled this very interesting benchmark on his Java Persistence Performance blog, which can be summarised as such:

Database Performance gain when batched
DB2 503%
Derby 7%
H2 20%
HSQLDB 25%
MySQL 5%
MySQL 332% (with rewriteBatchedStatements=true)
Oracle 503%
PostgreSQL 325%
SQL Server 325%

The above table shows the improvement when comparing each database against itself for INSERT, not databases against each other. Regardless of the actual results, it can be said that batching is never worse than not batching for the data set sizes used in the benchmark.

See the full article here to see a more detailed interpretation of the above benchmark results, as well as results for UPDATE statements:
http://java-persistence-performance.blogspot.ch/2013/05/batch-writing-and-dynamic-vs.html

Detect JDBC API Misusage with JDBCLint

I’ve recently seen an advertisement for JDBCLint on the H2 User Group. JDBCLint is an Apache licensed JDBC proxy implementation that does some plausibility checks on the lifecycles of your JDBC objects. For instance, it

  • Checks if a ResultSet is closed twice
  • Checks if a ResultSet is not closed at all (in the finalizer)
  • Checks if a ResultSet yields unread columns

All of these checks can be disabled by specifying relevant properties on the proxy. And the best thing is that this proxy is so easy to integrate:

import com.maginatics.jdbclint.ConnectionProxy;
...
Connection connection =
    DriverManager.getConnection(...);
connection = ConnectionProxy.newInstance(
    connection, new Properties());
connection.close();
// reports error and optionally throws exception
connection.close();

In addition to static code analysis tools like FindBugs or Alvor, this tool can help you find very subtle memory leaks in your large legacy application.

Certainly a tool to have on your tool chain!

The Code That Made me Cry

A friend of mine recently told me about the kind of problems he’s currently struggling with in the legacy application he’s maintaining. Here’s a sample piece of code to illustrate what I’m talking about:

String q = "select replace('" +
            accountNo +
           "%','- ','-') from dual";
rs = stmt.executeQuery(q);
if (rs.next()) {
    accountNoFormatted = rs.getString(1);
}

It instantly made me cry. As in Code That Made Me Cry, or #CTMMC. If this is just a sample, I can imagine what the rest of the application looks like. As a matter of fact, these problems were the very reason why he thought he needed to sort out a couple of things first, before he could even think about introducing jOOQ or any other new technology in that application. Yes, there’s some serious teaching to be done (or slapping?)

If you’ve read through this article thus far without knowing what I’m talking about, then let me give you some advice. Please follow this advice to keep my friend from jumping out the window:

NEVER send such trivial logic to the database for execution!

I’ve recently blogged about various reasons why you should calculate / execute some stuff in the database. A simple string replacement is not one of those things! Heck, why risk the database round trip / network latency, connection and/or data transfer timeouts, and all sorts of other stuff for something that could be written as such in Java?

accountNo.replace("- ", "-");

The method even has the same name as the SQL function. Heck, why even go through the hassle of using the horrible JDBC API for this? Please, dear developer. Take 1h and study the entire list of methods available to java.lang.String. It’s such an awesome and completely underestimated class!

NEVER reformat previously formatted data

This is the rule of thumb: Once data is formatted, it is eternally lost and unavailable to computing / data processing. There is only one simple reason why anyone would ever format any data. It is for displaying data to human beings. Humans are not good at deciphering or memorizing things like

a56225e0-45ef-11e3-8f96-0800200c9a66

Humans are good at reading and memorizing things like:

My wife's bank account

So repeat after me. Once data is formatted, it is eternally lost and unavailable to computing / data processing. If the formatting was wrong in any way, then fix the formatting where it is wrong. NEVER re-format the previously formatted data. NEVER.

NEVER format data in the data access layer

Just as humans are incredibly bad at operating on long technical IDs, machines are incredibly bad at operating on formatted data. In fact, there are so few reasons to ever format data in the data access layer that it should probably not even occur to you. One acceptable reason is when you have a very very sophisticated, highly tuned report which runs in the DB. But you don’t have that, because you considered using the SQL replace() function to remove a whitespace from a Java string. That isn’t exactly sophisticated reporting.

So read after me. NEVER format data in the data access layer, unless you have a compelling technical reason for it. Your accountNo should remain as untouched and technical and ID-style as long as possible throughout your application. There is absolutely no need to format it for human consumption before the accountNo hits the UI.

OK, to be fair, there’s another exception to this rule. When you choose to sort data in the UI, then you might want to sort the data by the formatted version of the accountNo, as the sorting result will be consumed by a human:

SELECT ..
FROM accounts a
ORDER BY a.account_no_formatted

Be lazy

There is one very simple way to become a better programmer: Be lazy. Be too lazy to write 10 lines of code for a simple replacement of "- " by "-". By being so incredibly lazy, you will always think:

There HAS to be a better way to write this

There’s nothing wrong with not knowing. But there’s everything wrong with using the path of least resistance and writing 10 lines of code for something as trivial as this. Believe me. Your life will be so much better, once trivial stuff can be written in one-liners. You can focus again on your business logic.

Apache Derby About to Adopt the Awesome SQL:2003 MERGE Statement

Apache Derby is one out of three popular Java embeddable databases (apart from H2 and HSQLDB). It is very SQL and JDBC standards-compliant, but maybe a bit behind on developments of more advanced SQL features. Around 6 years after its first submission, there has now been some action on the Apache Derby DERBY-3155 ticket, recently. Rick Hillegas has attached a first, promising draft for the MERGE statement specification, which can be seen here:
https://issues.apache.org/jira/secure/attachment/12597795/MergeStatement.html

Among all 14 of the SQL databases supported by jOOQ, Derby would thus be the 8th to support the SQL:2003 MERGE statement, lining up with:

Other databases support proprietary versions of MERGE:

… or other forms of UPSERT:

Visit DERBY-3155 and show the maintainers some love for implementing this awesome and powerful SQL statement!

The Lame Side of Java’s Backwards-Compatibility

Java is a very backwards-compatible language. Very as in very very very. It is so backwards compatible, we still have tons of deprecated code that was deprecated in the JDK 1.1. For example, most of the java.util.Date and java.util.Calendar API. Some may argue that it would’ve been easier to deprecate the classes altogether…

But things don’t get better as we’re approaching Java 8. Please, observe with me with a mixture of intrigue and disgust what is going to be added to the JDBC 4.2 specs:

“large”. As in “We should’ve made that a long instead of an int from the very beginning”. Luckily, Java 8 also introduces defender methods, such that the additions were done backwards-compatibly.

I wonder how many other places in the JDK should now have duplicate methods using the “large” term, because in the beginning, people chose int over long, when most processors were still 32bit, and it really did make a difference.

Also, I wonder what’ll happen when we run out of 64bit space in the year 2139, as mankind will reach the outer skirts of milky way. In order to be able to write the occasional planet-migration script, we’ll have to add things like executeHugeUpdate() to the JDBC specs in Java 11 – if we’re optimistic that Java 11 will be shipped by then ;-)

For more info, you can see the up-to-date OpenJDK source code here:
http://hg.openjdk.java.net/lambda/lambda/jdk/file/tip/src/share/classes/java/sql/Statement.java