Free as in Beer has caused Heartbleed (and Much More)

heartbleedHeartbleed is a bit over one month old now. A bug significant enough to have its own Wikipedia page. Today, we’re going to look into how wrong we have been in assuming that Open Source software is more secure than commercial software, because of our thinking that source code is open and that many developers are looking into it.

Free as in Beer

One of the core principles of Open Source software is, well, that it is open and that this openness is free in one way or another. This allows us developers to browse source code of third-party software and libraries for various reasons:

  • To learn from it
  • To copy it (under the terms of the respective license)
  • To modify it (under the terms of the respective license)
  • To verify it

Proprietary software often does not have the above attributes in exchange for warranties. When you read through the millions of lines of unintelligible legal Microsoft text, for instance, you will see that Microsoft gives a couple of guarantees, also with respect to security and how security flaws are remedied.

The warranty that is legally shouted at you by OpenSSL is this one:

THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT “AS IS” AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In fact, you don’t have any warranties. Specifically, there is no liability for direct or indirect loss of data, as has happened because of Heartbleed.

Obviuous, right? Because it’s free as in beer, any consequences resulting from the Heartbleed bug are entirely your own fault, if you’re using OpenSSL, or any other open source software that includes OpenSSL under similar terms.

Yes. You should have known that you were vulnerable, because you actually could verify it. Did you? Of course not. Did we? No way we’re delving through all that C code. Did our “suppliers”? We don’t even know. We’re using WinSCP extensively, and check this out, WinSCP was affected by Heartbleed! The developers over at WinSCP have been nice enough to fix this issue quickly and release a new version. They didn’t have to do this. You know what? Let’s hit that donate button, right now to thank them.

“They” should pay

OK, but again, wasn’t Heartbleed supposed to never even happen? Isn’t this Open Source? Doesn’t anyone (read: “they”. Because, I don’t have time) review such code, especially if it’s that widely used? Isn’t Open Source security much better than closed source security, which is essentially security through obscurity?

Eberhard Wolff, a well-known German freelance consultant and trainer recently replied this to me:

Yes, I’m sorry, Eberhard. But that’s just the case. Security is often neglected in many pieces of software, both commercial and open source. It’s not that the openness helps anyone discover things, security issues are very hard to discover per se. Also in commercial software. Remember GOTO fail? GOTO fail also affected SSL, but only in Apple software.

The issue lies elsewhere, though. Because Microsoft somehow manages to patch all security issues in virtually no time. They fix things so fast that users get frustrated about the sheer frequency of fixes ;-)

Cartoon by http://www.stickycomics.com/computer-update/
Cartoon by www.stickycomics.com

But Microsoft has a lot to lose. Pretty much 90% of desktop OS market share, that’s what they’ve got to lose. They’re paying a high amount of insurance money, invested in security teams that are penetration testing their own software to look for leaks. Yes. The vendor themselves are doing this. And yes, it costs money. And yes, you’ve already paid that when you bought Windows (or Office for that matter).

Money that OpenSSL never had. Read this frustrated section in the following public letter by Steve Marquess:

Thanks to that publicity there has been an outpouring of grassroots support from the OpenSSL user community, roughly two hundred donations this past week[3] along with many messages of support and encouragement[4]. Most were for $5 or $10 and, judging from the E-mail addresses and names, were from all around the world. I haven’t finished entering all of them to get an exact total, but all those donations together come to about US$9,000. Even if those donations continue to arrive at the same rate indefinitely (they won’t), and even though every penny of those funds goes directly to OpenSSL team members, it is nowhere near enough to properly sustain the manpower levels needed to support such a complex and critical software product. While OpenSSL does “belong to the people” it is neither realistic nor appropriate to expect that a few hundred, or even a few thousand, individuals provide all the financial support. The ones who should be contributing real resources are the commercial companies[5] and governments[6] who use OpenSSL extensively and take it for granted.

I can feel with Steve. US$9,000 to improve OpenSSL, a software that has added so much value to all of our servers and clients. That’s hardly enough for even 2-3 bug fixes, depending on whose salaries you’re paying.

But Steve tells off the “commercial companies” and “governments” to not have paid enough. But who are these “commercial companies”?

Meanwhile, at Data Geekery

We know similar stories, of course. When we have transitioned from completely Open Source to dual-licensed, we heard a lot of complaints by users that we have never heard of up to that moment. Users who have never donated or contributed code, bug reports, manual improvements or anything. This is OK. We were giving jOOQ away for free under the terms of the ASL 2.0 as part of a extended market analysis. We didn’t expect donations.

Of course, we have also disappointed one or two people who were hoping that jOOQ would remain “free as in freedom”, who have contributed, who have participated, and who now felt deceived. And we’re genuinely sorry about that. But participation doesn’t pay for our bills, money does. So we reached out for money (luckily, we always owned the code as we paid contributors of larger contributions, and had them all sign a CLA, so we could actually do this step, from a legal perspective).

But let’s again focus on the people looking for free beer. No contributions. No feedback. Free rides. And then, when asked for money, they’re pissed (even 8 months later!).

Fair enough. Our price has risen for that particular user. From free to something. They, too, have a right to be frustrated about this change, which they might not have expected. But they, too, have taken “free” for granted without proper reason. Without verifying.

“They”

We don’t believe that “THEY” should pay for our licenses. We don’t think that “THEY” should pay for enterprise support. There is no “THEY” in free software, there’s only all of us, because who decides which project is really important enough to deserve some funding from “THEM” and which one isn’t? All attempts of “fairly” distributing money will inevitably lead to corruption, misuse, abuse, and eventually, to a lack of innovation. We’ve known that from other industries.

So, let’s simply establish two facts:

  • Free software is lying around in the streets. It is publicly acclaimed to be AS IS software. If we’re using it, it is our own risk. And if it goes wrong, it is our own fault. No one else’s. Not the big companies’ (who are already investing tons of money in Open Source software), and not the governments’ (same thing there). Stop pretending that FOSS is in any way better or less of a legal risk than commercial software. That’s just not true.
  • In fact, there is no such thing as free software. “Free” is a price we’re paying as a down payment (or non-payment). The costs will or might arise later. If we’re lucky, the thing will remain “free”. But if we’re professionals, then we’ll insure ourselves against any risk arising from free “AS IS” software. This means that we’ll give back (= “pay”) in another way. Through donations, through bug fixes, through verification.

Further reading

Bruno Borges from Oracle has expressed his very interesting views and counter measures to the current state and meaning of Open Source software in this blog post.

Java 8 Friday: API Designers, be Careful

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Lean Functional API Design

With Java 8, API design has gotten a whole lot more interesting, but also a bit harder. As a successful API designer, it will no longer suffice to think about all sorts of object-oriented aspects of your API, you will now also need to consider functional aspects of it. In other words, instead of simply providing methods like:

void performAction(Parameter parameter);

// Call the above:
object.performAction(new Parameter(...));

… you should now think about whether your method arguments are better modelled as functions for lazy evaluation:

// Keep the existing method for convenience
// and for backwards compatibility
void performAction(Parameter parameter);

// Overload the existing method with the new
// functional one:
void performAction(Supplier<Parameter> parameter);

// Call the above:
object.performAction(() -> new Parameter(...));

This is great. Your API can be Java-8 ready even before you’re actually targeting Java 8. But if you’re going this way, there are a couple of things to consider.

JDK dependency

The above example makes use of the JDK 8 Supplier type. This type is not available before the JDK 8, so if you’re using it, you’re going to limit your APIs use to the JDK 8. If you want to continue supporting older Java versions, you’ll have to roll your own supplier, or maybe use Callable, which has been available since Java 5:

// Overload the existing method with the new
// functional one:
void performAction(Callable<Parameter> parameter);

// Call the above:
object.performAction(() -> new Parameter(...));

One advantage of using Callable is the fact that your lambda expressions (or “classic” Callable implementations, or nested / inner classes) are allowed to throw checked exceptions. We’ve blogged about another possibility to circumvent this limitation, here.

Overloading

While it is (probably) perfectly fine to overload these two methods

void performAction(Parameter parameter);
void performAction(Supplier<Parameter> parameter);

… you should stay wary when overloading “more similar” methods, like these ones:

void performAction(Supplier<Parameter> parameter);
void performAction(Callable<Parameter> parameter);

If you produce the above API, your API’s client code will not be able to make use of lambda expressions, as there is no way of disambiguating a lambda that is a Supplier from a lambda that is a Callable. We’ve also mentioned this in a previous blog post.

“void-compatible” vs “value-compatible”

I’ve recently (re-)discovered this interesting early JDK 8 compiler bug, where the compiler wasn’t able to disambiguate the following:

void run(Consumer<Integer> consumer);
void run(Function<Integer, Integer> function);

// Remember, the above types are roughly:
interface Consumer<T> {
    void accept(T t);
//  ^^^^ void-compatible
}

interface Function<T, R> {
    R apply(T t);
//  ^ value-compatible
}

The terms “void-compatible” and “value-compatible” are defined in the JLS §15.27.2 for lambda expressions. According to the JLS, the following two calls are not ambiguous:

// Only run(Consumer) is applicable
run(i -> {});

// Only run(Function) is applicable
run(i -> 1);

In other words, it is safe to overload a method to take two “similar” argument types, such as Consumer and Function, as lambda expressions used to express method arguments will not be ambiguous.

This is quite useful, because having an optional return value is very elegant when you’re using lambda expressions. Consider the upcoming jOOQ 3.4 transaction API, which is roughly summarised as such:


// This uses a "void-compatible" lambda
ctx.transaction(c -> {
    DSL.using(c).insertInto(...).execute();
    DSL.using(c).update(...).execute();
});

// This uses a "value-compatible" lambda
Integer result =
ctx.transaction(c -> {
    DSL.using(c).update(...).execute();
    DSL.using(c).delete(...).execute();

    return 42;
});

In the above example, the first call resolves to TransactionalRunnable whereas the second call resolves to TransactionalCallable whose API are like these:

interface TransactionalRunnable {
    void run(Configuration c) throws Exception;
}

interface TransactionalCallable<T> {
    T run(Configuration c) throws Exception;
}

Note, though, that as of JDK 1.8.0_05 and Eclipse Kepler (with the Java 8 support patch), this ambiguity resolution does not yet work because of these bugs:

So, in order to stay on the safe side, maybe you could just simply avoid overloading.

Generic methods are not SAMs

Do note that “SAM” interfaces that contain a single abstract generic method are NOT SAMs in the sense for them to be eligible as lambda expression targets. The following type will never form any lambda expression:

interface NotASAM {
    <T> void run(T t);
}

This is specified in the JLS §15.27.3

A lambda expression is congruent with a function type if all of the following are true:

  • The function type has no type parameters.
  • [ … ]

What do you have to do now?

If you’re an API designer, you should now start writing unit tests / integration tests also in Java 8. Why? For the simple reason that if you don’t you’ll get your API wrong in subtle ways for those users that are actually using it with Java 8. These things are extremely subtle. Getting them right takes a bit of practice and a lot of regression tests. Do you think you’d like to overload a method? Be sure you don’t break client API that is calling the original method with a lambda.

That’s it for today. Stay tuned for more awesome Java 8 content on this blog.

SQL Server Trick: Circumvent Missing ORDER BY Clause

SQL Server is known to have a very strict interpretation of the SQL standard. For instance, the following expressions or statements are not possible in SQL Server:

-- Get arbitrarily numbered row_numbers
SELECT ROW_NUMBER() OVER ()

-- Skip arbitrary rows
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
OFFSET 3 ROWS

Strictly speaking, that limitation makes sense because the above ROW_NUMBER() or OFFSET expressions are non-deterministic. Two subsequent executions of the same query might produce different results. But then again, any ORDER BY clause is non-deterministic, if you do not order by a strictly UNIQUE expression, such as a primary key.

So, that’s a bit of a pain, because other databases aren’t that strict and after all, you might just not care about explicit ordering for a quick, ad-hoc query, so a “reasonable”, lenient default would be useful.

Constant ORDER BY clauses don’t work

You cannot add a constant ORDER BY clause to window functions either. I.e.:

-- This doesn't work:
SELECT ROW_NUMBER() OVER (ORDER BY 'a')

-- But this does!
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY 'a'
OFFSET 3 ROWS

Note that ORDER BY 'a' uses a constant VARCHAR expression, not a numeric one, as that would be generating column-reference-by-index expressions, which would be non-constant in the second example.

Random column references don’t work

So you’re thinking that you can just add a random column reference? Sometimes you can, but often you cannot:

-- This doesn't work:
SELECT ROW_NUMBER() OVER (
  ORDER BY [no-column-available-here]
)

-- But this does!
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY a
OFFSET 3 ROWS

The above examples show that you do not always have a column reference available in any given SQL expression. There is no useful column that you could refer to from the ROW_NUMBER() function. At the same time, you can write ORDER BY a in the second example, but only if a is a “comparable” value, i.e. not a LOB, such as text or image.

Besides, as we don’t really care about the actual ordering, is it worth ordering the result set by anything at all? Do you happen to have an index on a?

Quasi-constant ORDER BY expressions do work

So, to stay on the safe side, if ever you need a dummy ORDER BY expression in SQL Server, use a quasi-constant expression, like @@version (or @@language, or any of these). The following will always work:

-- This always works:
SELECT ROW_NUMBER() OVER (ORDER BY @@version)

-- So does this:
SELECT a
FROM (VALUES (1), (2), (3), (4)) t(a)
ORDER BY @@version
OFFSET 3 ROWS

From the upcoming jOOQ 3.4, we’ll also generate such synthetic ORDER BY clauses that will help you simplify writing vendor-agnostic SQL in these edge-cases, as we believe that you simply shouldn’t think of these things all the time.

The Index You’ve Added is Useless. Why?

Recently, at the office:

Bob: I’ve looked into that slow query you’ve told me about yesterday, Alice. I’ve added the indexes you wanted. Everything should be fine now

Alice: Thanks Bob. I’ll quickly check … Nope Bob, still slow, it didn’t seem to work

Bob: You’re right Alice! It looks like Oracle isn’t picking up the index, for your query even if I add an /*+INDEX(...)*/ hint. I don’t know what went wrong!?

And so, the story continues. Alice is frustrated because her feature doesn’t ship on time, Bob is frustrated because he thinks that Oracle doesn’t work right.

True story!

Bob Forgot about Oracle and NULL

Poor Bob forgot (or didn’t know) that Oracle doesn’t put NULL values in “ordinary” indexes. Think about it this way:

CREATE TABLE person (
  id            NUMBER(38)   NOT NULL PRIMARY KEY,
  first_name    VARCHAR2(50) NOT NULL,
  last_name     VARCHAR2(50) NOT NULL,
  date_of_birth DATE             NULL
);

CREATE INDEX i_person_dob ON person(date_of_birth);

Now, Bob thinks that his index solves all problems, because he verified if the index worked using the following query:

SELECT * 
FROM   person
WHERE  date_of_birth > DATE '1980-01-01';

(of course, you generally shouldn’t SELECT *)

And the execution plan looked alright:

----------------------------------------------------
| Id  | Operation                   | Name         |
----------------------------------------------------
|   0 | SELECT STATEMENT            |              |
|   1 |  TABLE ACCESS BY INDEX ROWID| PERSON       |
|*  2 |   INDEX RANGE SCAN          | I_PERSON_DOB |
----------------------------------------------------

This is because Bob’s predicate doesn’t rely on NULL being part of the I_PERSON_DOB index. Unfortunately, Alice’s query looked more like this (simplified version):

SELECT 1 
FROM   dual
WHERE  DATE '1980-01-01' NOT IN (
  SELECT date_of_birth FROM person
);

So, essentially, Alice’s query checked if anyone had their date of birth at a given date. Her execution plan looked like this:

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|   2 |   FAST DUAL        |        |
|*  3 |   TABLE ACCESS FULL| PERSON |
-------------------------------------

As you can see, her query made a TABLE ACCESS FULL operation, bypassing the index. Why? It’s simple:

Even if our DATE '1980-01-01' value is or is not in the index, we’ll still have to check the whole table to see whether a single NULL value is contained in the date_of_birth column. Because, if there was a NULL value, the NOT IN predicate in Alice’s query would never yield TRUE or FALSE, but NULL.

Alice can solve this issue with NOT EXISTS

Alice can solve it easily herself, by replacing NOT IN through NOT EXISTS, a predicate that doesn’t suffer from SQL’s peculiar three-valued boolean logic.

SELECT 1
FROM   dual
WHERE  NOT EXISTS (
  SELECT 1
  FROM   person
  WHERE  date_of_birth = DATE '1980-01-01'
);

This new query now again yields an optimal plan:

------------------------------------------
| Id  | Operation         | Name         |
------------------------------------------
|   0 | SELECT STATEMENT  |              |
|*  1 |  FILTER           |              |
|   2 |   FAST DUAL       |              |
|*  3 |   INDEX RANGE SCAN| I_PERSON_DOB |
------------------------------------------

But the problem still exists, because what can happen, will happen, and Alice will have to remember this issue for every single query she writes.

Bob should just set the column to NOT NULL

The best solution, however is to simply set the column to NOT NULL:

ALTER TABLE person 
MODIFY date_of_birth DATE NOT NULL;

With this constraint, the NOT IN query is exactly equivalent to the NOT EXISTS query, and Bob and Alice can be friends again.

Takeaway: How to find “bad” columns?

It’s easy. The following useful query lists all indexes that have at least one nullable column in them.

SELECT 
  i.table_name,
  i.index_name,
  LISTAGG(
    LPAD(i.column_position,  2) || ': ' || 
    RPAD(i.column_name    , 30) || ' '  ||
    DECODE(t.nullable, 'Y', '(NULL)', '(NOT NULL)'), 
    ', '
  ) WITHIN GROUP (ORDER BY i.column_position) 
    AS "NULLABLE columns in indexes"
FROM user_ind_columns i
JOIN user_tab_cols t
ON (t.table_name, t.column_name) = 
  ((i.table_name, i.column_name))
WHERE EXISTS (
  SELECT 1
  FROM user_tab_cols t
  WHERE (t.table_name, t.column_name, t.nullable) = 
       ((i.table_name, i.column_name, 'Y'       ))
)
GROUP BY i.table_name, i.index_name
ORDER BY i.index_name ASC;

When run against Bob and Alice’s schema, the above query yields:

TABLE_NAME | INDEX_NAME   | NULLABLE columns in indexes
-----------+--------------+----------------------------
PERSON     | I_PERSON_DOB | 1: DATE_OF_BIRTH (NULL)

Use this query on your own schema now, and go through the results, carefully evaluating if you really need to keep that column nullable. In 50% of the cases, you don’t. By adding a NOT NULL constraint, you can tremendously speed up your application!

Java 8 Friday: Language Design is Subtle

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Language Design is Subtle

It’s been a busy week for us. We have just migrated the jOOQ integration tests to Java 8 for two reasons:

  • We want to be sure that client code compiles with Java 8
  • We started to get bored of writing the same old loops over and over again

The trigger was a loop where we needed to transform a SQLDialect[] into another SQLDialect[] calling .family() on each array element. Consider:

Java 7

SQLDialect[] families = 
    new SQLDialect[dialects.length];
for (int i = 0; i < families.length; i++)
    families[i] = dialects[i].family();

Java 8

SQLDialect[] families = 
Stream.of(dialects)
      .map(d -> d.family())
      .toArray(SQLDialect[]::new);

OK, it turns out that the two solutions are equally verbose, even if the latter feels a bit more elegant. :-)

And this gets us straight into the next topic:

Backwards-compatibility

For backwards-compatibility reasons, arrays and the pre-existing Collections API have not been retrofitted to accommodate all the useful methods that Streams now have. In other words, an array doesn’t have a map() method, just as much as List doesn’t have such a method. Streams and Collections/arrays are orthogonal worlds. We can transform them into each other, but they don’t have a unified API.

This is fine in everyday work. We’ll get used to the Streams API and we’ll love it, no doubt. But because of Java being extremely serious about backwards compatibility, we will have to think about one or two things more deeply.

Recently, we have published a post about The Dark Side of Java 8. It was a bit of a rant, although a mild one in our opinion (and it was about time to place some criticism, after all the praise we’ve been giving Java 8 in our series, before ;-) ). First off, that post triggered a reaction by Edwin Dalorzo from our friends at Informatech. (Edwin has written this awesome post comparing LINQ and Java 8 Streams, before). The criticism in our article evolved around three main aspects:

  • Overloading getting more complicated (see also this compiler bug)
  • Limited support for method modifiers on default methods
  • Primitive type “API overloads” for streams and functional interfaces

A response by Brian Goetz

I then got a personal mail from no one less than Brian Goetz himself (!), who pointed out a couple of things to me that I had not yet thought about in this way:

I still think you’re focusing on the wrong thing. Its not really the syntax you don’t like; its the model — you don’t want “default methods”, you want traits, and the syntax is merely a reminder that you didn’t get the feature you wanted. (But you’d be even more confused about “why can’t they be final” if we dropped the “default” keyword!) But that’s blaming the messenger (where here, the keyword is the messenger.)

Its fair to say “this isn’t the model I wanted”. There were many possible paths in the forest, and it may well be the road not taken was equally good or better.

This is also what Edwin had concluded. Default methods were a necessary means to tackle all the new API needed to make Java 8 useful. If Iterator, Iterable, List, Collection, and all the other pre-existing interfaces had to be adapted to accommodate lambdas and Streams API interaction, the expert group would have needed to break an incredible amount of API. Conversely, without adding these additional utility methods (see the awesome new Map methods, for instance!), Java 8 would have been only half as good.

And that’s it.

Even if maybe, some more class building tools might have been useful, they were not in the center of focus for the expert group who already had a lot to do to get things right. The center of focus was to provide a means for API evolution. Or in Brian Goetz’s own words:

Reaching out to the community

It’s great that Brian Goetz reaches out to the community to help us get the right picture about Java 8. Instead of explaining rationales about expert group decisions in private messages, he then asked me to publicly re-ask my questions again on Stack Overflow (or lambda-dev), such that he can then publicly answer them. For increased publicity and greater community benefit, I chose Stack Overflow. Here are:

The amount of traction these two questions got in no time shows how important these things are to the community, so don’t miss reading through them!

“Uncool”? Maybe. But very stable!

Java may not have the “cool” aura that node.js has. You may think about JavaScript-the-language whatever you want (as long as it contains swear words), but from a platform marketing perspective, Java is being challenged for the first time in a long time – and being “uncool” and backwards-compatible doesn’t help keeping developers interested.

But let’s think long-term, instead of going with trends. Having such a great professional platform like the Java language, the JVM, the JDK, JEE, and much more, is invaluable. Because at the end of the day, the “uncool” backwards-compatibility can also be awesome. As mentioned initially, we have upgraded our integration tests to Java 8. Not a single compilation error, not a single bug. Using Eclipse’s BETA support for Java 8, I could easily transform anonymous classes into lambdas and write awesome things like these upcoming jOOQ 3.4 nested transactions (API not final yet):

ctx.transaction(c1 -> {
    DSL.using(c1)
       .insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
       .values(3, "Doe")
       .execute();

    // Implicit savepoint here
    try {
        DSL.using(c1).transaction(c2 -> {
            DSL.using(c2)
               .update(AUTHOR)
               .set(AUTHOR.FIRST_NAME, "John")
               .where(AUTHOR.ID.eq(3))
               .execute();

            // Rollback to savepoint
            throw new MyRuntimeException("No");
        });
    }

    catch (MyRuntimeException ignore) {}

    return 42;
});

So at the end of the day, Java is great. Java 8 is a tremendous improvement over previous versions, and with great people in the expert groups (and reaching out to the community on social media), I trust that Java 9 will be even better. In particular, I’m looking forward to learning about how these two projects evolve:

Although, again, I am really curious how they will pull these two improvements off from a backwards-compatibility perspective, and what caveats we’ll have to understand, afterwards. ;-)

Anyway, let’s hope the expert groups will continue to provide public feedback on Stack Overflow. Stay tuned for more awesome Java 8 content on this blog.

How to Implement Sort Indirection in SQL

I’ve recently stumbled upon this interesting Stack Overflow question, where the user essentially wanted to ensure that resulting records are delivered in a well-defined order.

They wrote

SELECT name
FROM product
WHERE name IN ('CE367FAACDHCANPH-151556',
               'CE367FAACEX9ANPH-153877',
               'NI564FAACJSFANPH-162605',
               'GE526OTACCD3ANPH-149839')

They got

CE367FAACDHCANPH-151556
CE367FAACEX9ANPH-153877
GE526OTACCD3ANPH-149839
NI564FAACJSFANPH-162605

They wanted

CE367FAACDHCANPH-151556
CE367FAACEX9ANPH-153877
NI564FAACJSFANPH-162605
GE526OTACCD3ANPH-149839

Very often, according to your business rules, sorting orders are not “natural”, as in numeric sorting or in alpha-numeric sorting. Some business rule probably specified that GE526OTACCD3ANPH-149839 needs to appear last in a list. Or the user might have re-arranged product names in their screen with drag and drop, producing new sort order.

We could discuss, of course, if such sorting should be performed in the UI layer or not, but let’s assume that the business case or the performance requirements or the general architecture needed for this sorting to be done in the database. How to do it? Through…

Sort Indirection

In fact, you don’t want to sort by the product name, but by a pre-defined enumeration of such names. In other words, you want a function like this:

CE367FAACDHCANPH-151556 -> 1
CE367FAACEX9ANPH-153877 -> 2
NI564FAACJSFANPH-162605 -> 3
GE526OTACCD3ANPH-149839 -> 4

With plain SQL, there are many ways to do the above. Here are two of them (also seen in my Stack Overflow answer):

By using a CASE expression

You can tell the database the explicit sort indirection easily, using a CASE expression in your ORDER BY clause:

SELECT name
FROM product
WHERE name IN ('CE367FAACDHCANPH-151556',
               'CE367FAACEX9ANPH-153877',
               'NI564FAACJSFANPH-162605',
               'GE526OTACCD3ANPH-149839')
ORDER BY 
  CASE WHEN name = 'CE367FAACDHCANPH-151556' THEN 1
       WHEN name = 'CE367FAACEX9ANPH-153877' THEN 2
       WHEN name = 'NI564FAACJSFANPH-162605' THEN 3
       WHEN name = 'GE526OTACCD3ANPH-149839' THEN 4
  END

Note that I’ve used the CASE WHEN predicate THEN value END syntax, because this is implemented in all SQL dialects. Alternatively (if you’re not using Apache Derby), you could also save some characters when typing, writing:

ORDER BY 
  CASE name WHEN 'CE367FAACDHCANPH-151556' THEN 1
            WHEN 'CE367FAACEX9ANPH-153877' THEN 2
            WHEN 'NI564FAACJSFANPH-162605' THEN 3
            WHEN 'GE526OTACCD3ANPH-149839' THEN 4
  END

Of course, this requires repeating the same values in the predicate and in the sort indirection. This is why, in some cases, you might be more lucky …

By using INNER JOIN

In the following example, the predicate and the sort indirection are taken care of in a simple derived table that is INNER JOIN‘ed to the original query:

SELECT product.name
FROM product
JOIN (
  VALUES('CE367FAACDHCANPH-151556', 1),
        ('CE367FAACEX9ANPH-153877', 2),
        ('NI564FAACJSFANPH-162605', 3),
        ('GE526OTACCD3ANPH-149839', 4)
) AS sort (name, sort)
ON product.name = sort.name
ORDER BY sort.sort

The above example is using PostgreSQL syntax, but you might be able to implement the same in a different way in your database.

Using jOOQ’s sort indirection API

Sort indirection is a bit tedious to write out, which is why jOOQ has a special syntax for this kind of use-case, which is also documented in the manual. Any of the following statements perform the same as the above query:

// jOOQ generates 1, 2, 3, 4 as values in the
// generated CASE expression
DSL.using(configuration)
   .select(PRODUCT.NAME)
   .from(PRODUCT)
   .where(NAME.in(
      "CE367FAACDHCANPH-151556",
      "CE367FAACEX9ANPH-153877",
      "NI564FAACJSFANPH-162605",
      "GE526OTACCD3ANPH-149839"
   ))
   .orderBy(PRODUCT.NAME.sortAsc(
      "CE367FAACDHCANPH-151556",
      "CE367FAACEX9ANPH-153877",
      "NI564FAACJSFANPH-162605",
      "GE526OTACCD3ANPH-149839"
   ))
   .fetch();

// You can choose your own indirection values to
// be generated in the CASE expression
   .orderBy(PRODUCT.NAME.sort(
      new HashMap<String, Integer>() {{
        put("CE367FAACDHCANPH-151556", 2);
        put("CE367FAACEX9ANPH-153877", 3);
        put("NI564FAACJSFANPH-162605", 5);
        put("GE526OTACCD3ANPH-149839", 8);
      }}
   ))

Conclusion

Sort indirection is a nice trick to have up your sleeves every now and then. Never forget that you can put almost arbitrary column expressions in your SQL statement’s ORDER BY clause. Use them!

Three-State Booleans in Java

Every now and then, I miss SQL’s three-valued BOOLEAN semantics in Java. In SQL, we have:

  • TRUE
  • FALSE
  • UNKNOWN (also known as NULL)

Every now and then, I find myself in a situation where I wish I could also express this UNKNOWN or UNINITIALISED semantics in Java, when plain true and false aren’t enough.

Implementing a ResultSetIterator

For instance, when implementing a ResultSetIterator for jOOλ, a simple library modelling SQL streams for Java 8:

SQL.stream(stmt, Unchecked.function(r ->
    new SQLGoodies.Schema(
        r.getString("FIELD_1"),
        r.getBoolean("FIELD_2")
    )
))
.forEach(System.out::println);

In order to implement a Java 8 Stream, we need to construct an Iterator, which we can then pass to the new Spliterators.spliteratorUnknownSize() method:

StreamSupport.stream(
  Spliterators.spliteratorUnknownSize(iterator, 0), 
  false
);

Another example for this can be seen here on Stack Overflow.

When implementing the Iterator interface, we must implement hasNext() and next(). Note that with Java 8, remove() now has a default implementation, so we don’t need to implement it any longer.

While most of the time, a call to next() is preceded by a call to hasNext() exactly once, nothing in the Iterator contract requires this. It is perfectly fine to write:

if (it.hasNext()) {
    // Some stuff

    // Double-check again to be sure
    if (it.hasNext() && it.hasNext()) {

        // Yes, we're paranoid
        if (it.hasNext())
            it.next();
    }
}

How to translate the Iterator calls to backing calls on the JDBC ResultSet? We need to call ResultSet.next().

We could make the following translation:

  • Iterator.hasNext() == !ResultSet.isLast()
  • Iterator.next() == ResultSet.next()

But that translation is:

  • Expensive
  • Not dealing correctly with empty ResultSets
  • Not implemented in all JDBC drivers (Support for the isLast method is optional for ResultSets with a result set type of TYPE_FORWARD_ONLY)

So, we’ll have to maintain a flag, internally, that tells us:

  • If we had already called ResultSet.next()
  • What the result of that call was

Instead of creating a second variable, why not just use a three-valued java.lang.Boolean. Here’s a possible implementation from jOOλ:

class ResultSetIterator<T> implements Iterator<T> {

    final Supplier<? extends ResultSet>  supplier;
    final Function<ResultSet, T>         rowFunction;
    final Consumer<? super SQLException> translator;

    /**
     * Whether the underlying {@link ResultSet} has
     * a next row. This boolean has three states:
     * <ul>
     * <li>null:  it's not known whether there 
     *            is a next row</li>
     * <li>true:  there is a next row, and it
     *            has been pre-fetched</li>
     * <li>false: there aren't any next rows</li>
     * </ul>
     */
    Boolean hasNext;
    ResultSet rs;

    ResultSetIterator(
        Supplier<? extends ResultSet> supplier, 
        Function<ResultSet, T> rowFunction, 
        Consumer<? super SQLException> translator
    ) {
        this.supplier = supplier;
        this.rowFunction = rowFunction;
        this.translator = translator;
    }

    private ResultSet rs() {
        return (rs == null) 
             ? (rs = supplier.get()) 
             :  rs;
    }

    @Override
    public boolean hasNext() {
        try {
            if (hasNext == null) {
                hasNext = rs().next();
            }

            return hasNext;
        }
        catch (SQLException e) {
            translator.accept(e);
            throw new IllegalStateException(e);
        }
    }

    @Override
    public T next() {
        try {
            if (hasNext == null) {
                rs().next();
            }

            return rowFunction.apply(rs());
        }
        catch (SQLException e) {
            translator.accept(e);
            throw new IllegalStateException(e);
        }
        finally {
            hasNext = null;
        }
    }
}

As you can see, the hasNext() method locally caches the hasNext three-valued boolean state only if it was null before. This means that calling hasNext() several times will have no effect until you call next(), which resets the hasNext cached state.

Both hasNext() and next() advance the ResultSet cursor if needed.

Readability?

Some of you may argue that this doesn’t help readability. They’d introduce a new variable like:

boolean hasNext;
boolean hasHasNextBeenCalled;

The trouble with this is the fact that you’re still implementing three-valued boolean state, but distributed to two variables, which are very hard to name in a way that is truly more readable than the actual java.lang.Boolean solution. Besides, there are actually four state values for two boolean variables, so there is a slight increase in the risk of bugs.

Every rule has its exception. Using null for the above semantics is a very good exception to the null-is-bad histeria that has been going on ever since the introduction of Option / Optional

In other words: Which approach is best? There’s no TRUE or FALSE answer, only UNKNOWN ;-)

Be careful with this

However, as we’ve discussed in a previous blog post, you should avoid returning null from API methods if possible. In this case, using null explicitly as a means to model state is fine because this model is encapsulated in our ResultSetIterator. But try to avoid leaking such state to the outside of your API.