Archive by Author | lukaseder

Top 10 Easy Performance Optimisations in Java


There has been a lot of hype about the buzzword “web scale“, and people are going through lengths of reorganising their application architecture to get their systems to “scale”.

But what is scaling, and how can we make sure that we can scale?

Different aspects of scaling

The hype mentioned above is mostly about scaling load, i.e. to make sure that a system that works for 1 user will also work well for 10 users, or 100 users, or millions. Ideally, your system is as “stateless” as possible such that the few pieces of state that really remain can be transferred and transformed on any processing unit in your network. When load is your problem, latency is probably not, so it’s OK if individual requests take 50-100ms. This is often also referred to as scaling out

An entirely different aspect of scaling is about scaling performance, i.e. to make sure that an algorithm that works for 1 piece of information will also work well for 10 pieces, or 100 pieces, or millions. Whether this type of scaling is feasible is best described by Big O Notation. Latency is the killer when scaling performance. You want to do everything possible to keep all calculation on a single machine. This is often also referred to as scaling up

If there was anything like free lunch (there isn’t), we could indefinitely combine scaling up and out. Anyway, today, we’re going to look at some very easy ways to improve things on the performance side.

Big O Notation

Java 7’s ForkJoinPool as well as Java 8’s parallel Stream help parallelising stuff, which is great when you deploy your Java program onto a multi-core processor machine. The advantage of such parallelism compared to scaling across different machines on your network is the fact that you can almost completely eliminate latency effects, as all cores can access the same memory.

But don’t be fooled by the effect that parallelism has! Remember the following two things:

  • Parallelism eats up your cores. This is great for batch processing, but a nightmare for asynchronous servers (such as HTTP). There are good reasons why we’ve used the single-thread servlet model in the past decades. So parallelism only helps when scaling up.
  • Parallelism has no effect on your algorithm’s Big O Notation. If your algorithm is O(n log n), and you let that algorithm run on c cores, you will still have an O(n log n / c) algorithm, as c is an insignificant constant in your algorithm’s complexity. You will save wall-clock time, but not reduce complexity!

The best way to improve performance, of course, is by reducing algorithm complexity. The killer is achieve O(1) or quasi-O(1), of course, for instance a HashMap lookup. But that is not always possible, let alone easy.

If you cannot reduce your complexity, you can still gain a lot of performance if you tweak your algorithm where it really matters, if you can find the right spots. Assume the following visual representation of an algorithm:

algorithm

The overall complexity of the algorithm is O(N3), or O(N x O x P) if we want to deal with individual orders of magnitude. However, when profiling this code, you might find a funny scenario:

  • On your development box, the left branch (N -> M -> Heavy operation) is the only branch that you can see in your profiler, because the values for O and P are small in your development sample data.
  • On production, however, the right branch (N -> O -> P -> Easy operation or also N.O.P.E.) is really causing trouble. Your operations team might have figured this out using AppDynamics, or DynaTrace, or some similar software.

Without production data, you might quickly jump to conclusions and optimise the “heavy operation”. You ship to production and your fix has no effect.

There are no golden rules to optimisation apart from the facts that:

  • A well-designed application is much easier to optimise
  • Premature optimisation will not solve any performance problems, but make your application less well-designed, which in turn makes it harder to be optimised

Enough theory. Let’s assume that you have found the right branch to be the issue. It may well be that a very easy operation is blowing up in production, because it is called lots and lots of times (if N, O, and P are large). Please read this article in the context of there being a problem at the leaf node of an inevitable O(N3) algorithm. These optimisations won’t help you scale. They’ll help you save your customer’s day for now, deferring the difficult improvement of the overall algorithm until later!

Here are the top 10 easy performance optimisations in Java:

1. Use StringBuilder

This should be your default in almost all Java code. Try to avoid the + operator. Sure, you may argue that it is just syntax sugar for a StringBuilder anyway, as in:

String x = "a" + args.length + "b";

… which compiles to

 0  new java.lang.StringBuilder [16]
 3  dup
 4  ldc <String "a"> [18]
 6  invokespecial java.lang.StringBuilder(java.lang.String) [20]
 9  aload_0 [args]
10  arraylength
11  invokevirtual java.lang.StringBuilder.append(int) : java.lang.StringBuilder [23]
14  ldc <String "b"> [27]
16  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [29]
19  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [32]
22  astore_1 [x]

But what happens, if later on, you need to amend your String with optional parts?

String x = "a" + args.length + "b";

if (args.length == 1)
    x = x + args[0];

You will now have a second StringBuilder, that just needlessly consumes memory off your heap, putting pressure on your GC. Write this instead:

StringBuilder x = new StringBuilder("a");
x.append(args.length);
x.append("b");

if (args.length == 1);
    x.append(args[0]);

Takeaway

In the above example, it is probably completely irrelevant if you’re using explicit StringBuilder instances, or if you rely on the Java compiler creating implicit instances for you. But remember, we’re in the N.O.P.E. branch. Every CPU cycle that we’re wasting on something as stupid as GC or allocating a StringBuilder‘s default capacity, we’re wasting N x O x P times.

As a rule of thumb, always use a StringBuilder rather than the + operator. And if you can, keep the StringBuilder reference across several methods, if your String is more complex to build. This is what jOOQ does when you generate a complex SQL statement. There is only one StringBuilder that “traverses” your whole SQL AST (Abstract Syntax Tree)

And for crying out loud, if you still have StringBuffer references, do replace them by StringBuilder. You really hardly ever need to synchronize on a string being created.

2. Avoid regular expressions

Regular expressions are relatively cheap and convenient. But if you’re in the N.O.P.E. branch, they’re about the worst thing you can do. If you absolutely must use regular expressions in computation-intensive code sections, at least cache the Pattern reference instead of compiling it afresh all the time:

static final Pattern HEAVY_REGEX = 
    Pattern.compile("(((X)*Y)*Z)*");

But if your regular expression is really silly like

String[] parts = ipAddress.split("\\.");

… then you really better resort to ordinary char[] or index-based manipulation. For example this utterly unreadable loop does the same thing:

int length = ipAddress.length();
int offset = 0;
int part = 0;
for (int i = 0; i < length; i++) {
    if (i == length - 1 || 
            ipAddress.charAt(i + 1) == '.') {
        parts[part] = 
            ipAddress.substring(offset, i + 1);
        part++;
        offset = i + 2;
    }
}

… which also shows why you shouldn’t do any premature optimisation. Compared to the split() version, this is unmaintainable.

Challenge: The clever ones among your readers might find even faster algorithms.

Takeaway

Regular expressions are useful, but they come at a price. If you’re deep down in a N.O.P.E. branch, you must avoid regular expressions at all costs. Beware of a variety of JDK String methods, that use regular expressions, such as String.replaceAll(), or String.split().

Use a popular library like Apache Commons Lang instead, for your String manipulation.

3. Do not use iterator()

Now, this advice is really not for general use-cases, but only applicable deep down in a N.O.P.E. branch. Nonetheless, you should think about it. Writing Java-5 style foreach loops is convenient. You can just completely forget about looping internals, and write:

for (String value : strings) {
    // Do something useful here
}

However, every time you run into this loop, if strings is an Iterable, you will create a new Iterator instance. If you’re using an ArrayList, this is going to be allocating an object with 3 ints on your heap:

private class Itr implements Iterator<E> {
    int cursor;
    int lastRet = -1;
    int expectedModCount = modCount;
    // ...

Instead, you can write the following, equivalent loop and “waste” only a single int value on the stack, which is dirt cheap:

int size = strings.size();
for (int i = 0; i < size; i++) {
    String value : strings.get(i);
    // Do something useful here
}

… or, if your list doesn’t really change, you might even operate on an array version of it:

for (String value : stringArray) {
    // Do something useful here
}

Takeaway

Iterators, Iterable, and the foreach loop are extremely useful from a writeability and readability perspective, as well as from an API design perspective. However, they create a small new instance on the heap for each single iteration. If you run this iteration many many times, you want to make sure to avoid creating this useless instance, and write index-based iterations instead.

Discussion

Some interesting disagreement about parts of the above (in particular replacing Iterator usage by access-by-index) has been discussed on Reddit here.

4. Don’t call that method

Some methods are simple expensive. In our N.O.P.E. branch example, we don’t have such a method at the leaf, but you may well have one. Let’s assume your JDBC driver needs to go through incredible trouble to calculate the value of ResultSet.wasNull(). Your homegrown SQL framework code might look like this:

if (type == Integer.class) {
    result = (T) wasNull(rs, 
        Integer.valueOf(rs.getInt(index)));
}

// And then...
static final <T> T wasNull(ResultSet rs, T value) 
throws SQLException {
    return rs.wasNull() ? null : value;
}

This logic will now call ResultSet.wasNull() every time you get an int from the result set. But the getInt() contract reads:

Returns: the column value; if the value is SQL NULL, the value returned is 0

Thus, a simple, yet possibly drastic improvement to the above would be:

static final <T extends Number> T wasNull(
    ResultSet rs, T value
) 
throws SQLException {
    return (value == null || 
           (value.intValue() == 0 && rs.wasNull())) 
        ? null : value;
}

So, this is a no-brainer:

Takeaway

Don’t call expensive methods in an algorithms “leaf nodes”, but cache the call instead, or avoid it if the method contract allows it.

5. Use primitives and the stack

The above example is from jOOQ, which uses a lot of generics, and thus is forced to use wrapper types for byte, short, int, and long – at least before generics will be specialisable in Java 10 and project Valhalla. But you may not have this constraint in your code, so you should take all measures to replace:

// Goes to the heap
Integer i = 817598;

… by this:

// Stays on the stack
int i = 817598;

Things get worse when you’re using arrays:

// Three heap objects!
Integer[] i = { 1337, 424242 };

… by this:

// One heap object.
int[] i = { 1337, 424242 };

Takeaway

When you’re deep down in your N.O.P.E. branch, you should be extremely wary of using wrapper types. Chances are that you will create a lot of pressure on your GC, which has to kick in all the time to clean up your mess.

A particularly useful optimisation might be to use some primitive type and create large, one-dimensional arrays of it, and a couple of delimiter variables to indicate where exactly your encoded object is located on the array.

An excellent library for primitive collections, which are a bit more sophisticated than your average int[] is trove4j, which ships with LGPL.

Exception

There is an exception to this rule: boolean and byte have few enough values to be cached entirely by the JDK. You can write:

Boolean a1 = true; // ... syntax sugar for:
Boolean a2 = Boolean.valueOf(true);

Byte b1 = (byte) 123; // ... syntax sugar for:
Byte b2 = Byte.valueOf((byte) 123);

The same is true for low values of the other integer primitive types, including char, short, int, long.

But only if you’re auto-boxing them, or calling TheType.valueOf(), not when you call the constructor!

Never call the constructor on wrapper types, unless you really want a new instance

This fact can also help you write a sophisticated, trolling April Fool’s joke for your co-workers

Off heap

Of course, you might also want to experiment with off-heap libraries, although they’re more of a strategic decision, not a local optimisation.

An interesting article on that subject by Peter Lawrey and Ben Cotton is: OpenJDK and HashMap… Safely Teaching an Old Dog New (Off-Heap!) Tricks

6. Avoid recursion

Modern functional programming languages like Scala encourage the use of recursion, as they offer means of optimising tail-recursing algorithms back into iterative ones. If your language supports such optimisations, you might be fine. But even then, the slightest change of algorithm might produce a branch that prevents your recursion from being tail-recursive. Hopefully the compiler will detect this! Otherwise, you might be wasting a lot of stack frames for something that might have been implemented using only a few local variables.

Takeaway

There’s not much to say about this apart from: Always prefer iteration over recursion when you’re deep down the N.O.P.E. branch

7. Use entrySet()

When you want to iterate through a Map, and you need both keys and values, you must have a very good reason to write the following:

for (K key : map.keySet()) {
    V value : map.get(key);
}

… rather than the following:

for (Entry<K, V> entry : map.entrySet()) {
    K key = entry.getKey();
    V value = entry.getValue();
}

When you’re in the N.O.P.E. branch, you should be wary of maps anyway, because lots and lots of O(1) map access operations are still lots of operations. And the access isn’t free either. But at least, if you cannot do without maps, use entrySet() to iterate them! The Map.Entry instance is there anyway, you only need to access it.

Takeaway

Always use entrySet() when you need both keys and values during map iteration.

8. Use EnumSet or EnumMap

There are some cases where the number of possible keys in a map is known in advance – for instance when using a configuration map. If that number is relatively small, you should really consider using EnumSet or EnumMap, instead of regular HashSet or HashMap instead. This is easily explained by looking at EnumMap.put():

private transient Object[] vals;

public V put(K key, V value) {
    // ...
    int index = key.ordinal();
    vals[index] = maskNull(value);
    // ...
}

The essence of this implementation is the fact that we have an array of indexed values rather than a hash table. When inserting a new value, all we have to do to look up the map entry is ask the enum for its constant ordinal, which is generated by the Java compiler on each enum type. If this is a global configuration map (i.e. only one instance), the increased access speed will help EnumMap heavily outperform HashMap, which may use a bit less heap memory, but which will have to run hashCode() and equals() on each key.

Takeaway

Enum and EnumMap are very close friends. Whenever you use enum-like structures as keys, consider actually making those structures enums and using them as keys in EnumMap.

9. Optimise your hashCode() and equals() methods

If you cannot use an EnumMap, at least optimise your hashCode() and equals() methods. A good hashCode() method is essential because it will prevent further calls to the much more expensive equals() as it will produce more distinct hash buckets per set of instances.

In every class hierarchy, you may have popular and simple objects. Let’s have a look at jOOQ’s org.jooq.Table implementations.

The simplest and fastest possible implementation of hashCode() is this one:

// AbstractTable, a common Table base implementation:

@Override
public int hashCode() {

    // [#1938] This is a much more efficient hashCode()
    // implementation compared to that of standard
    // QueryParts
    return name.hashCode();
}

… where name is simply the table name. We don’t even consider the schema or any other property of the table, as the table names are usually distinct enough across a database. Also, the name is a string, so it has already a cached hashCode() value inside.

The comment is important, because AbstractTable extends AbstractQueryPart, which is a common base implementation for any AST (Abstract Syntax Tree) element. The common AST element does not have any properties, so it cannot make any assumptions an optimised hashCode() implementation. Thus, the overridden method looks like this:

// AbstractQueryPart, a common AST element
// base implementation:

@Override
public int hashCode() {
    // This is a working default implementation. 
    // It should be overridden by concrete subclasses,
    // to improve performance
    return create().renderInlined(this).hashCode();
}

In other words, the whole SQL rendering workflow has to be triggered to calculate the hash code of a common AST element.

Things get more interesting with equals()

// AbstractTable, a common Table base implementation:

@Override
public boolean equals(Object that) {
    if (this == that) {
        return true;
    }

    // [#2144] Non-equality can be decided early, 
    // without executing the rather expensive
    // implementation of AbstractQueryPart.equals()
    if (that instanceof AbstractTable) {
        if (StringUtils.equals(name, 
            (((AbstractTable<?>) that).name))) {
            return super.equals(that);
        }

        return false;
    }

    return false;
}

First thing: Always (not only in a N.O.P.E. branch) abort every equals() method early, if:

  • this == argument
  • this "incompatible type" argument

Note that the latter condition includes argument == null, if you’re using instanceof to check for compatible types. We’ve blogged about this before in 10 Subtle Best Practices when Coding Java.

Now, after aborting comparison early in obvious cases, you might also want to abort comparison early when you can make partial decisions. For instance, the contract of jOOQ’s Table.equals() is that for two tables to be considered equal, they must be of the same name, regardless of the concrete implementation type. For instance, there is no way these two items can be equal:

  • com.example.generated.Tables.MY_TABLE
  • DSL.tableByName("MY_OTHER_TABLE")

If the argument cannot be equal to this, and if we can check that easily, let’s do so and abort if the check fails. If the check succeeds, we can still proceed with the more expensive implementation from super. Given that most objects in the universe are not equal, we’re going to save a lot of CPU time by shortcutting this method.

some objects are more equal than others

In the case of jOOQ, most instances are really tables as generated by the jOOQ source code generator, whose equals() implementation is even further optimised. The dozens of other table types (derived tables, table-valued functions, array tables, joined tables, pivot tables, common table expressions, etc.) can keep their “simple” implementation.

10. Think in sets, not in individual elements

Last but not least, there is a thing that is not Java-related but applies to any language. Besides, we’re leaving the N.O.P.E. branch as this advice might just help you move from O(N3) to O(n log n), or something like that.

Unfortunately, many programmers think in terms of simple, local algorithms. They’re solving a problem step by step, branch by branch, loop by loop, method by method. That’s the imperative and/or functional programming style. While it is increasingly easy to model the “bigger picture” when going from pure imperative to object oriented (still imperative) to functional programming, all these styles lack something that only SQL and R and similar languages have:

Declarative programming.

In SQL (and we love it, as this is the jOOQ blog) you can declare the outcome you want to get from your database, without making any algorithmic implications whatsoever. The database can then take all the meta data available into consideration (e.g. constraints, keys, indexes, etc.) to figure out the best possible algorithm.

In theory, this has been the main idea behind SQL and relational calculus from the beginning. In practice, SQL vendors have implemented highly efficient CBOs (Cost-Based Optimisers) only since the last decade, so stay with us in the 2010’s when SQL will finally unleash its full potential (it was about time!)

But you don’t have to do SQL to think in sets. Sets / collections / bags / lists are available in all languages and libraries. The main advantage of using sets is the fact that your algorithms will become much much more concise. It is so much easier to write:

SomeSet INTERSECT SomeOtherSet

rather than:

// Pre-Java 8
Set result = new HashSet();
for (Object candidate : someSet)
    if (someOtherSet.contains(candidate))
        result.add(candidate);

// Even Java 8 doesn't really help
someSet.stream()
       .filter(someOtherSet::contains)
       .collect(Collectors.toSet());

Some may argue that functional programming and Java 8 will help you write easier, more concise algorithms. That’s not necessarily true. You can translate your imperative Java-7-loop into a functional Java-8 Stream collection, but you’re still writing the very same algorithm. Writing a SQL-esque expression is different. This…

SomeSet INTERSECT SomeOtherSet

… can be implemented in 1000 ways by the implementation engine. As we’ve learned today, perhaps it is wise to transform the two sets into EnumSet automatically, before running the INTERSECT operation. Perhaps we can parallelise this INTERSECT without making low-level calls to Stream.parallel()

Conclusion

In this article, we’ve talked about optimisations done on the N.O.P.E. branch, i.e. deep down in a high-complexity algorithm. In our case, being the jOOQ developers, we have interest in optimising our SQL generation:

  • Every query is generated only on a single StringBuilder
  • Our templating engine actually parses characters, instead of using regular expressions
  • We use arrays wherever we can, especially when iterating over listeners
  • We stay clear of JDBC methods that we don’t have to call
  • etc…

jOOQ is at the “bottom of the food chain”, because it’s the (second-)last API that is being called by our customers’ applications before the call leaves the JVM to enter the DBMS. Being at the bottom of the food chain means that every line of code that is executed in jOOQ might be called N x O x P times, so we must optimise eagerly.

Your business logic is not deep down in the N.O.P.E. branch. But your own, home-grown infrastructure logic may be (custom SQL frameworks, custom libraries, etc.) Those should be reviewed according to the rules that we’ve seen today. For instance, using Java Mission Control or any other profiler.

Liked this article?

If you can’t go and profile your application right now, you might enjoy reading any of these articles instead:

Still Using Windows 3.1? So why stick to SQL-92?


We’ve been blogging a lot about the merits of modern SQL on the jOOQ blog. Specifically, window functions are one of the most fascinating features. But there are many many others.

Markus Winand, author of the popular book SQL Performance Explained has recently given a very well-researched talk about modern SQL. We particularly like his headline:

Should this wonderful presentation have convinced you to buy a copy of SQL Performance Explained (our review here), do not forget to enter the “jOOQ” promo code for an exclusive 10% discount!

Top 5 Use-Cases For Nested Types


There has been an interesting discussion on reddit, the other day Static Inner Classes. When is it too much?

First, let’s review a little bit of basic historic Java knowledge. Java-the-language offers four levels of nesting classes, and by “Java-the-language”, I mean that these constructs are mere “syntax sugar”. They don’t exist in the JVM, which only knows ordinary classes.

(Static) Nested classes

class Outer {
    static class Inner {
    }
}

In this case, Inner is completely independent of Outer, except for a common, shared namespace.

Inner classes

class Outer {
    class Inner {
    }
}

In this case, Inner instances have an implicit reference to their enclosing Outer instance. In other words, there can be no Inner instance without an associated Outer instance.

The Java way of creating such an instance is this:

Outer.Inner yikes = new Outer().new Inner();

What looks totally awkward makes a lot of sense. Think about creating an Inner instance somewhere inside of Outer:

class Outer {
    class Inner {
    }

    void somewhereInside() {
        // We're already in the scope of Outer.
        // We don't have to qualify Inner explicitly.
        Inner aaahOK;

        // This is what we're used to writing.
        aaahOK = new Inner();

        // As all other locally scoped methods, we can
        // access the Inner constructor by 
        // dereferencing it from "this". We just
        // hardly ever write "this"
        aaahOK = this.new Inner();
    }
}

Note that much like the public or abstract keywords, the static keyword is implicit for nested interfaces. While the following hypothetical syntax might look familiar at first sight…:

class Outer {
    <non-static> interface Inner {
        default void doSomething() {
            Outer.this.doSomething();
        }
    }

    void doSomething() {}
}

… it is not possible to write the above. Apart from the lack of a <non-static> keyword, there don’t seem to be any obvious reason why “inner interfaces” shouldn’t be possible. I’d suspect the usual – there must be some really edge-casey caveat related to backwards-compatibility and/or multiple inheritance that prevents this.

Local classes

class Outer {
    void somewhereInside() {
        class Inner {
        }
    }
}

Local classes are probably one of the least known features in Java, as there is hardly any use for them. Local classes are named types whose scope extends only to the enclosing method. Obvious use-cases are when you want to reuse such a type several times within that method, e.g. to construct several similar listeners in a JavaFX application.

Anonymous classes

class Outer {
    Serializable dummy = new Serializable() {};
}

Anonymous classes are subtypes of another type with only one single instance.

Top 5 Use-Cases For Nested Classes

All of anonymous, local, and inner classes keep a reference to their enclosing instance, if they’re not defined in a static context. This may cause a lot of trouble if you let instances of these classes leak outside of their scope. Read more about that trouble in our article: Don’t be Clever: The Double Curly Braces Anti Pattern.

Often, however, you do want to profit from that enclosing instance. It can be quite useful to have some sort of “message” object that you can return without disclosing the actual implementation:

class Outer {

    // This implementation is private ...
    private class Inner implements Message {
        @Override
        public void getMessage() {
            Outer.this.someoneCalledMe();
        }
    }

    // ... but we can return it, being of
    // type Message
    Message hello() {
        return new Inner();
    }

    void someoneCalledMe() {}
}

With (static) nested classes, however, there is no enclosing scope as the Inner instance is completely independent of any Outer instance. So what’s the point of using such a nested class, rather than a top-level type?

1. Association with the outer type

If you want to communicate to the whole world, hey, this (inner) type is totally related to this (outer) type, and doesn’t make sense on its own, then you can nest the types. This has been done with Map and Map.Entry, for instance:

public interface Map<K, V> {
    interface Entry<K, V> {
    }
}

2. Hiding from the outside of the outer type

If package (default) visibility isn’t enough for your types you can create private static classes that are available only to their enclosing type and to all other nested types of the enclosing type. This is really the main use-case for static nested classes.

class Outer {
    private static class Inner {
    }
}

class Outer2 {
    Outer.Inner nope;
}

3. Protected types

This is really a very rare use-case, but sometimes, within a class hierarchy, you need types that you want to make available only to subtypes of a given type. This is a use-case for protected static classes:

class Parent {
    protected static class OnlySubtypesCanSeeMe {
    }

    protected OnlySubtypesCanSeeMe someMethod() {
        return new OnlySubtypesCanSeeMe();
    }
}

class Child extends Parent {
    OnlySubtypesCanSeeMe wow = someMethod();
}

4. To emulate modules

Unlike Ceylon, Java doesn’t have first-class modules. With Maven or OSGi, it is possible to add some modular behaviour to Java’s build (Maven) or runtime (OSGi) environments, but if you want to express modules in code, this isn’t really possible.

However, you can establish modules by convention by using static nested classes. Let’s look at the java.util.stream package. We could consider it a module, and within this module, we have a couple of “sub-modules”, or groups of types, such as the internal java.util.stream.Nodes class, which roughly looks like this:

final class Nodes {
    private Nodes() {}
    private static abstract class AbstractConcNode {}
    static final class ConcNode {
        static final class OfInt {}
        static final class OfLong {}
    }
    private static final class FixedNodeBuilder {}
    // ...
}

Some of this Nodes stuff is available to all of the java.util.stream package, so we might say that the way this is written, we have something like:

  • a synthetic java.util.stream.nodes sub-package, visible only to the java.util.stream “module”
  • a couple of java.util.stream.nodes.* types, visible also only to the java.util.stream “module”
  • a couple of “top-level” functions (static methods) in the synthetic java.util.stream.nodes package

Looks a lot like Ceylon, to me!

5. Cosmetic reasons

The last bit is rather boring. Or some may find it interesting. It’s about taste, or ease of writing things. Some classes are just so small and unimportant, it’s just easier to write them inside of another class. Saves you a .java file. Why not.

Conclusion

In times of Java 8, thinking about the very old features of Java the language might not prove to be extremely exciting. Static nested classes are a well-understood tool for a couple of niche use-cases.

The takeaway from this article, however, is this. Every time you nest a class, be sure to make it static if you don’t absolutely need a reference to the enclosing instance. You never know when that reference is blowing up your application in production.

You Will Regret Applying Overloading with Lambdas!


Writing good APIs is hard. Extremely hard. You have to think of an incredible amount of things if you want your users to love your API. You have to find the right balance between:

  1. Usefulness
  2. Usability
  3. Backward compatibility
  4. Forward compatibility

We’ve blogged about this topic before, in our article: How to Design a Good, Regular API. Today, we’re going to look into how…

Java 8 changes the rules

Yes!

Overloading is a nice tool to provide covenience in two dimensions:

  • By providing argument type alternatives
  • By providing argument default values

Examples for the above from the JDK include:

public class Arrays {

    // Argument type alternatives
    public static void sort(int[] a) { ... }
    public static void sort(long[] a) { ... }

    // Argument default values
    public static IntStream stream(int[] array) { ... }
    public static IntStream stream(int[] array, 
        int startInclusive, 
        int endExclusive) { ... }
}

The jOOQ API is obviously full of such convenience. As jOOQ is a DSL for SQL, we might even abuse a little bit:

public interface DSLContext {
    <T1> SelectSelectStep<Record1<T1>> 
        select(SelectField<T1> field1);

    <T1, T2> SelectSelectStep<Record2<T1, T2>> 
        select(SelectField<T1> field1, 
               SelectField<T2> field2);

    <T1, T2, T3> SelectSelectStep<Record3<T1, T2, T3>> s
        select(SelectField<T1> field1, 
               SelectField<T2> field2, 
               SelectField<T3> field3);

    <T1, T2, T3, T4> SelectSelectStep<Record4<T1, T2, T3, T4>> 
        select(SelectField<T1> field1, 
               SelectField<T2> field2, 
               SelectField<T3> field3, 
               SelectField<T4> field4);

    // and so on...
}

Languages like Ceylon take this idea of convenience one step further by claiming that the above is the only reasonable reason why overloading is be used in Java. And thus, the creators of Ceylon have completely removed overloading from their language, replacing the above by union types and actual default values for arguments. E.g.

// Union types
void sort(int[]|long[] a) { ... }

// Default argument values
IntStream stream(int[] array,
    int startInclusive = 0,
    int endInclusive = array.length) { ... }

Read “Top 10 Ceylon Language Features I Wish We Had In Java” for more information about Ceylon.

In Java, unfortunately, we cannot use union types or argument default values. So we have to use overloading to provide our API consumers with convenience methods.

If your method argument is a functional interface, however, things changed drastically between Java 7 and Java 8, with respect to method overloading. An example is given here from JavaFX.

JavaFX’s “unfriendly” ObservableList

JavaFX enhances the JDK collection types by making them “observable”. Not to be confused with Observable, a dinosaur type from the JDK 1.0 and from pre-Swing days.

JavaFX’s own Observable essentially looks like this:

public interface Observable {
  void addListener(InvalidationListener listener);
  void removeListener(InvalidationListener listener);
}

And luckily, this InvalidationListener is a functional interface:

@FunctionalInterface
public interface InvalidationListener {
  void invalidated(Observable observable);
}

This is great, because we can do things like:

Observable awesome = 
    FXCollections.observableArrayList();
awesome.addListener(fantastic -> splendid.cheer());

(notice how I’ve replaced foo/bar/baz with more cheerful terms. We should all do that. Foo and bar are so 1970)

Unfortunately, things get more hairy when we do what we would probably do, instead. I.e. instead of declaring an Observable, we’d like that to be a much more useful ObservableList:

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener(fantastic -> splendid.cheer());

But now, we get a compilation error on the second line:

awesome.addListener(fantastic -> splendid.cheer());
//      ^^^^^^^^^^^ 
// The method addListener(ListChangeListener<? super String>) 
// is ambiguous for the type ObservableList<String>

Because, essentially…

public interface ObservableList<E> 
extends List<E>, Observable {
    void addListener(ListChangeListener<? super E> listener);
}

and…

@FunctionalInterface
public interface ListChangeListener<E> {
    void onChanged(Change<? extends E> c);
}

Now again, before Java 8, the two listener types were completely unambiguously distinguishable, and they still are. You can easily call them by passing a named type. Our original code would still work if we wrote:

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
InvalidationListener hearYe = 
    fantastic -> splendid.cheer();
awesome.addListener(hearYe);

Or…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener((InvalidationListener) 
    fantastic -> splendid.cheer());

Or even…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener((Observable fantastic) -> 
    splendid.cheer());

All of these measures will remove ambiguity. But frankly, lambdas are only half as cool if you have to explicitly type the lambda, or the argument types. We have modern IDEs that can perform autocompletion and help infer types just as much as the compiler itself.

Imagine if we really wanted to call the other addListener() method, the one that takes a ListChangeListener. We’d have to write any of

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// Agh. Remember that we have to repeat "String" here
ListChangeListener<String> hearYe = 
    fantastic -> splendid.cheer();
awesome.addListener(hearYe);

Or…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// Agh. Remember that we have to repeat "String" here
awesome.addListener((ListChangeListener<String>) 
    fantastic -> splendid.cheer());

Or even…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// WTF... "extends" String?? But that's what this thing needs...
awesome.addListener((Change<? extends String> fantastic) -> 
    splendid.cheer());

Overload you shan’t. Be wary you must.

API design is hard. It was hard before, it has gotten harder now. With Java 8, if any of your API methods’ arguments are a functional interface, think twice about overloading that API method. And once you’ve concluded to proceed with overloading, think again, a third time whether this is really a good idea.

Not convinced? Have a close look at the JDK. For instance the java.util.stream.Stream type. How many overloaded methods do you see that have the same number of functional interface arguments, which again take the same number of method arguments (as in our previous addListener() example)?

Zero.

There are overloads where overload argument numbers differ. For instance:

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);

<R, A> R collect(Collector<? super T, A, R> collector);

You will never have any ambiguity when calling collect().

But when the argument numbers do not differ, and neither do the arguments’ own method argument numbers, the method names are different. For instance:

<R> Stream<R> map(Function<? super T, ? extends R> mapper);
IntStream mapToInt(ToIntFunction<? super T> mapper);
LongStream mapToLong(ToLongFunction<? super T> mapper);
DoubleStream mapToDouble(ToDoubleFunction<? super T> mapper);

Now, this is super annoying at the call site, because you have to think in advance what method you have to use based on a variety of involved types.

But it’s really the only solution to this dilemma. So, remember:

You Will Regret Applying Overloading with Lambdas!

Did you like this article? You might also like:

How to Translate SQL GROUP BY and Aggregations to Java 8


I couldn’t resist. I have read this question by Hugo Prudente on Stack Overflow. And I knew there had to be a better way than what the JDK has to offer.

The question reads:

I’m looking for a lambda to refine the data already retrieved. I have a raw resultset, if the user do not change the date I want use java’s lambda to group by the results for then. And I’m new to lambdas with java.

The lambda I’m looking for works simliar to this query.

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

SQL is declarative. Functional programming is not.

Before we go on with this discussion, let’s establish a very important fact. SQL is a completely declarative language. Functional (or “functional-ish”, to keep the Haskell-aficionados at peace) programming languages like Java 8 are not declarative. While expressing data transformation algorithms using functions is much more concise than expressing them using objects, or worse, using imperative instructions, you’re still explicitly expressing the algorithm.

When you write SQL, you don’t write any algorithm. You just describe the result you want to have. The SQL engine’s optimiser will figure out the algorithm for you – e.g. based on the fact that you may have an index on Z but not on W or on (Z, W).

While simple examples like these can easily be implemented using Java 8, you will quickly run into Java’s limitations, once you need to do more complex reporting.

Of course, as we’ve blogged before, the optimum is reached when you combine SQL and functional programming.

How can this be written in Java 8?

There are a variety of ways to do it. The essence is to understand all the participants in such a transformation. And no matter if you find this easy or hard, suitable for Java 8 or inadequate, thinking about the different, lesser-known parts of new Stream API is certainly worth the exercise.

The main participants here are:

  • Stream: If you’re using JDK 8 libraries, then the new java.util.stream.Stream type will be your first choice.
  • Collector: The JDK provides us with a rather low-level and thus very powerful new API for data aggregation (also known as “reduction”). This API is summarised by the new java.util.stream.Collector type, a new type from which we have heard only little so far in the blogosphere

Disclaimer

Some of the code displayed here might not work in your favourite IDE. Unfortunately, even if Java 7 reaches its end of life, all major IDEs (Eclipse, IntelliJ, NetBeans), and even the javac compiler still have quite a few bugs related to the combination of generic type inference and lambda expressions. Stay tuned until those bugs are fixed! And report any bug you discover. We’ll all thank you for it!

Let’s go!

Let’s review our SQL statement:

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

In terms of the Stream API, the table itself is the Stream. Let’s just assume that we have a “table type” A as such:

class A {
    final int w;
    final int x;
    final int y;
    final int z;

    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }

    @Override
    public String toString() {
        return "A{" +
                "w=" + w +
                ", x=" + x +
                ", y=" + y +
                ", z=" + z +
                '}';
    }
}

You can also add equals() and hashCode() if you must.

We can now easily compose the Stream using Stream.of(), and some sample data:

Stream<A> stream =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5));

Now, the next step is to GROUP BY z, w. The Stream API itself, unfortunately, doesn’t contain such a convenience method. We have to resort to more low-level operations by specifying the more general Stream.collect() operation, and passing a Collector to it that does the grouping. Luckily, a variety of different grouping Collectors are already made available from the Collectors helper class.

So we add that to our stream

Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(...));

jool-logo-blackNow the interesting part starts. How do we specify that we want to group by both A.z and A.w? We need to provide this groupingBy method with a function that can extract something like a SQL tuple from the A type. We could write our own tuple, or we simply use that of jOOλ, a library that we have created and open-sourced to improve our jOOQ integration tests.

The Tuple2 type roughly looks like this:

public class Tuple2<T1, T2> {

    public final T1 v1;
    public final T2 v2;

    public T1 v1() {
        return v1;
    }

    public T2 v2() {
        return v2;
    }

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }
}

public interface Tuple {
    static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
        return new Tuple2<>(v1, v2);
    }
}

It has many more useful features, but these ones will be sufficient for this article.

On a side-note

Why the JDK doesn’t ship with built-in tuples like C#’s or Scala’s escapes me.

Functional programming without tuples is like coffee without sugar: A bitter punch in your face.

Anyway… back on track

So we’re grouping by the (A.z, A.w) tuple, as we would in SQL

Map<Tuple2<Integer, Integer>, List<A>> map =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
    a -> tuple(a.z, a.w)
));

As you can see, this produces a verbose but very descriptive type, a map containing our grouping tuple as its key, and a list of collected table records as its value.

Running the following statement

map.entrySet().forEach(System.out::println);

will yield:

(1, 1)=[A{w=1, x=1, y=1, z=1}, A{w=1, x=2, y=3, z=1}]
(4, 9)=[A{w=9, x=8, y=6, z=4}, A{w=9, x=9, y=7, z=4}]
(5, 2)=[A{w=2, x=3, y=4, z=5}, A{w=2, x=4, y=4, z=5}, A{w=2, x=5, y=5, z=5}]

That’s already quite awesome! In fact, this behaves like the SQL:2011 standard COLLECT() aggregate function, that is also available in Oracle 10g+

Now, instead of actually collecting the A records, we prefer to aggregate the individual values of x and y. The JDK provides us with a couple of interesting new types, e.g. the java.util.IntSummaryStatistics, which is available for convenience again from the Collectors type via Collectors.summarizingInt().

On a side note

For my taste, this sledge-hammer data aggregation technique is a bit quirky. The JDK libraries have been left intentionally low level and verbose, perhaps to keep the library footprint small, or to prevent “horrible” consequences when in 5-10 years (after the release of JDK 9 and 10), it becomes obvious that some features may have been added prematurely.

At the same time, there is this all-or-nothing IntSummaryStatistics, that blindly aggregates these popular aggregation values for your collection:

  • COUNT(*)
  • SUM()
  • MIN()
  • MAX()

and obviously, once you have SUM() and COUNT(*), you also have AVG() = SUM() / COUNT(*). So that’s going to be the Java way. IntSummaryStatistics.

In case you were wondering, the SQL:2011 standard specifies these aggregate functions:

AVG, MAX, MIN, SUM, EVERY, ANY, SOME, COUNT, STDDEV_POP, STDDEV_SAMP, VAR_SAMP, VAR_POP, COLLECT, FUSION, INTERSECTION, COVAR_POP, COVAR_SAMP, CORR, REGR_SLOPE, REGR_INTERCEPT, REGR_COUNT, REGR_R2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SYY, REGR_SXY, PERCENTILE_CONT, PERCENTILE_DISC, ARRAY_AGG

And obviously there are many other, vendor-specific aggregate and window functions in SQL. We’ve blogged about them all:

True, MIN, MAX, SUM, COUNT, AVG are certainly the most popular ones. But it would’ve been nicer if they hadn’t been included in these default aggregation types, but made available in a much more composable way.

Anyway… back on track

If you want to stay low-level and use mostly JDK API, you can use the following technique to implement aggregation over two columns:

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map = Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
    a -> tuple(a.z, a.w),
    Collector.of(

        // When collecting, we'll aggregate data
        // into two IntSummaryStatistics for x and y
        () -> tuple(new IntSummaryStatistics(), 
                    new IntSummaryStatistics()),

        // The accumulator will simply take
        // new t = (x, y) values
        (r, t) -> {
            r.v1.accept(t.x);
            r.v2.accept(t.y);
        },

        // The combiner will merge two partial
        // aggregations, in case this is executed
        // in parallel
        (r1, r2) -> {
            r1.v1.combine(r2.v1);
            r1.v2.combine(r2.v2);

            return r1;
        }
    )
));

map.entrySet().forEach(System.out::println);

The above would now print

(1, 1)=(IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2}, 
        IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3})
(4, 9)=(IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9}, 
        IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7})
(5, 2)=(IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5}, 
        IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})

But obviously, no one will want to write that much code. The same thing can be achieved with jOOλ with much less code

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map =

// Seq is like a Stream, but sequential only,
// and with more features
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))

// Seq.groupBy() is just short for 
// Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),

    // ... because once you have tuples, 
    // why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
));

What you see above is probably as close as it gets to the original, very simmple SQL statement:

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

The interesting part here is the fact that we have what we call “tuple-collectors”, a Collector that collects data into tuples of aggregated results for any degree of the tuple (up to 8). Here’s the code for Tuple.collectors:

// All of these generics... sheesh!
static <T, A1, A2, D1, D2> 
       Collector<T, Tuple2<A1, A2>, Tuple2<D1, D2>> 
collectors(
    Collector<T, A1, D1> collector1
  , Collector<T, A2, D2> collector2
) {
    return Collector.of(
        () -> tuple(
            collector1.supplier().get()
          , collector2.supplier().get()
        ),
        (a, t) -> {
            collector1.accumulator().accept(a.v1, t);
            collector2.accumulator().accept(a.v2, t);
        },
        (a1, a2) -> tuple(
            collector1.combiner().apply(a1.v1, a2.v1)
          , collector2.combiner().apply(a1.v2, a2.v2)
        ),
        a -> tuple(
            collector1.finisher().apply(a.v1)
          , collector2.finisher().apply(a.v2)
        )
    );
}

Where the Tuple2<D1, D2> is the aggregation result type that we derive from collector1 (which provides D1) and from collector2 (which provides D2).

That’s it. We’re done!

Conclusion

Java 8 is a first step towards functional programming in Java. Using Streams and lambda expressions, we can already achieve quite a bit. The JDK APIs, however, are extremely low level and the experience when using IDEs like Eclipse, IntelliJ, or NetBeans can still be a bit frustrating. While writing this article (and adding the Tuple.collectors() method), I have reported around 10 bugs to the different IDEs. Some javac compiler bugs are not yet fixed, prior to JDK 1.8.0_40 ea. In other words:

I just keep throwing generic type parameters at the darn thing until the compiler stops bitching at me

But we’re on a good path. I trust that more useful API will ship with JDK 9 and especially with JDK 10, when all of the above will hopefully profit from the new value types and generic type specialization.

jool-logo-blackAnd, of course, if you haven’t already, download and contribute to jOOλ here!

We have created jOOλ to add the missing pieces to the JDK libraries. If you want to go all in on functional programming, i.e. when your vocabulary includes hipster terms (couldn’t resist) like monads, monoids, functors, and all that, we suggest you skip the JDK’s Streams and jOOλ entirely, and go download functionaljava by Mark Perry or javaslang by Daniel Dietrich

jOOQ Newsletter: January 21, 2015 – Groovy and Open Source – jOOQ and the strong Swiss Franc


Subscribe to this newsletter here

Tweet of the Day

Today, we’re very happy to have “spied” on our users as we can now show you a whole Tweet Conversation of the Day

RxJooq, or reactive jOOQ. How does that sound!? Yes, jOOQ is growing to become a hype among SQL and fluent API aficionados. A recent discussion on reddit already puts jOOQ on the same level with Hibernate with more than 10 mentions in answers to the question “Java: What ORM to use”. Our goal has always been for a Java developer to ask themselves at the beginning of a project:

Is this a jOOQ project, or is this a Hibernate project (or both)?

It is too early to announce anything, but at Data Geekery, we’re very interested and thus putting efforts into collaborating with Red Hat to make the jOOQ / Hibernate integration work more seamlessly, so stay tuned for more goodness in that area.

Groovy and Open Source – What it means for us

You may have heard of Pivotal’s recent announcement about their withdrawing sponsorship from the Groovy and Grails ecosystem. This isn’t exactly a surprise to many people as Pivotal’s main focus has shifted towards their PaaS business quite some time ago. The interesting aspect from our perspective is the fact that a whole ecosystem seems to have relied on the benevolence of a single sponsor. Quite a risk!

We think that Open Source should work differently. Open Source is a fine means of offering freemium and (legally) riskless software to potential customers in order to help customers start engaging with a brand. The ultimate vendor goal with Open Source is always upselling. As our valued jOOQ users and jOOQ newsletter and blog readers, we obviously hope that you will eventually understand all the combined SQL value put into jOOQ, and thus upgrade to a commercial jOOQ subscription.

This wasn’t necessarily the case at Pivotal. There is no obvious path from using Groovy (or Grails) to buying Pivotal’s cloud platform solutions. To make things worse, in order to survive, the Groovy platform now depends on a new, arbitrary sponsor whose incentive to sponsor Groovy might be 100% different from Pivotal’s. For the end user, this will not be the same Groovy any more – so it is hard to believe that Groovy will not suffer heavily from any future transition.

We believe that vendors shouldn’t depend on benevolence. We believe that vendors should have a very clear strategy why they’re creating a product, and do everything necessary to satisfy real customer’s needs. So we want to take the opportunity and thank you for being with us, and for making jOOQ (both the Open Source Edition and the Commercial Editions) what it is: A platform valued by both users of Open Source and commercial databases.

More information about our take on Pivotal and Groovy can be found on our blog:

It’s jOONuary! Profit from our 20% Discount Promotion

Speaking of our customers, there has never been a better time to become one!

Your budget for 2015 has been set in stone? You spent too much money on geeky infrastructure during the Holiday Season? Not a problem for your planned jOOQ integration! If you purchase new jOOQ licenses in jOONuary (January 2015), we will offer you a limited-time 20% discount on all price plans. Act quickly!

http://www.jooq.org/joonuary

jOOQ and the Strong Swiss Franc

We’re a Switzerland-based company, and as such are heavily influenced by recent events on the currency exchange markets. The EUR (which is our sale currency) has plummeted almost 20% compared to the CHF (which is our accounting currency).

This affects all of the Swiss export industry, and many companies are starting to take measures. We will not take any measures thus far and continue with our existing EUR-based price model. For our international customers, nothing will change. For our Swiss customers, this means that in addition to the above jOONuary discount, you will now also benefit from a “Euro discount”! Did we say there has never been a better time to become our customer?

jOOQ 3.6 Outlook

The upcoming jOOQ 3.6 will not be less exciting than the previous versions in the least bit. Here is a quick outline of what we’re going to be doing in the upcoming release:

  • SAP HANA support. We’ve been talking to database vendors in the past, and we continue to do so, maintaining good relationships with the technical and community people at the vendor side. This time, the collaboration initiative came from the vendor directly, and we’ve heard them.

    SAP HANA is an emerging cloud SQL and in-memory SQL platform, with a big Java and Scala based tool chain, which constitutes a perfect match for the jOOQ ecosystem. We’re going to support both HANA’s SQL features as well as HANA’s SQLScript features in the jOOQ 3.6 Enterprise Edition. If you’re an SAP HANA user and interested in details, or in a free preview of jOOQ 3.6.0, please contact sales right away. We’re more than happy to provide you with more info.

  • Nested records and tables. One of the SQL standard’s most underestimated features is the capability of nesting records and tables. In a true ORDBMS, tables (or MULTISETs) can be nested any level deep. If your SQL database supports these features, it is very easy to materialise a nested object graph directly in the database, instead of relying on the JOIN-based workarounds provided by modern ORMs.

    Nesting of records can also be very useful when reusing common data structures, such as audit columns (creation_date, creation_user, modification_date, modification_user). JPA supports the @Embedded annotation for this, and we’ll delve into these features as well.

    We believe that true MULTISET support will obsolete our competing products’ most important asset: mapping. Once you can declare all mapping already in SQL, you will no longer miss JPA once you’ve migrated to jOOQ.

  • A new ConverterProvider SPI. Converters are great for supporting custom data types, but having to register them all the time is tedious. What if jOOQ just supported T <-> U conversion right out of the box, for any combination of T and U? We’ll let you register all your favourite converters and jOOQ figures out the conversion path through the converter graph.
  • Even better PL/SQL support. PL/SQL types are ubiquitous, but they are not easily accessible via JDBC, and thus via jOOQ. We’re researching a variety of possibilities of working around JDBC’s limitations to allow you to use your favourite PL/SQL types: BOOLEAN, RECORD types, perhaps even table types.

 

Upcoming jOOQ Events

Have you missed one of our talks and presentations in the recent past? No problem at all, we’re back on the road after a short winter break. Here are all of our upcoming events:

Keep up to date with our own and third-party jOOQ events on our news website:http://www.jooq.org/news.

We’re looking forward to meeting you and to talking about all things Java and SQL!

Open Source Doesn’t Need More Support. It Needs Better Business Models


Jamie Allen, Typesafe‘s Director of Global Services published an interesting point of view on Twitter:

And he’s right of course. We are constantly reminded of the fact that we should support FOSS projects on which we depend. Just recently, Wikipedia had this huge banner on top of it, asking for money, and we probably should. But do we really depend on them, and can we really make a difference? Let us look at the problem from a business perspective.

There isn’t a second Red Hat

About a year ago, there was an extremely interesting article on Tech Crunch by Peter Levine, partner at Andreessen Horowitz who have just invested $40M in Stack Exchange. The article was about Why There Will Never Be Another RedHat: The Economics Of Open Source. It compared Red Hat’s and VMWare’s market capitalisation and revenue with that of Microsoft, Oracle, or Amazon, showing that even Red Hat is a rather insignificant competitor in terms of these size metrics.

Why is this so?

Let’s go back to Groovy: There are probably tens of thousands of Groovy developers out there who have simply downloaded Groovy and then never again interacted with the vendor. It probably wouldn’t even be wrong to say that many developers weren’t aware of Pivotal having been the main sponsor behind Groovy. Sure, Groovy is a strong brand, but it is really “everybody’s brand”, and thus: nobody’s brand. Not being a strong brand, it attracted only techies with language interests (and it is a beautiful language to work with, indeed!)

Now, Pivotal has withdrawn their engagement from Groovy, for completely understandable reasons. Let’s review Jamie’s point of view:

Pivotal’s move to end support of Groovy is a stark reminder that enterprises who depend on FOSS projects should help support them.

Would it have mattered if “we” had supported Groovy?

Perhaps. A Groovy Foundation (similar to the Apache Foundation, or the Eclipse Foundation) might have made Groovy a bit less dependent on Pivotal, and this can still happen. Possibly, a couple of larger companies who depend on Groovy, or Gradle might chime in and become Silver or Gold or Platinum Sponsors, or something like that. Perhaps, Gradleware will seize the opportunity and “buy” Groovy to become THE Groovy company.

But will it work? Does the same work for Typesafe? Can monetising an Open Source language and platform work in times when even C# is now given away for free?

Red Hat can make money off Linux, because Linux is a very complex ecosystem that just requires a support subscription when you’re running it in production. No bank on this planet will ever run a server farm without the vendor promising 24h support with under 1h reaction time. But is the same true for Scala, Groovy? The “critical” work has long been done by the developers. The binaries are built and shipped onto production where operations takes over. Operations couldn’t care less if that binary is built with Groovy, Scala, Java, Ceylon, Kotlin, Fantom, or any of the other gazillion Java alternatives. All operations will ever care about is the JVM, or Weblogic – and the database, of course. And operations is where the long-term subscription money is, not development.

This doesn’t mean that no one should make money from developers. Companies like JetBrains, ZeroTurnaround, or also ourselves, Data Geekery show that it works, on a much smaller scale. But if a company is “selling” a programming language that doesn’t immediately help them upsell their customers to buy their significant other subscriptions, you should be wary as the vendor motivation to produce the programming language product is very unclear – and in the case of Pivotal, “unclear” is not even close to describe the vendor motivation.

Good examples of holistic platform strategies are these, because operations and the end user can immediately drive the decision chain that justifies the language lock-in for the developers:

  • C# -> Visual Studio -> SQL Server -> Azure, etc.
  • Java -> JVM / Weblogic -> Oracle Database -> Oracle Commerce, etc.

OK examples are these, although the upselling potential might not be viable enough to maintain a whole ecosystem. We’ll see how it works:

  • Kotlin -> IntelliJ

Less good examples are these, because the value proposition chain is really not obvious. There is no justification for the language lock-in:

  • Groovy -> Cloud Platform ??
  • Scala -> Reactive Programming ??
  • Ceylon -> RHEL ??

The Business Model

Jamie Allen’s Tweet shows a lot about what’s wrong with many Open Source vendors. While he claims that end users depend on OSS products from their vendors, the opposite is true. The end user can simply fork the OSS product and lead it to a graceful end of life before replacing it. But the vendor really depends on the goodwill and the benevolence of their FOSS communities. The vendor then tries to leverage goodwill to make a weird-sounding upselling between completely unrelated products. This cannot work.

So join us in our endeavours. Make Open Source a business. A viable business, a business driven by the vendor (and by the market, of course). A business that makes sense. A business that involves dual-licensing and reasonable upselling. A business that uses Open Source mainly as a freemium entry point for the actual business.

You can be romantic about F(L)OSS in your heart, that’s OK. But please don’t depend on it. It would be too bad if you don’t succeed, just because you ran out of money from your “sponsors”, because you didn’t care about the business aspect of your product.

Read also 5 lessons for any open source business transitioning to a revenue-based model

Suis-je Groovy? No! What Pivotal’s Decision Means for Open Source Software


Today there was great news in the JVM ecosystem.

Pivotal, the company who is committed to OSS has become a bit less committed:

committed

The reaction in the community were largely summarised by the hashtag #jesuisgroovy:

The interesting part in Pivotal’s announcement is this one:

The decision to conclude its sponsorship of Groovy and Grails is part of Pivotal’s larger strategy to concentrate resources on accelerating both commercial and open source projects that support its growing traction in Platform-as-a-Service, Data, and Agile development. Pivotal has determined that the time is right to let further development of Groovy and Grails be led by other interested parties in the open source community who can best serve the goals of those projects.

The official announcement can be read here.

Groovy is not a viable business (for Pivotal)

In other words, Groovy is not a viable business for Pivotal. And it’s hard to disagree here. Groovy has never been created with any commercial interests. Like many Open Source projects, Groovy was created in order to make something “better”, mostly for the sake of it being better. Of course it was useful as it introduced a lot of nice features into the Java ecosystem at a time before all these new JVM languages popped up. And before all these new JVM languages finally had an effect on Java-the-language itself.

On the other hand, the Groovy website’s rather geeky look-and-feel has never made it seem as though virtually anyone had any commercial interests in the language or the platform for that matter. I’m not trying to be harsh here, Groovy is an awesome language, created with love. But maintaining an Open Source ecosystem is hard work. It costs a lot of money and effort. And in the case of Groovy, it is just very hard to disagree with the fact that there is probably little money to be made out of it.

How to make money out of Open Source

When we moved on from a purely Open Source jOOQ to a dual-licensed one, we were criticised a lot by people who realised that they might fall into the dual-licensing category who no longer gets to ride for free. This was of course a disappointing evolution for those people. We see it as one step forward for a product that doesn’t just want to implement l’art pour l’art. We believe that we’re adding value on a small scale to a select set of customers with real SQL problems, and we want to continue to do so. Thus, commercial interests are now the driving force behind our developments, and dual-licensing is the easiest way to achieve that on our own small scale.

Many of those who had criticised us claimed that we should create a support-based Open Source business model instead (like Pivotal!). In other words: Let “them” pay for support – whoever “they” are. But that is not a viable model in the long run.

We create fishing poles. We don’t want to compete with our customers, the fishermen.

In software, the vendor of some product shouldn’t commoditize the main driver for innovation: The product. They should sell the product and create an ecosystem and a market for consultants that will be much better at applying the product to some concrete customer’s business. It is a win-win-win situation for everyone:

  • Vendors get money from licenses
  • Consultants get money from their specialist knowledge
  • End-users get a better, cheaper solution with a lower cost of ownership

Although, the consultant is always the one whose work is commoditized in the long run as demand for the product increases, and more consultants pop up trying to make money from their consulting business.

Joel Spolsky has written an extremely interesting Strategy Letter on the idea of commoditizing a complementary product (support, in this case) to increase the demand for the primary product (license, in this case). In the case of a PaaS company like Pivotal, however, we can only guess that even the commoditization of a whole programming language and ecosystem is no longer sustainable enough to increase demand for their PaaS offerings.

If that is the truth, then other platforms like Spring are at the stake as well! If Groovy is not sustainable, why should Spring be?

Am I Groovy?

Suis-je Groovy? To get back to the original claim:

No, I’m not Groovy. Groovy and every other piece of Open Source software that does not in any direct way produce a commercial value for both the vendor and the consumer is doomed to fail in the long run. Open Source software has created a tremendous amount of value in our industry. Companies like ourselves wouldn’t be possible if we’d still pay millions for an operating system license. The question is not whether software is free as in beer or as in freedom. The question is whether anyone has any viable commercial interests in making a particular software element cheap or even free. If they don’t, well, the joke will be on you as the vendor might just stop doing it. Open Source or not.

Using Java 8 to Prevent Excessively Wide Logs


Some logs are there to be consumed by machines and kept forever.

Other logs are there just to debug and to be consumed by humans. In the latter case, you often want to make sure that you don’t produce too much logs, especially not too wide logs, as many editors and other tools have problems once line lenghts exceed a certain size (e.g. this Eclipse bug).

String manipulation used to be a major pain in Java, with lots of tedious-to-write loops and branches, etc. No longer with Java 8!

The following truncate method will truncate all lines within a string to a certain length:

public String truncate(String string) {
    return truncate(string, 80);
}

public String truncate(String string, int length) {
    return Seq.of(string.split("\n"))
              .map(s -> StringUtils.abbreviate(s, 400))
              .join("\n");
}

The above example uses jOOλ 0.9.4 and Apache Commons Lang, but you can achieve the same using vanilla Java 8, of course:

public String truncate(String string) {
    return truncate(string, 80);
}

public String truncate(String string, int length) {
    return Stream.of(string.split("\n"))
                 .map(s -> s.substring(0, Math.min(s.length(), length)))
                 .collect(Collectors.joining("\n"));
}

The above when truncating logs to length 10, the above program will produce:

Input

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Output

Lorem ipsum dolor...
incididunt ut lab...
nostrud exercitat...
Duis aute irure d...
fugiat nulla pari...
culpa qui officia...

Happy logging!

Infinite Loops. Or: Anything that Can Possibly Go Wrong, Does.


A wise man once said:

Anything that can possibly go wrong, does

Murphy

Some programmers are wise men, thus a wise programmer once said:

A good programmer is someone who looks both ways before crossing a one-way street.

Doug Linder

In a perfect world, things work as expected and you may think that it is a good idea to keep consuming things until the end. So the following pattern is found all over in every code base:

Java

for (;;) {
    // something
}

C

while (1) {
    // something
}

BASIC

10 something
20 GOTO 10

Want to see proof? Search github for while(true) and check out the number of matches:

https://github.com/search?q=while+true&type=Code

Never use possibly infinite loops

There is a very interesting discussion in computer science around the topic of the “Halting Problem”. The essence of the halting problem as proved by Alan Turing a long time ago is the fact that it is really undecidable. While humans can quickly assess that the following program will never stop:

for (;;) continue;

… and that the following program will always stop:

for (;;) break;

… computers cannot decide on such things, and even very experienced humans might not immediately be able to do so when looking at a more complex algorithm.

Learning by doing

In jOOQ, we have recently learned about the halting problem the hard way: By doing.

Before fixing issue #3696, we worked around a bug (or flaw) in SQL Server’s JDBC driver. The bug resulted in SQLException chains not being reported correctly, e.g. when the following trigger raises several errors:

CREATE TRIGGER Employee_Upd_2  ON  EMPLOYEE FOR UPDATE
AS 
BEGIN

    Raiserror('Employee_Upd_2 Trigger called...',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...1',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...2',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...3',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...4',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...5',16,-1)

END
GO

So, we explicitly consumed those SQLExceptions, such that jOOQ users got the same behaviour for all databases:

consumeLoop: for (;;)
    try {
        if (!stmt.getMoreResults() && 
             stmt.getUpdateCount() == -1)
            break consumeLoop;
    }
    catch (SQLException e) {
        previous.setNextException(e);
        previous = e;
    }

This has worked for most of our customers, as the chain of exceptions thus reported is probably finite, and also probably rather small. Even the trigger example above is not a real-world one, so the number of actual errors reported might be between 1-5.

Did I just say … “probably” ?

As our initial wise men said: The number might be between 1-5. But it might just as well be 1000. Or 1000000. Or worse, infinite. As in the case of issue #3696, when a customer used jOOQ with SQL Azure. So, in a perfect world, there cannot be an infinite number of SQLException reported, but this isn’t a perfect world and SQL Azure also had a bug (probably still does), which reported the same error again and again, eventually leading to an OutOfMemoryError, as jOOQ created a huge SQLException chain, which is probably better than looping infinitely. At least the exception was easy to detect and work around. If the loop ran infinitely, the server might have been completely blocked for all users of our customer.

The fix is now essentially this one:

consumeLoop: for (int i = 0; i < 256; i++)
    try {
        if (!stmt.getMoreResults() && 
             stmt.getUpdateCount() == -1)
            break consumeLoop;
    }
    catch (SQLException e) {
        previous.setNextException(e);
        previous = e;
    }

True to the popular saying:

640 KB ought to be enough for anybody

The only exception

So as we’ve seen before, this embarassing example shows that anything that can possibly go wrong, does. In the context of possibly ininite loops, beware that this kind of bug will take entire servers down.

The Jet Propulsion Laboratory at the California Institute of Technology has made this an essential rule for their coding standards:

Rule 3 (loop bounds)

All loops shall have a statically determinable upper-bound on the maximum number of loop iterations. It shall be possible for a static compliance checking tool to affirm the existence of the bound. An exception is allowed for the use of a single non-terminating loop per task or thread where requests are received and processed. Such a server loop shall be annotated with the C comment: /* @non-terminating@ */.

So, apart from very few exceptions, you should never expose your code to the risk of infinite loops by not providing upper bounds to loop iterations (the same can be said about recursion, btw.)

Conclusion

Go over your code base today and look for any possible while (true), for (;;), do {} while (true); and other statements. Review those statements closely and see if they can halt – e.g. using break, or throw, or return, or continue (an outer loop).

Chances are, that you or someone before you who wrote that code was as naive as we were, believing that…

… oh come on, this will never happen

Because, you know what happens when you think that nothing will happen.

Follow

Get every new post delivered to your Inbox.

Join 2,483 other followers

%d bloggers like this: