Archive | java RSS for this section

jOOQ vs. Slick – Pros and Cons of Each Approach


Every framework introduces a new compromise. A compromise that is introduced because the framework makes some assumptions about how you’d like to interact with your software infrastructure.

An example of where this compromise has struck users recently is the discussion “Are Slick queries generally isomorphic to the SQL queries?“. And, of course, the answer is: No. What appears to be a simple Slick query:

val salesJoin = sales 
      join purchasers 
      join products 
      join suppliers on {
  case (((sale, purchaser), product), supplier) =>
    sale.productId === product.id &&
    sale.purchaserId === purchaser.id &&
    product.supplierId === supplier.id
}

… turns into a rather large monster with tons of derived tables that are totally unnecessary, given the original query (formatting is mine):

select x2.x3, x4.x5, x2.x6, x2.x7 
from (
    select x8.x9 as x10, 
           x8.x11 as x12, 
           x8.x13 as x14, 
           x8.x15 as x7, 
           x8.x16 as x17, 
           x8.x18 as x3, 
           x8.x19 as x20, 
           x21.x22 as x23, 
           x21.x24 as x25, 
           x21.x26 as x6 
    from (
        select x27.x28 as x9,
               x27.x29 as x11, 
               x27.x30 as x13, 
               x27.x31 as x15, 
               x32.x33 as x16, 
               x32.x34 as x18, 
               x32.x35 as x19 
        from (
            select x36."id" as x28, 
                   x36."purchaser_id" as x29, 
                   x36."product_id" as x30, 
                   x36."total" as x31 
            from "sale" x36
        ) x27 
        inner join (
            select x37."id" as x33, 
                   x37."name" as x34, 
                   x37."address" as x35 
	    from "purchaser" x37
        ) x32 
        on 1=1
    ) x8 
    inner join (
        select x38."id" as x22, 
               x38."supplier_id" as x24, 
               x38."name" as x26 
        from "product" x38
    ) x21
    on 1=1
) x2 
inner join (
    select x39."id" as x40, 
           x39."name" as x5, 
           x39."address" as x41 
    from "supplier" x39
) x4 
on ((x2.x14 = x2.x23) 
and (x2.x12 = x2.x17)) 
and (x2.x25 = x4.x40) 
where x2.x7 >= ?

Christopher Vogt, a former Slick maintainer and still actively involved member of the Slick community, explains the above in the following words:

This means that Slick relies on your database’s query optimizer to be able to execute the sql query that Slick produced efficiently. Currently that is not always the case in MySQL

One of the main ideas behind Slick, according to Christopher, is:

Slick is not a DSL that allows you to build exactly specified SQL strings. Slick’s Scala query translation allows for re-use and composition and using Scala as the language to write your queries. It does not allow you to predict the exact sql query, only the semantics and the rough structure.

Slick vs. jOOQ

Since Christopher later on also compared Slick with jOOQ, I allowed myself to chime in and to add my two cents:

From a high level (without actual Slick experience) I’d say that Slick and jOOQ embrace compositionality equally well. I’ve seen crazy queries of several 100s of lines of [jOOQ] SQL in customer code, composed over several methods. You can do that with both APIs.

On the other hand, as Chris said: Slick has a focus on Scala collections, jOOQ on SQL tables.

  • From a conceptual perspective (= in theory), this focus shouldn’t matter.
  • From a type safety perspective, Scala collections are easier to type-check than SQL tables and queries because SQL as a language itself is rather hard to type-check given that the semantics of various of the advanced SQL clauses alter type configurations rather implicitly (e.g. outer joins, grouping sets, pivot clauses, unions, group by, etc.).
  • From a practical perspective, SQL itself is only an approximation of the original relational theories and has attained a life of its own. This may or may not matter to you.

I guess in the end it really boils down to whether you want to reason about Scala collections (queries are better integrated / more idiomatic with your client code) or about SQL tables (queries are better integrated / more idiomatic with your database).

At this point, I’d like to add another two cents to the discussion. Customers don’t buy the product that you’re selling. They never do. In the case of Hibernate, customers and users were hoping to be able to forget SQL forever. The opposite is true. As Gavin King himself (the creator of Hibernate) had told me:

gavin-king

Because customers and users had never listened to Gavin (and to other ORM creators), we now have what many call the object-relational impedance mismatch. A lot of unjustified criticism has been expressed against Hibernate and JPA, APIs which are simply too popular for the limited scope they really cover.

With Slick (or C#’s LINQ, for that matter), a similar mismatch is impeding integrations, if users abuse these tools for what they believe to be a replacement for SQL. Slick does a great job at modelling the relational model directly in the Scala language. This is wonderful if you want to reason about relations just like you reason about collections. But it is not a SQL API. To illustrate how difficult it is to overcome these limitations, you can browse the issue tracker or user group to learn about:

We’ll simply call this:

The Functional-Relational Impedance Mismatch

SQL is much more

Markus Winand (the author of the popular SQL Performance Explained) has recently published a very good presentation about “modern SQL”, an idea that we fully embrace at jOOQ:

We believe that APIs that have been trying to hide the SQL language from general purpose languages like Java, Scala, C# are missing out on a lot of the very nice features that can add tremendous value to your application. jOOQ is an API that fully embraces the SQL language, with all its awesome features (and with all its quirks). You obviously may or may not agree with that.

We’ll leave this article open ended, hoping you’ll chime in to discuss the benefits and caveats of each approach. Of staying close to Scala vs. staying close to SQL.

As a small teaser, however, I’d like to announce a follow-up article showing that there is no such thing as an object-relational impedance mismatch. You (and your ORM) are just not using SQL correctly. Stay tuned!

Thou Shalt Not Name Thy Method “Equals”


(unless you really override Object.equals(), of course).

I’ve stumbled upon a rather curious Stack Overflow question by user Frank:

Why does Java’s Area#equals method not override Object#equals?

Interestingly, there is a Area.equals(Area) method which really takes an Area argument, instead of a Object argument as declared in Object.equals(). This leads to rather nasty behaviour, as discovered by Frank:

@org.junit.Test
public void testEquals() {
    java.awt.geom.Area a = new java.awt.geom.Area();
    java.awt.geom.Area b = new java.awt.geom.Area();
    assertTrue(a.equals(b)); // -> true

    java.lang.Object o = b;
    assertTrue(a.equals(o)); // -> false
}

Technically, it is correct for AWT’s Area to have been implemented this way (as hashCode() isn’t implemented either), but the way Java resolves methods, and the way programmers digest code that has been written like the above code, it is really a terrible idea to overload the equals method.

No static equals, either

These rules also hold true for static equals() methods, such as for instance Apache Commons Lang‘s

ObjectUtils.equals(Object o1, Object o2)

The confusion here arises by the fact that you cannot static-import this equals method:

import static org.apache.commons.lang.ObjectUtils.equals;

When you now type the following:

equals(obj1, obj2);

You will get a compiler error:

The method equals(Object) in the type Object is not applicable for the arguments (…, …)

The reason for this is that methods that are in the scope of the current class and its super types will always shadow anything that you import this way. The following doesn’t work either:

import static org.apache.commons.lang.ObjectUtils.defaultIfNull;

public class Test {
  void test() {
    defaultIfNull(null, null);
    // ^^ compilation error here
  }

  void defaultIfNull() {
  }
}

Details in this Stack Overflow question.

Conclusion

The conclusion is simple. never overload any of the methods declared in Object (overriding is fine, of course). This includes:

  • clone()
  • equals()
  • finalize()
  • getClass()
  • hashCode()
  • notify()
  • notifyAll()
  • toString()
  • wait()

Of course, it would be great if those methods weren’t declared in Object in the first place, but that ship has sailed 20 years ago.

Top 10 Easy Performance Optimisations in Java


There has been a lot of hype about the buzzword “web scale“, and people are going through lengths of reorganising their application architecture to get their systems to “scale”.

But what is scaling, and how can we make sure that we can scale?

Different aspects of scaling

The hype mentioned above is mostly about scaling load, i.e. to make sure that a system that works for 1 user will also work well for 10 users, or 100 users, or millions. Ideally, your system is as “stateless” as possible such that the few pieces of state that really remain can be transferred and transformed on any processing unit in your network. When load is your problem, latency is probably not, so it’s OK if individual requests take 50-100ms. This is often also referred to as scaling out

An entirely different aspect of scaling is about scaling performance, i.e. to make sure that an algorithm that works for 1 piece of information will also work well for 10 pieces, or 100 pieces, or millions. Whether this type of scaling is feasible is best described by Big O Notation. Latency is the killer when scaling performance. You want to do everything possible to keep all calculation on a single machine. This is often also referred to as scaling up

If there was anything like free lunch (there isn’t), we could indefinitely combine scaling up and out. Anyway, today, we’re going to look at some very easy ways to improve things on the performance side.

Big O Notation

Java 7’s ForkJoinPool as well as Java 8’s parallel Stream help parallelising stuff, which is great when you deploy your Java program onto a multi-core processor machine. The advantage of such parallelism compared to scaling across different machines on your network is the fact that you can almost completely eliminate latency effects, as all cores can access the same memory.

But don’t be fooled by the effect that parallelism has! Remember the following two things:

  • Parallelism eats up your cores. This is great for batch processing, but a nightmare for asynchronous servers (such as HTTP). There are good reasons why we’ve used the single-thread servlet model in the past decades. So parallelism only helps when scaling up.
  • Parallelism has no effect on your algorithm’s Big O Notation. If your algorithm is O(n log n), and you let that algorithm run on c cores, you will still have an O(n log n / c) algorithm, as c is an insignificant constant in your algorithm’s complexity. You will save wall-clock time, but not reduce complexity!

The best way to improve performance, of course, is by reducing algorithm complexity. The killer is achieve O(1) or quasi-O(1), of course, for instance a HashMap lookup. But that is not always possible, let alone easy.

If you cannot reduce your complexity, you can still gain a lot of performance if you tweak your algorithm where it really matters, if you can find the right spots. Assume the following visual representation of an algorithm:

algorithm

The overall complexity of the algorithm is O(N3), or O(N x O x P) if we want to deal with individual orders of magnitude. However, when profiling this code, you might find a funny scenario:

  • On your development box, the left branch (N -> M -> Heavy operation) is the only branch that you can see in your profiler, because the values for O and P are small in your development sample data.
  • On production, however, the right branch (N -> O -> P -> Easy operation or also N.O.P.E.) is really causing trouble. Your operations team might have figured this out using AppDynamics, or DynaTrace, or some similar software.

Without production data, you might quickly jump to conclusions and optimise the “heavy operation”. You ship to production and your fix has no effect.

There are no golden rules to optimisation apart from the facts that:

  • A well-designed application is much easier to optimise
  • Premature optimisation will not solve any performance problems, but make your application less well-designed, which in turn makes it harder to be optimised

Enough theory. Let’s assume that you have found the right branch to be the issue. It may well be that a very easy operation is blowing up in production, because it is called lots and lots of times (if N, O, and P are large). Please read this article in the context of there being a problem at the leaf node of an inevitable O(N3) algorithm. These optimisations won’t help you scale. They’ll help you save your customer’s day for now, deferring the difficult improvement of the overall algorithm until later!

Here are the top 10 easy performance optimisations in Java:

1. Use StringBuilder

This should be your default in almost all Java code. Try to avoid the + operator. Sure, you may argue that it is just syntax sugar for a StringBuilder anyway, as in:

String x = "a" + args.length + "b";

… which compiles to

 0  new java.lang.StringBuilder [16]
 3  dup
 4  ldc <String "a"> [18]
 6  invokespecial java.lang.StringBuilder(java.lang.String) [20]
 9  aload_0 [args]
10  arraylength
11  invokevirtual java.lang.StringBuilder.append(int) : java.lang.StringBuilder [23]
14  ldc <String "b"> [27]
16  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [29]
19  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [32]
22  astore_1 [x]

But what happens, if later on, you need to amend your String with optional parts?

String x = "a" + args.length + "b";

if (args.length == 1)
    x = x + args[0];

You will now have a second StringBuilder, that just needlessly consumes memory off your heap, putting pressure on your GC. Write this instead:

StringBuilder x = new StringBuilder("a");
x.append(args.length);
x.append("b");

if (args.length == 1);
    x.append(args[0]);

Takeaway

In the above example, it is probably completely irrelevant if you’re using explicit StringBuilder instances, or if you rely on the Java compiler creating implicit instances for you. But remember, we’re in the N.O.P.E. branch. Every CPU cycle that we’re wasting on something as stupid as GC or allocating a StringBuilder‘s default capacity, we’re wasting N x O x P times.

As a rule of thumb, always use a StringBuilder rather than the + operator. And if you can, keep the StringBuilder reference across several methods, if your String is more complex to build. This is what jOOQ does when you generate a complex SQL statement. There is only one StringBuilder that “traverses” your whole SQL AST (Abstract Syntax Tree)

And for crying out loud, if you still have StringBuffer references, do replace them by StringBuilder. You really hardly ever need to synchronize on a string being created.

2. Avoid regular expressions

Regular expressions are relatively cheap and convenient. But if you’re in the N.O.P.E. branch, they’re about the worst thing you can do. If you absolutely must use regular expressions in computation-intensive code sections, at least cache the Pattern reference instead of compiling it afresh all the time:

static final Pattern HEAVY_REGEX = 
    Pattern.compile("(((X)*Y)*Z)*");

But if your regular expression is really silly like

String[] parts = ipAddress.split("\\.");

… then you really better resort to ordinary char[] or index-based manipulation. For example this utterly unreadable loop does the same thing:

int length = ipAddress.length();
int offset = 0;
int part = 0;
for (int i = 0; i < length; i++) {
    if (i == length - 1 || 
            ipAddress.charAt(i + 1) == '.') {
        parts[part] = 
            ipAddress.substring(offset, i + 1);
        part++;
        offset = i + 2;
    }
}

… which also shows why you shouldn’t do any premature optimisation. Compared to the split() version, this is unmaintainable.

Challenge: The clever ones among your readers might find even faster algorithms.

Takeaway

Regular expressions are useful, but they come at a price. If you’re deep down in a N.O.P.E. branch, you must avoid regular expressions at all costs. Beware of a variety of JDK String methods, that use regular expressions, such as String.replaceAll(), or String.split().

Use a popular library like Apache Commons Lang instead, for your String manipulation.

3. Do not use iterator()

Now, this advice is really not for general use-cases, but only applicable deep down in a N.O.P.E. branch. Nonetheless, you should think about it. Writing Java-5 style foreach loops is convenient. You can just completely forget about looping internals, and write:

for (String value : strings) {
    // Do something useful here
}

However, every time you run into this loop, if strings is an Iterable, you will create a new Iterator instance. If you’re using an ArrayList, this is going to be allocating an object with 3 ints on your heap:

private class Itr implements Iterator<E> {
    int cursor;
    int lastRet = -1;
    int expectedModCount = modCount;
    // ...

Instead, you can write the following, equivalent loop and “waste” only a single int value on the stack, which is dirt cheap:

int size = strings.size();
for (int i = 0; i < size; i++) {
    String value : strings.get(i);
    // Do something useful here
}

… or, if your list doesn’t really change, you might even operate on an array version of it:

for (String value : stringArray) {
    // Do something useful here
}

Takeaway

Iterators, Iterable, and the foreach loop are extremely useful from a writeability and readability perspective, as well as from an API design perspective. However, they create a small new instance on the heap for each single iteration. If you run this iteration many many times, you want to make sure to avoid creating this useless instance, and write index-based iterations instead.

Discussion

Some interesting disagreement about parts of the above (in particular replacing Iterator usage by access-by-index) has been discussed on Reddit here.

4. Don’t call that method

Some methods are simple expensive. In our N.O.P.E. branch example, we don’t have such a method at the leaf, but you may well have one. Let’s assume your JDBC driver needs to go through incredible trouble to calculate the value of ResultSet.wasNull(). Your homegrown SQL framework code might look like this:

if (type == Integer.class) {
    result = (T) wasNull(rs, 
        Integer.valueOf(rs.getInt(index)));
}

// And then...
static final <T> T wasNull(ResultSet rs, T value) 
throws SQLException {
    return rs.wasNull() ? null : value;
}

This logic will now call ResultSet.wasNull() every time you get an int from the result set. But the getInt() contract reads:

Returns: the column value; if the value is SQL NULL, the value returned is 0

Thus, a simple, yet possibly drastic improvement to the above would be:

static final <T extends Number> T wasNull(
    ResultSet rs, T value
) 
throws SQLException {
    return (value == null || 
           (value.intValue() == 0 && rs.wasNull())) 
        ? null : value;
}

So, this is a no-brainer:

Takeaway

Don’t call expensive methods in an algorithms “leaf nodes”, but cache the call instead, or avoid it if the method contract allows it.

5. Use primitives and the stack

The above example is from jOOQ, which uses a lot of generics, and thus is forced to use wrapper types for byte, short, int, and long – at least before generics will be specialisable in Java 10 and project Valhalla. But you may not have this constraint in your code, so you should take all measures to replace:

// Goes to the heap
Integer i = 817598;

… by this:

// Stays on the stack
int i = 817598;

Things get worse when you’re using arrays:

// Three heap objects!
Integer[] i = { 1337, 424242 };

… by this:

// One heap object.
int[] i = { 1337, 424242 };

Takeaway

When you’re deep down in your N.O.P.E. branch, you should be extremely wary of using wrapper types. Chances are that you will create a lot of pressure on your GC, which has to kick in all the time to clean up your mess.

A particularly useful optimisation might be to use some primitive type and create large, one-dimensional arrays of it, and a couple of delimiter variables to indicate where exactly your encoded object is located on the array.

An excellent library for primitive collections, which are a bit more sophisticated than your average int[] is trove4j, which ships with LGPL.

Exception

There is an exception to this rule: boolean and byte have few enough values to be cached entirely by the JDK. You can write:

Boolean a1 = true; // ... syntax sugar for:
Boolean a2 = Boolean.valueOf(true);

Byte b1 = (byte) 123; // ... syntax sugar for:
Byte b2 = Byte.valueOf((byte) 123);

The same is true for low values of the other integer primitive types, including char, short, int, long.

But only if you’re auto-boxing them, or calling TheType.valueOf(), not when you call the constructor!

Never call the constructor on wrapper types, unless you really want a new instance

This fact can also help you write a sophisticated, trolling April Fool’s joke for your co-workers

Off heap

Of course, you might also want to experiment with off-heap libraries, although they’re more of a strategic decision, not a local optimisation.

An interesting article on that subject by Peter Lawrey and Ben Cotton is: OpenJDK and HashMap… Safely Teaching an Old Dog New (Off-Heap!) Tricks

6. Avoid recursion

Modern functional programming languages like Scala encourage the use of recursion, as they offer means of optimising tail-recursing algorithms back into iterative ones. If your language supports such optimisations, you might be fine. But even then, the slightest change of algorithm might produce a branch that prevents your recursion from being tail-recursive. Hopefully the compiler will detect this! Otherwise, you might be wasting a lot of stack frames for something that might have been implemented using only a few local variables.

Takeaway

There’s not much to say about this apart from: Always prefer iteration over recursion when you’re deep down the N.O.P.E. branch

7. Use entrySet()

When you want to iterate through a Map, and you need both keys and values, you must have a very good reason to write the following:

for (K key : map.keySet()) {
    V value : map.get(key);
}

… rather than the following:

for (Entry<K, V> entry : map.entrySet()) {
    K key = entry.getKey();
    V value = entry.getValue();
}

When you’re in the N.O.P.E. branch, you should be wary of maps anyway, because lots and lots of O(1) map access operations are still lots of operations. And the access isn’t free either. But at least, if you cannot do without maps, use entrySet() to iterate them! The Map.Entry instance is there anyway, you only need to access it.

Takeaway

Always use entrySet() when you need both keys and values during map iteration.

8. Use EnumSet or EnumMap

There are some cases where the number of possible keys in a map is known in advance – for instance when using a configuration map. If that number is relatively small, you should really consider using EnumSet or EnumMap, instead of regular HashSet or HashMap instead. This is easily explained by looking at EnumMap.put():

private transient Object[] vals;

public V put(K key, V value) {
    // ...
    int index = key.ordinal();
    vals[index] = maskNull(value);
    // ...
}

The essence of this implementation is the fact that we have an array of indexed values rather than a hash table. When inserting a new value, all we have to do to look up the map entry is ask the enum for its constant ordinal, which is generated by the Java compiler on each enum type. If this is a global configuration map (i.e. only one instance), the increased access speed will help EnumMap heavily outperform HashMap, which may use a bit less heap memory, but which will have to run hashCode() and equals() on each key.

Takeaway

Enum and EnumMap are very close friends. Whenever you use enum-like structures as keys, consider actually making those structures enums and using them as keys in EnumMap.

9. Optimise your hashCode() and equals() methods

If you cannot use an EnumMap, at least optimise your hashCode() and equals() methods. A good hashCode() method is essential because it will prevent further calls to the much more expensive equals() as it will produce more distinct hash buckets per set of instances.

In every class hierarchy, you may have popular and simple objects. Let’s have a look at jOOQ’s org.jooq.Table implementations.

The simplest and fastest possible implementation of hashCode() is this one:

// AbstractTable, a common Table base implementation:

@Override
public int hashCode() {

    // [#1938] This is a much more efficient hashCode()
    // implementation compared to that of standard
    // QueryParts
    return name.hashCode();
}

… where name is simply the table name. We don’t even consider the schema or any other property of the table, as the table names are usually distinct enough across a database. Also, the name is a string, so it has already a cached hashCode() value inside.

The comment is important, because AbstractTable extends AbstractQueryPart, which is a common base implementation for any AST (Abstract Syntax Tree) element. The common AST element does not have any properties, so it cannot make any assumptions an optimised hashCode() implementation. Thus, the overridden method looks like this:

// AbstractQueryPart, a common AST element
// base implementation:

@Override
public int hashCode() {
    // This is a working default implementation. 
    // It should be overridden by concrete subclasses,
    // to improve performance
    return create().renderInlined(this).hashCode();
}

In other words, the whole SQL rendering workflow has to be triggered to calculate the hash code of a common AST element.

Things get more interesting with equals()

// AbstractTable, a common Table base implementation:

@Override
public boolean equals(Object that) {
    if (this == that) {
        return true;
    }

    // [#2144] Non-equality can be decided early, 
    // without executing the rather expensive
    // implementation of AbstractQueryPart.equals()
    if (that instanceof AbstractTable) {
        if (StringUtils.equals(name, 
            (((AbstractTable<?>) that).name))) {
            return super.equals(that);
        }

        return false;
    }

    return false;
}

First thing: Always (not only in a N.O.P.E. branch) abort every equals() method early, if:

  • this == argument
  • this "incompatible type" argument

Note that the latter condition includes argument == null, if you’re using instanceof to check for compatible types. We’ve blogged about this before in 10 Subtle Best Practices when Coding Java.

Now, after aborting comparison early in obvious cases, you might also want to abort comparison early when you can make partial decisions. For instance, the contract of jOOQ’s Table.equals() is that for two tables to be considered equal, they must be of the same name, regardless of the concrete implementation type. For instance, there is no way these two items can be equal:

  • com.example.generated.Tables.MY_TABLE
  • DSL.tableByName("MY_OTHER_TABLE")

If the argument cannot be equal to this, and if we can check that easily, let’s do so and abort if the check fails. If the check succeeds, we can still proceed with the more expensive implementation from super. Given that most objects in the universe are not equal, we’re going to save a lot of CPU time by shortcutting this method.

some objects are more equal than others

In the case of jOOQ, most instances are really tables as generated by the jOOQ source code generator, whose equals() implementation is even further optimised. The dozens of other table types (derived tables, table-valued functions, array tables, joined tables, pivot tables, common table expressions, etc.) can keep their “simple” implementation.

10. Think in sets, not in individual elements

Last but not least, there is a thing that is not Java-related but applies to any language. Besides, we’re leaving the N.O.P.E. branch as this advice might just help you move from O(N3) to O(n log n), or something like that.

Unfortunately, many programmers think in terms of simple, local algorithms. They’re solving a problem step by step, branch by branch, loop by loop, method by method. That’s the imperative and/or functional programming style. While it is increasingly easy to model the “bigger picture” when going from pure imperative to object oriented (still imperative) to functional programming, all these styles lack something that only SQL and R and similar languages have:

Declarative programming.

In SQL (and we love it, as this is the jOOQ blog) you can declare the outcome you want to get from your database, without making any algorithmic implications whatsoever. The database can then take all the meta data available into consideration (e.g. constraints, keys, indexes, etc.) to figure out the best possible algorithm.

In theory, this has been the main idea behind SQL and relational calculus from the beginning. In practice, SQL vendors have implemented highly efficient CBOs (Cost-Based Optimisers) only since the last decade, so stay with us in the 2010’s when SQL will finally unleash its full potential (it was about time!)

But you don’t have to do SQL to think in sets. Sets / collections / bags / lists are available in all languages and libraries. The main advantage of using sets is the fact that your algorithms will become much much more concise. It is so much easier to write:

SomeSet INTERSECT SomeOtherSet

rather than:

// Pre-Java 8
Set result = new HashSet();
for (Object candidate : someSet)
    if (someOtherSet.contains(candidate))
        result.add(candidate);

// Even Java 8 doesn't really help
someSet.stream()
       .filter(someOtherSet::contains)
       .collect(Collectors.toSet());

Some may argue that functional programming and Java 8 will help you write easier, more concise algorithms. That’s not necessarily true. You can translate your imperative Java-7-loop into a functional Java-8 Stream collection, but you’re still writing the very same algorithm. Writing a SQL-esque expression is different. This…

SomeSet INTERSECT SomeOtherSet

… can be implemented in 1000 ways by the implementation engine. As we’ve learned today, perhaps it is wise to transform the two sets into EnumSet automatically, before running the INTERSECT operation. Perhaps we can parallelise this INTERSECT without making low-level calls to Stream.parallel()

Conclusion

In this article, we’ve talked about optimisations done on the N.O.P.E. branch, i.e. deep down in a high-complexity algorithm. In our case, being the jOOQ developers, we have interest in optimising our SQL generation:

  • Every query is generated only on a single StringBuilder
  • Our templating engine actually parses characters, instead of using regular expressions
  • We use arrays wherever we can, especially when iterating over listeners
  • We stay clear of JDBC methods that we don’t have to call
  • etc…

jOOQ is at the “bottom of the food chain”, because it’s the (second-)last API that is being called by our customers’ applications before the call leaves the JVM to enter the DBMS. Being at the bottom of the food chain means that every line of code that is executed in jOOQ might be called N x O x P times, so we must optimise eagerly.

Your business logic is not deep down in the N.O.P.E. branch. But your own, home-grown infrastructure logic may be (custom SQL frameworks, custom libraries, etc.) Those should be reviewed according to the rules that we’ve seen today. For instance, using Java Mission Control or any other profiler.

Liked this article?

If you can’t go and profile your application right now, you might enjoy reading any of these articles instead:

Top 5 Use-Cases For Nested Types


There has been an interesting discussion on reddit, the other day Static Inner Classes. When is it too much?

First, let’s review a little bit of basic historic Java knowledge. Java-the-language offers four levels of nesting classes, and by “Java-the-language”, I mean that these constructs are mere “syntax sugar”. They don’t exist in the JVM, which only knows ordinary classes.

(Static) Nested classes

class Outer {
    static class Inner {
    }
}

In this case, Inner is completely independent of Outer, except for a common, shared namespace.

Inner classes

class Outer {
    class Inner {
    }
}

In this case, Inner instances have an implicit reference to their enclosing Outer instance. In other words, there can be no Inner instance without an associated Outer instance.

The Java way of creating such an instance is this:

Outer.Inner yikes = new Outer().new Inner();

What looks totally awkward makes a lot of sense. Think about creating an Inner instance somewhere inside of Outer:

class Outer {
    class Inner {
    }

    void somewhereInside() {
        // We're already in the scope of Outer.
        // We don't have to qualify Inner explicitly.
        Inner aaahOK;

        // This is what we're used to writing.
        aaahOK = new Inner();

        // As all other locally scoped methods, we can
        // access the Inner constructor by 
        // dereferencing it from "this". We just
        // hardly ever write "this"
        aaahOK = this.new Inner();
    }
}

Note that much like the public or abstract keywords, the static keyword is implicit for nested interfaces. While the following hypothetical syntax might look familiar at first sight…:

class Outer {
    <non-static> interface Inner {
        default void doSomething() {
            Outer.this.doSomething();
        }
    }

    void doSomething() {}
}

… it is not possible to write the above. Apart from the lack of a <non-static> keyword, there don’t seem to be any obvious reason why “inner interfaces” shouldn’t be possible. I’d suspect the usual – there must be some really edge-casey caveat related to backwards-compatibility and/or multiple inheritance that prevents this.

Local classes

class Outer {
    void somewhereInside() {
        class Inner {
        }
    }
}

Local classes are probably one of the least known features in Java, as there is hardly any use for them. Local classes are named types whose scope extends only to the enclosing method. Obvious use-cases are when you want to reuse such a type several times within that method, e.g. to construct several similar listeners in a JavaFX application.

Anonymous classes

class Outer {
    Serializable dummy = new Serializable() {};
}

Anonymous classes are subtypes of another type with only one single instance.

Top 5 Use-Cases For Nested Classes

All of anonymous, local, and inner classes keep a reference to their enclosing instance, if they’re not defined in a static context. This may cause a lot of trouble if you let instances of these classes leak outside of their scope. Read more about that trouble in our article: Don’t be Clever: The Double Curly Braces Anti Pattern.

Often, however, you do want to profit from that enclosing instance. It can be quite useful to have some sort of “message” object that you can return without disclosing the actual implementation:

class Outer {

    // This implementation is private ...
    private class Inner implements Message {
        @Override
        public void getMessage() {
            Outer.this.someoneCalledMe();
        }
    }

    // ... but we can return it, being of
    // type Message
    Message hello() {
        return new Inner();
    }

    void someoneCalledMe() {}
}

With (static) nested classes, however, there is no enclosing scope as the Inner instance is completely independent of any Outer instance. So what’s the point of using such a nested class, rather than a top-level type?

1. Association with the outer type

If you want to communicate to the whole world, hey, this (inner) type is totally related to this (outer) type, and doesn’t make sense on its own, then you can nest the types. This has been done with Map and Map.Entry, for instance:

public interface Map<K, V> {
    interface Entry<K, V> {
    }
}

2. Hiding from the outside of the outer type

If package (default) visibility isn’t enough for your types you can create private static classes that are available only to their enclosing type and to all other nested types of the enclosing type. This is really the main use-case for static nested classes.

class Outer {
    private static class Inner {
    }
}

class Outer2 {
    Outer.Inner nope;
}

3. Protected types

This is really a very rare use-case, but sometimes, within a class hierarchy, you need types that you want to make available only to subtypes of a given type. This is a use-case for protected static classes:

class Parent {
    protected static class OnlySubtypesCanSeeMe {
    }

    protected OnlySubtypesCanSeeMe someMethod() {
        return new OnlySubtypesCanSeeMe();
    }
}

class Child extends Parent {
    OnlySubtypesCanSeeMe wow = someMethod();
}

4. To emulate modules

Unlike Ceylon, Java doesn’t have first-class modules. With Maven or OSGi, it is possible to add some modular behaviour to Java’s build (Maven) or runtime (OSGi) environments, but if you want to express modules in code, this isn’t really possible.

However, you can establish modules by convention by using static nested classes. Let’s look at the java.util.stream package. We could consider it a module, and within this module, we have a couple of “sub-modules”, or groups of types, such as the internal java.util.stream.Nodes class, which roughly looks like this:

final class Nodes {
    private Nodes() {}
    private static abstract class AbstractConcNode {}
    static final class ConcNode {
        static final class OfInt {}
        static final class OfLong {}
    }
    private static final class FixedNodeBuilder {}
    // ...
}

Some of this Nodes stuff is available to all of the java.util.stream package, so we might say that the way this is written, we have something like:

  • a synthetic java.util.stream.nodes sub-package, visible only to the java.util.stream “module”
  • a couple of java.util.stream.nodes.* types, visible also only to the java.util.stream “module”
  • a couple of “top-level” functions (static methods) in the synthetic java.util.stream.nodes package

Looks a lot like Ceylon, to me!

5. Cosmetic reasons

The last bit is rather boring. Or some may find it interesting. It’s about taste, or ease of writing things. Some classes are just so small and unimportant, it’s just easier to write them inside of another class. Saves you a .java file. Why not.

Conclusion

In times of Java 8, thinking about the very old features of Java the language might not prove to be extremely exciting. Static nested classes are a well-understood tool for a couple of niche use-cases.

The takeaway from this article, however, is this. Every time you nest a class, be sure to make it static if you don’t absolutely need a reference to the enclosing instance. You never know when that reference is blowing up your application in production.

You Will Regret Applying Overloading with Lambdas!


Writing good APIs is hard. Extremely hard. You have to think of an incredible amount of things if you want your users to love your API. You have to find the right balance between:

  1. Usefulness
  2. Usability
  3. Backward compatibility
  4. Forward compatibility

We’ve blogged about this topic before, in our article: How to Design a Good, Regular API. Today, we’re going to look into how…

Java 8 changes the rules

Yes!

Overloading is a nice tool to provide covenience in two dimensions:

  • By providing argument type alternatives
  • By providing argument default values

Examples for the above from the JDK include:

public class Arrays {

    // Argument type alternatives
    public static void sort(int[] a) { ... }
    public static void sort(long[] a) { ... }

    // Argument default values
    public static IntStream stream(int[] array) { ... }
    public static IntStream stream(int[] array, 
        int startInclusive, 
        int endExclusive) { ... }
}

The jOOQ API is obviously full of such convenience. As jOOQ is a DSL for SQL, we might even abuse a little bit:

public interface DSLContext {
    <T1> SelectSelectStep<Record1<T1>> 
        select(SelectField<T1> field1);

    <T1, T2> SelectSelectStep<Record2<T1, T2>> 
        select(SelectField<T1> field1, 
               SelectField<T2> field2);

    <T1, T2, T3> SelectSelectStep<Record3<T1, T2, T3>> s
        select(SelectField<T1> field1, 
               SelectField<T2> field2, 
               SelectField<T3> field3);

    <T1, T2, T3, T4> SelectSelectStep<Record4<T1, T2, T3, T4>> 
        select(SelectField<T1> field1, 
               SelectField<T2> field2, 
               SelectField<T3> field3, 
               SelectField<T4> field4);

    // and so on...
}

Languages like Ceylon take this idea of convenience one step further by claiming that the above is the only reasonable reason why overloading is be used in Java. And thus, the creators of Ceylon have completely removed overloading from their language, replacing the above by union types and actual default values for arguments. E.g.

// Union types
void sort(int[]|long[] a) { ... }

// Default argument values
IntStream stream(int[] array,
    int startInclusive = 0,
    int endInclusive = array.length) { ... }

Read “Top 10 Ceylon Language Features I Wish We Had In Java” for more information about Ceylon.

In Java, unfortunately, we cannot use union types or argument default values. So we have to use overloading to provide our API consumers with convenience methods.

If your method argument is a functional interface, however, things changed drastically between Java 7 and Java 8, with respect to method overloading. An example is given here from JavaFX.

JavaFX’s “unfriendly” ObservableList

JavaFX enhances the JDK collection types by making them “observable”. Not to be confused with Observable, a dinosaur type from the JDK 1.0 and from pre-Swing days.

JavaFX’s own Observable essentially looks like this:

public interface Observable {
  void addListener(InvalidationListener listener);
  void removeListener(InvalidationListener listener);
}

And luckily, this InvalidationListener is a functional interface:

@FunctionalInterface
public interface InvalidationListener {
  void invalidated(Observable observable);
}

This is great, because we can do things like:

Observable awesome = 
    FXCollections.observableArrayList();
awesome.addListener(fantastic -> splendid.cheer());

(notice how I’ve replaced foo/bar/baz with more cheerful terms. We should all do that. Foo and bar are so 1970)

Unfortunately, things get more hairy when we do what we would probably do, instead. I.e. instead of declaring an Observable, we’d like that to be a much more useful ObservableList:

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener(fantastic -> splendid.cheer());

But now, we get a compilation error on the second line:

awesome.addListener(fantastic -> splendid.cheer());
//      ^^^^^^^^^^^ 
// The method addListener(ListChangeListener<? super String>) 
// is ambiguous for the type ObservableList<String>

Because, essentially…

public interface ObservableList<E> 
extends List<E>, Observable {
    void addListener(ListChangeListener<? super E> listener);
}

and…

@FunctionalInterface
public interface ListChangeListener<E> {
    void onChanged(Change<? extends E> c);
}

Now again, before Java 8, the two listener types were completely unambiguously distinguishable, and they still are. You can easily call them by passing a named type. Our original code would still work if we wrote:

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
InvalidationListener hearYe = 
    fantastic -> splendid.cheer();
awesome.addListener(hearYe);

Or…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener((InvalidationListener) 
    fantastic -> splendid.cheer());

Or even…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();
awesome.addListener((Observable fantastic) -> 
    splendid.cheer());

All of these measures will remove ambiguity. But frankly, lambdas are only half as cool if you have to explicitly type the lambda, or the argument types. We have modern IDEs that can perform autocompletion and help infer types just as much as the compiler itself.

Imagine if we really wanted to call the other addListener() method, the one that takes a ListChangeListener. We’d have to write any of

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// Agh. Remember that we have to repeat "String" here
ListChangeListener<String> hearYe = 
    fantastic -> splendid.cheer();
awesome.addListener(hearYe);

Or…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// Agh. Remember that we have to repeat "String" here
awesome.addListener((ListChangeListener<String>) 
    fantastic -> splendid.cheer());

Or even…

ObservableList<String> awesome = 
    FXCollections.observableArrayList();

// WTF... "extends" String?? But that's what this thing needs...
awesome.addListener((Change<? extends String> fantastic) -> 
    splendid.cheer());

Overload you shan’t. Be wary you must.

API design is hard. It was hard before, it has gotten harder now. With Java 8, if any of your API methods’ arguments are a functional interface, think twice about overloading that API method. And once you’ve concluded to proceed with overloading, think again, a third time whether this is really a good idea.

Not convinced? Have a close look at the JDK. For instance the java.util.stream.Stream type. How many overloaded methods do you see that have the same number of functional interface arguments, which again take the same number of method arguments (as in our previous addListener() example)?

Zero.

There are overloads where overload argument numbers differ. For instance:

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);

<R, A> R collect(Collector<? super T, A, R> collector);

You will never have any ambiguity when calling collect().

But when the argument numbers do not differ, and neither do the arguments’ own method argument numbers, the method names are different. For instance:

<R> Stream<R> map(Function<? super T, ? extends R> mapper);
IntStream mapToInt(ToIntFunction<? super T> mapper);
LongStream mapToLong(ToLongFunction<? super T> mapper);
DoubleStream mapToDouble(ToDoubleFunction<? super T> mapper);

Now, this is super annoying at the call site, because you have to think in advance what method you have to use based on a variety of involved types.

But it’s really the only solution to this dilemma. So, remember:

You Will Regret Applying Overloading with Lambdas!

Did you like this article? You might also like:

How to Translate SQL GROUP BY and Aggregations to Java 8


I couldn’t resist. I have read this question by Hugo Prudente on Stack Overflow. And I knew there had to be a better way than what the JDK has to offer.

The question reads:

I’m looking for a lambda to refine the data already retrieved. I have a raw resultset, if the user do not change the date I want use java’s lambda to group by the results for then. And I’m new to lambdas with java.

The lambda I’m looking for works simliar to this query.

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

SQL is declarative. Functional programming is not.

Before we go on with this discussion, let’s establish a very important fact. SQL is a completely declarative language. Functional (or “functional-ish”, to keep the Haskell-aficionados at peace) programming languages like Java 8 are not declarative. While expressing data transformation algorithms using functions is much more concise than expressing them using objects, or worse, using imperative instructions, you’re still explicitly expressing the algorithm.

When you write SQL, you don’t write any algorithm. You just describe the result you want to have. The SQL engine’s optimiser will figure out the algorithm for you – e.g. based on the fact that you may have an index on Z but not on W or on (Z, W).

While simple examples like these can easily be implemented using Java 8, you will quickly run into Java’s limitations, once you need to do more complex reporting.

Of course, as we’ve blogged before, the optimum is reached when you combine SQL and functional programming.

How can this be written in Java 8?

There are a variety of ways to do it. The essence is to understand all the participants in such a transformation. And no matter if you find this easy or hard, suitable for Java 8 or inadequate, thinking about the different, lesser-known parts of new Stream API is certainly worth the exercise.

The main participants here are:

  • Stream: If you’re using JDK 8 libraries, then the new java.util.stream.Stream type will be your first choice.
  • Collector: The JDK provides us with a rather low-level and thus very powerful new API for data aggregation (also known as “reduction”). This API is summarised by the new java.util.stream.Collector type, a new type from which we have heard only little so far in the blogosphere

Disclaimer

Some of the code displayed here might not work in your favourite IDE. Unfortunately, even if Java 7 reaches its end of life, all major IDEs (Eclipse, IntelliJ, NetBeans), and even the javac compiler still have quite a few bugs related to the combination of generic type inference and lambda expressions. Stay tuned until those bugs are fixed! And report any bug you discover. We’ll all thank you for it!

Let’s go!

Let’s review our SQL statement:

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

In terms of the Stream API, the table itself is the Stream. Let’s just assume that we have a “table type” A as such:

class A {
    final int w;
    final int x;
    final int y;
    final int z;

    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }

    @Override
    public String toString() {
        return "A{" +
                "w=" + w +
                ", x=" + x +
                ", y=" + y +
                ", z=" + z +
                '}';
    }
}

You can also add equals() and hashCode() if you must.

We can now easily compose the Stream using Stream.of(), and some sample data:

Stream<A> stream =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5));

Now, the next step is to GROUP BY z, w. The Stream API itself, unfortunately, doesn’t contain such a convenience method. We have to resort to more low-level operations by specifying the more general Stream.collect() operation, and passing a Collector to it that does the grouping. Luckily, a variety of different grouping Collectors are already made available from the Collectors helper class.

So we add that to our stream

Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(...));

jool-logo-blackNow the interesting part starts. How do we specify that we want to group by both A.z and A.w? We need to provide this groupingBy method with a function that can extract something like a SQL tuple from the A type. We could write our own tuple, or we simply use that of jOOλ, a library that we have created and open-sourced to improve our jOOQ integration tests.

The Tuple2 type roughly looks like this:

public class Tuple2<T1, T2> {

    public final T1 v1;
    public final T2 v2;

    public T1 v1() {
        return v1;
    }

    public T2 v2() {
        return v2;
    }

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }
}

public interface Tuple {
    static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
        return new Tuple2<>(v1, v2);
    }
}

It has many more useful features, but these ones will be sufficient for this article.

On a side-note

Why the JDK doesn’t ship with built-in tuples like C#’s or Scala’s escapes me.

Functional programming without tuples is like coffee without sugar: A bitter punch in your face.

Anyway… back on track

So we’re grouping by the (A.z, A.w) tuple, as we would in SQL

Map<Tuple2<Integer, Integer>, List<A>> map =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
    a -> tuple(a.z, a.w)
));

As you can see, this produces a verbose but very descriptive type, a map containing our grouping tuple as its key, and a list of collected table records as its value.

Running the following statement

map.entrySet().forEach(System.out::println);

will yield:

(1, 1)=[A{w=1, x=1, y=1, z=1}, A{w=1, x=2, y=3, z=1}]
(4, 9)=[A{w=9, x=8, y=6, z=4}, A{w=9, x=9, y=7, z=4}]
(5, 2)=[A{w=2, x=3, y=4, z=5}, A{w=2, x=4, y=4, z=5}, A{w=2, x=5, y=5, z=5}]

That’s already quite awesome! In fact, this behaves like the SQL:2011 standard COLLECT() aggregate function, that is also available in Oracle 10g+

Now, instead of actually collecting the A records, we prefer to aggregate the individual values of x and y. The JDK provides us with a couple of interesting new types, e.g. the java.util.IntSummaryStatistics, which is available for convenience again from the Collectors type via Collectors.summarizingInt().

On a side note

For my taste, this sledge-hammer data aggregation technique is a bit quirky. The JDK libraries have been left intentionally low level and verbose, perhaps to keep the library footprint small, or to prevent “horrible” consequences when in 5-10 years (after the release of JDK 9 and 10), it becomes obvious that some features may have been added prematurely.

At the same time, there is this all-or-nothing IntSummaryStatistics, that blindly aggregates these popular aggregation values for your collection:

  • COUNT(*)
  • SUM()
  • MIN()
  • MAX()

and obviously, once you have SUM() and COUNT(*), you also have AVG() = SUM() / COUNT(*). So that’s going to be the Java way. IntSummaryStatistics.

In case you were wondering, the SQL:2011 standard specifies these aggregate functions:

AVG, MAX, MIN, SUM, EVERY, ANY, SOME, COUNT, STDDEV_POP, STDDEV_SAMP, VAR_SAMP, VAR_POP, COLLECT, FUSION, INTERSECTION, COVAR_POP, COVAR_SAMP, CORR, REGR_SLOPE, REGR_INTERCEPT, REGR_COUNT, REGR_R2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SYY, REGR_SXY, PERCENTILE_CONT, PERCENTILE_DISC, ARRAY_AGG

And obviously there are many other, vendor-specific aggregate and window functions in SQL. We’ve blogged about them all:

True, MIN, MAX, SUM, COUNT, AVG are certainly the most popular ones. But it would’ve been nicer if they hadn’t been included in these default aggregation types, but made available in a much more composable way.

Anyway… back on track

If you want to stay low-level and use mostly JDK API, you can use the following technique to implement aggregation over two columns:

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map = Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
    a -> tuple(a.z, a.w),
    Collector.of(

        // When collecting, we'll aggregate data
        // into two IntSummaryStatistics for x and y
        () -> tuple(new IntSummaryStatistics(), 
                    new IntSummaryStatistics()),

        // The accumulator will simply take
        // new t = (x, y) values
        (r, t) -> {
            r.v1.accept(t.x);
            r.v2.accept(t.y);
        },

        // The combiner will merge two partial
        // aggregations, in case this is executed
        // in parallel
        (r1, r2) -> {
            r1.v1.combine(r2.v1);
            r1.v2.combine(r2.v2);

            return r1;
        }
    )
));

map.entrySet().forEach(System.out::println);

The above would now print

(1, 1)=(IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2}, 
        IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3})
(4, 9)=(IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9}, 
        IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7})
(5, 2)=(IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5}, 
        IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})

But obviously, no one will want to write that much code. The same thing can be achieved with jOOλ with much less code

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map =

// Seq is like a Stream, but sequential only,
// and with more features
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))

// Seq.groupBy() is just short for 
// Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),

    // ... because once you have tuples, 
    // why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
));

What you see above is probably as close as it gets to the original, very simmple SQL statement:

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM table 
GROUP BY z, w;

The interesting part here is the fact that we have what we call “tuple-collectors”, a Collector that collects data into tuples of aggregated results for any degree of the tuple (up to 8). Here’s the code for Tuple.collectors:

// All of these generics... sheesh!
static <T, A1, A2, D1, D2> 
       Collector<T, Tuple2<A1, A2>, Tuple2<D1, D2>> 
collectors(
    Collector<T, A1, D1> collector1
  , Collector<T, A2, D2> collector2
) {
    return Collector.of(
        () -> tuple(
            collector1.supplier().get()
          , collector2.supplier().get()
        ),
        (a, t) -> {
            collector1.accumulator().accept(a.v1, t);
            collector2.accumulator().accept(a.v2, t);
        },
        (a1, a2) -> tuple(
            collector1.combiner().apply(a1.v1, a2.v1)
          , collector2.combiner().apply(a1.v2, a2.v2)
        ),
        a -> tuple(
            collector1.finisher().apply(a.v1)
          , collector2.finisher().apply(a.v2)
        )
    );
}

Where the Tuple2<D1, D2> is the aggregation result type that we derive from collector1 (which provides D1) and from collector2 (which provides D2).

That’s it. We’re done!

Conclusion

Java 8 is a first step towards functional programming in Java. Using Streams and lambda expressions, we can already achieve quite a bit. The JDK APIs, however, are extremely low level and the experience when using IDEs like Eclipse, IntelliJ, or NetBeans can still be a bit frustrating. While writing this article (and adding the Tuple.collectors() method), I have reported around 10 bugs to the different IDEs. Some javac compiler bugs are not yet fixed, prior to JDK 1.8.0_40 ea. In other words:

I just keep throwing generic type parameters at the darn thing until the compiler stops bitching at me

But we’re on a good path. I trust that more useful API will ship with JDK 9 and especially with JDK 10, when all of the above will hopefully profit from the new value types and generic type specialization.

jool-logo-blackAnd, of course, if you haven’t already, download and contribute to jOOλ here!

We have created jOOλ to add the missing pieces to the JDK libraries. If you want to go all in on functional programming, i.e. when your vocabulary includes hipster terms (couldn’t resist) like monads, monoids, functors, and all that, we suggest you skip the JDK’s Streams and jOOλ entirely, and go download functionaljava by Mark Perry or javaslang by Daniel Dietrich

Using Java 8 to Prevent Excessively Wide Logs


Some logs are there to be consumed by machines and kept forever.

Other logs are there just to debug and to be consumed by humans. In the latter case, you often want to make sure that you don’t produce too much logs, especially not too wide logs, as many editors and other tools have problems once line lenghts exceed a certain size (e.g. this Eclipse bug).

String manipulation used to be a major pain in Java, with lots of tedious-to-write loops and branches, etc. No longer with Java 8!

The following truncate method will truncate all lines within a string to a certain length:

public String truncate(String string) {
    return truncate(string, 80);
}

public String truncate(String string, int length) {
    return Seq.of(string.split("\n"))
              .map(s -> StringUtils.abbreviate(s, 400))
              .join("\n");
}

The above example uses jOOλ 0.9.4 and Apache Commons Lang, but you can achieve the same using vanilla Java 8, of course:

public String truncate(String string) {
    return truncate(string, 80);
}

public String truncate(String string, int length) {
    return Stream.of(string.split("\n"))
                 .map(s -> s.substring(0, Math.min(s.length(), length)))
                 .collect(Collectors.joining("\n"));
}

The above when truncating logs to length 10, the above program will produce:

Input

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Output

Lorem ipsum dolor...
incididunt ut lab...
nostrud exercitat...
Duis aute irure d...
fugiat nulla pari...
culpa qui officia...

Happy logging!

Infinite Loops. Or: Anything that Can Possibly Go Wrong, Does.


A wise man once said:

Anything that can possibly go wrong, does

Murphy

Some programmers are wise men, thus a wise programmer once said:

A good programmer is someone who looks both ways before crossing a one-way street.

Doug Linder

In a perfect world, things work as expected and you may think that it is a good idea to keep consuming things until the end. So the following pattern is found all over in every code base:

Java

for (;;) {
    // something
}

C

while (1) {
    // something
}

BASIC

10 something
20 GOTO 10

Want to see proof? Search github for while(true) and check out the number of matches:

https://github.com/search?q=while+true&type=Code

Never use possibly infinite loops

There is a very interesting discussion in computer science around the topic of the “Halting Problem”. The essence of the halting problem as proved by Alan Turing a long time ago is the fact that it is really undecidable. While humans can quickly assess that the following program will never stop:

for (;;) continue;

… and that the following program will always stop:

for (;;) break;

… computers cannot decide on such things, and even very experienced humans might not immediately be able to do so when looking at a more complex algorithm.

Learning by doing

In jOOQ, we have recently learned about the halting problem the hard way: By doing.

Before fixing issue #3696, we worked around a bug (or flaw) in SQL Server’s JDBC driver. The bug resulted in SQLException chains not being reported correctly, e.g. when the following trigger raises several errors:

CREATE TRIGGER Employee_Upd_2  ON  EMPLOYEE FOR UPDATE
AS 
BEGIN

    Raiserror('Employee_Upd_2 Trigger called...',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...1',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...2',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...3',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...4',16,-1)
    Raiserror('Employee_Upd_2 Trigger called...5',16,-1)

END
GO

So, we explicitly consumed those SQLExceptions, such that jOOQ users got the same behaviour for all databases:

consumeLoop: for (;;)
    try {
        if (!stmt.getMoreResults() && 
             stmt.getUpdateCount() == -1)
            break consumeLoop;
    }
    catch (SQLException e) {
        previous.setNextException(e);
        previous = e;
    }

This has worked for most of our customers, as the chain of exceptions thus reported is probably finite, and also probably rather small. Even the trigger example above is not a real-world one, so the number of actual errors reported might be between 1-5.

Did I just say … “probably” ?

As our initial wise men said: The number might be between 1-5. But it might just as well be 1000. Or 1000000. Or worse, infinite. As in the case of issue #3696, when a customer used jOOQ with SQL Azure. So, in a perfect world, there cannot be an infinite number of SQLException reported, but this isn’t a perfect world and SQL Azure also had a bug (probably still does), which reported the same error again and again, eventually leading to an OutOfMemoryError, as jOOQ created a huge SQLException chain, which is probably better than looping infinitely. At least the exception was easy to detect and work around. If the loop ran infinitely, the server might have been completely blocked for all users of our customer.

The fix is now essentially this one:

consumeLoop: for (int i = 0; i < 256; i++)
    try {
        if (!stmt.getMoreResults() && 
             stmt.getUpdateCount() == -1)
            break consumeLoop;
    }
    catch (SQLException e) {
        previous.setNextException(e);
        previous = e;
    }

True to the popular saying:

640 KB ought to be enough for anybody

The only exception

So as we’ve seen before, this embarassing example shows that anything that can possibly go wrong, does. In the context of possibly ininite loops, beware that this kind of bug will take entire servers down.

The Jet Propulsion Laboratory at the California Institute of Technology has made this an essential rule for their coding standards:

Rule 3 (loop bounds)

All loops shall have a statically determinable upper-bound on the maximum number of loop iterations. It shall be possible for a static compliance checking tool to affirm the existence of the bound. An exception is allowed for the use of a single non-terminating loop per task or thread where requests are received and processed. Such a server loop shall be annotated with the C comment: /* @non-terminating@ */.

So, apart from very few exceptions, you should never expose your code to the risk of infinite loops by not providing upper bounds to loop iterations (the same can be said about recursion, btw.)

Conclusion

Go over your code base today and look for any possible while (true), for (;;), do {} while (true); and other statements. Review those statements closely and see if they can halt – e.g. using break, or throw, or return, or continue (an outer loop).

Chances are, that you or someone before you who wrote that code was as naive as we were, believing that…

… oh come on, this will never happen

Because, you know what happens when you think that nothing will happen.

Leaky Abstractions, or How to Bind Oracle DATE Correctly with Hibernate


We’ve recently published an article about how to bind the Oracle DATE type correctly in SQL / JDBC, and jOOQ. This article got a bit of traction on reddit with an interesting remark by Vlad Mihalcea, who is frequently blogging about Hibernate, JPA, transaction management and connection pooling on his blog. Vlad pointed out that this problem can also be solved with Hibernate, and we’re going to look into this, shortly.

What is the problem with Oracle DATE?

The problem that was presented in the previous article is dealing with the fact that if a query uses filters on Oracle DATE columns:

// execute_at is of type DATE and there's an index
PreparedStatement stmt = connection.prepareStatement(
    "SELECT * " + 
    "FROM rentals " +
    "WHERE rental_date > ? AND rental_date < ?");

… and we’re using java.sql.Timestamp for our bind values:

stmt.setTimestamp(1, start);
stmt.setTimestamp(2, end);

… then the execution plan will turn very bad with a FULL TABLE SCAN or perhaps an INDEX FULL SCAN, even if we should have gotten a regular INDEX RANGE SCAN.

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|*  2 |   TABLE ACCESS FULL| RENTAL |
-------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   2 - filter((INTERNAL_FUNCTION("RENTAL_DATE")>=:1 AND 
              INTERNAL_FUNCTION("RENTAL_DATE")<=:2))

This is because the database column is widened from Oracle DATE to Oracle TIMESTAMP via this INTERNAL_FUNCTION(), rather than truncating the java.sql.Timestamp value to Oracle DATE.

More details about the problem itself can be seen in the previous article

Preventing this INTERNAL_FUNCTION() with Hibernate

You can fix this with Hibernate’s proprietary API, using a org.hibernate.usertype.UserType.

Assuming that we have the following entity:

@Entity
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    public Timestamp rentalDate;
}

And now, let’s run this query here (I’m using Hibernate API, not JPA, for the example):

List<Rental> rentals =
session.createQuery("from Rental r where r.rentalDate between :from and :to")
       .setParameter("from", Timestamp.valueOf("2000-01-01 00:00:00.0"))
       .setParameter("to", Timestamp.valueOf("2000-10-01 00:00:00.0"))
       .list();

The execution plan that we’re now getting is again inefficient:

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|*  2 |   TABLE ACCESS FULL| RENTAL |
-------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   2 - filter((INTERNAL_FUNCTION("RENTAL0_"."RENTAL_DATE")>=:1 AND 
              INTERNAL_FUNCTION("RENTAL0_"."RENTAL_DATE")<=:2))

The solution is to add this @Type annotation to all relevant columns…

@Entity
@TypeDefs(
    value = @TypeDef(
        name = "oracle_date", 
        typeClass = OracleDate.class
    )
)
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    @Type(type = "oracle_date")
    public Timestamp rentalDate;
}

and register the following, simplified UserType:

import java.io.Serializable;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.util.Objects;

import oracle.sql.DATE;

import org.hibernate.engine.spi.SessionImplementor;
import org.hibernate.usertype.UserType;

public class OracleDate implements UserType {

    @Override
    public int[] sqlTypes() {
        return new int[] { Types.TIMESTAMP };
    }

    @Override
    public Class<?> returnedClass() {
        return Timestamp.class;
    }

    @Override
    public Object nullSafeGet(
        ResultSet rs, 
        String[] names, 
        SessionImplementor session, 
        Object owner
    )
    throws SQLException {
        return rs.getTimestamp(names[0]);
    }

    @Override
    public void nullSafeSet(
        PreparedStatement st, 
        Object value, 
        int index, 
        SessionImplementor session
    )
    throws SQLException {
        // The magic is here: oracle.sql.DATE!
        st.setObject(index, new DATE(value));
    }

    // The other method implementations are omitted
}

This will work because using the vendor-specific oracle.sql.DATE type will have the same effect on your execution plan as explicitly casting the bind variable in your SQL statement, as shown in the previous article: CAST(? AS DATE). The execution plan is now the desired one:

------------------------------------------------------
| Id  | Operation                    | Name          |
------------------------------------------------------
|   0 | SELECT STATEMENT             |               |
|*  1 |  FILTER                      |               |
|   2 |   TABLE ACCESS BY INDEX ROWID| RENTAL        |
|*  3 |    INDEX RANGE SCAN          | IDX_RENTAL_UQ |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   3 - access("RENTAL0_"."RENTAL_DATE">=:1 
          AND "RENTAL0_"."RENTAL_DATE"<=:2)

If you want to reproduce this issue, just query any Oracle DATE column with a java.sql.Timestamp bind value through JPA / Hibernate, and get the execution plan as indicated here.

Don’t forget to flush shared pools and buffer caches to enforce the calculation of new plans between executions, because the generated SQL is the same each time.

Can I do it with JPA 2.1?

At first sight, it looks like the new converter feature in JPA 2.1 (which works just like jOOQ’s converter feature) should be able to do the trick. We should be able to write:

import java.sql.Timestamp;

import javax.persistence.AttributeConverter;
import javax.persistence.Converter;

import oracle.sql.DATE;

@Converter
public class OracleDateConverter 
implements AttributeConverter<Timestamp, DATE>{

    @Override
    public DATE convertToDatabaseColumn(Timestamp attribute) {
        return attribute == null ? null : new DATE(attribute);
    }

    @Override
    public Timestamp convertToEntityAttribute(DATE dbData) {
        return dbData == null ? null : dbData.timestampValue();
    }
}

This converter can then be used with our entity:

import java.sql.Timestamp;

import javax.persistence.Column;
import javax.persistence.Convert;
import javax.persistence.Entity;
import javax.persistence.Id;

@Entity
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    @Convert(converter = OracleDateConverter.class)
    public Timestamp rentalDate;
}

But unfortunately, this doesn’t work out of the box as Hibernate 4.3.7 will think that you’re about to bind a variable of type VARBINARY:

// From org.hibernate.type.descriptor.sql.SqlTypeDescriptorRegistry

    public <X> ValueBinder<X> getBinder(JavaTypeDescriptor<X> javaTypeDescriptor) {
        if ( Serializable.class.isAssignableFrom( javaTypeDescriptor.getJavaTypeClass() ) ) {
            return VarbinaryTypeDescriptor.INSTANCE.getBinder( javaTypeDescriptor );
        }

        return new BasicBinder<X>( javaTypeDescriptor, this ) {
            @Override
            protected void doBind(PreparedStatement st, X value, int index, WrapperOptions options)
                    throws SQLException {
                st.setObject( index, value, jdbcTypeCode );
            }
        };
    }

Of course, we can probably somehow tweak this SqlTypeDescriptorRegistry to create our own “binder”, but then we’re back to Hibernate-specific API. This particular implementation is probably a “bug” at the Hibernate side, which has been registered here, for the record:

https://hibernate.atlassian.net/browse/HHH-9553

Conclusion

Abstractions are leaky on all levels, even if they are deemed a “standard” by the JCP. Standards are often a means of justifying an industry de-facto standard in hindsight (with some politics involved, of course). Let’s not forget that Hibernate didn’t start as a standard and massively revolutionised the way the standard-ish J2EE folks tended to think about persistence, 14 years ago.

In this case we have:

  • Oracle SQL, the actual implementation
  • The SQL standard, which specifies DATE quite differently from Oracle
  • ojdbc, which extends JDBC to allow for accessing Oracle features
  • JDBC, which follows the SQL standard with respect to temporal types
  • Hibernate, which offers proprietary API in order to access Oracle SQL and ojdbc features when binding variables
  • JPA, which again follows the SQL standard and JDBC with respect to temporal types
  • Your entity model

As you can see, the actual implementation (Oracle SQL) leaked up right into your own entity model, either via Hibernate’s UserType, or via JPA’s Converter. From then on, it will hopefully be shielded off from your application (until it won’t), allowing you to forget about this nasty little Oracle SQL detail.

Any way you turn it, if you want to solve real customer problems (i.e. the significant performance issue at hand), then you will need to resort to vendor-specific API from Oracle SQL, ojdbc, and Hibernate – instead of pretending that the SQL, JDBC, and JPA standards are the bottom line.

But that’s probably alright. For most projects, the resulting implementation-lockin is totally acceptable.

Really Too Bad that Java 8 Doesn’t Have Iterable.stream()


This is one of the more interesting recent Stack Overflow questions:

Why does Iterable not provide stream() and parallelStream() methods?

At first, it might seem intuitive to make it straight-forward to convert an Iterable into a Stream, because the two are really more or less the same thing for 90% of all use-cases.

Granted, the expert group had a strong focus on making the Stream API parallel capable, but anyone who works with Java every day will notice immediately, that Stream is most useful in its sequential form. And an Iterable is just that. A sequential stream with no guarantees with respect to parallelisation. So, it would only be intuitive if we could simply write:

iterable.stream();

In fact, subtypes of Iterable do have such methods, e.g.

collection.stream();

Brian Goetz himself gave an answer to the above Stack Overflow question. The reasons for this omittance are rooted in the fact that some Iterables might prefer to return an IntStream instead of a Stream. This really seems to be a very remote reason for a design decision, but as always, omittance today doesn’t mean omittance forever. On the other hand, if they had introduced Iterable.stream() today, and it turned out to be a mistake, they couldn’t have removed it again.

Well, primitive types in Java are a pain and they did all sorts of bad things to generics in the first place, and now to Stream as well, as we have to write the following, in order to turn an Iterable into a Stream:

Stream s = StreamSupport.stream(iterable.spliterator(), false);

Brian Goetz argues that this is “easy”, but I would disagree. As an API consumer, I experience a lot of friction in productivity because of:

  • Having to remember this otherwise useless StreamSupport type. This method could very well have been put into the Stream interface, because we already have Stream construction methods in there, such as Stream.of().
  • Having to remember the subtle difference between Iterator and Spliterator in the context of what I believe has nothing to do with parallelisation. It may well be that Spliterators will become popular eventually, though, so this doubt is for the magic 8 ball to address.
  • In fact, I have to repeat the information that there is nothing to be parallelised via the boolean argument false

Parallelisation really has such a big weight in this new API, even if it will cover only around 5%-10% of all functional collection manipulation operations. While sequential processing was not the main design goal of the JDK 8 APIs, it is really the main benefit for all of us, and the friction around APIs related to sequential processing should be as low as possible.

The method above should have just been called

Stream s = Stream.stream(iterable);

It could be implemented like this:

public static<T> Stream<T> stream(Iterable<T> i) {
    return StreamSupport.stream(i.spliterator(), false);
}

Obviously with convenience overloads that allow for the additional specialisations, like parallelisation, or passing a Spliterator

But again, if Iterable had its own stream() default method, an incredible number of APIs would be so much better integrated with Java 8 out of the box, without even supporting Java 8 explicitly!

Take jOOQ for instance. jOOQ still supports Java 6, so a direct dependency is not possible. However, jOOQ’s ResultQuery type is an Iterable. This allows you to use such queries directly inline in foreach loops, as if you were writing PL/SQL:

PL/SQL

FOR book IN (
  SELECT * FROM books ORDER BY books.title
)
LOOP
  -- Do things with book
END LOOP;

Java

for (BookRecord book : 
  ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
) {
  // Do things with book
}

Now imagine the same thing in Java 8:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .stream()
   .map / reduce / findAny, etc...

Unfortunately, the above is currently not possible. You could, of course, eagerly fetch all the results into a jOOQ Result, which extends List:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .fetch()
   .stream()
   .map / reduce / findAny, etc...

But it’s one more method to call (every time), and the actual stream semantics is broken, because the fetch is done eagerly.

Complaining on a high level

This is, of course, complaining on a high level, but it would really be great if a future version of Java, e.g. Java 9, would add this missing method to the Iterable API. Again, 99% of all use-cases will want the Stream type to be returned, not the IntStream type. And if they do want that for whatever obscure reason (much more obscure than many evil things from old legacy Java APIs, looking at you Calendar), then why shouldn’t they just declare an intStream() method. After all, if someone is crazy enough to write Iterable<Integer> when they’re really operating on int primitive types, they’ll probably accept a little workaround.

Follow

Get every new post delivered to your Inbox.

Join 2,663 other followers

%d bloggers like this: