Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()

What’s better? Using the JDK’s String.replace() or something like Apache Commons Lang’s Apache Commons Lang’s StringUtils.replace()?

In this article, I’ll compare the two, first in a profiling session using Java Mission Control (JMC), then in a benchmark using JMH, and we’ll see that Java 9 heavily improved things in this area.

Profiling using JMC

In a recent profiling session where I checked for any “obvious” bottlenecks in jOOQ, I’ve discovered this nasty regular expression pattern instantiation:

Tons of int[] instances were allocated by a regular expression pattern. That’s weird, because in general, inside of jOOQ’s internals, special care is always taken to pre-compile any regular expressions that are needed in static members, e.g.:

private static final Pattern TYPE_NAME_PATTERN = 
  Pattern.compile("\\([^\\)]*\\)");

This allows for using the Pattern in a far more optimal way, than e.g. by using String.replaceAll():

// Much better, pattern is pre-compiled
TYPE_NAME_PATTERN.matcher(castTypeName).replaceAll("")

// Much worse, pattern is compiled *every time*
castTypeName.replaceAll("\\([^\\)]*\\)", "")

That should be clear to everyone. The price to pay for this is the fact that the pattern is stored “far away” in some static member, rather than being visible right where it is used, which is a bit less readable. At least in my opinion.

SIDENOTE: People tend to get all angry about premature optimisation and such. Yes, these optimisations are micro optimisations and aren’t always worth the trouble. But this article is about jOOQ, a library that does a lot of expression tree transformations, and it is important for jOOQ to eliminate even 1% “bottlenecks”, as they make a difference. So, please read this article in this context.

Consider also our previous post about this subject: Top 10 Easy Performance Optimisations in Java

What was the problem in jOOQ?

Now, what appears to be obvious when using regular expressions seems less obvious when using ordinary, constant string replacements, such as when calling String.replace(CharSequence), as was done in the linked jOOQ issue #6672. The relevant piece of code was escaping all inline strings that are sent to the SQL database, to prevent syntax errors and, of course, SQL injection:

static final String escape(Object val, Context<?> context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = result.replace("\\", "\\\\");

    return result.replace("'", "''");
}

We’re always escaping apostrophes by doubling them, and in some databases (e.g. MySQL), we often have to escape backslashes as well (unfortunately, not all ORMs seem to do this or even be aware of this MySQL “feature”).

Unfortunately as well, despite heavy use of Apache Commons Lang’s StringUtils.replace() in jOOQ’s internals, every now and then a String.replace(CharSequence) sneaks in, because it’s just so convenient to write.

Meh, does it matter?

Usually, in ordinary business logic, it shouldn’t (again – don’t optimise prematurely), but in jOOQ, which is essentially a SQL string manipulation library, it can get quite costly if a single replace call is done excessively (for good reasons, of course), and it is slower than it should be. And it is, prior to Java 9, when this method was optimised. I’ve done the profiling with Java 8, where internally, String.replace() uses a literal regex pattern (i.e. a pattern with a “literal” flag that is faster, but it is a pattern, nonetheless).

Not only does the method appear as a major offender in the GC allocation view, it also triggers quite some action in the “hot methods” view of JMC:

Those are quite a few Pattern methods. The percentages have to be understood in the context of a benchmark, running millions of queries against an H2 in-memory database, so the overhead is significant!

Using Apache Commons Lang’s StringUtils

A simple fix is to use Apache Commons Lang’s StringUtils instead:

static final String escape(Object val, Context<?> context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = StringUtils.replace(result, "\\", "\\\\");

    return StringUtils.replace(result, "'", "''");
}

Now, the pressure has changed significantly. The int[] allocation is barely noticeable in comparison:

And much fewer Pattern calls are made, overall.

Benchmarking using JMH

Profiling can be very useful to spot bottlenecks, but it needs to be read with care. It introduces some artefacts and slight overheads and it is not 100% accurate when sampling call stacks, which might lead the wrong conclusions at times. This is why it is sometimes important to back claims by running an actual benchmark. And when benchmarking, please, don’t just loop 1 million times in a main() method. That will be very very inaccurate, except for very obvious, order-of-magnitude scale differences.

I’m using JMH here, running the following simple benchmark:

package org.jooq.test.benchmark;

import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5)
@Measurement(iterations = 7)
public class StringReplaceBenchmark {

    private static final String SHORT_STRING_NO_MATCH = "abc";
    private static final String SHORT_STRING_ONE_MATCH = "a'bc";
    private static final String SHORT_STRING_SEVERAL_MATCHES = "'a'b'c'";
    private static final String LONG_STRING_NO_MATCH = 
      "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc";
    private static final String LONG_STRING_ONE_MATCH = 
      "abcabcabcabcabcabcabcabcabcabcabca'bcabcabcabcabcabcabcabcabcabcabcabcabc";
    private static final String LONG_STRING_SEVERAL_MATCHES = 
      "abcabca'bcabcabcabcabcabc'abcabcabca'bcabcabcabcabcabca'bcabcabcabcabcabcabc";

    @Benchmark
    public void testStringReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_NO_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_NO_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_ONE_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_ONE_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_SEVERAL_MATCHES.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_SEVERAL_MATCHES.replace("'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_NO_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_NO_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_ONE_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_ONE_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_SEVERAL_MATCHES, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_SEVERAL_MATCHES, "'", "''"));
    }
}

Notice that I tried to run 2 x 3 different string replacement scenarios:

  • The string is “short”
  • The string is “long”

Cross joining (there, finally some SQL in this post!) the above with:

  • No match is found
  • One match is found
  • Several matches are found

That’s important because different optimisations can be implemented for those different cases, and probably, in jOOQ’s case, there is mostly no match in this particular case.

I ran this benchmark once on Java 8:

$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

And on Java 9:

$ java -version
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

As Tagir Valeev was kind enough to remind me that this issue was supposed to be fixed in Java 9:

The results are:

Java 8

testStringReplaceLongStringNoMatch               thrpt   21    4809343.940 ▒  66443.628  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   25063493.793 ▒ 660657.256  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    1406989.855 ▒  43051.008  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    6961669.111 ▒ 141504.827  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    1103323.491 ▒  17047.449  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3899108.777 ▒  41854.636  ops/s

testStringReplaceShortStringNoMatch              thrpt   21    5936992.874 ▒  68115.030  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  171660973.829 ▒ 377711.864  ops/s

testStringReplaceShortStringOneMatch             thrpt   21    3267435.957 ▒ 240198.763  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21    9943846.428 ▒ 270821.641  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    2313713.015 ▒  28806.738  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5447065.933 ▒ 139525.472  ops/s

As can be seen, the difference is “catastrophic”. Apache Commons Lang’s StringUtils drastically outpeforms the JDK’s String.replace() in every discipline, especially when no match is found in a short string! That’s because the library optimises for this particular case:

...
int end = searchText.indexOf(searchString, start);
if (end == INDEX_NOT_FOUND) {
    return text;
}

Java 9

Things look a bit differently for Java 9:

testStringReplaceLongStringNoMatch               thrpt   21   55528132.674 ▒  479721.812  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   55767541.806 ▒  754862.755  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    4806322.839 ▒  217538.714  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    8366539.616 ▒  142757.888  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    2685134.029 ▒   78108.171  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3923819.576 ▒  351103.020  ops/s

testStringReplaceShortStringNoMatch              thrpt   21  122398496.629 ▒ 1350086.256  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  121139633.453 ▒ 2756892.669  ops/s

testStringReplaceShortStringOneMatch             thrpt   21   18070522.151 ▒  498663.835  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21   11367395.622 ▒  153377.552  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    7548407.681 ▒  168950.209  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5045065.948 ▒  175251.545  ops/s

Java 9’s implementation is now similar to that of Apache Commons, with the same optimisation for non-matches:

public String replace(CharSequence target, CharSequence replacement) {
    String tgtStr = target.toString();
    String replStr = replacement.toString();
    int j = indexOf(tgtStr);
    if (j < 0) {
        return this;
    }
    ...

It is still quite slower for matches in long strings, but faster for matches in short strings. The tradeoff for jOOQ will be to still prefer Apache Commons because:

  • Most people are still on Java 8 or less, currently
  • Most replacements won’t match and both implementations fare equally well for that in Java 9, but Apache Commons is much faster for this category in Java 8
  • If there’s a match and thus a replacement, the speed depends on the string length, where the faster implementation is currently undecided

Conclusion

This micro optimisation stuff matters in jOOQ because jOOQ is a library that does a lot of SQL string manipulation. Every allocation and every CPU cycle that is wasted when manipulating SQL strings slows down the library, and thus impacts all of its users. In a situation like this, it is definitely worth considering not using these useful JDK String methods, and opting for the much faster Apache Commons implementations instead.

Things have improved a lot in Java 9, in case of which this can mostly be ignored. But if you still need to support Java 8 (we still support Java 6 in our commercial distributions!), then this has to be considered.

Are Java 8 Streams Truly Lazy? Not Completely!

In a recent article, I’ve shown that programmers should always apply a filter first, map later strategy with streams. The example I made there was this one:

hugeCollection
    .stream()
    .limit(2)
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList());

In this case, the limit() operation implements the filtering, which should take place before the mapping.

Several readers correctly mentioned that in this case, it doesn’t matter what order we’re putting the limit() and map() operations, because most operations are evaluated lazily in the Java 8 Stream API.

Or rather: The collect() terminal operation pulls values from the stream lazily, and as the limit(5) operation reaches the end, it will no longer produce new values, regardless whether map() came before or after. This can be proven easily as follows:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -> {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -> i + 1)
              .limit(5)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});
    }
}

The output of the above is:

Map: 1
Map: 2
Map: 3
Map: 4
Map: 5


Map: 1
Map: 2
Map: 3
Map: 4
Map: 5

But this isn’t always the case!

This optimisation is an implementation detail, and in general, it is not unwise to really apply the filter first, map later rule thoroughly, not relying on such an optimisation. In particular, the Java 8 implementation of flatMap() is not lazy. Consider the following logic, where we put a flatMap() operation in the middle of the stream:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -> {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .limit(5)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});
    }
}

The result is now:

Map: 1
Map: 1
Map: 1
Map: 1
Map: 2
Map: 2
Map: 2
Map: 2


Map: 1
Map: 1
Map: 1
Map: 1
Map: 2

So, the first Stream pipeline will map all the 8 flatmapped values prior to applying the limit, whereas the second Stream pipeline really limits the stream to 5 elements first, and then maps only those.

The reason for this is in the flatMap() implementation:

// In ReferencePipeline.flatMap()
try (Stream<? extends R> result = mapper.apply(u)) {
    if (result != null)
        result.sequential().forEach(downstream);
}

As you can see, the result of the flatMap() operation is consumed eagerly with a terminal forEach() operation, which will always produce all the four values in our case and send them to the next operation. So, flatMap() isn’t lazy, and thus the next operation after it will get all of its results. This is true for Java 8. Future Java versions might improve this, of course.

We better filter them first. And map later.

A Nice API Design Gem: Strategy Pattern With Lambdas

With Java 8 lambdas being available to us as a programming tool, there is a “new” and elegant way of constructing objects. I put “new” in quotes, because it’s not new. It used to be called the strategy pattern, but as I’ve written on this blog before, many GoF patterns will no longer be implemented in their classic OO way, now that we have lambdas.

A simple example from jOOQ

jOOQ knows a simple type called Converter. It’s a simple SPI, which allows users to implement custom data types and inject data type conversion into jOOQ’s type system. The interface looks like this:

public interface Converter<T, U> {
    U from(T databaseObject);
    T to(U userObject);
    Class<T> fromType();
    Class<U> toType();
}

Users will have to implement 4 methods:

  • Conversion from a database (JDBC) type T to the user type U
  • Conversion from the user type U to the database (JDBC) type T
  • Two methods providing a Class reference, to work around generic type erasure

Now, an implementation that converts hex strings (database) to integers (user type):

public class HexConverter implements Converter<String, Integer> {

    @Override
    public Integer from(String hexString) {
        return hexString == null 
            ? null 
            : Integer.parseInt(hexString, 16);
    }

    @Override
    public String to(Integer number) {
        return number == null 
            ? null 
            : Integer.toHexString(number);
    }

    @Override
    public Class<String> fromType() {
        return String.class;
    }

    @Override
    public Class<Integer> toType() {
        return Integer.class;
    }
}

That wasn’t difficult to write, but it’s quite boring to write this much boilerplate:

  • Why do we need to give this class a name?
  • Why do we need to override methods?
  • Why do we need to handle nulls ourselves?

Now, we could write some object oriented libraries, e.g. abstract base classes that take care at least of the fromType() and toType() methods, but much better: The API designer can provide a “constructor API”, which allows users to provide “strategies”, which is just a fancy name for “function”. One function (i.e. lambda) for each of the four methods. For example:

public interface Converter<T, U> {
    ...

    static <T, U> Converter<T, U> of(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return new Converter<T, U>() { ... boring code here ... }
    }

    static <T, U> Converter<T, U> ofNullable(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return of(
            fromType,
            toType,

            // Boring null handling code here
            t -> t == null ? null : from.apply(t),
            u -> u == null ? null : to.apply(u)
        );
    }
}

From now on, we can easily write converters in a functional way. For example, our HexConverter would become:

Converter<String, Integer> converter =
Converter.ofNullable(
    String.class,
    Integer.class,
    s -> Integer.parseInt(s, 16),
    Integer::toHexString
);

Wow! This is really nice, isn’t it? This is the pure essence of what it means to write a Converter. No more overriding, null handling, type juggling, just the bidirectional conversion logic.

Other examples

A more famous example is the JDK 8 Collector.of() constructor, without which it would be much more tedious to implement a collector. For example, if we want to find the second largest element in a stream… easy!

for (int i : Stream.of(1, 8, 3, 5, 6, 2, 4, 7)
                   .collect(Collector.of(
    () -> new int[] { Integer.MIN_VALUE, Integer.MIN_VALUE },
    (a, t) -> {
        if (a[0] < t) {
            a[1] = a[0];
            a[0] = t;
        }
        else if (a[1] < t)
            a[1] = t;
    },
    (a1, a2) -> {
        throw new UnsupportedOperationException(
            "Say no to parallel streams");
    }
)))
    System.out.println(i);

Run this, and you get:

8
7

Bonus exercise: Make the collector parallel capable by implementing the combiner correctly. In a sequential-only scenario, we don’t need it (until we do, of course…).

Conclusion

The concrete examples are nice examples of API usage, but the key message is this:

If you have an interface of the form:

interface MyInterface {
    void myMethod1();
    String myMethod2();
    void myMethod3(String value);
    String myMethod4(String value);
}

Then, just add a convenience constructor to the interface, accepting Java 8 functional interfaces like this:

// You write this boring stuff
interface MyInterface {
    static MyInterface of(
        Runnable function1,
        Supplier<String> function2,
        Consumer<String> function3,
        Function<String, String> function4
    ) {
        return new MyInterface() {
            @Override
            public void myMethod1() {
                function1.run();
            }

            @Override
            public String myMethod2() {
                return function2.get();
            }

            @Override
            public void myMethod3(String value) {
                function3.accept(value);
            }

            @Override
            public String myMethod4(String value) {
                return function4.apply(value);
            }
        }
    }
}

As an API designer, you write this boilerplate only once. And your users can then easily write things like these:

// Your users write this awesome stuff
MyInterface.of(
    () -> { ... },
    () -> "hello",
    v -> { ... },
    v -> "world"
);

Easy! And your users will love you forever for this.

Should I Implement the Arcane Iterator.remove() Method? Yes You (Probably) Should

An interesting question was asked on reddit’s /r/java recently:

Should Iterators be used to modify a custom Collection?

Paraphrasing the question: The author wondered whether a custom java.util.Iterator that is returned from a mutable Collection.iterator() method should implement the weird Iterator.remove() method.

A totally understandable question.

What does Iterator.remove() do?

Few people ever use this method at all. For instance, if you want to implement a generic way to remove null values from an arbitrary Collection, this would be the most generic approach:

Collection<Integer> collection =
Stream.of(1, 2, null, 3, 4, null, 5, 6)
      .collect(Collectors.toCollection(ArrayList::new));

System.out.println(collection);

Iterator<Integer> it = collection.iterator();
while (it.hasNext())
    if (it.next() == null)
        it.remove();

System.out.println(collection);

The above program will print:

[1, 2, null, 3, 4, null, 5, 6]
[1, 2, 3, 4, 5, 6]

Somehow, this API usage does feel dirty. An Iterator seems to be useful to … well … iterate its backing collection. It’s really weird that it also allows for modifying it. It’s even weirder that it only offers removal. E.g. we cannot add a new element before or after the current element of iteration, or replace it.

Luckily, Java 8 provides us with a much better method on the Collection API directly, namely Collection.removeIf(Predicate).

The above iteration code can be re-written as such:

collection.removeIf(Objects::isNull);

OK, now should I implement remove() on my own iterators?

Yes, you should – if your custom collection is mutable. For a very simple reason. Check out the default implementation of Collection.removeIf():

default boolean removeIf(Predicate<? super E> filter) {
    Objects.requireNonNull(filter);
    boolean removed = false;
    final Iterator<E> each = iterator();
    while (each.hasNext()) {
        if (filter.test(each.next())) {
            each.remove();
            removed = true;
        }
    }
    return removed;
}

As I said. The most generic way to remove specific elements from a Collection is precisely to go by its Iterator.remove() method and that’s precisely what the JDK does. Subtypes like ArrayList may of course override this implementation because there’s a more performant alternative, but in general, if you write your own custom, modifiable collection, you should implement this method.

And enjoy the ride into Java’s peculiar, historic caveats for which we all love the language.

A Hidden jOOQ Gem: Foreach Loop Over ResultQuery

A recent question on Stack Overflow about jOOQ caught my attention. The question essentially asked:

Why do both of these loops work?

// With fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetch()) { // fetch() here

    doThingsWithRecord(rec);
}

// Without fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)) { // No fetch() here

    doThingsWithRecord(rec);
}

And indeed, just like in PL/SQL, you can use any jOOQ ResultQuery as a Java 5 Iterable, because that’s what it is. An Iterable<R> where R extends Record.

The semantics is simple. When Iterable.iterator() is invoked, the query is executed and the Result.iterator() is returned. So, the result is materialised in the client memory just like if I called fetch(). Unsurprisingly, this is the implementation of AbstractResultQuery.iterator():

@Override
public final Iterator<R> iterator() {
    return fetch().iterator();
}

No magic. But it’s great that this works like PL/SQL:

FOR rec IN (SELECT * FROM my_table ORDER BY my_table.column)
LOOP
  doThingsWithRecord(rec);
END LOOP;

Note, unfortunately, there’s no easy way to manage resources through Iterable, i.e. there’s no AutoCloseableIterable returning an AutoCloseableIterator, which could be used in an auto-closing try-with-resources style loop. This is why the entire result set needs to be fetched at the beginning of the loop. For lazy fetching, you can still use ResultQuery.fetchLazy()

try (Cursor<MyTableRecord> cursor = DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetchLazy()) {

    for (MyTableRecord rec : cursor)
        doThingsWithRecord(rec);
}

Happy coding!

Using jOOλ to Combine Several Java 8 Collectors into One

With Java 8 being mainstream now, people start using Streams for everything, even in cases where that’s a bit exaggerated (a.k.a. completely nuts, if you were expecting a hyperbole here). For instance, take mykong’s article here, showing how to collect a Map’s entry set stream into a list of keys and a list of values:

http://www.mkyong.com/java8/java-8-convert-map-to-list

The code posted on mykong.com does it in two steps:

package com.mkyong.example;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class ConvertMapToList {
    public static void main(String[] args) {
        Map<Integer, String> map = new HashMap<>();
        map.put(10, "apple");
        map.put(20, "orange");
        map.put(30, "banana");
        map.put(40, "watermelon");
        map.put(50, "dragonfruit");

        System.out.println("\n1. Export Map Key to List...");

        List<Integer> result = map.entrySet().stream()
                .map(x -> x.getKey())
                .collect(Collectors.toList());

        result.forEach(System.out::println);

        System.out.println("\n2. Export Map Value to List...");

        List<String> result2 = map.entrySet().stream()
                .map(x -> x.getValue())
                .collect(Collectors.toList());

        result2.forEach(System.out::println);
    }
}

This is probably not what you should do in your own code. First off, if you’re OK with iterating the map twice, the simplest way to collect a map’s keys and values would be this:

List<Integer> result1 = new ArrayList<>(map.keySet());
List<String> result2 = new ArrayList<>(map.values());

There’s absolutely no need to resort to Java 8 streams for this particular example. The above is about as simple and speedy as it gets.

Don’t shoehorn Java 8 Streams into every problem

But if you really want to use streams, then I would personally prefer a solution where you do it in one go. There’s no need to iterate the Map twice in this particular case. For instance, you could do it by using jOOλ’s Tuple.collectors() method, a method that combines two collectors into a new collector that returns a tuple of the individual collections.

Code speaks for itself more clearly than the above description. Mykong.com’s code could be replaced by this:

Tuple2<List<Integer>, List<String>> result = 
map.entrySet()
    .stream()
    .collect(Tuple.collectors(
        Collectors.mapping(Entry::getKey, Collectors.toList()),
        Collectors.mapping(Entry::getValue, Collectors.toList())
    ));

The only jOOλ code put in place here is the call to Tuple.collectors(), which combines the standard JDK collectors that apply mapping on the Map entries before collecting keys and values into lists.

When printing the above result, you’ll get:

([50, 20, 40, 10, 30], [dragonfruit, orange, watermelon, apple, banana])

i.e. a tuple containing the two resulting lists.

Even simpler, don’t use the Java 8 Stream API, use jOOλ’s Seq (for sequential stream) and write this shorty instead:

Tuple2<List<Integer>, List<String>> result = 
Seq.seq(map)
   .collect(
        Collectors.mapping(Tuple2::v1, Collectors.toList()),
        Collectors.mapping(Tuple2::v2, Collectors.toList())
   );

Where Collectable.collect(Collector, Collector) provides awesome syntax sugar over the previous example

Convinced? Get jOOλ here: https://github.com/jOOQ/jOOL

The Java JIT Compiler is Darn Good at Optimization

“Challenge accepted” said Tagir Valeev when I recently asked the readers of the jOOQ blog to show if the Java JIT (Just-In-Time compilation) can optimise away a for loop.

Tagir is the author of StreamEx, very useful Java 8 Stream extension library that adds additional parallelism features on top of standard streams. He’s a speaker at conferences, and has contributed a dozen of patches into OpenJDK Stream API (including bug fixes, performance optimizations and new features). He’s interested in static code analysis and works on a new Java bytecode analyzer.

I’m very happy to publish Tagir’s guest post here on the jOOQ blog.

tagir-valeev

The Java JIT Compiler

In recent article Lukas wondered whether JIT could optimize a code like this to remove an unnecessary iteration:

// ... than this, where we "know" the list
// only contains one value
for (Object object : Collections.singletonList("abc")) {
    doSomethingWith(object);
}

Here’s my answer: JIT can do even better. Let’s consider this simple method which calculates total length of all the strings of supplied list:

static int testIterator(List<String> list) {
    int sum = 0;
    for (String s : list) {
        sum += s.length();
    }
    return sum;
}

As you might know this code is equivalent to the following:

static int testIterator(List<String> list) {
    int sum = 0;
    Iterator<String> it = list.iterator();
    while(it.hasNext()) {
        String s = it.next();
        sum += s.length();
    }
    return sum;
}

Of course in general case the list could be anything, so when creating an iterator, calling hasNext and next methods JIT must emit honest virtual calls which is not very fast. However what will happen if you always supply the singletonList here? Let’s create some simple test:

public class Test {
    static int res = 0;

    public static void main(String[] args) {
        for (int i = 0; i < 100000; i++) {
            res += testIterator(Collections.singletonList("x"));
        }
        System.out.println(res);
    }
}

We are calling our testIterator in a loop so it’s called enough times to be JIT-compiled with C2 JIT compiler. As you might know, in HotSpot JVM there are two JIT-compilers, namely C1 (client) compiler and C2 (server) compiler. In 64-bit Java 8 they work together. First method is compiled with C1 and special instructions are added to gather some statistics (which is called profiling). Among it there is type statistics. JVM will carefully check which exact types our list variable has. And in our case it will discover that in 100% of cases it’s singleton list and nothing else. When method is called quite often, it gets recompiled by better C2 compiler which can use this information. Thus when C2 compiles it can assume that in future singleton list will also appear quite often.

You may ask JIT compiler to output the assembly generated for methods. To do this you should install hsdis on your system. After that you may use convenient tools like JITWatch or write a JMH benchmark and use -perfasm option. Here we will not use third-party tools and simply launch the JVM with the following command line options:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintAssembly Test >output.txt

This will generate quite huge output which may scare the children. The assembly generated by C2 compiler for ourtestIterator method looks like this (on Intel x64 platform):

  # {method} {0x0000000055120518} 
  # 'testIterator' '(Ljava/util/List;)I' in 'Test'
  # parm0:    rdx:rdx   = 'java/util/List'
  #           [sp+0x20]  (sp of caller)
  0x00000000028e7560: mov    %eax,-0x6000(%rsp)
  0x00000000028e7567: push   %rbp

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  0x00000000028e7568: sub    $0x10,%rsp         
                                                
  ; implicit exception: dispatches to 0x00000000028e75bd
  0x00000000028e756c: mov    0x8(%rdx),%r10d    

  ;   {metadata('java/util/Collections$SingletonList')}
  0x00000000028e7570: cmp    $0x14d66a20,%r10d  

  ;*synchronization entry
  ; - java.util.Collections::singletonIterator@-1
  ; - java.util.Collections$SingletonList::iterator@4
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7577: jne    0x00000000028e75a0 

  ;*getfield element
  ; - java.util.Collections$SingletonList::iterator@1
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7579: mov    0x10(%rdx),%ebp    

  ; implicit exception: dispatches to 0x00000000028e75c9
  0x00000000028e757c: mov    0x8(%rbp),%r11d    

  ;   {metadata('java/lang/String')}
  0x00000000028e7580: cmp    $0x14d216d0,%r11d  
  0x00000000028e7587: jne    0x00000000028e75b1

  ;*checkcast
  ; - Test::testIterator@24 (line 16)
  0x00000000028e7589: mov    %rbp,%r10          
                                                
  ;*getfield value
  ; - java.lang.String::length@1
  ; - Test::testIterator@30 (line 17)
  0x00000000028e758c: mov    0xc(%r10),%r10d    

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  ; implicit exception: dispatches to 0x00000000028e75d5
  0x00000000028e7590: mov    0xc(%r10),%eax     
                                                
                                               
  0x00000000028e7594: add    $0x10,%rsp
  0x00000000028e7598: pop    %rbp

  # 0x0000000000130000
  0x00000000028e7599: test   %eax,-0x27b759f(%rip)        
         
  ;   {poll_return}                                       
  0x00000000028e759f: retq   
  ... // slow paths follow

What you can notice is that it’s surpisingly short. I’ll took the liberty to annotate what happens here:

// Standard stack frame: every method has such prolog
mov    %eax,-0x6000(%rsp)
push   %rbp
sub    $0x10,%rsp         
// Load class identificator from list argument (which is stored in rdx 
// register) like list.getClass() This also does implicit null-check: if 
// null is supplied, CPU will trigger a hardware exception. The exception
// will be caught by JVM and translated into NullPointerException
mov    0x8(%rdx),%r10d
// Compare list.getClass() with class ID of Collections$SingletonList class 
// which is constant and known to JIT
cmp    $0x14d66a20,%r10d
// If list is not singleton list, jump out to the slow path
jne    0x00000000028e75a0
// Read Collections$SingletonList.element private field into rbp register
mov    0x10(%rdx),%ebp
// Read its class identificator and check whether it's actually String
mov    0x8(%rbp),%r11d
cmp    $0x14d216d0,%r11d
// Jump out to the exceptional path if not (this will create and throw
// ClassCastException)
jne    0x00000000028e75b1
// Read private field String.value into r10 which is char[] array containing
//  String content
mov    %rbp,%r10
mov    0xc(%r10),%r10d
// Read the array length field into eax register (by default method returns
// its value via eax/rax)
mov    0xc(%r10),%eax
// Standard method epilog
add    $0x10,%rsp
pop    %rbp
// Safe-point check (so JVM can take the control if necessary, for example,
// to perform garbage collection)
test   %eax,-0x27b759f(%rip)
// Return
retq   

If it’s still hard to understand, let’s rewrite it via pseudo-code:

if (list.class != Collections$SingletonList) {
  goto SLOW_PATH;
}
str = ((Collections$SingletonList)list).element;
if (str.class != String) {
  goto EXCEPTIONAL_PATH;
}
return ((String)str).value.length;

So for the hot path we have no iterator allocated and no loop, just several dereferences and two quick checks (which are always false, so CPU branch predictor will predict them nicely). Iterator object is evaporated completely, though originally it has additional bookkeeping like tracking whether it was already called and throwing NoSuchElementException in this case. JIT-compiler statically proved that these parts of code are unnecessary and removed them. The sum variable is also evaporated. Nevertheless the method is correct: if it happens in future that it will be called with something different from singleton list, it will handle this situation on the SLOW_PATH (which is of course much longer). Other cases like list == null or list element is not String are also handled.

What will occur if your program pattern changes? Imagine that at some point you are no longer using singleton lists and pass different list implementations here. When JIT discovers that SLOW_PATH is hit too often, it will recompile the method to remove special handling of singleton list. This is different from pre-compiled applications: JIT can change your code following the behavioral changes of your program.