Tag Archive | Java 8

Really Too Bad that Java 8 Doesn’t Have Iterable.stream()


This is one of the more interesting recent Stack Overflow questions:

Why does Iterable not provide stream() and parallelStream() methods?

At first, it might seem intuitive to make it straight-forward to convert an Iterable into a Stream, because the two are really more or less the same thing for 90% of all use-cases.

Granted, the expert group had a strong focus on making the Stream API parallel capable, but anyone who works with Java every day will notice immediately, that Stream is most useful in its sequential form. And an Iterable is just that. A sequential stream with no guarantees with respect to parallelisation. So, it would only be intuitive if we could simply write:

iterable.stream();

In fact, subtypes of Iterable do have such methods, e.g.

collection.stream();

Brian Goetz himself gave an answer to the above Stack Overflow question. The reasons for this omittance are rooted in the fact that some Iterables might prefer to return an IntStream instead of a Stream. This really seems to be a very remote reason for a design decision, but as always, omittance today doesn’t mean omittance forever. On the other hand, if they had introduced Iterable.stream() today, and it turned out to be a mistake, they couldn’t have removed it again.

Well, primitive types in Java are a pain and they did all sorts of bad things to generics in the first place, and now to Stream as well, as we have to write the following, in order to turn an Iterable into a Stream:

Stream s = StreamSupport.stream(iterable.spliterator(), false);

Brian Goetz argues that this is “easy”, but I would disagree. As an API consumer, I experience a lot of friction in productivity because of:

  • Having to remember this otherwise useless StreamSupport type. This method could very well have been put into the Stream interface, because we already have Stream construction methods in there, such as Stream.of().
  • Having to remember the subtle difference between Iterator and Spliterator in the context of what I believe has nothing to do with parallelisation. It may well be that Spliterators will become popular eventually, though, so this doubt is for the magic 8 ball to address.
  • In fact, I have to repeat the information that there is nothing to be parallelised via the boolean argument false

Parallelisation really has such a big weight in this new API, even if it will cover only around 5%-10% of all functional collection manipulation operations. While sequential processing was not the main design goal of the JDK 8 APIs, it is really the main benefit for all of us, and the friction around APIs related to sequential processing should be as low as possible.

The method above should have just been called

Stream s = Stream.stream(iterable);

It could be implemented like this:

public static<T> Stream<T> stream(Iterable<T> i) {
    return StreamSupport.stream(i.spliterator(), false);
}

Obviously with convenience overloads that allow for the additional specialisations, like parallelisation, or passing a Spliterator

But again, if Iterable had its own stream() default method, an incredible number of APIs would be so much better integrated with Java 8 out of the box, without even supporting Java 8 explicitly!

Take jOOQ for instance. jOOQ still supports Java 6, so a direct dependency is not possible. However, jOOQ’s ResultQuery type is an Iterable. This allows you to use such queries directly inline in foreach loops, as if you were writing PL/SQL:

PL/SQL

FOR book IN (
  SELECT * FROM books ORDER BY books.title
)
LOOP
  -- Do things with book
END LOOP;

Java

for (BookRecord book : 
  ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
) {
  // Do things with book
}

Now imagine the same thing in Java 8:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .stream()
   .map / reduce / findAny, etc...

Unfortunately, the above is currently not possible. You could, of course, eagerly fetch all the results into a jOOQ Result, which extends List:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .fetch()
   .stream()
   .map / reduce / findAny, etc...

But it’s one more method to call (every time), and the actual stream semantics is broken, because the fetch is done eagerly.

Complaining on a high level

This is, of course, complaining on a high level, but it would really be great if a future version of Java, e.g. Java 9, would add this missing method to the Iterable API. Again, 99% of all use-cases will want the Stream type to be returned, not the IntStream type. And if they do want that for whatever obscure reason (much more obscure than many evil things from old legacy Java APIs, looking at you Calendar), then why shouldn’t they just declare an intStream() method. After all, if someone is crazy enough to write Iterable<Integer> when they’re really operating on int primitive types, they’ll probably accept a little workaround.

Let’s Stream a Map in Java 8 with jOOλ


I wanted to find an easy way to stream a Map in Java 8. Guess what? There isn’t!

What I would’ve expected for convenience is the following method:

public interface Map<K, V> {

    default Stream<Entry<K, V>> stream() {
        return entrySet().stream();
    }    
}

But there’s no such method. There are probably a variety of reasons why such a method shouldn’t exist, e.g.:

  • There’s no “clear” preference for entrySet() being chosen over keySet() or values(), as a stream source
  • Map isn’t really a collection. It’s not even an Iterable
  • That wasn’t the design goal
  • The EG didn’t have enough time

Well, there is a very compelling reason for Map to have been retrofitted to provide both an entrySet().stream() and to finally implement Iterable<Entry<K, V>>. And that reason is the fact that we now have Map.forEach():

default void forEach(
        BiConsumer<? super K, ? super V> action) {
    Objects.requireNonNull(action);
    for (Map.Entry<K, V> entry : entrySet()) {
        K k;
        V v;
        try {
            k = entry.getKey();
            v = entry.getValue();
        } catch(IllegalStateException ise) {
            // this usually means the entry is no longer in the map.
            throw new ConcurrentModificationException(ise);
        }
        action.accept(k, v);
    }
}

forEach() in this case accepts a BiConsumer that really consumes entries in the map. If you search through JDK source code, there are really very few references to the BiConsumer type outside of Map.forEach() and perhaps a couple of CompletableFuture methods and a couple of streams collection methods.

So, one could almost assume that BiConsumer was strongly driven by the needs of this forEach() method, which would be a strong case for making Map.Entry a more important type throughout the collections API (we would have preferred the type Tuple2, of course).

Let’s continue this line of thought. There is also Iterable.forEach():

public interface Iterable<T> {
    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }
}

Both Map.forEach() and Iterable.forEach() intuitively iterate the “entries” of their respective collection model, although there is a subtle difference:

  • Iterable.forEach() expects a Consumer taking a single value
  • Map.forEach() expects a BiConsumer taking two values: the key and the value (NOT a Map.Entry!)

Think about it this way:

This makes the two methods incompatible in a “duck typing sense”, which makes the two types even more different

Bummer!

Improving Map with jOOλ

We find that quirky and counter-intuitive. forEach() is really not the only use-case of Map traversal and transformation. We’d love to have a Stream<Entry<K, V>>, or even better, a Stream<Tuple2<T1, T2>>. So we implemented that in jOOλ, a library which we’ve developed for our integration tests at jOOQ. With jOOλ, you can now wrap a Map in a Seq type (“Seq” for sequential stream, a stream with many more functional features):

Map<Integer, String> map = new LinkedHashMap<>();
map.put(1, "a");
map.put(2, "b");
map.put(3, "c");

assertEquals(
  Arrays.asList(
    tuple(1, "a"), 
    tuple(2, "b"), 
    tuple(3, "c")
  ),

  Seq.seq(map).toList()
);

What you can do with it? How about creating a new Map, swapping keys and values in one go:

System.out.println(
  Seq.seq(map)
     .map(Tuple2::swap)
     .toMap(Tuple2::v1, Tuple2::v2)
);

System.out.println(
  Seq.seq(map)
     .toMap(Tuple2::v2, Tuple2::v1)
);

Both of the above will yield:

{a=1, b=2, c=3}

Just for the record, here’s how to swap keys and values with standard JDK API:

System.out.println(
  map.entrySet()
     .stream()
     .collect(Collectors.toMap(
         Map.Entry::getValue, 
         Map.Entry::getKey
     ))
);

It can be done, but the every day verbosity of standard Java API makes things a bit hard to read / write

Don’t Miss out on Writing Java 8 SQL One-Liners with jOOλ or jOOQ


More and more people are catching up with the latest update to our platform by adopting functional programming also for their businesses.

At Data Geekery, we’re using Java 8 for our jOOQ integration tests, as using the new Streams API with lambda expressions makes generating ad-hoc test data so much easier.

However, we don’t feel that the JDK offers as much as it could, which is why we have also implemented and open-sourced jOOλ, a small utility library that patches those short-comings.

Note, we don’t aim to replace more sophisticated libraries like functionaljava. jOOλ is really just patching short-comings.

Putting lambdas to work with jOOλ or jOOQ

I’ve recently encountered this Stack Overflow question, which asked for streaming a result set with all columns into a single list. For example:

Input

+----+------------+------------+
| ID | FIRST_NAME | LAST_NAME  |
+----+------------+------------+
|  1 | Joslyn     | Vanderford |
|  2 | Rudolf     | Hux        |
+----+------------+------------+

Output

1
Joslyn
Vanderford
2
Rudolf
Hux

This is a typical school-book example for using functional programming rather than an iterative solution:

Iterative solution

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list = new ArrayList<>();

while (rs.next()) {
    for (int i = 0; i < meta.getColumnCount(); i++) {
        list.add(rs.getObject(i + 1));
    }
}

Truth is, the iterative solution isn’t all that bad, but let’s learn how this could be done with functional programming.

Using jOOλ

We’re using jOOλ for this example for a couple of reasons:

  • JDBC didn’t really adopt the new features. There is no simple ResultSet to Stream conversion, even if there should be.
  • Unfortunately, the new functional interfaces do not allow for throwing checked exceptions. The try .. catch blocks inside lambdas don’t exactly look nice
  • Interestingly, there is no way of generating a finite stream without also implementing an Iterator or Spliterator

So, here’s the plain code:

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list =
Seq.generate()
   .limitWhile(Unchecked.predicate(v -> rs.next()))
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))
   .toList()

So far, this looks about as verbose (or a bit more) than the iterative solution. As you can see, a couple of jOOλ extensions were needed here:

// This generate is a shortcut to generate an
// infinite stream with unspecified content
Seq.generate()

// This predicate-based stream termination
// unfortunately doesn't exist in the JDK
// Besides, the checked exception is wrapped in a
// RuntimeException by calling Unchecked.wrapper(...)
   .limitWhile(Unchecked.predicate(v -> rs.next()))

// Standard JDK flatmapping, producing a "nested"
// stream of column values for the "outer" stream
// of database rows
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))

// This is another convenience method that is more
// verbose to write with standard JDK code
   .toList()

Using jOOQ

jOOQ has even more convenience API to operate on result records of your SQL statement. Consider the following piece of logic:

ResultSet rs = ...;

List<Object> list =
DSL.using(connection)
   .fetch(rs)
   .stream()
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .collect(Collectors.toList());

Note that the above example is using standard JDK API, without resorting to jOOλ for convenience. If you want to use jOOλ with jOOQ, you could even write:

ResultSet rs = ...;

List<Object> list = 
Seq.seq(DSL.using(connection).fetch(rs))
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .toList();

Easy? I would say so! Let’s remember that this example:

  • Fetches a JDBC ResultSet into a Java Collection
  • Transforms each record in the result set into an array of column values
  • Transforms each array into a stream
  • Flattens that stream into a stream of streams
  • Collects all values into a single list

Whew!

Conclusion

We’re heading towards exciting times! It will take a while until all Java 8 idioms and functional thinking will feel “natural” to Java developers, also in the enterprise.

The idea of having a sort of data source that can be configured with pipelined data transformations expressed as lambda expressions to be evaluated lazily is very compelling, though. jOOQ is an API that encapsulates SQL data sources in a very fluent and intuitive way, but it doesn’t stop there. jOOQ produces regular JDK collections of records, which can be transformed out-of-the-box via the new streams API.

We believe that this will drastically change the way the Java ecosystem will think about data transformation. Stay tuned for more examples on this blog!

How Nashorn Impacts API Evolution on a New Level


Following our previous article about how to use jOOQ with Java 8 and Nashorn, one of our users discovered a flaw in using the jOOQ API as discussed here on the user group. In essence, the flaw can be summarised like so:

Java code

package org.jooq.nashorn.test;

public class API {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

JavaScript code

var API = Java.type("org.jooq.nashorn.test.API");
API.test(1); // This will fail with RuntimeException

After some investigation and the kind help of Attila Szegedi, as well as Jim Laskey (both Nashorn developers from Oracle), it became clear that Nashorn disambiguates overloaded methods and varargs differently than what an old Java developer might expect. Quoting Attila:

Nashorn’s overload method resolution mimics Java Language Specification (JLS) as much as possible, but allows for JavaScript-specific conversions too. JLS says that when selecting a method to invoke for an overloaded name, variable arity methods can be considered for invocation only when there is no applicable fixed arity method.

I agree that variable arity methods can be considered only when there is no applicable fixed arity method. But the whole notion of “applicable” itself is completely changed as type promotion (or coercion / conversion) using ToString, ToNumber, ToBoolean is preferred over what intuitively appear to be “exact” matches with varargs methods!

Let this sink in!

Given that we now know how Nashorn resolves overloading, we can see that any of the following are valid workarounds:

Explicitly calling the test(Integer[]) method using an array argument:

This is the simplest approach, where you ignore the fact that varargs exist and simply create an explicit array

var API = Java.type("org.jooq.nashorn.test.API");
API.test([1]);

Explicitly calling the test(Integer[]) method by saying so:

This is certainly the safest approach, as you’re removing all ambiguity from the method call

var API = Java.type("org.jooq.nashorn.test.API");
API["test(Integer[])"](1);

Removing the overload:

public class AlternativeAPI1 {
    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

Removing the varargs:

public class AlternativeAPI3 {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        System.out.println("OK");
    }
}

Providing an exact option:

public class AlternativeAPI4 {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        test(new Integer[] { args });
    }

    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

Replacing String by CharSequence (or any other “similar type”):

Now, this is interesting:

public class AlternativeAPI5 {
    public static void test(CharSequence string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        System.out.println("OK");
    }
}

Specifically, the distinction between CharSequence and String types appears to be very random from a Java perspective in my opinion.

Agreed, implementing overloaded method resolution in a dynamically typed language is very hard, if even possible. Any solution is a compromise that will introduce flaws at some ends. Or as Attila put it:

As you can see, no matter what we do, something else would suffer; overloaded method selection is in a tight spot between Java and JS type systems and very sensitive to even small changes in the logic.

True! But not only is overload method selection very sensitive to even small changes. Using Nashorn with Java interoperability is, too! As an API designer, over the years, I have grown used to semantic versioning, and the many subtle rules to follow when keeping an API source compatible, behavior compatible – and if ever possible – to a large degree also binary compatible.

Forget about that when your clients are using Nashorn. They’re on their own. A newly introduced overload in your Java API can break your Nashorn clients quite badly. But then again, that’s JavaScript, the language that tells you at runtime that:

['10','10','10','10'].map(parseInt)

… yields

[10, NaN, 2, 3]

… and where

++[[]][+[]]+[+[]] === "10"

yields true! (sources here)

For more information about JavaScript, please visit this introductory tutorial

When the Java 8 Streams API is not Enough


Java 8 was – as always – a release of compromises and backwards-compatibility. A release where the JSR-335 expert group might not have agreed upon scope or feasibility of certain features with some of the audience. See some concrete explanations by Brian Goetz about why …

But today we’re going to focus on the Streams API’s “short-comings”, or as Brian Goetz would probably put it: things out of scope given the design goals.

Parallel Streams?

Parallel computing is hard, and it used to be a pain. People didn’t exactly love the new (now old) Fork / Join API, when it was first shipped with Java 7. Conversely, and clearly, the conciseness of calling Stream.parallel() is unbeatable.

But many people don’t actually need parallel computing (not to be confused with multi-threading!). In 95% of all cases, people would have probably preferred a more powerful Streams API, or perhaps a generally more powerful Collections API with lots of awesome methods on various Iterable subtypes.

Changing Iterable is dangerous, though. Even a no-brainer as transforming an Iterable into a Stream via a potential Iterable.stream() method seems to risk opening pandora’s box!.

Sequential Streams!

So if the JDK doesn’t ship it, we create it ourselves!

Streams are quite awesome per se. They’re potentially infinite, and that’s a cool feature. Mostly – and especially with functional programming – the size of a collection doesn’t really matter that much, as we transform element by element using functions.

If we admit Streams to be purely sequential, then we could have any of these pretty cool methods as well (some of which would also be possible with parallel Streams):

  • cycle() – a guaranteed way to make every stream infinite
  • duplicate() – duplicate a stream into two equivalent streams
  • foldLeft() – a sequential and non-associative alternative to reduce()
  • foldRight() – a sequential and non-associative alternative to reduce()
  • limitUntil() – limit the stream to those records before the first one to satisfy a predicate
  • limitWhile() – limit the stream to those records before the first one not to satisfy a predicate
  • maxBy() – reduce the stream to the maximum mapped value
  • minBy() – reduce the stream to the minimum mapped value
  • partition() – partition a stream into two streams, one satisfying a predicate and the other not satisfying the same predicate
  • reverse() – produce a new stream in inverse order
  • skipUntil() – skip records until a predicate is satisified
  • skipWhile() – skip records as long as a predicate is satisfied
  • slice() – take a slice of the stream, i.e. combine skip() and limit()
  • splitAt() – split a stream into two streams at a given position
  • unzip() – split a stream of pairs into two streams
  • zip() – merge two streams into a single stream of pairs
  • zipWithIndex() – merge a stream with its corresponding stream of indexes into a single stream of pairs

jOOλ’s new Seq type does all that

All of the above is part of jOOλ. jOOλ (pronounced “jewel”, or “dju-lambda”, also written jOOL in URLs and such) is an ASL 2.0 licensed library that emerged from our own development needs when implementing jOOQ integration tests with Java 8. Java 8 is exceptionally well-suited for writing tests that reason about sets, tuples, records, and all things SQL.

But the Streams API just slightly feels insufficient, so we have wrapped JDK’s Streams into our own Seq type (Seq for sequence / sequential Stream):

// Wrap a stream in a sequence
Seq<Integer> seq1 = seq(Stream.of(1, 2, 3));

// Or create a sequence directly from values
Seq<Integer> seq2 = Seq.of(1, 2, 3);

We’ve made Seq a new interface that extends the JDK Stream interface, so you can use Seq fully interoperably with other Java APIs – leaving the existing methods unchanged:

public interface Seq<T> extends Stream<T> {

    /**
     * The underlying {@link Stream} implementation.
     */
    Stream<T> stream();
	
	// [...]
}

Now, functional programming is only half the fun if you don’t have tuples. Unfortunately, Java doesn’t have built-in tuples and while it is easy to create a tuple library using generics, tuples are still second-class syntactic citizens when comparing Java to Scala, for instance, or C# and even VB.NET.

Nonetheless…

jOOλ also has tuples

We’ve run a code-generator to produce tuples of degree 1-8 (we might add more in the future, e.g. to match Scala’s and jOOQ’s “magical” degree 22).

And if a library has such tuples, the library also needs corresponding functions. The essence of these TupleN and FunctionN types is summarised as follows:

public class Tuple3<T1, T2, T3>
implements 
    Tuple, 
	Comparable<Tuple3<T1, T2, T3>>, 
	Serializable, Cloneable {
    
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;
	
	// [...]
}

and

@FunctionalInterface
public interface Function3<T1, T2, T3, R> {

    default R apply(Tuple3<T1, T2, T3> args) {
        return apply(args.v1, args.v2, args.v3);
    }

    R apply(T1 v1, T2 v2, T3 v3);
}

There are many more features in Tuple types, but let’s leave them out for today.

On a side note, I’ve recently had an interesting discussion with Gavin King (the creator of Hibernate) on reddit. From an ORM perspective, Java classes seem like a suitable implementation for SQL / relational tuples, and they are indeed. From an ORM perspective.

But classes and tuples are fundamentally different, which is a very subtle issue with most ORMs – e.g. as explained here by Vlad Mihalcea.

Besides, SQL’s notion of row value expressions (i.e. tuples) is quite different from what can be modelled with Java classes. This topic will be covered in a subsequent blog post.

Some jOOλ examples

With the aforementioned goals in mind, let’s see how the above API can be put to work by example:

zipping

// (tuple(1, "a"), tuple(2, "b"), tuple(3, "c"))
Seq.of(1, 2, 3).zip(Seq.of("a", "b", "c"));

// ("1:a", "2:b", "3:c")
Seq.of(1, 2, 3).zip(
    Seq.of("a", "b", "c"), 
    (x, y) -> x + ":" + y
);

// (tuple("a", 0), tuple("b", 1), tuple("c", 2))
Seq.of("a", "b", "c").zipWithIndex();

// tuple((1, 2, 3), (a, b, c))
Seq.unzip(Seq.of(
    tuple(1, "a"),
    tuple(2, "b"),
    tuple(3, "c")
));

This is already a case where tuples have become very handy. When we “zip” two streams into one, we want a wrapper value type that combines both values. Classically, people might’ve used Object[] for quick-and-dirty solutions, but an array doesn’t indicate attribute types or degree.

Unfortunately, the Java compiler cannot reason about the effective bound of the <T> type in Seq<T>. This is why we can only have a static unzip() method (instead of an instance one), whose signature looks like this:

// This works
static <T1, T2> Tuple2<Seq<T1>, Seq<T2>> 
    unzip(Stream<Tuple2<T1, T2>> stream) { ... }
	
// This doesn't work:
interface Seq<T> extends Stream<T> {
    Tuple2<Seq<???>, Seq<???>> unzip();
}

Skipping and limiting

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipWhile(i -> i < 3);

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipUntil(i -> i == 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitWhile(i -> i < 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitUntil(i -> i == 3);

Other functional libraries probably use different terms than skip (e.g. drop) and limit (e.g. take). It doesn’t really matter in the end. We opted for the terms that are already present in the existing Stream API: Stream.skip() and Stream.limit()

Folding

// "abc"
Seq.of("a", "b", "c").foldLeft("", (u, t) -> t + u);

// "cba"
Seq.of("a", "b", "c").foldRight("", (t, u) -> t + u);

The Stream.reduce() operations are designed for parallelisation. This means that the functions passed to it must have these important attributes:

But sometimes, you really want to “reduce” a stream with functions that do not have the above attributes, and consequently, you probably don’t care about the reduction being parallelisable. This is where “folding” comes in.

A nice explanation about the various differences between reducing and folding (in Scala) can be seen here.

Splitting

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// tuple((1, 3, 5), (2, 4, 6))
Seq.of(1, 2, 3, 4, 5, 6).partition(i -> i % 2 != 0)

// tuple((1, 2), (3, 4, 5))
Seq.of(1, 2, 3, 4, 5).splitAt(2);

The above functions all have one thing in common: They operate on a single stream in order to produce two new streams, that can be consumed independently.

Obviously, this means that internally, some memory must be consumed to keep buffers of partially consumed streams. E.g.

  • duplication needs to keep track of all values that have been consumed in one stream, but not in the other
  • partitioning needs to fast forward to the next value that satisfies (or doesn’t satisfy) the predicate, without losing all the dropped values
  • splitting might need to fast forward to the split index

For some real functional fun, let’s have a look at a possible splitAt() implementation:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    return seq(stream)
          .zipWithIndex()
          .partition(t -> t.v2 < position)
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

… or with comments:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    // Add jOOλ functionality to the stream
    // -> local Type: Seq<T>
    return seq(stream)
	
    // Keep track of stream positions
    // with each element in the stream
    // -> local Type: Seq<Tuple2<T, Long>>
          .zipWithIndex()
	  
    // Split the streams at position
    // -> local Type: Tuple2<Seq<Tuple2<T, Long>>,
    //                       Seq<Tuple2<T, Long>>>
          .partition(t -> t.v2 < position)
		  
    // Remove the indexes from zipWithIndex again
    // -> local Type: Tuple2<Seq<T>, Seq<T>>
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

Nice, isn’t it? A possible implementation for partition(), on the other hand, is a bit more complex. Here trivially with Iterator instead of the new Spliterator:

static <T> Tuple2<Seq<T>, Seq<T>> partition(
        Stream<T> stream, 
        Predicate<? super T> predicate
) {
    final Iterator<T> it = stream.iterator();
    final LinkedList<T> buffer1 = new LinkedList<>();
    final LinkedList<T> buffer2 = new LinkedList<>();

    class Partition implements Iterator<T> {

        final boolean b;

        Partition(boolean b) {
            this.b = b;
        }

        void fetch() {
            while (buffer(b).isEmpty() && it.hasNext()) {
                T next = it.next();
                buffer(predicate.test(next)).offer(next);
            }
        }

        LinkedList<T> buffer(boolean test) {
            return test ? buffer1 : buffer2;
        }

        @Override
        public boolean hasNext() {
            fetch();
            return !buffer(b).isEmpty();
        }

        @Override
        public T next() {
            return buffer(b).poll();
        }
    }

    return tuple(
        seq(new Partition(true)), 
        seq(new Partition(false))
    );
}

I’ll let you do the exercise and verify the above code.

Get and contribute to jOOλ, now!

All of the above is part of jOOλ, available for free from GitHub. There is already a partially Java-8-ready, full-blown library called functionaljava, which goes much further than jOOλ.

Yet, we believe that all what’s missing from Java 8’s Streams API is really just a couple of methods that are very useful for sequential streams.

In a previous post, we’ve shown how we can bring lambdas to String-based SQL using a simple wrapper for JDBC (of course, we still believe that you should use jOOQ instead).

Today, we’ve shown how we can write awesome functional and sequential Stream processing very easily, with jOOλ.

Stay tuned for even more jOOλ goodness in the near future (and pull requests are very welcome, of course!)

The 10 Most Annoying Things Coming Back to Java After Some Days of Scala


So, I’m experimenting with Scala because I want to write a parser, and the Scala Parsers API seems like a really good fit. After all, I can implement the parser in Scala and wrap it behind a Java interface, so apart from an additional runtime dependency, there shouldn’t be any interoperability issues.

After a few days of getting really really used to the awesomeness of Scala syntax, here are the top 10 things I’m missing the most when going back to writing Java:

1. Multiline strings

That is my personal favourite, and a really awesome feature that should be in any language. Even PHP has it: Multiline strings. As easy as writing:

println ("""Dear reader,

If we had this feature in Java,
wouldn't that be great?

Yours Sincerely,
Lukas""")

Where is this useful? With SQL, of course! Here’s how you can run a plain SQL statement with jOOQ and Scala:

println(
  DSL.using(configuration)
     .fetch("""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            ORDER BY a.id, b.id
            """)
)

And this isn’t only good for static strings. With string interpolation, you can easily inject variables into such strings:

val predicate =
  if (someCondition)
    "AND a.id = 1"
  else
    ""

println(
  DSL.using(configuration)
      // Observe this little "s"
     .fetch(s"""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            -- This predicate is the referencing the
            -- above Scala local variable. Neat!
            WHERE 1 = 1 $predicate
            ORDER BY a.id, b.id
            """)
)

That’s pretty awesome, isn’t it? For SQL, there is a lot of potential in Scala.

jOOQ: The Best Way to Write SQL in Scala

2. Semicolons

I sincerely haven’t missed them one bit. The way I structure code (and probably the way most people structure code), Scala seems not to need semicolons at all. In JavaScript, I wouldn’t say the same thing. The interpreted and non-typesafe nature of JavaScript seems to indicate that leaving away optional syntax elements is a guarantee to shoot yourself in the foot. But not with Scala.

val a = thisIs.soMuchBetter()
val b = no.semiColons()
val c = at.theEndOfALine()

This is probably due to Scala’s type safety, which would make the compiler complain in one of those rare ambiguous situations, but that’s just an educated guess.

3. Parentheses

This is a minefield and leaving away parentheses seems dangerous in many cases. In fact, you can also leave away the dots when calling a method:

myObject method myArgument

Because of the amount of ambiguities this can generate, especially when chaining more method calls, I think that this technique should be best avoided. But in some situations, it’s just convenient to “forget” about the parens. E.g.

val s = myObject.toString

4. Type inference

This one is really annoying in Java, and it seems that many other languages have gotten it right, in the meantime. Java only has limited type inference capabilities, and things aren’t as bright as they could be.

In Scala, I could simply write

val s = myObject.toString

… and not care about the fact that s is of type String. Sometimes, but only sometimes I like to explicitly specify the type of my reference. In that case, I can still do it:

val s : String = myObject.toString

5. Case classes

I think I’d fancy writing another POJO with 40 attributes, constructors, getters, setters, equals, hashCode, and toString

— Said no one. Ever

Scala has case classes. Simple immutable pojos written in one-liners. Take the Person case class for instance:

case class Person(firstName: String, lastName: String)

I do have to write down the attributes once, agreed. But everything else should be automatic.

And how do you create an instance of such a case class? Easily, you don’t even need the new operator (in fact, it completely escapes my imagination why new is really needed in the first place):

Person("George", "Orwell")

That’s it. What else do you want to write to be Enterprise-compliant?

Side-note

OK, some people will now argue to use project lombok. Annotation-based code generation is nonsense and should be best avoided. In fact, many annotations in the Java ecosystem are simple proof of the fact that the Java language is – and will forever be – very limited in its evolution capabilities. Take @Override for instance. This should be a keyword, not an annotation. You may think it’s a cosmetic difference, but I say that Scala has proven that annotations are pretty much always the wrong tool. Or have you seen heavily annotated Scala code, recently?

6. Methods (functions!) everywhere

This one is really one of the most useful features in any language, in my opinion. Why do we always have to link a method to a specific class? Why can’t we simply have methods in any scope level? Because we can, with Scala:

// "Top-level", i.e. associated with the package
def m1(i : Int) = i + 1

object Test {

    // "Static" method in the Test instance
    def m2(i : Int) = i + 2
    
    def main(args: Array[String]): Unit = {

        // Local method in the main method
        def m3(i : Int) = i + 3
        
        println(m1(1))
        println(m2(1))
        println(m3(1))
    }
}

Right? Why shouldn’t I be able to define a local method in another method? I can do that with classes in Java:

public void method() {
    class LocalClass {}

    System.out.println(new LocalClass());
}

A local class is an inner class that is local to a method. This is hardly ever useful, but what would be really useful is are local methods.

These are also supported in JavaScript or Ceylon, by the way.

7. The REPL

Because of various language features (such as 6. Methods everywhere), Scala is a language that can easily run in a REPL. This is awesome for testing out a small algorithm or concept outside of the scope of your application.

In Java, we usually tend to do this:

public class SomeRandomClass {

    // [...]
  
    public static void main(String[] args) {
        System.out.println(SomeOtherClass.testMethod());
    }

    // [...]
}

In Scala, I would’ve just written this in the REPL:

println(SomeOtherClass.testMethod)

Notice also the always available println method. Pure gold in terms of efficient debugging.

8. Arrays are NOT (that much of) a special case

In Java, apart from primitive types, there are also those weird things we call arrays. Arrays originate from an entirely separate universe, where we have to remember quirky rules originating from the ages of Capt Kirk (or so)

array

Yes, rules like:

// Compiles but fails at runtime
Object[] arrrrr = new String[1];
arrrrr[0] = new Object();

// This works
Object[] arrrr2 = new Integer[1];
arrrr2[0] = 1; // Autoboxing

// This doesn't work
Object[] arrrr3 = new int[];

// This works
Object[] arr4[] = new Object[1][];

// So does this (initialisation):
Object[][] arr5 = { { } };

// Or this (puzzle: Why does it work?):
Object[][] arr6 = { { new int[1] } };

// But this doesn't work (assignment)
arr5 = { { } };

Yes, the list could go on. With Scala, arrays are less of a special case, syntactically speaking:

val a = new Array[String](3);
a(0) = "A"
a(1) = "B"
a(2) = "C"
a.map(v => v + ":")

// output Array(A:, B:, C:)

As you can see, arrays behave much like other collections including all the useful methods that can be used on them.

9. Symbolic method names

Now, this topic is one that is more controversial, as it reminds us of the perils of operator overloading. But every once in a while, we’d wish to have something similar. Something that allows us to write

val x = BigDecimal(3);
val y = BigDecimal(4);
val z = x * y

Very intuitively, the value of z should be BigDecimal(12). That cannot be too hard, can it? I don’t care if the implementation of * is really a method called multiply() or whatever. When writing down the method, I’d like to use what looks like a very common operator for multiplication.

By the way, I’d also like to do that with SQL. Here’s an example:

select ( 
  AUTHOR.FIRST_NAME || " " || AUTHOR.LAST_NAME,
  AUTHOR.AGE - 10
)
from AUTHOR
where AUTHOR.ID > 10
fetch

Doesn’t that make sense? We know that || means concat (in some databases). We know what the meaning of - (minus) and > (greater than) is. Why not just write it?

The above is a compiling example of jOOQ in Scala, btw.

jOOQ: The Best Way to Write SQL in Scala

Attention: Caveat

There’s always a flip side to allowing something like operator overloading or symbolic method names. It can (and will be) abused. By libraries as much as by the Scala language itself.

10. Tuples

Being a SQL person, this is again one of the features I miss most in other languages. In SQL, everything is either a TABLE or a ROW. few people actually know that, and few databases actually support this way of thinking.

Scala doesn’t have ROW types (which are really records), but at least, there are anonymous tuple types. Think of rows as tuples with named attributes, whereas case classes would be named rows:

  • Tuple: Anonymous type with typed and indexed elements
  • Row: Anonymous type with typed, named, and indexed elements
  • case class: Named type with typed and named elements

In Scala, I can just write:

// A tuple with two values
val t1 = (1, "A")

// A nested tuple
val t2 = (1, "A", (2, "B"))

In Java, a similar thing can be done, but you’ll have to write the library yourself, and you have no language support:

class Tuple2<T1, T2> {
    // Lots of bloat, see missing case classes
}

class Tuple3<T1, T2, T3> {
    // Bloat bloat bloat
}

And then:

// Yikes, no type inference...
Tuple2<Integer, String> t1 = new Tuple2<>(1, "A");

// OK, this will certainly not look nice
Tuple3<Integer, String, Tuple2<Integer, String>> t2 =
    new Tuple3<>(1, "A", new Tuple2<>(2, "B"));

jOOQ makes extensive use of the above technique to bring you SQL’s row value expressions to Java, and surprisingly, in most cases, you can do without the missing type inference as jOOQ is a fluent API where you never really assign values to local variables… An example:

DSL.using(configuration)
   .select(T1.SOME_VALUE)
   .from(T1)
   .where(
      // This ROW constructor is completely type safe
      row(T1.COL1, T1.COL2)
      .in(select(T2.A, T2.B).from(T2))
   )
   .fetch();

Conclusion

This was certainly a pro-Scala and slightly contra-Java article. Don’t get me wrong. By no means, I’d like to migrate entirely to Scala. I think that the Scala language goes way beyond what is reasonable in any useful software. There are lots of little features and gimmicks that seem nice to have, but will inevitably blow up in your face, such as:

  • implicit conversion. This is not only very hard to manage, it also slows down compilation horribly. Besides, it’s probably utterly impossible to implement semantic versioning reasonably using implicit, as it is probably not possible to foresee all possible client code breakage through accidental backwards-incompatibility.
  • local imports seem great at first, but their power quickly makes code unintelligible when people start partially importing or renaming types for a local scope.
  • symbolic method names are most often abused. Take the parser API for instance, which features method names like ^^, ^^^, ^?, or ~!

Nonetheless, I think that the advantages of Scala over Java listed in this article could all be implemented in Java as well:

  • with little risk of breaking backwards-compatibility
  • with (probably) not too big of an effort, JLS-wise
  • with a huge impact on developer productivity
  • with a huge impact on Java’s competitiveness

In any case, Java 9 will be another promising release, with hot topics like value types, declaration-site variance, specialisation (very interesting!) or ClassDynamic

With these huge changes, let’s hope there’s also some room for any of the above little improvements, that would add more immediate value to every day work.

Using Oracle AQ in Java Won’t Get Any Easier Than This


As recently announced in our newsletter, the upcoming jOOQ 3.5 will include an awesome new feature for those of you using the Oracle database: Native support for Oracle AQ! And your client code will be so easy to write, you’ll be putting those AQs all over your database immediately.

How does it work?

jOOQ rationale

The biggest reason why many of our users love jOOQ is our code generator. It generates a Java representation of your database schema, with all the relevant objects that you need when writing SQL. So far, this has included tables, sequences, user-defined-types, packages, procedures.

What’s new is that AQ objects are now also generated and associated with the generated object type.

A simple schema

Let’s consider writing this simple schema (all sources available on GitHub)

CREATE OR REPLACE TYPE book_t 
  AS OBJECT (
  ID         NUMBER(7),
  title      VARCHAR2(100 CHAR),
  language   VARCHAR2(2 CHAR)
)
/

CREATE OR REPLACE TYPE books_t 
  AS VARRAY(32) OF book_t
/

CREATE OR REPLACE TYPE author_t 
  AS OBJECT (
  ID         NUMBER(7),
  first_name VARCHAR2(100 CHAR),
  last_name  VARCHAR2(100 CHAR),
  books      books_t
)
/

CREATE OR REPLACE TYPE authors_t 
  AS VARRAY(32) OF author_t
/

BEGIN
  DBMS_AQADM.CREATE_QUEUE_TABLE(
    queue_table => 'new_author_aq_t',
    queue_payload_type => 'author_t'
  );

  DBMS_AQADM.CREATE_QUEUE(
    queue_name => 'new_author_aq',
    queue_table => 'new_author_aq_t'
  );

  DBMS_AQADM.START_QUEUE(
    queue_name => 'new_author_aq'
  );
  COMMIT;
END;
/

So, essentially, we have both OBJECT and VARRAY types for books and authors. You might prefer using TABLE types rather than VARRAY types, but for the sake of simplicity, we stick with VARRAY (as it isn’t so easy to use nested TABLE types with AQs in Oracle).

We have also created a queue that notifies listeners every time a new author is added to the database – along with their books. Imagine enqueue operations being done in a trigger on either the author or the book table.

jOOQ-generated code

When you run the jOOQ codegenerator (version 3.5 upwards) against the above schema, you’ll get a new Queues.java file, which contains:

public class Queues {
    public static final Queue<AuthorT> NEW_AUTHOR_AQ 
      = new QueueImpl<AuthorT>(
         "NEW_AUTHOR_AQ", SP, AUTHOR_T);
}

Obviously, also the previously shown OBJECT and VARRAY types are also generated by jOOQ, just like lables.

(of course, the actual naming patterns for generated Java code are completely configurable)

Using the generated artefacts

The above code is not really nicely formatted on this blog, but you don’t see any of this in your every day work. Because when you want to enqueue a message to this queue, you can simply write:

// Create a new OBJECT type with nested
// VARRAY type
AuthorT author = new AuthorT(
    1,
    "George",
    "Orwell",
    new BooksT(
        new BookT(1, "1984", "en"),
        new BookT(2, "Animal Farm", "en")
    )
);

// ... and simply enqueue that on NEW_AUTHOR_AQ
DBMS_AQ.enqueue(configuration, NEW_AUTHOR_AQ, author);

Seriously? That easy? Yes!

Compare the above to anything you’ve written before through JDBC, or using Oracle’s native APIs. You’ll find a couple of examples about how to serialise / deserialise RAW types, but frankly, queues are awesome because you can send OBJECT types through the database, and we don’t see those examples from Oracle. In fact, trust us, you don’t want to serialise OBJECT, VARRAY, or TABLE types through JDBC. You don’t. That’s our job. We’re hacking JDBC so you don’t have to.

Of course, you can also pass MESSAGE_PROPERTIES_T, ENQUEUE_OPTIONS_T, and DEQUEUE_OPTIONS_T types as arguments to the enqueue() and dequeue() methods.

Dequeuing is just as easy. The following will generate a blocking call and wait for the next AUTHOR_T message to arrive:

AuthorT author =
  DBMS_AQ.dequeue(configuration, NEW_AUTHOR_AQ);

That’s it. Can’t be that hard, can it?

jOOQ: The best way to use Oracle AQ in Java

Goodie: Java 8 and Oracle AQ

With the above simple API and Java 8, we can do what Oracle must’ve known long ago, when they renamed Oracle AQ’s marketing name to Oracle Streams. Let’s create a Java 8 Stream of AQ-produced OBJECT types with jOOQ. Easy as pie. Just write:

static <R extends UDTRecord<R>> Stream<R> stream(
    Configuration c, 
    Queue<R> queue
) {
    return Stream.generate(() -> 
        DBMS_AQ.dequeue(c, queue)
    );
}

And now, use this beauty like so:

stream(configuration, NEW_AUTHOR_AQ)
    .limit(10)
    .forEach(author -> {
        System.out.println(
            author.getFirstName() + " " +
            author.getLastName());
    });

The above statement takes the next 10 messages dequeued this way and prints them to the console.

Java 8 Friday: More Functional Relational Transformation


In the past, we’ve been providing you with a new article every Friday about what’s new in Java 8. It has been a very exciting blog series, but we would like to focus again more on our core content, which is Java and SQL. We will still be occasionally blogging about Java 8, but no longer every Friday (as some of you have already notice).

In this last, short post of the Java 8 Friday series, we’d like to re-iterate the fact that we believe that the future belongs to functional relational data transformation (as opposed to ORM). We’ve spent about 20 years now using the object-oriented software development paradigm. Many of us have been very dogmatic about it. In the last 10 years, however, a “new” paradigm has started to get increasing traction in programming communities: Functional programming.

Functional programming is not that new, however. Lisp has been a very early functional programming language. XSLT and SQL are also somewhat functional (and declarative!). As we’re big fans of SQL’s functional (and declarative!) nature, we’re quite excited about the fact that we now have sophisticated tools in Java to transform tabular data that has been extracted from SQL databases. Streams!

SQL ResultSets are very similar to Streams

As we’ve pointed out before, JDBC ResultSets and Java 8 Streams are quite similar. This is even more true when you’re using jOOQ, which replaces the JDBC ResultSet by an org.jooq.Result, which extends java.util.List, and thus automatically inherits all Streams functionality. Consider the following query that allows fetching a one-to-many relationship between BOOK and AUTHOR records:

Map<Record2<String, String>, 
    List<Record2<Integer, String>>> booksByAuthor =

// This work is performed in the database
// --------------------------------------
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()

// This work is performed in Java memory
// -------------------------------------
   .stream()

   // Group BOOKs by AUTHOR
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(AUTHOR.FIRST_NAME, 
                    AUTHOR.LAST_NAME),

        // This is the target data structure
        LinkedHashMap::new,

        // This is the value to be produced for each
        // group: A list of BOOK
        mapping(
            r -> r.into(BOOK.ID, BOOK.TITLE),
            toList()
        )
    ));

The fluency of the Java 8 Streams API is very idiomatic to someone who has been used to writing SQL with jOOQ. Obviously, you can also use something other than jOOQ, e.g. Spring’s JdbcTemplate, or Apache Commons DbUtils, or just wrap the JDBC ResultSet in an Iterator…

What’s very nice about this approach, compared to ORM is the fact that there is no magic happening at all. Every piece of mapping logic is explicit and, thanks to Java generics, fully typesafe. The type of the booksByAuthor output is complex, and a bit hard to read / write, in this example, but it is also fully descriptive and useful.

The same functional transformation with POJOs

If you aren’t too happy with using jOOQ’s Record2 tuple types, no problem. You can specify your own data transfer objects like so:

class Book {
    public int id;
    public String title;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

static class Author {
    public String firstName;
    public String lastName;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

With the above DTO, you can now leverage jOOQ’s built-in POJO mapping to transform the jOOQ records into your own domain classes:

Map<Author, List<Book>> booksByAuthor =
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()
   .stream()
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(Author.class),
        LinkedHashMap::new,

        // This is the grouping value list
        mapping(
            r -> r.into(Book.class),
            toList()
        )
    ));

Explicitness vs. implicitness

At Data Geekery, we believe that a new time has started for Java developers. A time where Annotatiomania™ (finally!) ends and people stop assuming all that implicit behaviour through annotation magic. ORMs depend on a huge amount of specification to explain how each annotation works with each other annotation. It is hard to reverse-engineer (or debug!) this kind of not-so-well-understood annotation-language that JPA has brought to us.

On the flip side, SQL is pretty well understood. Tables are an easy-to-handle data structure, and if you need to transform those tables into something more object-oriented, or more hierarchically structured, you can simply apply functions to those tables and group values yourself! By grouping those values explicitly, you stay in full control of your mapping, just as with jOOQ, you stay in full control of your SQL.

This is why we believe that in the next 5 years, ORMs will lose relevance and people start embracing explicit, stateless and magicless data transformation techniques again, using Java 8 Streams.

Java 8 Friday: The Best Java 8 Resources – Your Weekend is Booked


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, method references, default methods, the Streams API, and other great stuff. You’ll find the source code on GitHub.

The Best Java 8 Resources – Your Weekend is Booked

We’re obviously not the only ones writing about Java 8. Ever since this great language update’s go live, there had been blogs all around the world appearing with great content and different perspectives on the subject. In this edition of the Java 8 Friday series, we’d like to summarise some of the best content that has been going on on that subject.

1. Brian Goetz’s Answers on Stack Overflow

Brian Goetz was the spec lead for JSR 335. Together with his Expert Group team, he has worked very hard to help Java 8 succeed. However, now that JSR 335 has shipped, his work is far from being over. Brian has had the courtesy of giving authoritative answers to questions from the Java community on Stack Overflow. Here are some of the most interesting questions:

Thumbs up to this great community effort. It cannot get any better than hearing authoritative answers from the spec lead himself.

2. Baeldung.com’s Collection of Java 8 Resources

This list of resources wouldn’t be complete without the very useful list of Java 8 resources (mostly authoritative links to specifications) from the guys over at Baeldung.com. Here is:

http://www.baeldung.com/java8

3. The jOOQ Blog’s Java 8 Friday Series

Yay, that’s us! :-)

Yes, we’ve worked hard to bring you the latest from our experience when integrating jOOQ with Java 8. Here are some of our most popular articles from the recent months:

4. ZeroTurnaround’s RebelLabs Blog

As part of the ZeroTurnaround content marketing strategy, ZeroTurnaround has launched RebelLabs quite a while ago where various writers publish interesting articles around the topic of Java, which aren’t necessarily related to JRebel and other ZT products. There is some great Java 8 related content having been published over there. Here are our favourite gems:

5. The Takipi Blog

Just like ZeroTurnaround and ourselves, our friends over at Takipi provide you with some awesome Java 8 content on their blog.

6. Benji Weber’s Fun Experiments with Java 8

This blog series we found particularly fun to read. Benji Weber really thinks outside of the box and does some crazy things with default methods, method references and all that. Things that Java developers could only dream of, so far. Here are:

7. The Geeks from Paradise Blog’s Java 8 Musings

Edwin Dalorzo from Informatech has been treating us with a variety of well-founded comparisons between Java 8 and .NET. This is particularly interesting when comparing Streams with LINQ. Here are some of his best writings:

Is this list complete?

No, it is missing many other, very interesting blog series. Do you have a series to share? We’re more than happy to update this post, just let us know (in the comments section)

Java 8 Friday: 10 Subtle Mistakes When Using the Streams API


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

10 Subtle Mistakes When Using the Streams API

We’ve done all the SQL mistakes lists:

But we haven’t done a top 10 mistakes list with Java 8 yet! For today’s occasion (it’s Friday the 13th), we’ll catch up with what will go wrong in YOUR application when you’re working with Java 8. (it won’t happen to us, as we’re stuck with Java 6 for another while)

1. Accidentally reusing streams

Wanna bet, this will happen to everyone at least once. Like the existing “streams” (e.g. InputStream), you can consume streams only once. The following code won’t work:

IntStream stream = IntStream.of(1, 2);
stream.forEach(System.out::println);

// That was fun! Let's do it again!
stream.forEach(System.out::println);

You’ll get a

java.lang.IllegalStateException: 
  stream has already been operated upon or closed

So be careful when consuming your stream. It can be done only once

2. Accidentally creating “infinite” streams

You can create infinite streams quite easily without noticing. Take the following example:

// Will run indefinitely
IntStream.iterate(0, i -> i + 1)
         .forEach(System.out::println);

The whole point of streams is the fact that they can be infinite, if you design them to be. The only problem is, that you might not have wanted that. So, be sure to always put proper limits:

// That's better
IntStream.iterate(0, i -> i + 1)
         .limit(10)
         .forEach(System.out::println);

3. Accidentally creating “subtle” infinite streams

We can’t say this enough. You WILL eventually create an infinite stream, accidentally. Take the following stream, for instance:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .distinct()
         .limit(10)
         .forEach(System.out::println);

So…

  • we generate alternating 0’s and 1’s
  • then we keep only distinct values, i.e. a single 0 and a single 1
  • then we limit the stream to a size of 10
  • then we consume it

Well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. It might expect more than that. So it’ll forever consume new values from the stream, and the limit(10) will never be reached. Tough luck, your application stalls.

4. Accidentally creating “subtle” parallel infinite streams

We really need to insist that you might accidentally try to consume an infinite stream. Let’s assume you believe that the distinct() operation should be performed in parallel. You might be writing this:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .parallel()
         .distinct()
         .limit(10)
         .forEach(System.out::println);

Now, we’ve already seen that this will turn forever. But previously, at least, you only consumed one CPU on your machine. Now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. That’s pretty bad. You can probably hard-reboot your server / development machine after that. Have a last look at what my laptop looked like prior to exploding:

If I were a laptop, this is how I'd like to go.

If I were a laptop, this is how I’d like to go.

5. Mixing up the order of operations

So, why did we insist on your definitely accidentally creating infinite streams? It’s simple. Because you may just accidentally do it. The above stream can be perfectly consumed if you switch the order of limit() and distinct():

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .limit(10)
         .distinct()
         .forEach(System.out::println);

This now yields:

0
1

Why? Because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1).

Of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). No one really wants 10 random values, and only then reduce them to be distinct.

If you’re coming from a SQL background, you might not expect such differences. Take SQL Server 2012, for instance. The following two SQL statements are the same:

-- Using TOP
SELECT DISTINCT TOP 10 *
FROM i
ORDER BY ..

-- Using FETCH
SELECT *
FROM i
ORDER BY ..
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY

So, as a SQL person, you might not be as aware of the importance of the order of streams operations.

jOOQ, the best way to write SQL in Java

6. Mixing up the order of operations (again)

Speaking of SQL, if you’re a MySQL or PostgreSQL person, you might be used to the LIMIT .. OFFSET clause. SQL is full of subtle quirks, and this is one of them. The OFFSET clause is applied FIRST, as suggested in SQL Server 2012’s (i.e. the SQL:2008 standard’s) syntax.

If you translate MySQL / PostgreSQL’s dialect directly to streams, you’ll probably get it wrong:

IntStream.iterate(0, i -> i + 1)
         .limit(10) // LIMIT
         .skip(5)   // OFFSET
         .forEach(System.out::println);

The above yields

5
6
7
8
9

Yes. It doesn’t continue after 9, because the limit() is now applied first, producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). Not what you may have intended.

BEWARE of the LIMIT .. OFFSET vs. "OFFSET .. LIMIT" trap!

7. Walking the file system with filters

We’ve blogged about this before. What appears to be a good idea is to walk the file system using filters:

Files.walk(Paths.get("."))
     .filter(p -> !p.toFile().getName().startsWith("."))
     .forEach(System.out::println);

The above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. Unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. Lazily, though, but logically containing all sub-paths. Now, the filter will correctly filter out paths whose names start with a dot “.”. E.g. .git or .idea will not be part of the resulting stream. But these paths will be: .\.git\refs, or .\.idea\libraries. Not what you intended.

Now, don’t fix this by writing the following:

Files.walk(Paths.get("."))
     .filter(p -> !p.toString().contains(File.separator + "."))
     .forEach(System.out::println);

While that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories.

I guess you’ll have to resort to good old JDK 1.0 File.list() again. The good news is, FilenameFilter and FileFilter are both functional interfaces.

8. Modifying the backing collection of a stream

While you’re iterating a List, you must not modify that same list in the iteration body. That was true before Java 8, but it might become more tricky with Java 8 streams. Consider the following list from 0..9:

// Of course, we create this list using streams:
List<Integer> list = 
IntStream.range(0, 10)
         .boxed()
         .collect(toCollection(ArrayList::new));

Now, let’s assume that we want to remove each element while consuming it:

list.stream()
    // remove(Object), not remove(int)!
    .peek(list::remove)
    .forEach(System.out::println);

Interestingly enough, this will work for some of the elements! The output you might get is this one:

0
2
4
6
8
null
null
null
null
null
java.util.ConcurrentModificationException

If we introspect the list after catching that exception, there’s a funny finding. We’ll get:

[1, 3, 5, 7, 9]

Heh, it “worked” for all the odd numbers. Is this a bug? No, it looks like a feature. If you’re delving into the JDK code, you’ll find this comment in ArrayList.ArraListSpliterator:

/*
 * If ArrayLists were immutable, or structurally immutable (no
 * adds, removes, etc), we could implement their spliterators
 * with Arrays.spliterator. Instead we detect as much
 * interference during traversal as practical without
 * sacrificing much performance. We rely primarily on
 * modCounts. These are not guaranteed to detect concurrency
 * violations, and are sometimes overly conservative about
 * within-thread interference, but detect enough problems to
 * be worthwhile in practice. To carry this out, we (1) lazily
 * initialize fence and expectedModCount until the latest
 * point that we need to commit to the state we are checking
 * against; thus improving precision.  (This doesn't apply to
 * SubLists, that create spliterators with current non-lazy
 * values).  (2) We perform only a single
 * ConcurrentModificationException check at the end of forEach
 * (the most performance-sensitive method). When using forEach
 * (as opposed to iterators), we can normally only detect
 * interference after actions, not before. Further
 * CME-triggering checks apply to all other possible
 * violations of assumptions for example null or too-small
 * elementData array given its size(), that could only have
 * occurred due to interference.  This allows the inner loop
 * of forEach to run without any further checks, and
 * simplifies lambda-resolution. While this does entail a
 * number of checks, note that in the common case of
 * list.stream().forEach(a), no checks or other computation
 * occur anywhere other than inside forEach itself.  The other
 * less-often-used methods cannot take advantage of most of
 * these streamlinings.
 */

Now, check out what happens when we tell the stream to produce sorted() results:

list.stream()
    .sorted()
    .peek(list::remove)
    .forEach(System.out::println);

This will now produce the following, “expected” output

0
1
2
3
4
5
6
7
8
9

And the list after stream consumption? It is empty:

[]

So, all elements are consumed, and removed correctly. The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state. It is now “safe” to remove elements from the list!

Well… can we really? Let’s proceed with parallel(), sorted() removal:

list.stream()
    .sorted()
    .parallel()
    .peek(list::remove)
    .forEach(System.out::println);

This now yields:

7
6
2
5
8
4
1
0
9
3

And the list contains

[8]

Eek. We didn’t remove all elements!? Free beers (and jOOQ stickers) go to anyone who solves this streams puzzler!

This all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. It just doesn’t work.

9. Forgetting to actually consume the stream

What do you think the following stream does?

IntStream.range(1, 5)
         .peek(System.out::println)
         .peek(i -> { 
              if (i == 5) 
                  throw new RuntimeException("bang");
          });

When you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. But that’s not correct. It won’t do anything. The stream just sits there, never having been consumed.

As with any fluent API or DSL, you might actually forget to call the “terminal” operation. This might be particularly true when you use peek(), as peek() is an aweful lot similar to forEach().

This can happen with jOOQ just the same, when you forget to call execute() or fetch():

DSL.using(configuration)
   .update(TABLE)
   .set(TABLE.COL1, 1)
   .set(TABLE.COL2, "abc")
   .where(TABLE.ID.eq(3));

Oops. No execute()

jOOQ, the best way to write SQL in Java

Yes, the “best” way – with 1-2 caveats ;-)

10. Parallel stream deadlock

This is now a real goodie for the end!

All concurrent systems can run into deadlocks, if you don’t properly synchronise things. While finding a real-world example isn’t obvious, finding a forced example is. The following parallel() stream is guaranteed to run into a deadlock:

Object[] locks = { new Object(), new Object() };

IntStream
    .range(1, 5)
    .parallel()
    .peek(Unchecked.intConsumer(i -> {
        synchronized (locks[i % locks.length]) {
            Thread.sleep(100);

            synchronized (locks[(i + 1) % locks.length]) {
                Thread.sleep(50);
            }
        }
    }))
    .forEach(System.out::println);

Note the use of Unchecked.intConsumer(), which transforms the functional IntConsumer interface into a org.jooq.lambda.fi.util.function.CheckedIntConsumer, which is allowed to throw checked exceptions.

Well. Tough luck for your machine. Those threads will be blocked forever :-)

The good news is, it has never been easier to produce a schoolbook example of a deadlock in Java!

For more details, see also Brian Goetz’s answer to this question on Stack Overflow.

Conclusion

With streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. Few of these bugs can be prevented, except through practice and staying focused. You have to think about how to order your operations. You have to think about whether your streams may be infinite.

Streams (and lambdas) are a very powerful tool. But a tool which we need to get a hang of, first.

Stay tuned for more exciting Java 8 articles on this blog.

Follow

Get every new post delivered to your Inbox.

Join 2,190 other followers

%d bloggers like this: