Archive by Author | lukaseder

Oracle LONG and LONG RAW Causing “Stream has already been closed” Exception


Like many old databases, Oracle has legacy data types, which are rather nasty to work with in every day SQL. Usually, you don’t run into wild encounters of LONG and LONG RAW data types anymore, but when you’re working with an old database, or with the dictionary views, you might just have to deal with LONG.

These data types are pretty much the same thing as the “newer” LOB representations:

  • LONG and CLOB are somewhat the same thing, except they aren’t
  • LONG RAW and BLOB are somewhat the same thing, except they aren’t

Reading LONG or LONG RAW from JDBC causes a “Stream has already been closed” exception

When you have the following schema:

CREATE TABLE t_long_raw_and_blob (
  id        NUMBER(7),
  blob1     BLOB,
  longx     LONG RAW,
  blob2     BLOB,

  CONSTRAINT pk_t_long_raw_and_blob PRIMARY KEY (id)
);

CREATE TABLE t_long_and_clob (
  id        NUMBER(7),
  clob1     CLOB,
  longx     LONG,
  clob2     CLOB,

  CONSTRAINT pk_t_long_and_clob PRIMARY KEY (id)
);

… you cannot just simply select all columns from JDBC (or other APIs) like this:

try (PreparedStatement s = con.prepareStatement(
        "SELECT * FROM t_long_raw_and_blob");
     ResultSet rs = s.executeQuery()) {

    while (rs.next()) {
        System.out.println();
        System.out.println("ID    = " + rs.getInt(1));
        System.out.println("BLOB1 = " + rs.getBytes(2));
        System.out.println("LONGX = " + rs.getBytes(3));
        System.out.println("BLOB2 = " + rs.getBytes(4));
    }
}

If you’re doing the above, you’ll run into something along the lines of:

Caused by: java.sql.SQLException: Stream has already been closed
    at oracle.jdbc.driver.LongRawAccessor.getBytes(LongRawAccessor.java:162)
    at oracle.jdbc.driver.OracleResultSetImpl.getBytes(OracleResultSetImpl.java:708)
    ... 33 more

The “correct” solution would be, to run the following, instead:

try (PreparedStatement s = con.prepareStatement(
        "SELECT * FROM t_long_raw_and_blob");
     ResultSet rs = s.executeQuery()) {

    while (rs.next()) {
        byte[] longx = rs.getBytes(3);

        System.out.println();
        System.out.println("ID    = " + rs.getInt(1));
        System.out.println("BLOB1 = " + rs.getBytes(2));
        System.out.println("LONGX = " + longx);
        System.out.println("BLOB2 = " + rs.getBytes(4));
    }
}

In short: All LONG or LONG RAW columns have to be retrieved from the ResultSet prior to all the other columns.

That’s nasty

Indeed! Some sort of low level Oracle protocol flaw has leaked outside of the JDBC API, which is very unfortunate. We don’t care about these details. We should be able to fetch resources in any order.

In jOOQ, we’ve fixed this issue #4820, so you can run your statement and order the columns in whatever order you want them to be:

DSL.using(configuration)
   .select(
       T_LONG_RAR_AND_BLOB.ID,
       T_LONG_RAR_AND_BLOB.BLOB1,
       T_LONG_RAR_AND_BLOB.LONGX,
       T_LONG_RAR_AND_BLOB.BLOB2
   )
   .from(T_LONG_RAR_AND_BLOB)
   .fetch();

jOOQ will internally reorder the columns when fetching them from the ResultSet, transparently.

Please, Java. Do Finally Support Multiline String Literals


I understand the idea of Java-the-language being rather hard to maintain in a backwards-compatible way. I understand the idea of JDK API, such as the collections, to be rather tough not to break. Yes.

I don’t understand why Java still doesn’t have multiline string literals.

How often do you write JDBC code (or whatever other external language or markup, say, JSON or XML you want to embed in Java) like this?

try (PreparedStatement s = connection.prepareStatement(
    "SELECT * "
  + "FROM my_table "
  + "WHERE a = b "
)) {
    ...
}

What’s the issue?

  • Syntax correctness, i.e. don’t forget to add a whitespace at the end of each line
  • Style in host language vs style in external language, sure the above code looks “nicely” formatted in Java, but it’s not formatted for the consuming server side
  • SQL injection, didn’t we teach our juniors not to perform this kind of string concatenation in SQL, to prevent SQL injection? Sure, the above is still safe, but what keeps a less experienced maintainer from embedding, accidentally, user input?

Today, I was working with some code written in Xtend, a very interesting language that compiles into Java source code. Xtend is exceptionally useful for templating (e.g. for generating jOOQ’s Record1Record22 API). I noticed another very nice feature of multi line strings:

The lack of need for escaping!

Multi line strings in Xtend are terminated by triple-apostrophes. E.g.

// Xtend
val regex = '''import java\.lang\.AutoCloseable;'''

Yes, the above is a valid Java regular expression. I’m escaping the dots when matching imports of the AutoCloseable type. I don’t have to do this tedious double-escaping that I have to do in ordinary strings to tell the Java compiler that the backslash is really a backslash, not Java escaping of the following character:

// Java
String regex = "import java\\.lang\\.AutoCloseable;";

So… Translated to our original SQL example, I would really like to write this, instead:

try (PreparedStatement s = connection.prepareStatement(
    '''SELECT *
       FROM my_table
       WHERE a = b'''
)) {
    ...
}

With a big nice-to-have plus: String interpolation (even PHP has it)!

String tableName = "my_table";
int b = 1;
try (PreparedStatement s = connection.prepareStatement(
    '''SELECT *
       FROM ${tableName}
       WHERE a = ${b}'''
)) {
    ...
}

Small but very effective improvement

This would be a very small (in terms of language complexity budget: Just one new token) but very effective improvement for all of us out there who are embedding an external language (SQL, XML, XPath, Regex, you name it) in Java. We do that a lot. And we hate it.

It doesn’t have to be as powerful as Xtend’s multiline string literals (which really rock with their whitespace management for formatting, and templating expressions). But it would be a start.

Please, make this a New Year’s resolution! :)

JEP 277 “Enhanced Deprecation” is Nice. But Here’s a Much Better Alternative


Maintaining APIs is hard.

We’re maintaining the jOOQ API which is extremely complex. But we are following relatively relaxed rules as far as semantic versioning is concerned.

When you read comments by Brian Goetz and others about maintaining backwards-compatibility in the JDK, I can but show a lot of respect for their work. Obviously, we all wish that things like Vector, Stack, Hashtable were finally removed, but there are backwards-compatibility related edge cases around the collections API that ordinary mortals will never think of. For instance: Why aren’t Java Collections remove methods generic?

Better Deprecation

Stuart Marks aka Dr Deprecator

Stuart Marks aka Dr Deprecator

With Java 9, Jigsaw, and modularity, one of the main driving goals for the new features is to be able to “cut off” parts of the JDK and gently deprecate and remove them over the next releases. And as a part of this improvement, Stuart Marks AKA Dr Deprecator has suggested JEP 277: “Enhanced Deprecation”

The idea is for this to enhance the @Deprecated annotation with some additional info, such as:

  • UNSPECIFIED. This API has been deprecated without any reason having been given. This is the default value; everything that’s deprecated today implicitly has a deprecation reason of UNSPECIFIED.
  • CONDEMNED. This API is earmarked for removal in a future JDK release. Note, the use of the word “condemned” here is used in the sense of a structure that is intended to be torn down. The term is not mean to imply any moral censure.
  • DANGEROUS. Use of this API can lead to data loss, deadlock, security vulnerability, incorrect results, or loss of JVM integrity.
  • OBSOLETE. This API is no longer necessary, and usages should be removed. No replacement API exists. Note that OBSOLETE APIs might or might not be marked CONDEMNED.
  • SUPERSEDED. This API has been replaced by a newer API, and usages should be migrated away from this API to the newer API. Note that SUPERSEDED APIs might or might not be marked CONDEMNED.
  • UNIMPLEMENTED. Calling this has no effect or will unconditionally throw an exception.
  • EXPERIMENTAL. This API is not a stable part of the specification, and it may change incompatibly or disappear at any time.

When deprecating stuff, it’s important to be able to communicate the intent of the deprecation. This can be achieved as well via the @deprecated Javadoc tag, where any sort of text can be generated.

An alternative, much better solution

The above proposition suffers from the following problems:

  • It’s not extensible. The above may be enough for JDK library designers, but we as third party API providers will want to have many more elements in the enum, other than CONDEMNED, DANGEROUS, etc.
  • Still no plain text info. There is still redundancy between this annotation and the Javadoc tag as we can still not formally provide any text to the annotation that clarifies, e.g. the motivation of why something is “DANGEROUS”.
  • “Deprecated” is wrong. The idea of marking something UNIMPLEMENTED or EXPERIMENTAL as “deprecated” shows the workaround-y nature of this JEP, which tries to shoehorn some new functionality into existing names.

I have a feeling that the JEP is just too afraid to touch too many parts. Yet, there would be an extremely simple alternative that is much much better for everyone:

public @interface Warning {
    String name() default "warning";
    String description() default "";
} 

There’s no need to constrain the number of possible warning types to a limited list of constants. Instead, we can have a @Warning annotation that takes any string!

Of course, the JDK could have a set of well-known string values, such as:

public interface ResultSet {

    @Deprecated
    @Warning(name="OBSOLETE")
    InputStream getUnicodeStream(int columnIndex);

}

or…

public interface Collection<E> {

    @Warning(name="OPTIONAL")
    boolean remove(Object o);
}

Notice that while JDBC’s ResultSet.getUnicodeStream() is really deprecated in the sense of being “OBSOLETE”, we could also add a hint to the Collection.remove() method, which applies only to the Collection type, not to many of its subtypes.

Now, the interesting thing with such an approach is that we could also enhance the useful @SuppressWarnings annotation, because sometimes, we simply KnowWhatWeAreDoing™, e.g. when writing things like:

Collection<Integer> collection = new ArrayList<>();

// Compiler!! Stop bitching
@SuppressWarnings("OPTIONAL")
boolean ok = collection.remove(1);

This approach would solve many problems in one go:

  • The JDK maintainers have what they want. Nice tooling for gently deprecating JDK stuff
  • The not-so-well documented mess around what’s possible to do with @SuppressWarnings would finally be a bit more clean and formal
  • We could emit tons of custom warnings to our users, depending on a variety of use-cases
  • Users could mute warnings on a very fine-grained level

For instance: A motivation for jOOQ would be to disambiguate the DSL equal() method from the unfortunate Object.equals() method:

public interface Field<T> {

   /**
     * <code>this = value</code>.
     */
    Condition equal(T value);

    /**
     * <strong>Watch out! This is 
     * {@link Object#equals(Object)}, 
     * not a jOOQ DSL feature!</strong>
     */
    @Override
    @Warning(
        name = "ACCIDENTAL_EQUALS",
        description = "Did you mean Field.equal?"
    )
    boolean equals(Object other);
}

The background of this use-case is described here:
https://github.com/jOOQ/jOOQ/issues/4763

Conclusion

JEP 277 is useful, no doubt. But it is also very limited in scope (probably not to further delay Jigsaw?) Yet, I wish this topic of generating these kinds of compiler warnings would be dealt with more thoroughly by the JDK maintainers. This is a great opportunity to DoTheRightThing™

I don’t think the above “spec” is complete. It’s just a rough idea. But I had wished for such a mechanism many many times as an API designer. To be able to give users a hint about potential API misuse, which they can mute either via:

  • @SuppressWarnings, directly in the code.
  • Easy to implement IDE settings. It would be really simple for Eclipse, NetBeans, and IntelliJ to implement custom warning handling for these things.

Once we do have a @Warning annotation, we can perhaps, finally deprecate the not so useful @Deprecated

@Warning(name = "OBSOLETE")
public @interface Deprecated {
}

Discussions

See also follow-up discussions on:

How to Fill Sparse Data With the Previous Non-Empty Value in SQL


The following is a very common problem in all data related technologies and we’re going to look into two very lean, SQL-based solutions for it:

How do I fill the cells of a sparse data set with the “previous non-empty value”?

The problem

The problem is really simple and I’m reusing the example provided by Stack Overflow user aljassi in this question:

We have a table containing “sparse” data:

Col1  Col2  Col3  Col4
----------------------
A     0     1     5
B     0     4     0
C     2     0     0
D     0     0     0
E     3     5     0
F     0     3     0
G     0     3     1
H     0     1     5
I     3     5     0

The above data set contains a couple of interesting data points that are non-zero, and some gaps modelled by the value zero. In other examples, we could replace zero by NULL, but it would still be the same problem. The desired result is the following:

Col1  Col2  Col3  Col4
----------------------
A     0     1     5
B     0     4     5
C     2     4     5
D     2     4     5
E     3     5     5
F     3     3     5
G     3     3     1
H     3     1     5
I     3     5     5

Note that all the generated values are highlighted in red, and they correspond to the most recent blue value.

How to do it with SQL? We’ll be looking at two solutions:

A solution using window functions

This is the solution you should be looking for, and there are two answers in the linked Stack Overflow question that both make use of window functions:

Both solutions are roughly equivalent. Here’s how they work (using Oracle syntax):

WITH t(col1, col2, col3, col4) AS (
  SELECT 'A', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 'B', 0, 4, 0 FROM DUAL UNION ALL
  SELECT 'C', 2, 0, 0 FROM DUAL UNION ALL
  SELECT 'D', 0, 0, 0 FROM DUAL UNION ALL
  SELECT 'E', 3, 5, 0 FROM DUAL UNION ALL
  SELECT 'F', 0, 3, 0 FROM DUAL UNION ALL
  SELECT 'G', 0, 3, 1 FROM DUAL UNION ALL
  SELECT 'H', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 'I', 3, 5, 0 FROM DUAL
)
SELECT
  col1,

  nvl(last_value(nullif(col2, 0)) 
      IGNORE NULLS OVER (ORDER BY col1), 0) col2,

  nvl(last_value(nullif(col3, 0)) 
      IGNORE NULLS OVER (ORDER BY col1), 0) col3,

  nvl(last_value(nullif(col4, 0)) 
      IGNORE NULLS OVER (ORDER BY col1), 0) col4
FROM t

Now, let’s decompose these window functions:

NULLIF(colx, 0)

This is just an easy way of producing NULL values whenever we have what is an accepted “empty” value in our data set. So, instead of zeros, we just get NULL. Applying this function to our data, we’re getting:

Col1  Col2  Col3  Col4
----------------------
A     NULL  1     5
B     NULL  4     NULL
C     2     NULL  NULL
D     NULL  NULL  NULL
E     3     5     NULL
F     NULL  3     NULL
G     NULL  3     1
H     NULL  1     5
I     3     5     NULL

We’re doing this because now, we can make use of the useful IGNORE NULLS clause that is available to some ranking functions, specifically LAST_VALUE(), or LAG(). We can now write:

last_value(...) IGNORE NULLS OVER (ORDER BY col1)

Where we take the last non-NULL value that precedes the current row when ordering rows by col1:

  • If the current row contains a non-NULL value, we’re taking that value.
  • If the current row contains a NULL value, we’re going “up” until we reach a non-NULL value
  • If we’re going “up” and we haven’t reached any non-NULL value, well, we get NULL

This leads to the following result:

Col1  Col2  Col3  Col4
----------------------
A     NULL  1     5
B     NULL  4     5
C     2     4     5
D     2     4     5
E     3     5     5
F     3     3     5
G     3     3     1
H     3     1     5
I     3     5     5

Note that with most window functions, once you specify an ORDER BY clause, then the following frame clause is taken as a default:

last_value(...) IGNORE NULLS OVER (
  ORDER BY col1
  ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
)

That’s a lot of keywords, but their meaning is not really that obscure once you get a hang of window functions. We suggest reading the following blog posts to learn more about them:

Finally, because we don’t want those NULL values to remain in our results, we simply remove them using NVL() (or COALESCE() in other databases):

nvl(last_value(...) IGNORE NULLS OVER (...), 0)

Easy, isn’t it? Note, that in this particular case, LAG() and LAST_VALUE() will have the same effect.

A solution using the MODEL clause

Whenever you have a problem in (Oracle) SQL, that starts getting hard to solve with window functions, the Oracle MODEL clause might offer an “easy” solution to it. I’m using quotes on “easy”, because the syntax is a bit hard to remember, but the essence of it is really not that hard.

The MODEL clause is nothing else than an Oracle-specific dialect for implementing spreadsheet-like logic in the database. I highly recommend reading the relevant Whitepaper by Oracle, which explains the functionality very well:

http://www.oracle.com/technetwork/middleware/bi-foundation/10gr1-twp-bi-dw-sqlmodel-131067.pdf

Here’s how you could tackle the problem with MODEL (and bear with me):

WITH t(col1, col2, col3, col4) AS (
  SELECT 'A', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 'B', 0, 4, 0 FROM DUAL UNION ALL
  SELECT 'C', 2, 0, 0 FROM DUAL UNION ALL
  SELECT 'D', 0, 0, 0 FROM DUAL UNION ALL
  SELECT 'E', 3, 5, 0 FROM DUAL UNION ALL
  SELECT 'F', 0, 3, 0 FROM DUAL UNION ALL
  SELECT 'G', 0, 3, 1 FROM DUAL UNION ALL
  SELECT 'H', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 'I', 3, 5, 0 FROM DUAL
)
SELECT * FROM t
MODEL
  DIMENSION BY (row_number() OVER (ORDER BY col1) rn)
  MEASURES (col1, col2, col3, col4)
  RULES (
    col2[any] = DECODE(col2[cv(rn)], 0, NVL(col2[cv(rn) - 1], 0), col2[cv(rn)]),
    col3[any] = DECODE(col3[cv(rn)], 0, NVL(col3[cv(rn) - 1], 0), col3[cv(rn)]),
    col4[any] = DECODE(col4[cv(rn)], 0, NVL(col4[cv(rn) - 1], 0), col4[cv(rn)])
  )

There are three clauses that are of interest here:

The DIMENSION BY clause

Like in a Microsoft Excel spreadsheet, the DIMENSION corresponds to the consecutive, distinct index of each spreadsheet cell, by which we want to access the cell. In Excel, there are always two dimensions (one written with letters A..Z, AA..ZZ, …) and the other one written with numbers (1..infinity).

Using MODEL, you can specify as many dimensions as you want. In our example, we’ll only use one, the row number of each row, ordered by col1 (another use case for a window function).

The MEASURES clause

The MEASURES clause specifies the individual cell values for each “cell”. In Microsoft Excel, a cell can have only one value. In Oracle’s MODEL clause, we can operate on many values at once, within a “cell”.

In this case, we’ll just make all the columns our cells.

The RULES clause

This is the really interesting part in the MODEL clause. Here, we specify by what rules we want to calculate the values of each individual cell. The syntax is simple:

RULES (
  <rule 1>,
  <rule 2>,
  ...,
  <rule N>
)

Each individual rule can implement an assignment of the form:

RULES (
  cell[dimension(s)] = rule
)

In our case, we’ll repeat the same rule for cells col2, col3, and col4, and for any value of the dimension rn (for row number). So, the left-hand side of the assignment is

RULES (
  col2[any] = rule,
  col3[any] = rule,
  col4[any] = rule,
)

The right hand side is a trivial (but not trivial-looking) expression:

DECODE(col2[cv(rn)], 0, NVL(col2[cv(rn) - 1], 0), col2[cv(rn)])

Let’s decompose again.

DECODE

DECODE is a simple and useful Oracle function that takes a first argument, compares it with argument 2, and if they’re the same, returns argument 3, otherwise argument 4. It works like a CASE, which is a bit more verbose:

DECODE(A, B, C, D)

-- The same as:

CASE A WHEN B THEN C ELSE D END

cv(rn)

cv() is a MODEL specific “function” that means “current value”. On the left-hand side of the assignment, we used "any" as the dimension specifier, so we’re applying this rule for “any” value of rn. In order to access a specific rn value, we’ll simply write cv(rn), or the “current value of rn”.

recursiveness

The RULES of the MODEL clause are allowed to span a recursive tree (although not a graph, so no cycles are allowed), where each cell can be defined based on a previous cell, which is again defined based on its predecessor. We’re doing this via col2[cv(rn) - 1], where cv(rn) - 1 means the “current row number minus one”.

Easy, right? Granted. The syntax isn’t straight-forward and we’re only scratching the surface of what’s possible with MODEL.

Conclusion

SQL provides cool ways to implementing data-driven, declarative specifications of what your data should be like. The MODEL clause is a bit eerie, but at the same time extremely powerful. Much easier and also a bit faster are window functions, a tool that should be in the tool chain of every developer working with SQL.

In this article, we’ve shown how to fill gaps in sparse data using window functions or MODEL. A similar use-case are running totals. If this article has triggered your interest, I suggest reading about different approaches of calculating a running total in SQL.

Reactive Database Access – Part 2 – Actors


We’re very happy to continue our a guest post series on the jOOQ blog by Manuel Bernhardt. In this blog series, Manuel will explain the motivation behind so-called reactive technologies and after introducing the concepts of Futures and Actors use them in order to access a relational database in combination with jOOQ.

manuel-bernhardtManuel Bernhardt is an independent software consultant with a passion for building web-based systems, both back-end and front-end. He is the author of “Reactive Web Applications” (Manning) and he started working with Scala, Akka and the Play Framework in 2010 after spending a long time with Java. He lives in Vienna, where he is co-organiser of the local Scala User Group. He is enthusiastic about the Scala-based technologies and the vibrant community and is looking for ways to spread its usage in the industry. He’s also scuba-diving since age 6, and can’t quite get used to the lack of sea in Austria.

This series is split in three parts, which we’ll publish over the next month:

Introduction

In our last post we introduced the concept of reactive applications, explained the merits of asynchronous programming and introduced Futures, a tool for expressing and manipulating asynchronous values.

In this post we will look into another tool for building asynchronous programs based on the concept of message-driven communication: actors.

The Actor-based concurrency model was popularized by the Erlang programming language and its most popular implementation on the JVM is the Akka concurrency toolkit.

In one way, the Actor model is object-orientation done “right”: the state of an actor can be mutable, but it is never exposed directly to the outside world. Instead, actors communicate with each other on the basis of asynchronous message-passing in which the messages themselves are immutable. An actor can only do one of three things:

  • send and receive any number of messages
  • change its behaviour or state in response to a message arriving
  • start new child actors

It is always in the hands of an actor to decide what state it is ready to share, and when to mutate it. This model therefore makes it much easier for us humans to write concurrent programs that are not riddled with race-conditions or deadlocks that we may have introduced by accidentally reading or writing outdated state or using locks as a means to avoid the latter.

In what follows we are going to see how Actors work and how to combine them with Futures.

Actor fundamentals

Actors are lightweight objects that communicate with eachother by sending and receiving messages. Each actor has amailbox in which incoming messages are queued before they get processed.

Two actors talking to each other

Actors have different states: they can be started, resumed, stopped and restarted. Resuming or restarting an actor is useful when an actor crashes as we will see later on.

Actors also have an actor reference which is a means for one actor to reach another. Like a phone number, the actor reference is a pointer to an actor, and if the actor were to be restarted and replaced by a new incarnation in case of crash it would make no difference to other actors attempting to send messages to it since the only thing they know about the actor is its reference, not the identity of one particular incarnation.

Sending and receiving messages

Let’s start by creating a simple actor:

import akka.actor._

class Luke extends Actor {
  def receive = {
    case _ => // do nothing
  }
}

This is really all it takes to create an actor. But that’s not very interesting. Let’s spice things up a little and define a reaction to a given message:

import akka.actor._

case object RevelationOfFathership

class Luke extends Actor {
  def receive = {
    case RevelationOfFathership =>
      System.err.println("Noooooooooo")
  }
}   

Here we go! RevelationOfFathership is a case object, i.e. an immutable message. This last detail is rather important: your messages should always be self-contained and not referencing the internal state of any actor since this would effectively leak this state to the outside, hence breaking the guarantee that only an actor can change its internal state. This last bit is paramount for actors to offer a better, more human-friendly concurrency model and for not getting any surprises.

Now that Luke knows how to appropriately respond to the inconvenient truth that Dark Vader is his father, all we need is the dark lord himself.

import akka.actor._

class Vader extends Actor {

  override def preStart(): Unit =
    context.actorSelection("akka://application/user/luke") ! RevelationOfFathership

  def receive = {
    case _ => // ...
  }
}

The Vader actor uses the preStart lifecycle method in order to trigger sending the message to his son when he gets started up. We’re using the actor’s context in order to send a message to Luke.

The entire sequence for running this example would look as follows:

import akka.actor._

val system = ActorSystem("application")
val luke = system.actorOf(Props[Luke], name = "luke")
val vader = system.actorOf(Props[Vader], name = "vader")

The Props are a means to describe how to obtain an instance of an actor. Since they are immutable they can be freely shared, for example accross different JVMs running on different machines (this is useful for example when operating an Akka cluster).

Actor supervision

Actors do not merely exist in the wild, but instead are part of an actor hierarchy and each actor has a parent. Actors that we create are supervised by the User Guardian of the application’s ActorSystem which is a special actor provided by Akka and responsible for supervising all actors in user space. The role of a supervising actor is to decide how to deal with the failure of a child actor and to act accordingly.

The User Guardian itself is supervised by the Root Guardian (which also supervises another special actor internal to Akka), and is itself supervised by a special actor reference. Legend says that this reference was there before all other actor references came into existence and is called “the one who walks the bubbles of space-time” (if you don’t believe me, check the official Akka documentation).

Organizing actors in hierarchies offers the advantage of encoding error handling right into the hierarchy. Each parent is responsible for the actions of their children. Should something go wrong and a child crash, the parent would have the opportunity to restart it.

Vader, for example, has a few storm troopers:

import akka.actor._
import akka.routing._

class Vader extends Actor {

  val troopers: ActorRef = context.actorOf(
    RoundRobinPool(8).props(Props[StromTrooper])
  )
}

The RoundRobinPool is a means of expressing the fact that messages sent to troopers will be sent to each trooper child one after the other. Routers encode strategies for sending messages to several actors at once, Akka provides many predefined routers.

Crashed Stormtroopers

Ultimately, actors can crash, and it is then the job of the supervisor to decide what to do. The decision-making mechanism is represented by a so-called supervision strategy. For example, Vader could decide to retry restarting a storm trooper 3 times before giving up and stopping it:

import akka.actor._

class Vader extends Actor {

  val troopers: ActorRef = context.actorOf(
    RoundRobinPool(8).props(Props[StromTrooper])
  )

  override def supervisorStrategy =
    OneForOneStrategy(maxNrOfRetries = 3) {
      case t: Throwable =>
        log.error("StormTrooper down!", t)
        SupervisorStrategy.Restart
    }
}

This supervision strategy is rather crude since it deals with all types of Throwable in the same fashion. We will see in our next post that supervision strategies are an effective means of reacting to different types of failures in different ways.

Combining Futures and Actors

There is one golden rule of working with actors: you should not perform any blocking operation such as for example a blocking network call. The reason is simple: if the actor blocks, it can not process incoming messages which may lead to a full mailbox (or rather, since the default mailbox used by actors is unbounded, to an OutOfMemoryException.

This is why it may be useful to be able to use Futures within actors. The pipe pattern is designed to do just that: it send the result of a Future to an actor:

import akka.actor._
import akka.pattern.pipe

class Luke extends Actor {
  def receive = {
    case RevelationOfFathership =>
      sendTweet("Nooooooo") pipeTo self
    case tsr: TweetSendingResult =>
      // ...
  }

  def sendTweet(msg: String): Future[TweetSendingResult] = ...
}  

In this example we call the sendTweet Future upon reception of the RevelationOfFathership and use the pipeTo method to indicate that we would like the result of the Future to be sent to ourselves.

There is just one problem with the code above: if the Future were to fail, we would receive the failed throwable in a rather inconvenient format, wrapped in a message of type akka.actor.Status.Failure, without any useful context. This is why it may be more appropriate to recover failures before piping the result:

import akka.actor._
import akka.pattern.pipe
import scala.control.NonFatal

class Luke extends Actor {
  def receive = {
    case RevelationOfFathership =>
      val message = "Nooooooo"
      sendTweet(message) recover {
        case NonFatal(t) => TweetSendFailure(message, t)
      } pipeTo self
    case tsr: TweetSendingResult =>
      // ...
    case tsf: TweetSendingFailure =>
      // ...
  }

  def sendTweet(msg: String): Future[TweetSendingResult] = ...
}   

With this failure handling we now know which message failed to be sent on Twitter and can take an appropriate action (e.g. re-try sending it).

That’s it for this short introduction to Actors. In the next and last post of this series we will see how to use Futures and Actors in combination for reactive database access.

Read on

Stay tuned as we’ll publish Part 3 shortly as a part of this series:

jOOQ Tuesdays: Rafael Winterhalter is Wrestling Byte Code with Byte Buddy


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Rafael Winterhalter

We have the pleasure of talking to Rafael Winterhalter in this seventh edition who will be telling us about the depths of Java byte code, and about his library Byte Buddy, which makes working with byte code extremely easy.

Note that Byte Buddy won the 2015 Duke’s Choice award – congratulations to this from our side!

Hi Rafael – You’re the author of the popular Byte Buddy library. What does Byte Buddy do?

Byte Buddy is a code generation and manipulation library. It offers APIs for creating new Java classes at runtime and for changing existing classes before or after they were loaded.

At first glance, this might sound like a very esoteric thing to do, but runtime code generation is used in a large number of Java projects. Code generation tools are often used by library developers to implement aspect-oriented programming. For example, the mocking library Mockito adopted Byte Buddy to create subclasses of mocked classes at runtime. In order to implement a mock, Mockito overrides all methods of a class such that the user’s original code is not invoked when a method is called in a test. And there are plenty of other well-known users of code generation. Spring, for example, uses code generation to implement its annotation aspects such as security or transactions. And Hibernate uses code-generation to lazily load properties from getter methods by overriding those getters to query the database only if they are invoked.

Why is there a need for Byte Buddy when there are alternatives like ASM, CGLIB, AspectJ or Javassist?

Before I started working on Byte Buddy, I was involved in several other open-source projects as a contributor. As mentioned before, code generation is a typical requirement for implementing many libraries and so I got used to working with mostly CGLIB and Javassist. However, I became constantly frustrated with those libraries’ limitations and I wanted to resolve the problems I had discovered. Eventually, I started to write an alternative library that I later published as Byte Buddy.

To understand the limitations of alternative libraries, mocks are a good example use case. Mocks in Mockito were previously created using CGLIB. CGLIB is a rather mature library. It has been around for over 15 years and when it was originally developed, the library’s developers did of course not anticipate features such as annotations, generic types or defender methods. Annotations did however become an important part of many APIs which would not accept a mock instance because any annotations of overridden methods were lost. In Java, annotations on methods are never inherited when they are overridden. And annotations on types are only inherited if they are explicitly declared to be. To overcome this, Byte Buddy allows to copy any annotation to a subclass what is now a feature in Mockito 2.

In contrast, Javassist allows to copy annotations, but I do not personally like the approach of the library. In Javassist, all generated code is represented as Java code contained in strings. As a result, Javassist code evolves similarly unstructured to Java code that only describes SQL as concatenated strings. Besides creating code that is difficult to maintain, this approach also offers vulnerabilities such as Java code injection similar to SQL injection. It is sometimes possible to attack Javassist code by letting it compile arbitrary code what can cause sever damage to an application.

AspectJ is a powerful tool when manipulating existing code. However, Byte Buddy lets you do anything that AspectJ is capable of but in plain and simple Java. This way, developers do not need to learn a new syntax or programming metaphor or install tools for their build-process and IDEs. Furthermore, I do not find the join-point and point-cut terminology intuitive and decided to avoid it altogether. Instead, I decided to mimic terminology that developers already know from the Java programming language to ease the first steps with Byte Buddy.

ASM on the other hand is the basis on top of which Byte Buddy is implemented. ASM is a byte code parser rather than a code generation library. ASM processes single class files  and does not consider type hierarchies. ASM does neither have a concept of class loading and does not include higher-level concepts on top of byte code instructions. Byte Buddy offers however an adapter that exposes the ASM API to users that require the generation of very specific code.

How does one become so involved with low-level Java?

In the beginning, I set myself a goal of only creating a version of CGLIB with annotation support which was what I originally needed. But I quickly found out that a lot of developers were looking for the solution that Byte Buddy has become today. Therefore, I started to plan to make the full feature set of the Java virtual machine accessible. For doing so, learning all the gory details and corner cases of the class file format has become a necessity to implement these features. To be fair, the class file format is fairly trivial once you get the hang of it and I really enjoy to see my library mature.

Between Java byte code (2GL language) and SQL (4GL language), there are many levels of programmatic abstraction. Where do you feel at home the most?

I would want to use the right tool for the right job. Obviously, I enjoy working with byte code, but I would avoid handcrafting byte code when working in a production project. In the end, this is what higher-level abstractions such as Byte Buddy are made for.

Looking at the common use cases, Byte Buddy is however often used for implementing custom features by changing code based on annotations on methods. In a way, Byte Buddy enables developers to implement their own 4G abstraction. Declarative programming is a great abstraction for certain tasks, SQL being one of them.

You’ve become a famous speaker and domain expert in a very short time. What’s your most exciting story, being an influencer?

Mainly, I find it exciting to meet users of my library. I have met folks that implemented internal frameworks with large teams that is based on my software and obviously, it makes me proud that Byte Buddy proves to be that useful.

Thank you very much Rafael

If you want to learn more about Rafael’s work, about byte code or about Byte Buddy, check out his talk at JavaZone:

SQL GROUP BY and Functional Dependencies: A Very Useful Feature


Relational databases define the term “Functional Dependency” as such (from Wikipedia):

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

In SQL, functional dependencies appear whenever there is a unique constraint (e.g. a primary key constraint). Let’s assume the following:

CREATE TABLE actor (
  actor_id BIGINT NOT NULL PRIMARY KEY,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL
);

It can be said that both FIRST_NAME and LAST_NAME each have a functional dependency on the ACTOR_ID column.

Nice. So what?

This isn’t just some mathematical statement that can be applied to unique constraints. It’s extremely useful for SQL. It means that for every ACTOR_ID value, there can be only one (functionally dependent) FIRST_NAME and LAST_NAME value. The other way round, this isn’t true. For any given FIRST_NAME and/or LAST_NAME value, we can have multiple ACTOR_ID values, as we can have multiple actors by the same names.

Because there can be only one corresponding FIRST_NAME and LAST_NAME value for any given ACTOR_ID value, we can omit those columns in the GROUP BY clause. Let’s assume also:

CREATE TABLE film_actor (
  actor_id BIGINT NOT NULL,
  film_id BIGINT NOT NULL,
  
  PRIMARY KEY (actor_id, film_id),
  FOREIGN KEY (actor_id) REFERENCS actor (actor_id),
  FOREIGN KEY (film_id) REFERENCS film (film_id)
);

Now, if we want to count the number of films per actor, we can write:

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id
ORDER BY COUNT(*) DESC

This is extremely useful as it saves us from a lot of typing. In fact, the way GROUP BY semantics is defined, we can put all sorts of column references in the SELECT clause, which are any of:

  • Column expressions that appear in the GROUP BY clause
  • Column expressions that are functionally dependent on the set of column expressions in the GROUP BY clause
  • Aggregate functions

Unfortunately, not everyone supports this

If you’re using Oracle, for instance, you can’t make use of the above. You’ll need to write the classic, equivalent version where all the non-aggregate column expressions appearing in the SELECT clause must also appear in the GROUP BY clause

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id, first_name, last_name
--                 ^^^^^^^^^^  ^^^^^^^^^ unnecessary
ORDER BY COUNT(*) DESC

Further reading:

3 Reasons why You Shouldn’t Replace Your for-loops by Stream.forEach()


Awesome! We’re migrating our code base to Java 8. We’ll replace everything by functions. Throw out design patterns. Remove object orientation. Right! Let’s go!

Wait a minute

Java 8 has been out for over a year now, and the thrill has gone back to day-to-day business.

A non-representative study executed by baeldung.com from May 2015 finds that 38% of their readers have adopted Java 8. Prior to that, a late 2014 study by Typsafe had claimed 27% Java 8 adoption among their users.

What does it mean for your code-base?

Some Java 7 -> Java 8 migration refactorings are no-brainers. For instance, when passing a Callable to an ExecutorService:

ExecutorService s = ...

// Java 7 - meh...
Future<String> f = s.submit(
    new Callable<String>() {
        @Override
        public String call() {
            return "Hello World";
        }
    }
);

// Java 8 - of course!
Future<String> f = s.submit(() -> "Hello World");

The anonymous class style really doesn’t add any value here.

Apart from these no-brainers, there are other, less obvious topics. E.g. whether to use an external vs. an internal iterator. See also this interesting read from 2007 by Neil Gafter on the timeless topic:
http://gafter.blogspot.ch/2007/07/internal-versus-external-iterators.html

The result of the following two pieces of logic is the same

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    System.out.println(i);

// "Modern"
list.forEach(System.out::println);

I claim that the “modern” approach should be used with extreme care, i.e. only if you truly benefit from the internal, functional iteration (e.g. when chaining a set of operations via Stream’s map(), flatMap() and other operations).

Here’s a short list of cons of the “modern” approach compared to the classic one:

1. Performance – you will lose on it

Angelika Langer has wrapped up this topic well enough in her article and the related talk that she’s giving at conferences:
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html

In many cases, performance is not critical, and you shouldn’t do any premature optimisation – so you may claim that this argument is not really an argument per se. But I will counter this attitude in this case, saying that the overhead of Stream.forEach() compared to an ordinary for loop is so significant in general that using it by default will just pile up a lot of useless CPU cycles across all of your application. If we’re talking about 10%-20% more CPU consumption just based on the choice of loop style, then we did something fundamentally wrong. Yes – individual loops don’t matter, but the load on the overall system could have been avoided.

Here’s Angelika’s benchmark result on an ordinary loop, finding the max value in a list of boxed ints:

ArrayList, for-loop : 6.55 ms
ArrayList, seq. stream: 8.33 ms

In other cases, when we’re performing relatively easy calculations on primitive data types, we absolutely SHOULD fall back to the classic for loop (and preferably to arrays, rather than collections).

Here’s Angelika’s benchmark result on an ordinary loop, finding the max value in an array of primitive ints:

int-array, for-loop : 0.36 ms
int-array, seq. stream: 5.35 ms

Premature optimisation is not good, but cargo-culting the avoidance of premature optimisation is even worse. It’s important to reflect on what context we’re in, and to make the right decisions in such a context. We’ve blogged about performance before, see our article Top 10 Easy Performance Optimisations in Java

2. Readability – for most people, at least

We’re software engineers. We’ll always discuss style of our code as if it really mattered. For instance, whitespace, or curly braces.

The reason why we do so is because maintenance of software is hard. Especially of code written by someone else. A long time ago. Who probably wrote only C code before switching to Java.

Sure, in the example we’ve had so far, we don’t really have a readability issue, the two versions are probably equivalent:

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    System.out.println(i);

// "Modern"
list.forEach(System.out::println);

But what happens here:

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    for (int j = 0; j < i; j++)
        System.out.println(i * j);

// "Modern"
list.forEach(i -> {
    IntStream.range(0, i).forEach(j -> {
        System.out.println(i * j);
    });
});

Things start getting a bit more interesting and unusual. I’m not saying “worse”. It’s a matter of practice and of habit. And there isn’t a black/white answer to the problem. But if the rest of the code base is imperative (and it probably is), then nesting range declarations and forEach() calls, and lambdas is certainly unusual, generating cognitive friction in the team.

You can construct examples where an imperative approach really feels more awkward than the equivalent functional one, as exposed here:

But in many situations, that’s not true, and writing the functional equivalent of something relatively easy imperative is rather hard (and again, inefficient). An example could be seen on this blog in a previous post:
http://blog.jooq.org/2015/09/09/how-to-use-java-8-functional-programming-to-generate-an-alphabetic-sequence/

In that post, we generated a sequence of characters:

A, B, ..., Z, AA, AB, ..., ZZ, AAA

… similar to the columns in MS Excel:

MS Excel column names

The imperative approach (originally by an unnamed user on Stack Overflow):

import static java.lang.Math.*;
 
private static String getString(int n) {
    char[] buf = new char[(int) floor(log(25 * (n + 1)) / log(26))];
    for (int i = buf.length - 1; i >= 0; i--) {
        n--;
        buf[i] = (char) ('A' + n % 26);
        n /= 26;
    }
    return new String(buf);
}

… probably outshines the funcitonal one on a conciseness level:

import java.util.List;
 
import org.jooq.lambda.Seq;
 
public class Test {
    public static void main(String[] args) {
        int max = 3;
 
        List<String> alphabet = Seq
            .rangeClosed('A', 'Z')
            .map(Object::toString)
            .toList();
 
        Seq.rangeClosed(1, max)
           .flatMap(length ->
               Seq.rangeClosed(1, length - 1)
                  .foldLeft(Seq.seq(alphabet), (s, i) -> 
                      s.crossJoin(Seq.seq(alphabet))
                       .map(t -> t.v1 + t.v2)))
           .forEach(System.out::println);
    }
}

And this is already using jOOλ, to simplify writing functional Java.

3. Maintainability

Let’s think again of our previous example. Instead of multiplying values, we divide them now.

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    for (int j = 0; j < i; j++)
        System.out.println(i / j);

// "Modern"
list.forEach(i -> {
    IntStream.range(0, i).forEach(j -> {
        System.out.println(i / j);
    });
});

Obviously, this is asking for trouble, and we can immediately see the trouble in an exception stack trace.

Old school

Exception in thread "main" java.lang.ArithmeticException: / by zero
	at Test.main(Test.java:13)

Modern

Exception in thread "main" java.lang.ArithmeticException: / by zero
	at Test.lambda$1(Test.java:18)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:557)
	at Test.lambda$0(Test.java:17)
	at java.util.Arrays$ArrayList.forEach(Arrays.java:3880)
	at Test.main(Test.java:16)

Wow. Were we just…? Yes. These are the same reasons why we’ve had performance issues in item #1 in the first place. Internal iteration is just a lot more work for the JVM and the libraries. And this is an extremely easy use-case, we could’ve displayed the same thing with the generation of AA, AB, .., ZZ series.

From a maintenance perspective, a functional programming style can be much harder than imperative programming – especially when you blindly mix the two styles in legacy code.

Conclusion

This is usually a pro-functional programming, pro-declarative programming blog. We love lambdas. We love SQL. And combined, they can produce miracles.

But when you migrate to Java 8 and contemplate using a more functional style in your code, beware that FP is not always better – for various reasons. In fact, it is never “better”, it is just different and allows us to reason about problems differently.

We Java developers will need to practice, and come up with an intuitive understanding of when to use FP, and when to stick with OO/imperative. With the right amount of practice, combining both will help us improve our software.

Or, to put it in Uncle Bob’s terms:

The bottom, bottom line here is simply this. OO programming is good, when you know what it is. Functional programming is good when you know what it is. And functional OO programming is also good once you know what it is.

http://blog.cleancoder.com/uncle-bob/2014/11/24/FPvsOO.html

Reactive Database Access – Part 1 – Why “Async”


We’re very happy to announce a guest post series on the jOOQ blog by Manuel Bernhardt. In this blog series, Manuel will explain the motivation behind so-called reactive technologies and after introducing the concepts of Futures and Actors use them in order to access a relational database in combination with jOOQ.

manuel-bernhardtManuel Bernhardt is an independent software consultant with a passion for building web-based systems, both back-end and front-end. He is the author of “Reactive Web Applications” (Manning) and he started working with Scala, Akka and the Play Framework in 2010 after spending a long time with Java. He lives in Vienna, where he is co-organiser of the local Scala User Group. He is enthusiastic about the Scala-based technologies and the vibrant community and is looking for ways to spread its usage in the industry. He’s also scuba-diving since age 6, and can’t quite get used to the lack of sea in Austria.

This series is split in three parts, which we’ll publish over the next month:

Reactive?

The concept of Reactive Applications is getting increasingly popular these days and chances are that you have already heard of it someplace on the Internet. If not, you could read the Reactive Manifesto or we could perhaps agree to the following simple summary thereof: in a nutshell, Reactive Applications are applications that:

  • make optimal use of computational resources (in terms of CPU and memory usage) by leveraging asynchronous programming techniques
  • know how to deal with failure, degrading gracefully instead of, well, just crashing and becoming unavailable to their users
  • can adapt to intensive workloads, scaling out on several machines / nodes as load increases (and scaling back in)

Reactive Applications do not exist merely in a wild, pristine green field. At some point they will need to store and access data in order to do something meaningful and chances are that the data happens to live in a relational database.

Latency and database access

When an application talks to a database more often than not the database server is not going to be running on the same server as the application. If you’re unlucky it might even be that the server (or set of servers) hosting the application live in a different data centre than the database server. Here is what this means in terms of latency:

Latency numbers every programmer should know

Say that you have an application that runs a simple SELECT query on its front page (let’s not debate whether or not this is a good idea here). If your application and database servers live in the same data centre, you are looking at a latency of the order of 500 µs (depending on how much data comes back). Now compare this to all that your CPU could do during that time (all those green and black squares on the figure above) and keep this in mind – we’ll come back to it in just a minute.

The cost of threads

Let’s suppose that you run your welcome page query in a synchronous manner (which is what JDBC does) and wait for the result to come back from the database. During all of this time you will be monopolizing a thread that waits for the result to come back. A Java thread that just exists (without doing anything at all) can take up to 1 MB of heap memory, so if you use a threaded server that will allocate one thread per user (I’m looking at you, Tomcat) then it is in your best interest to have quite a bit of memory available for your application in order for it to still work when featured on Hacker News (1 MB / concurrent user).

Reactive applications such as the ones built with the Play Framework make use of a server that follows the evented server model: instead of following the “one user, one thread” mantra it will treat requests as a set of events (accessing the database would be one of these events) and run it through an event loop:

Evented server and its event loop

Such a server will not use many threads. For example, default configuration of the Play Framework is to create one thread per CPU core with a maximum of 24 threads in the pool. And yet, this type of server model can deal with many more concurrent requests than the threaded model given the same hardware. The trick, as it turns out, is to hand over the thread to other events when a task needs to do some waiting – or in other words: to program in an asynchronous fashion.

Painful asynchronous programming

Asynchronous programming is not really new and programming paradigms for dealing with it have been around since the 70s and have quietly evolved since then. And yet, asynchronous programming is not necessarily something that brings back happy memories to most developers. Let’s look at a few of the typical tools and their drawbacks.

Callbacks

Some languages (I’m looking at you, Javascript) have been stuck in the 70s with callbacks as their only tool for asynchronous programming up until recently (ECMAScript 6 introduced Promises). This is also knows as christmas tree programming:

Christmas tree of hell

Ho ho ho.

Threads

As a Java developer, the word asynchronous may not necessarily have a very positive connotation and is often associated to the infamous synchronized keyword:

Working with threads in Java is hard, especially when using mutable state – it is so much more convenient to let an underlying application server abstract all of the asynchronous stuff away and not worry about it, right? Unfortunately as we have just seen this comes at quite a hefty cost in terms of performance.

And I mean, just look at this stack trace:

Abstraction is the mother of all trees

In one way, threaded servers are to asynchronous programming what Hibernate is to SQL – a leaky abstraction that will cost you dearly on the long run. And once you realize it, it is often too late and you are trapped in your abstraction, fighting by all means against it in order to increase performance. Whilst for database access it is relatively easy to let go of the abstraction (just use plain SQL, or even better, jOOQ), for asynchronous programming the better tooling is only starting to gain in popularity.

Let’s turn to a programming model that finds its roots in functional programming: Futures.

Futures: the SQL of asynchronous programming

Futures as they can be found in Scala leverage functional programming techniques that have been around for decades in order to make asynchronous programming enjoyable again.

Future fundamentals

A scala.concurrent.Future[T] can be seen as a box that will eventually contain a value of type T if it succeeds. If it fails, theThrowable at the origin of the failure will be kept. A Future is said to have succeeded once the computation it is waiting for has yielded a result, or failed if there was an error during the computation. In either case, once the Future is done computing, it is said to be completed.

Welcome to the Futures

As soon as a Future is declared, it will start running, which means that the computation it tries to achieve will be executed asynchronously. For example, we can use the WS library of the Play Framework in order to execute a GET request against the Play Framework website:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

This call will return immediately and lets us continue to do other things. At some point in the future, the call may have been executed, in which case we could access the result to do something with it. Unlike Java’s java.util.concurrent.Future<V>which lets one check whether a Future is done or block while retrieving it with the get() method, Scala’s Future makes it possible to specify what to do with the result of an execution.

Transforming Futures

Manipulating what’s inside of the box is easy as well and we do not need to wait for the result to be available in order to do so:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

val siteOnline: Future[Boolean] = 
  response.map { r =>
    r.status == 200
  }

siteOnline.foreach { isOnline =>
  if(isOnline) {
    println("The Play site is up")
  } else {
    println("The Play site is down")
  }
}

In this example, we turn our Future[WSResponse] into a Future[Boolean] by checking for the status of the response. It is important to understand that this code will not block at any point: only when the response will be available will a thread be made available for the processing of the response and execute the code inside of the map function.

Recovering failed Futures

Failure recovery is quite convenient as well:

val response: Future[WSResponse] =
  WS.url("http://www.playframework.com").get()

val siteAvailable: Future[Option[Boolean]] = 
  response.map { r =>
    Some(r.status == 200)
  } recover {
    case ce: java.net.ConnectException => None
  }

At the very end of the Future we call the recover method which will deal with a certain type of exception and limit the dammage. In this example we are only handling the unfortunate case of a java.net.ConnectException by returning a Nonevalue.

Composing Futures

The killer feature of Futures is their composeability. A very typical use-case when building asynchronous programming workflows is to combine the results of several concurrent operations. Futures (and Scala) make this rather easy:

def siteAvailable(url: String): Future[Boolean] =
  WS.url(url).get().map { r =>
    r.status == 200
}

val playSiteAvailable =
  siteAvailable("http://www.playframework.com")

val playGithubAvailable =
  siteAvailable("https://github.com/playframework")

val allSitesAvailable: Future[Boolean] = for {
  siteAvailable <- playSiteAvailable
  githubAvailable <- playGithubAvailable
} yield (siteAvailable && githubAvailable)

The allSitesAvailable Future is built using a for comprehension which will wait until both Futures have completed. The two Futures playSiteAvailable and playGithubAvailable will start running as soon as they are being declared and the for comprehension will compose them together. And if one of those Futures were to fail, the resulting Future[Boolean] would fail directly as a result (without waiting for the other Future to complete).

This is it for the first part of this series. In the next post we will look at another tool for reactive programming and then finally at how to use those tools in combination in order to access a relational database in a reactive fashion.

Read on

Stay tuned as we’ll publish Parts 2 and 3 shortly as a part of this series:

A Subtle AutoCloseable Contract Change Between Java 7 and Java 8


A nice feature of the Java 7 try-with-resources statement and the AutoCloseable type that was introduced to work with this statement is the fact that static code analysis tools can detect resource leaks. For instance, Eclipse:

resource-leak

When you have the above configuration and you try running the following program, you’ll get three warnings:

public static void main(String[] args) 
throws Exception {
    Connection c = DriverManager.getConnection(
         "jdbc:h2:~/test", "sa", "");
    Statement s = c.createStatement();
    ResultSet r = s.executeQuery("SELECT 1 + 1");
    r.next();
    System.out.println(r.getInt(1));
}

The output is, trivially

2

The warnings are issued on all of c, s, r. A quick fix (don’t do this!) is to suppress the warning using an Eclipse-specific SuppressWarnings parameter:

@SuppressWarnings("resource")
public static void main(String[] args) 
throws Exception {
    ...
}

After all, WeKnowWhatWeReDoing™ and this is just a simple example, right?

Wrong!

The right way to fix this, even for simple examples (at least after Java 7) is to use the effortless try-with-resources statement.

public static void main(String[] args) 
throws Exception {
    try (Connection c = DriverManager.getConnection(
             "jdbc:h2:~/test", "sa", "");
         Statement s = c.createStatement();
         ResultSet r = s.executeQuery("SELECT 1 + 1")) {

        r.next();
        System.out.println(r.getInt(1));
    }
}

In fact, it would be great if Eclipse could auto-fix this warning and wrap all the individual statements in a try-with-resources statement. Upvote this feature request, please!

Great, we know this. What’s the deal with Java 8?

In Java 8, the contract on AutoCloseable has changed very subtly (or bluntly, depending on your point of view).

Java 7 version

A resource that must be closed when it is no longer needed.

Note the word "must".

Java 8 version

An object that may hold resources (such as file or socket handles) until it is closed. The close() method of an AutoCloseable object is called automatically when exiting a try-with-resources block for which the object has been declared in the resource specification header. This construction ensures prompt release, avoiding resource exhaustion exceptions and errors that may otherwise occur.

API Note:

It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.

In short, from Java 8 onwards, AutoCloseable is more of a hint saying that you might be using a resource that needs to be closed, but this isn’t necessarily the case.

This is similar to the Iterable contract, which doesn’t say whether you can iterate only once, or several times over the Iterable, but it imposes a contract that is required for the foreach loop.

When do we have “optionally closeable” resources?

Take jOOQ for instance. Unlike in JDBC, a jOOQ Query (which was made AutoCloseable in jOOQ 3.7) may or may not represent a resource, depending on how you execute it. By default, it is not a resource:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "")) {

    // No new resources created here:
    ResultQuery<Record> query =
        DSL.using(c).resultQuery("SELECT 1 + 1");

    // Resources created and closed immediately
    System.out.println(query.fetch());
}

The output is again:

+----+
|   2|
+----+
|   2|
+----+

But now, we have again an Eclipse warning on the query variable, saying that there is a resource that needs to be closed, even if by using jOOQ this way, we know that this isn’t true. The only resource in the above code is the JDBC Connection, and it is properly handled. The jOOQ-internal PreparedStatement and ResultSet are completely handled and eagerly closed by jOOQ.

Then, why implement AutoCloseable in the first place?

jOOQ inverses JDBC’s default behaviour.

  • In JDBC, everything is done lazily by default, and resources have to be closed explicitly.
  • In jOOQ, everything is done eagerly by default, and optionally, resources can be kept alive explicitly.

For instance, the following code will keep an open PreparedStatement and ResultSet:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "");

     // We "keep" the statement open in the ResultQuery
     ResultQuery<Record> query =
         DSL.using(c)
            .resultQuery("SELECT 1 + 1")
            .keepStatement(true)) {

    // We keep the ResultSet open in the Cursor
    try (Cursor<Record> cursor = query.fetchLazy()) {
        System.out.println(cursor.fetchOne());
    }
}

With this version, we no longer have any warnings in Eclipse, but the above version is really the exception when using the jOOQ API.

The same thing is true for Java 8’s Stream API. Interestingly, Eclipse doesn’t issue any warnings here:

Stream<Integer> stream = Arrays.asList(1, 2, 3).stream();
stream.forEach(System.out::println);

Conclusion

Resource leak detection seems to be a nice IDE / compiler feature at first. But avoiding false positives is hard. Specifically, because Java 8 changed contracts on AutoCloseable, implementors are allowed to implement the AutoCloseable contract for mere convenience, not as a clear indicator of a resource being present that MUST be closed.

This makes it very hard, if not impossible, for an IDE to detect resource leaks of third party APIs (non-JDK APIs), where these contracts aren’t generally well-known. The solution is, as ever so often with static code analysis tools, to simply turn off potential resource leak detection:

resource-leak-solution

For more insight, see also this Stack Overflow answer by Stuart Marks, linking to the EG’s discussions on lambda-dev

Follow

Get every new post delivered to your Inbox.

Join 3,907 other followers

%d bloggers like this: