We’re Taking Bets: This Annotation Will Soon Show up in the JDK


This recent Stack Overflow question by Yahor has intrigued me: How to ensure at Java 8 compile time that a method signature “implements” a functional interface. It’s a very good question. Let’s assume the following nominal type:

@FunctionalInterface
interface LongHasher {
    int hash(long x);
}

The type imposes a crystal clear contract. Implementors must provide a single method named hash() taking a long argument, returning a int value. When using lambdas or method references, then the hash() method name is no longer relevant, and the structural type long -> int will be sufficient.

In his question, Yahor wants to enforce the above type upon three static methods (example modified by me):

class LongHashes {

    // OK
    static int xorHash(long x) {
        return (int)(x ^ (x >>> 32));
    }

    // OK
    static int continuingHash(long x) {
        return (int)(x + (x >>> 32));
    }

    // Yikes
    static int randomHash(NotLong x) {
         return xorHash(x * 0x5DEECE66DL + 0xBL);
    }
}

And he would like the Java compiler to complain in the third case, as the randomHash() does not “conform” to LongHasher.

A compilation error is easy to produce, of course, by actually assigning the static methods in their functional notation (method references) to a LongHasher instance:

// OK
LongHasher good = LongHashes::xorHash;
LongHasher alsoGood = LongHashes::continuingHash;

// Yikes
LongHasher ouch = LongHashes::randomHash;

But that’s not as concise as it could / should be. The type constraint should be imposed directly on the static method.

And what’s the Java way of doing that?

With annotations, of course!

I’m going to take bets that the following pattern will show up by JDK 10:

class LongHashes {

    // Compiles
    @ReferenceableAs(LongHasher.class)
    static int xorHash(long x) {
        return (int)(x ^ (x >>> 32));
    }

    // Compiles
    @ReferenceableAs(LongHasher.class)
    static int continuingHash(long x) {
        return (int)(x + (x >>> 32));
    }

    // Doesn't compile
    @ReferenceableAs(LongHasher.class)
    static int randomHash(NotLong x) {
         return xorHash(x * 0x5DEECE66DL + 0xBL);
    }
}

In fact, you could already implement such an annotation today, and write your own annotation processor (or JSR-308 checker) to validate these methods. Looking forward to yet another great annotation!

So, who’s in for the bet that we’ll have this annotation by JDK 10?

Type Safe Queries for JPA’s Native Query API


When you’re using JPA – sometimes – JPQL won’t do the trick and you’ll have to resort to native SQL. From the very beginning, ORMs like Hibernate kept an open “backdoor” for these cases and offered a similar API to Spring’s JdbcTemplate, to Apache DbUtils, or to jOOQ for plain SQL. This is useful as you can continue using your ORM as your single point of entry for database interaction.

However, writing complex, dynamic SQL using string concatenation is tedious and error-prone, and an open door for SQL injection vulnerabilities. Using a type safe API like jOOQ would be very useful, but you may find it hard to maintain two different connection, transaction, session models within the same application just for 10-15 native queries.

But the truth is:

You an use jOOQ for your JPA native queries!

That’s true! There are several ways to achieve this.

Fetching tuples (i.e. Object[])

The simplest way will not make use of any of JPA’s advanced features and simply fetch tuples in JPA’s native Object[] form for you. Assuming this simple utility method:

public static List<Object[]> nativeQuery(
    EntityManager em, 
    org.jooq.Query query
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(query.getSQL());

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    return result.getResultList();
}

Using the API

This is all you need to bridge the two APIs in their simplest form to run “complex” queries via an EntityManager:

List<Object[]> books =
nativeQuery(em, DSL.using(configuration)
    .select(
        AUTHOR.FIRST_NAME, 
        AUTHOR.LAST_NAME, 
        BOOK.TITLE
    )
    .from(AUTHOR)
    .join(BOOK)
        .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
    .orderBy(BOOK.ID));

books.forEach((Object[] book) -> 
    System.out.println(book[0] + " " + 
                       book[1] + " wrote " + 
                       book[2]));

Agreed, not a lot of type safety in the results – as we’re only getting an Object[]. We’re looking forward to a future Java that supports tuple (or even record) types like Scala or Ceylon.

So a better solution might be the following:

Fetching entities

Let’s assume you have the following, very simple entities:

@Entity
@Table(name = "book")
public class Book {

    @Id
    public int id;

    @Column(name = "title")
    public String title;

    @ManyToOne
    public Author author;
}

@Entity
@Table(name = "author")
public class Author {

    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    @OneToMany(mappedBy = "author")
    public Set<Book> books;
}

And let’s assume, we’ll add an additional utility method that also passes a Class reference to the EntityManager:

public static <E> List<E> nativeQuery(
    EntityManager em, 
    org.jooq.Query query,
    Class<E> type
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(
        query.getSQL(), type);

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    // There's an unsafe cast here, but we can be sure
    // that we'll get the right type from JPA
    return result.getResultList();
}

Using the API

This is now rather slick, just put your jOOQ query into that API and get JPA entities back from it – the best of both worlds, as you can easily add/remove nested collections from the fetched entities as if you had fetched them via JPQL:

List<Author> authors =
nativeQuery(em,
    DSL.using(configuration)
       .select()
       .from(AUTHOR)
       .orderBy(AUTHOR.ID)
, Author.class); // This is our entity class here

authors.forEach(author -> {
    System.out.println(author.firstName + " " + 
                       author.lastName + " wrote");
    
    books.forEach(book -> {
        System.out.println("  " + book.title);

        // Manipulate the entities here. Your
        // changes will be persistent!
    });
});

Fetching EntityResults

If you’re extra-daring and have a strange affection for annotations, or you just want to crack a joke for your coworkers just before you leave on vacation, you can also resort to using JPA’s javax.persistence.SqlResultSetMapping. Imagine the following mapping declaration:

@SqlResultSetMapping(
    name = "bookmapping",
    entities = {
        @EntityResult(
            entityClass = Book.class,
            fields = {
                @FieldResult(name = "id", column = "b_id"),
                @FieldResult(name = "title", column = "b_title"),
                @FieldResult(name = "author", column = "b_author_id")
            }
        ),
        @EntityResult(
            entityClass = Author.class,
            fields = {
                @FieldResult(name = "id", column = "a_id"),
                @FieldResult(name = "firstName", column = "a_first_name"),
                @FieldResult(name = "lastName", column = "a_last_name")
            }
        )
    }
)

Essentially, the above declaration maps database columns (@SqlResultSetMapping -> entities -> @EntityResult -> fields -> @FieldResult -> column) onto entities and their corresponding attributes. With this powerful technique, you can generate entity results from any sort of SQL query result.

Again, we’ll be creating a small little utility method:

public static <E> List<E> nativeQuery(
    EntityManager em, 
    org.jooq.Query query,
    String resultSetMapping
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(
        query.getSQL(), resultSetMapping);

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    // This implicit cast is a lie, but let's risk it
    return result.getResultList();
}

Note that the above API makes use of an anti-pattern, which is OK in this case, because JPA is not a type safe API in the first place.

Using the API

Now, again, you can pass your type safe jOOQ query to the EntityManager via the above API, passing the name of the SqlResultSetMapping along like so:

List<Object[]> result =
nativeQuery(em,
    DSL.using(configuration
       .select(
           AUTHOR.ID.as("a_id"),
           AUTHOR.FIRST_NAME.as("a_first_name"),
           AUTHOR.LAST_NAME.as("a_last_name"),
           BOOK.ID.as("b_id"),
           BOOK.AUTHOR_ID.as("b_author_id"),
           BOOK.TITLE.as("b_title")
       )
       .from(AUTHOR)
       .join(BOOK).on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
       .orderBy(BOOK.ID)), 
    "bookmapping" // The name of the SqlResultSetMapping
);

result.forEach((Object[] entities) -> {
    JPAAuthor author = (JPAAuthor) entities[1];
    JPABook book = (JPABook) entities[0];

    System.out.println(author.firstName + " " + 
                       author.lastName + " wrote " + 
                       book.title);
});

The result in this case is again an Object[], but this time, the Object[] doesn’t represent a tuple with individual columns, but it represents the entities as declared by the SqlResultSetMapping annotation.

This approach is intriguing and probably has its use when you need to map arbitrary results from queries, but still want managed entities. We can only recommend Thorben Janssen‘s interesting blog series about these advanced JPA features, if you want to know more:

Conclusion

Choosing between an ORM and SQL (or between Hibernate and jOOQ, in particular) isn’t always easy.

  • ORMs shine when it comes to applying object graph persistence, i.e. when you have a lot of complex CRUD, involving complex locking and transaction strategies.
  • SQL shines when it comes to running bulk SQL, both for read and write operations, when running analytics, reporting.

When you’re “lucky” (as in – the job is easy), your application is only on one side of the fence, and you can make a choice between ORM and SQL. When you’re “lucky” (as in – ooooh, this is an interesting problem), you will have to use both. (See also Mike Hadlow’s interesting article on the subject)

The message here is: You can! Using JPA’s native query API, you can run complex queries leveraging the full power of your RDBMS, and still map results to JPA entities. You’re not restricted to using JPQL.

Side-note

While we’ve been critical with some aspects of JPA in the past (read How JPA 2.1 has become the new EJB 2.0 for details), our criticism has been mainly focused on JPA’s (ab-)use of annotations. When you’re using a type safe API like jOOQ, you can provide the compiler with all the required type information easily to construct results. We’re convinced that a future version of JPA will engage more heavily in using Java’s type system, allowing a more fluent integration of SQL, JPQL, and entity persistence.

Do Not Make This Mistake When Developing an SPI


Most of your code is private, internal, proprietary, and will never be exposed to public. If that’s the case, you can relax – you can refactor all of your mistakes, including those that incur breaking API changes.

If you’re maintining public API, however, that’s not the case. If you’re maintaining public SPI (Service Provider Interfaces), then things get even worse.

The H2 Trigger SPI

In a recent Stack Overflow question about how to implement an H2 database trigger with jOOQ, I have encountered the org.h2.api.Trigger SPI again – a simple and easy-to-implement SPI that implements trigger semantics. Here’s how triggers work in the H2 database:

Use the trigger

CREATE TRIGGER my_trigger
BEFORE UPDATE
ON my_table
FOR EACH ROW
CALL "com.example.MyTrigger"

Implement the trigger

public class MyTrigger implements Trigger {

    @Override
    public void init(
        Connection conn, 
        String schemaName,
        String triggerName, 
        String tableName, 
        boolean before, 
        int type
    )
    throws SQLException {}

    @Override
    public void fire(
        Connection conn, 
        Object[] oldRow, 
        Object[] newRow
    )
    throws SQLException {
        // Using jOOQ inside of the trigger, of course
        DSL.using(conn)
           .insertInto(LOG, LOG.FIELD1, LOG.FIELD2, ..)
           .values(newRow[0], newRow[1], ..)
           .execute();
    }

    @Override
    public void close() throws SQLException {}

    @Override
    public void remove() throws SQLException {}
}

The whole H2 Trigger SPI is actually rather elegant, and usually you only need to implement the fire() method.

So, how is this SPI wrong?

It is wrong very subtly. Consider the init() method. It has a boolean flag to indicate whether the trigger should fire before or after the triggering event, i.e. the UPDATE. What if suddenly, H2 were to also support INSTEAD OF triggers? Ideally, this flag would then be replaced by an enum:

public enum TriggerTiming {
    BEFORE,
    AFTER,
    INSTEAD_OF
}

But we can’t simply introduce this new enum type because the init() method shouldn’t be changed incompatibly, breaking all implementing code! With Java 8, we could at least declare an overload like this:

    default void init(
        Connection conn, 
        String schemaName,
        String triggerName, 
        String tableName, 
        TriggerTiming timing, 
        int type
    )
    throws SQLException {
        // New feature isn't supported by default
        if (timing == INSTEAD_OF)
            throw new SQLFeatureNotSupportedException();

        // Call through to old feature by default
        init(conn, schemaName, triggerName,
             tableName, timing == BEFORE, type);
    }

This would allow new implementations to handle INSTEAD_OF triggers while old implementations would still work. But it feels hairy, doesn’t it?

Now, imagine, we’d also support ENABLE / DISABLE clauses and we want to pass those values to the init() method. Or maybe, we want to handle FOR EACH ROW. There’s currently no way to do that with this SPI. So we’re going to get more and more of these overloads, which are very hard to implement. And effectively, this has happened already, as there is also org.h2.tools.TriggerAdapter, which is redundant with (but subtly different from) Trigger.

What would be a better approach, then?

The ideal approach for an SPI provider is to provide “argument objects”, like these:

public interface Trigger {
    default void init(InitArguments args)
        throws SQLException {}
    default void fire(FireArguments args)
        throws SQLException {}
    default void close(CloseArguments args)
        throws SQLException {}
    default void remove(RemoveArguments args)
        throws SQLException {}

    final class InitArguments {
        public Connection connection() { ... }
        public String schemaName() { ... }
        public String triggerName() { ... }
        public String tableName() { ... }
        /** use #timing() instead */
        @Deprecated
        public boolean before() { ... }
        public TriggerTiming timing() { ... }
        public int type() { ... }
    }

    final class FireArguments {
        public Connection connection() { ... }
        public Object[] oldRow() { ... }
        public Object[] newRow() { ... }
    }

    // These currently don't have any properties
    final class CloseArguments {}
    final class RemoveArguments {}
}

As you can see in the above example, Trigger.InitArguments has been successfully evolved with appropriate deprecation warnings. No client code was broken, and the new functionality is ready to be used, if needed. Also, close() and remove() are ready for future evolutions, even if we don’t need any arguments yet.

The overhead of this solution is at most one object allocation per method call, which shouldn’t hurt too much.

Another example: Hibernate’s UserType

Unfortunately, this mistake happens way too often. Another prominent example is Hibernate’s hard-to-implement org.hibernate.usertype.UserType SPI:

public interface UserType {
    int[] sqlTypes();
    Class returnedClass();
    boolean equals(Object x, Object y);
    int hashCode(Object x);

    Object nullSafeGet(
        ResultSet rs, 
        String[] names, 
        SessionImplementor session, 
        Object owner
    ) throws SQLException;

    void nullSafeSet(
        PreparedStatement st, 
        Object value, 
        int index, 
        SessionImplementor session
    ) throws SQLException;

    Object deepCopy(Object value);
    boolean isMutable();
    Serializable disassemble(Object value);
    Object assemble(
        Serializable cached, 
        Object owner
    );
    Object replace(
        Object original, 
        Object target, 
        Object owner
    );
}

The SPI looks rather difficult to implement. Probably, you can get something working rather quickly, but will you feel at ease? Will you think that you got it right? Some examples:

  • Is there never a case where you need the owner reference also in nullSafeSet()?
  • What if your JDBC driver doesn’t support fetching values by name from ResultSet?
  • What if you need to use your user type in a CallableStatement for a stored procedure?

Another important aspect of such SPIs is the way implementors can provide values back to the framework. It is generally a bad idea to have non-void methods in SPIs as you will never be able to change the return type of a method again. Ideally, you should have argument types that accept “outcomes”. A lot of the above methods could be replaced by a single configuration() method like this:

public interface UserType {
    default void configure(ConfigureArgs args) {}

    final class ConfigureArgs {
        public void sqlTypes(int[] types) { ... }
        public void returnedClass(Class<?> clazz) { ... }
        public void mutable(boolean mutable) { ... }
    }

    // ...
}

Another example, a SAX ContentHandler

Have a look at this example here:

public interface ContentHandler {
    void setDocumentLocator (Locator locator);
    void startDocument ();
    void endDocument();
    void startPrefixMapping (String prefix, String uri);
    void endPrefixMapping (String prefix);
    void startElement (String uri, String localName,
                       String qName, Attributes atts);
    void endElement (String uri, String localName,
                     String qName);
    void characters (char ch[], int start, int length);
    void ignorableWhitespace (char ch[], int start, int length);
    void processingInstruction (String target, String data);
    void skippedEntity (String name);
}

Some examples for drawbacks of this SPI:

  • What if you need the attributes of an element at the endElement() event? You’ll have to remember them yourself.
  • What if you’d like to know the prefix mapping uri at the endPrefixMapping() event? Or at any other event?

Clearly, SAX was optimised for speed, and it was optimised for speed at a time when the JIT and the GC were still weak. Nonetheless, implementing a SAX handler is not trivial. Parts of this is due to the SPI being hard to implement.

We don’t know the future

As API or SPI providers, we simply do not know the future. Right now, we may think that a given SPI is sufficient, but we’ll break it already in the next minor release. Or we don’t break it and tell our users that we cannot implement these new features.

With the above tricks, we can continue evolving our SPI without incurring any breaking changes:

  • Always pass exactly one argument object to the methods.
  • Always return void. Let implementors interact with SPI state via the argument object.
  • Use Java 8’s default methods, or provide an “empty” default implementation.

Did you enjoy this read? You might also enjoy:

How to Access a Method’s Result Value From the Finally Block


While the JVM is a stack-based machine, the Java language doesn’t really offer you any way to access that Stack. Even if sometimes, in rare occasions, it would be very useful.

An example

Method result values are put on the stack. If you look at the following example:

public int method() {
    if (something)
        return 1;

    ...
    if (somethingElse)
        return 2;

    ...
    return 0;
}

If we ignore the halting problem, error handling, and other academic discussions, we can say that the above method will “certainly” return any value of 1, 2, or 0. And that value is put on the stack prior to jumping out of the method.

Now, sometimes, it may be a use-case to take some action only when a given result value is returned. People might then be lured into starting the old flame-war discussion about whether multiple return statements are EVIL™ and the whole method should have been phrased like this, instead:

public int method() {
    int result = 0;

    if (something)
        result = 1;

    ...
    if (somethingElse)
        result = 2;

    ...
    // Important action here prior to return
    if (result == 1337)
        log.info("hehehe ;-)");

    return result;
}

Of course, the above example is wrong, because, previously, the if (something) return 1 and if (something) return 2 statements immediately aborted method execution. In order to achieve the same with the “single-return-statement” technique, we’ll have to rewrite our code like this:

public int method() {
    int result = 0;

    if (something)
        result = 1;
    else {

        ...
        if (somethingElse)
            result = 2;
        else {
            ...
        }
    }

    // Important action here prior to return
    if (result == 1337)
        log.info("hehehe ;-)");

    return result;
}

… and, of course, we can continue bike-shedding and flame-waring the use of curly braces and/or indentation levels, which shows we haven’t gained anything.

Accessing the return value from the stack

What we really wanted to do in our original implementation is a check just before returning to see what value is on the stack, i.e. what value will be returned. Here’s some pseudo-Java:

public int method() {
    try {
        if (something)
            return 1;

        ...
        if (somethingElse)
            return 2;

        ...
        return 0;
    }

    // Important action here prior to return
    finally {
        if (reflectionMagic.methodResult == 1337)
            log.info("hehehe ;-)");
    }
}

The good news is: Yes we can! Here’s a simple trick that can be done to achieve the above:

public int method() {
    int result = 0;

    try {
        if (something)
            return result = 1;

        ...
        if (somethingElse)
            return result = 2;

        ...
        return result = 0;
    }

    // Important action here prior to return
    finally {
        if (result == 1337)
            log.info("hehehe ;-)");
    }
}

The less good news is: You must never forget to explicitly assign the result. But every once in a while, this technique can be very useful to “access the method stack” when the Java language doesn’t really allow you to.

Of course…

Of course you could also just resort to this boring solution here:

public int method() {
    int result = actualMethod();

    if (result == 1337)
        log.info("hehehe ;-)");

    return result;
}

public int actualMethod() {
    if (something)
        return result = 1;

    ...
    if (somethingElse)
        return result = 2;

    ...
    return result = 0;
}

… and probably, most often, this technique is indeed better (because slightly more readable). But sometimes, you want to do more stuff than just logging in that finally block, or you want to access more than just the result value, and you don’t want to refactor the method.

Other approaches?

Now it’s your turn. What would be your preferred, alternative approach (with code examples?) E.g. using a Try monad? Or aspects?

Use This Preference to Speed up Your Eclipse m2e Configuration


Who doesn’t know them. The good old JFace dialogs in Eclipse that give you a visual representation of what is really a rather simple XML or properties file. In the case of m2e, it looks like this:

m2e-default

Unfortunately, this screen is a bit slow to load, and it doesn’t offer much value beyond checking version numbers and some other stuff that you’ll never change again. If you’re deep into using Maven, you’ll put plugins all over the place, and there is no way to visually manage the plugins in this screen. There can’t be, because plugins may contain “schemaless” XML, for which it would be impossible to design a non-XML editor.

So, most Eclipse/Maven/m2e users will immediately jump to the pom.xml representation, i.e. the tab on the far right, to edit the XML sources of their pom.xml files.

Luckily, you can change your workspace settings and make opening the XML tab the default for your pom.xml files:

m2e-change-default

A setting, which I believe should be the default in new installations of Eclipse, anyway. If you agree with this, show this issue some love:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=465882

Functional Programming in Java 8 with Javaslang


We’re very happy to announce a guest post on the jOOQ Blog written by Daniel Dietrich, Senior Software Engineer at HSH Nordbank, husband and father of three. He currently creates a pricing framework for financial products as project lead and lead developer.

Daniel Dietrich

Besides his work, he is interested in programming languages, efficient algorithms and data structures. Daniel wrote the short book Play Framework Starter on building web-applications with the Play Framework for Java and Scala – and has been creating Javaslang, a functional component library for Java 8 recently, which has triggered our interest in particular.


It was a really exciting moment as I heard that Java will get lambdas. The fundamental idea of using functions as a means of abstraction has its origin in the ‘lambda calculus’, 80 years ago. Now, Java developers are able to pass behavior using functions.

List<Integer> list = Arrays.asList(2, 3, 1);

// passing the comparator as lambda expression
Collections.sort(list, (i1, i2) -> i1 - i2);

Lambda expressions reduce the verbosity of Java a lot. The new Stream API closes the gap between lambdas and the Java collection library. Taking a closer look shows, that parallel Streams are used rarely or at least with caution. A Stream cannot be reused and it is annoying that collections have to be converted forth and back.

// stream a list, sort it and collect results
Arrays.asList(2, 3, 1)
  .stream()
  .sorted()
  .collect(Collectors.toList());
        
// a little bit shorter
Stream.of(2, 3, 1)
  .sorted()
  .collect(Collectors.toList());

// or better use an IntStream?
IntStream.of(2, 3, 1)
  .sorted()
  .collect(ArrayList::new, List::add, List::addAll);

// slightly simplified
IntStream.of(2, 3, 1)
  .sorted()
  .boxed()
  .collect(Collectors.toList());

Wow! These are quite some variants for sorting a list of integers. Generally we want to focus on the what rather than wrapping our heads around the how. This extra dimension of complexity isn’t necessary. Here is how to achieve the same result with Javaslang:

List.of(2, 3, 1).sort();

Typically every object oriented language has an imperative core, so does Java. We control the flow of our applications using conditional statements and loops.

String getContent(String location) throws IOException {
    try {
        final URL url = new URL(location);
        if (!"http".equals(url.getProtocol())) {
            throw new UnsupportedOperationException(
                "Protocol is not http");
        }
        final URLConnection con = url.openConnection();
        final InputStream in = con.getInputStream();
        return readAndClose(in);
    } catch(Exception x) {
        throw new IOException(
            "Error loading location " + location, x);
    }
}

Functional languages have expressions instead of statements, we think in values. Lambda expressions help us transforming values. Here is one example, using Javaslang’s Try:

Try<String> getContent(String location) {
    return Try
        .of(() -> new URL(location))
        .filter(url -> "http".equals(url.getProtocol()))
        .flatMap(url -> Try.of(url::openConnection))
        .flatMap(con -> Try.of(con::getInputStream))
        .map(this::readAndClose);
}

The result is either a Success containing the content or a Failure containing an exception. In general, this notion is more concise compared to the imperative style and leads to robust programs we are able to reason about.

I hope this brief introduction has peaked your interest in javaslang.com! Please visit the site to learn more about functional programming with Java 8 and Javaslang.

Use this Neat Window Function Trick to Calculate Time Differences in a Time Series


Whenever you feel that itch…

Can’t I calculate this with SQL?

The answer is: Yes you can! And you should! Let’s see how…

Calculating time differences between rows

Let’s consider the following database containing timestamps (e.g. in a log database). We’re using PostgreSQL syntax for this:

CREATE TABLE timestamps (
  ts timestamp
);

INSERT INTO timestamps VALUES
  ('2015-05-01 12:15:23.0'),
  ('2015-05-01 12:15:24.0'),
  ('2015-05-01 12:15:27.0'),
  ('2015-05-01 12:15:31.0'),
  ('2015-05-01 12:15:40.0'),
  ('2015-05-01 12:15:55.0'),
  ('2015-05-01 12:16:01.0'),
  ('2015-05-01 12:16:03.0'),
  ('2015-05-01 12:16:04.0'),
  ('2015-05-01 12:16:04.0');

Obviously, you’ll be adding constraints and indexes, etc. Now, let’s assume that each individual timestamp represents an event in your system, and you’d like to keep track of how long ago the previous event has happened. I.e. you’d like the following result:

ts                   delta
-------------------------------
2015-05-01 12:15:23
2015-05-01 12:15:24  00:00:01
2015-05-01 12:15:27  00:00:03
2015-05-01 12:15:31  00:00:04
2015-05-01 12:15:40  00:00:09
2015-05-01 12:15:55  00:00:15
2015-05-01 12:16:01  00:00:06
2015-05-01 12:16:03  00:00:02
2015-05-01 12:16:04  00:00:01
2015-05-01 12:16:04  00:00:00

In other words

  • ts1 (12:15:23) + delta (00:00:01) = ts2 (12:15:24)
  • ts2 (12:15:24) + delta (00:00:03) = ts3 (12:15:27)

This can be achieved very easily with the LAG() window function:

SELECT 
  ts, 
  ts - lag(ts, 1) OVER (ORDER BY ts) delta
FROM timestamps
ORDER BY ts;

The above reads simply:

Give me the difference between the ts value of the current row and the ts value of the row that “lags” behind this row by one, with rows ordered by ts.

Easy, right? With LAG() you can actually access any row from another row within a “sliding window” by simply specifying the lag index.

We’ve already described this wonderful window function in a previous blog post.

Bonus: A running total interval

In addition to the difference between this timestamp and the previous one, we might be interested in the total difference between this timestamp and the first timestamp. This may sound like a running total (see our previous article about running totals using SQL), but it can be calculated much more easily using FIRST_VALUE() – a “cousin” of LAG()

SELECT 
  ts, 
  ts - lag(ts, 1) OVER w delta,
  ts - first_value(ts) OVER w total
FROM timestamps
WINDOW w AS (ORDER BY ts)
ORDER BY ts;

… the above query then yields

ts                   delta     total
---------------------------------------
2015-05-01 12:15:23            00:00:00
2015-05-01 12:15:24  00:00:01  00:00:01
2015-05-01 12:15:27  00:00:03  00:00:04
2015-05-01 12:15:31  00:00:04  00:00:08
2015-05-01 12:15:40  00:00:09  00:00:17
2015-05-01 12:15:55  00:00:15  00:00:32
2015-05-01 12:16:01  00:00:06  00:00:38
2015-05-01 12:16:03  00:00:02  00:00:40
2015-05-01 12:16:04  00:00:01  00:00:41
2015-05-01 12:16:04  00:00:00  00:00:41

Extra bonus: The total since a “reset” event

We can take this as far as we want. Let’s assume that we want to reset the total from time to time:

CREATE TABLE timestamps (
  ts timestamp,
  event varchar(50)
);

INSERT INTO timestamps VALUES
  ('2015-05-01 12:15:23.0', null),
  ('2015-05-01 12:15:24.0', null),
  ('2015-05-01 12:15:27.0', 'reset'),
  ('2015-05-01 12:15:31.0', null),
  ('2015-05-01 12:15:40.0', null),
  ('2015-05-01 12:15:55.0', 'reset'),
  ('2015-05-01 12:16:01.0', null),
  ('2015-05-01 12:16:03.0', null),
  ('2015-05-01 12:16:04.0', null),
  ('2015-05-01 12:16:04.0', null);

We can now run the following query:

SELECT
  ts, 
  ts - lag(ts, 1) 
       OVER (ORDER BY ts) delta,
  ts - first_value(ts) 
       OVER (PARTITION BY c ORDER BY ts) total
FROM (
  SELECT 
    COUNT(*) FILTER (WHERE EVENT = 'reset') 
             OVER (ORDER BY ts) c,
    ts
  FROM timestamps
) timestamps
ORDER BY ts;

… to produce

ts                   delta     total
---------------------------------------
2015-05-01 12:15:23            00:00:00
2015-05-01 12:15:24  00:00:01  00:00:01
2015-05-01 12:15:27  00:00:03  00:00:00 <-- reset
2015-05-01 12:15:31  00:00:04  00:00:04
2015-05-01 12:15:40  00:00:09  00:00:13
2015-05-01 12:15:55  00:00:15  00:00:00 <-- reset
2015-05-01 12:16:01  00:00:06  00:00:06
2015-05-01 12:16:03  00:00:02  00:00:08
2015-05-01 12:16:04  00:00:01  00:00:09
2015-05-01 12:16:04  00:00:00  00:00:09

The beautiful part is in the derived table

  SELECT 
    COUNT(*) FILTER (WHERE EVENT = 'reset') 
             OVER (ORDER BY ts) c,
    ts
  FROM timestamps

This derived table just adds the “partition” to each set of timestamps given the most recent “reset” event. The result of the above subquery is:

c  ts
----------------------
0  2015-05-01 12:15:23
0  2015-05-01 12:15:24
1  2015-05-01 12:15:27 <-- reset
1  2015-05-01 12:15:31
1  2015-05-01 12:15:40
2  2015-05-01 12:15:55 <-- reset
2  2015-05-01 12:16:01
2  2015-05-01 12:16:03
2  2015-05-01 12:16:04
2  2015-05-01 12:16:04

As you can see, the COUNT(*) window function counts all the previous “reset” events, ordered by timestamp. This information can then be used as the PARTITION for the FIRST_VALUE() window function in order to find the first timestamp in each partition, i.e. at the time of the most recent “reset” event:

  ts - first_value(ts) 
       OVER (PARTITION BY c ORDER BY ts) total

Conclusion

It’s almost a running gag on this blog to say that…

There was SQL before window functions and SQL after window functions

Window functions are extremely powerful and they’re a part of the SQL standard, supported in most commercial databases, in PostgreSQL, in Firebird 3.0, and in CUBRID. If you aren’t using them already, start using them today!

If you’ve liked this article, find out more about window functions in any of the following articles: