Archive | java RSS for this section

Reactive Database Access – Part 1 – Why “Async”


We’re very happy to announce a guest post series on the jOOQ blog by Manuel Bernhardt. In this blog series, Manuel will explain the motivation behind so-called reactive technologies and after introducing the concepts of Futures and Actors use them in order to access a relational database in combination with jOOQ.

manuel-bernhardtManuel Bernhardt is an independent software consultant with a passion for building web-based systems, both back-end and front-end. He is the author of “Reactive Web Applications” (Manning) and he started working with Scala, Akka and the Play Framework in 2010 after spending a long time with Java. He lives in Vienna, where he is co-organiser of the local Scala User Group. He is enthusiastic about the Scala-based technologies and the vibrant community and is looking for ways to spread its usage in the industry. He’s also scuba-diving since age 6, and can’t quite get used to the lack of sea in Austria.

This series is split in three parts, which we’ll publish over the next month:

Reactive?

The concept of Reactive Applications is getting increasingly popular these days and chances are that you have already heard of it someplace on the Internet. If not, you could read the Reactive Manifesto or we could perhaps agree to the following simple summary thereof: in a nutshell, Reactive Applications are applications that:

  • make optimal use of computational resources (in terms of CPU and memory usage) by leveraging asynchronous programming techniques
  • know how to deal with failure, degrading gracefully instead of, well, just crashing and becoming unavailable to their users
  • can adapt to intensive workloads, scaling out on several machines / nodes as load increases (and scaling back in)

Reactive Applications do not exist merely in a wild, pristine green field. At some point they will need to store and access data in order to do something meaningful and chances are that the data happens to live in a relational database.

Latency and database access

When an application talks to a database more often than not the database server is not going to be running on the same server as the application. If you’re unlucky it might even be that the server (or set of servers) hosting the application live in a different data centre than the database server. Here is what this means in terms of latency:

Latency numbers every programmer should know

Say that you have an application that runs a simple SELECT query on its front page (let’s not debate whether or not this is a good idea here). If your application and database servers live in the same data centre, you are looking at a latency of the order of 500 µs (depending on how much data comes back). Now compare this to all that your CPU could do during that time (all those green and black squares on the figure above) and keep this in mind – we’ll come back to it in just a minute.

The cost of threads

Let’s suppose that you run your welcome page query in a synchronous manner (which is what JDBC does) and wait for the result to come back from the database. During all of this time you will be monopolizing a thread that waits for the result to come back. A Java thread that just exists (without doing anything at all) can take up to 1 MB of heap memory, so if you use a threaded server that will allocate one thread per user (I’m looking at you, Tomcat) then it is in your best interest to have quite a bit of memory available for your application in order for it to still work when featured on Hacker News (1 MB / concurrent user).

Reactive applications such as the ones built with the Play Framework make use of a server that follows the evented server model: instead of following the “one user, one thread” mantra it will treat requests as a set of events (accessing the database would be one of these events) and run it through an event loop:

Evented server and its event loop

Such a server will not use many threads. For example, default configuration of the Play Framework is to create one thread per CPU core with a maximum of 24 threads in the pool. And yet, this type of server model can deal with many more concurrent requests than the threaded model given the same hardware. The trick, as it turns out, is to hand over the thread to other events when a task needs to do some waiting – or in other words: to program in an asynchronous fashion.

Painful asynchronous programming

Asynchronous programming is not really new and programming paradigms for dealing with it have been around since the 70s and have quietly evolved since then. And yet, asynchronous programming is not necessarily something that brings back happy memories to most developers. Let’s look at a few of the typical tools and their drawbacks.

Callbacks

Some languages (I’m looking at you, Javascript) have been stuck in the 70s with callbacks as their only tool for asynchronous programming up until recently (ECMAScript 6 introduced Promises). This is also knows as christmas tree programming:

Christmas tree of hell

Ho ho ho.

Threads

As a Java developer, the word asynchronous may not necessarily have a very positive connotation and is often associated to the infamous synchronized keyword:

Working with threads in Java is hard, especially when using mutable state – it is so much more convenient to let an underlying application server abstract all of the asynchronous stuff away and not worry about it, right? Unfortunately as we have just seen this comes at quite a hefty cost in terms of performance.

And I mean, just look at this stack trace:

Abstraction is the mother of all trees

In one way, threaded servers are to asynchronous programming what Hibernate is to SQL – a leaky abstraction that will cost you dearly on the long run. And once you realize it, it is often too late and you are trapped in your abstraction, fighting by all means against it in order to increase performance. Whilst for database access it is relatively easy to let go of the abstraction (just use plain SQL, or even better, jOOQ), for asynchronous programming the better tooling is only starting to gain in popularity.

Let’s turn to a programming model that finds its roots in functional programming: Futures.

Futures: the SQL of asynchronous programming

Futures as they can be found in Scala leverage functional programming techniques that have been around for decades in order to make asynchronous programming enjoyable again.

Future fundamentals

A scala.concurrent.Future[T] can be seen as a box that will eventually contain a value of type T if it succeeds. If it fails, theThrowable at the origin of the failure will be kept. A Future is said to have succeeded once the computation it is waiting for has yielded a result, or failed if there was an error during the computation. In either case, once the Future is done computing, it is said to be completed.

Welcome to the Futures

As soon as a Future is declared, it will start running, which means that the computation it tries to achieve will be executed asynchronously. For example, we can use the WS library of the Play Framework in order to execute a GET request against the Play Framework website:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

This call will return immediately and lets us continue to do other things. At some point in the future, the call may have been executed, in which case we could access the result to do something with it. Unlike Java’s java.util.concurrent.Future<V>which lets one check whether a Future is done or block while retrieving it with the get() method, Scala’s Future makes it possible to specify what to do with the result of an execution.

Transforming Futures

Manipulating what’s inside of the box is easy as well and we do not need to wait for the result to be available in order to do so:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

val siteOnline: Future[Boolean] = 
  response.map { r =>
    r.status == 200
  }

siteOnline.foreach { isOnline =>
  if(isOnline) {
    println("The Play site is up")
  } else {
    println("The Play site is down")
  }
}

In this example, we turn our Future[WSResponse] into a Future[Boolean] by checking for the status of the response. It is important to understand that this code will not block at any point: only when the response will be available will a thread be made available for the processing of the response and execute the code inside of the map function.

Recovering failed Futures

Failure recovery is quite convenient as well:

val response: Future[WSResponse] =
  WS.url("http://www.playframework.com").get()

val siteAvailable: Future[Option[Boolean]] = 
  response.map { r =>
    Some(r.status == 200)
  } recover {
    case ce: java.net.ConnectException => None
  }

At the very end of the Future we call the recover method which will deal with a certain type of exception and limit the dammage. In this example we are only handling the unfortunate case of a java.net.ConnectException by returning a Nonevalue.

Composing Futures

The killer feature of Futures is their composeability. A very typical use-case when building asynchronous programming workflows is to combine the results of several concurrent operations. Futures (and Scala) make this rather easy:

def siteAvailable(url: String): Future[Boolean] =
  WS.url(url).get().map { r =>
    r.status == 200
}

val playSiteAvailable =
  siteAvailable("http://www.playframework.com")

val playGithubAvailable =
  siteAvailable("https://github.com/playframework")

val allSitesAvailable: Future[Boolean] = for {
  siteAvailable <- playSiteAvailable
  githubAvailable <- playGithubAvailable
} yield (siteAvailable && githubAvailable)

The allSitesAvailable Future is built using a for comprehension which will wait until both Futures have completed. The two Futures playSiteAvailable and playGithubAvailable will start running as soon as they are being declared and the for comprehension will compose them together. And if one of those Futures were to fail, the resulting Future[Boolean] would fail directly as a result (without waiting for the other Future to complete).

This is it for the first part of this series. In the next post we will look at another tool for reactive programming and then finally at how to use those tools in combination in order to access a relational database in a reactive fashion.

Read on

Stay tuned as we’ll publish Parts 2 and 3 shortly as a part of this series:

A Subtle AutoCloseable Contract Change Between Java 7 and Java 8


A nice feature of the Java 7 try-with-resources statement and the AutoCloseable type that was introduced to work with this statement is the fact that static code analysis tools can detect resource leaks. For instance, Eclipse:

resource-leak

When you have the above configuration and you try running the following program, you’ll get three warnings:

public static void main(String[] args) 
throws Exception {
    Connection c = DriverManager.getConnection(
         "jdbc:h2:~/test", "sa", "");
    Statement s = c.createStatement();
    ResultSet r = s.executeQuery("SELECT 1 + 1");
    r.next();
    System.out.println(r.getInt(1));
}

The output is, trivially

2

The warnings are issued on all of c, s, r. A quick fix (don’t do this!) is to suppress the warning using an Eclipse-specific SuppressWarnings parameter:

@SuppressWarnings("resource")
public static void main(String[] args) 
throws Exception {
    ...
}

After all, WeKnowWhatWeReDoing™ and this is just a simple example, right?

Wrong!

The right way to fix this, even for simple examples (at least after Java 7) is to use the effortless try-with-resources statement.

public static void main(String[] args) 
throws Exception {
    try (Connection c = DriverManager.getConnection(
             "jdbc:h2:~/test", "sa", "");
         Statement s = c.createStatement();
         ResultSet r = s.executeQuery("SELECT 1 + 1")) {

        r.next();
        System.out.println(r.getInt(1));
    }
}

In fact, it would be great if Eclipse could auto-fix this warning and wrap all the individual statements in a try-with-resources statement. Upvote this feature request, please!

Great, we know this. What’s the deal with Java 8?

In Java 8, the contract on AutoCloseable has changed very subtly (or bluntly, depending on your point of view).

Java 7 version

A resource that must be closed when it is no longer needed.

Note the word "must".

Java 8 version

An object that may hold resources (such as file or socket handles) until it is closed. The close() method of an AutoCloseable object is called automatically when exiting a try-with-resources block for which the object has been declared in the resource specification header. This construction ensures prompt release, avoiding resource exhaustion exceptions and errors that may otherwise occur.

API Note:

It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.

In short, from Java 8 onwards, AutoCloseable is more of a hint saying that you might be using a resource that needs to be closed, but this isn’t necessarily the case.

This is similar to the Iterable contract, which doesn’t say whether you can iterate only once, or several times over the Iterable, but it imposes a contract that is required for the foreach loop.

When do we have “optionally closeable” resources?

Take jOOQ for instance. Unlike in JDBC, a jOOQ Query (which was made AutoCloseable in jOOQ 3.7) may or may not represent a resource, depending on how you execute it. By default, it is not a resource:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "")) {

    // No new resources created here:
    ResultQuery<Record> query =
        DSL.using(c).resultQuery("SELECT 1 + 1");

    // Resources created and closed immediately
    System.out.println(query.fetch());
}

The output is again:

+----+
|   2|
+----+
|   2|
+----+

But now, we have again an Eclipse warning on the query variable, saying that there is a resource that needs to be closed, even if by using jOOQ this way, we know that this isn’t true. The only resource in the above code is the JDBC Connection, and it is properly handled. The jOOQ-internal PreparedStatement and ResultSet are completely handled and eagerly closed by jOOQ.

Then, why implement AutoCloseable in the first place?

jOOQ inverses JDBC’s default behaviour.

  • In JDBC, everything is done lazily by default, and resources have to be closed explicitly.
  • In jOOQ, everything is done eagerly by default, and optionally, resources can be kept alive explicitly.

For instance, the following code will keep an open PreparedStatement and ResultSet:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "");

     // We "keep" the statement open in the ResultQuery
     ResultQuery<Record> query =
         DSL.using(c)
            .resultQuery("SELECT 1 + 1")
            .keepStatement(true)) {

    // We keep the ResultSet open in the Cursor
    try (Cursor<Record> cursor = query.fetchLazy()) {
        System.out.println(cursor.fetchOne());
    }
}

With this version, we no longer have any warnings in Eclipse, but the above version is really the exception when using the jOOQ API.

The same thing is true for Java 8’s Stream API. Interestingly, Eclipse doesn’t issue any warnings here:

Stream<Integer> stream = Arrays.asList(1, 2, 3).stream();
stream.forEach(System.out::println);

Conclusion

Resource leak detection seems to be a nice IDE / compiler feature at first. But avoiding false positives is hard. Specifically, because Java 8 changed contracts on AutoCloseable, implementors are allowed to implement the AutoCloseable contract for mere convenience, not as a clear indicator of a resource being present that MUST be closed.

This makes it very hard, if not impossible, for an IDE to detect resource leaks of third party APIs (non-JDK APIs), where these contracts aren’t generally well-known. The solution is, as ever so often with static code analysis tools, to simply turn off potential resource leak detection:

resource-leak-solution

For more insight, see also this Stack Overflow answer by Stuart Marks, linking to the EG’s discussions on lambda-dev

Beware of Functional Programming in Java!


This isn’t going to be a rant about functional programming, which is awesome. This is a warning about some practices that you are very likely going to apply to your code, which are terribly wrong!.

Higher order functions are essential to functional programming, and thus, talking about them will help you be the center of attention at parties.

If you’re writing JavaScript, you’re doing it all the time. For instance:

setTimeout(function() {
    alert('10 Seconds passed');
}, 10000);

The above setTimeout() function is a higher-order function. It is a function that takes an anonymous function as an argument. After 10 seconds, it will call the function passed as an argument.

We can write another easy higher-order function that provides the above function as a result:

var message = function(text) {
    return function() {
        alert(text);
    }
};

setTimeout(message('10 Seconds passed'), 10000);

If you execute the above, message() will be executed, returning an anonymous function, which alerts the argument text you have passed to message()

In functional programming, the above is common practice. A function returned from a higher-order function will capture the outer scope and is able to act on this scope when called.

Why is this practice treacherous in Java?

For the same reasons. A “function” (lambda) returned from a higher-order “function” (method) will capture the outer scope and is able to act on this scope when called.

The most trivial example is given here:

class Test {
    public static void main(String[] args) {
        Runnable runnable = runnable();
        runnable.run(); // Breakpoint here
    }

    static Runnable runnable() {
        return () -> {
            System.out.println("Hello");
        };
    }
}

In the above logic, if you put a breakpoint right where the runnable.run() call is made, you can see the harmless lambda instance on the stack. A simple generated class, backing the functional interface implementation:

harmless-lambda-instance

Now let’s translate this example to your average Enterprise™ application (notice the annotations), which we’ve greatly simplified to fit this blog post:

class Test {
    public static void main(String[] args) {
        Runnable runnable = new EnterpriseBean()
            .runnable();
        runnable.run(); // Breakpoint here
    }
}

@ImportantDeclaration
@NoMoreXML({
    @CoolNewValidationStuff("Annotations"),
    @CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
    Object[] enterpriseStateObject = 
        new Object[100_000_000];

    Runnable runnable() {
        return () -> {
            System.out.println("Hello");
        };
    }
}

The breakpoint is still at the same spot. What do we see on the stack?

Still a harmless little lambda instance:

harmless-lambda-instance-2

Fine. Of course. Let’s add some additional logging, just for debugging

class Test {
    public static void main(String[] args) {
        Runnable runnable = new EnterpriseBean()
            .runnable();
        runnable.run(); // Breakpoint here
    }
}

@ImportantDeclaration
@NoMoreXML({
    @CoolNewValidationStuff("Annotations"),
    @CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
    Object[] enterpriseStateObject = 
        new Object[100_000_000];

    Runnable runnable() {
        return () -> {
            // Some harmless debugging here
            System.out.println("Hello from: " + this);
        };
    }
}

Ooops!

Suddenly, the “harmless” little this reference forced the Java compiler to enclose the enclosing instance of the EnterpriseBean™ in the returned Runnable class:

treacherous-lambda-with-enclosing-instance

And with it that heavy enterpriseStateObject came along, which can now no longer be garbage collected, until the call site releases the harmless little Runnable

Sure. Just be careful, know what you’re doing, and don’t reference “this” from a lambda

… you say?

How about a more subtle version?

class EnterpriseBean {
    Object[] enterpriseStateObject = 
        new Object[100_000_000];

    Runnable runnable() {
        return () -> log(); // implicit this.log()
    }

    void log() {
        // Some harmless debugging here
        System.out.println("Hello");
    }
}

OK, this is nothing new now, is it?

Indeed, it isn’t. Java 8 doesn’t have first-class functions, and that’s OK. The idea of backing lambda expressions by nominal SAM types is quite cunning, as it allowed to upgrade and lambda-y-fy all existing libraries in the Java ecosystem without changing them.

Also, with an anonymous class, this whole story would not have been surprising. The following coding style has leaked internal state via anonymous classes since good old Swing 1.0 style ActionListener et al.

class Test {
    public static void main(String[] args) {
        Runnable runnable = new EnterpriseBean()
            .runnable();
        runnable.run();
    }
}

@ImportantDeclaration
@NoMoreXML({
    @CoolNewValidationStuff("Annotations"),
    @CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
    Object[] enterpriseStateObject = 
        new Object[100_000_000];

    Runnable runnable() {
        return new Runnable() {
            @Override
            public void run() {
                System.out.println("Hello from " + EnterpriseBean.this);
            }
        };
    }
}

What’s new? The lambda style will encourage using higher-order functions in Java, all over the place. Which is generally good. But only when the higher-order function is a static method, whose resulting types will not enclose any state.

With the above examples, however, we can see that we’re going to be debugging through a couple of memory leaks and problems in the near future, when we start embracing Java 8’s functional style programming.

So, be careful, and follow this rule:

(“Pure”) Higher order functions MUST be static methods in Java!

Further reading

Enclosing instances have caused issues before. Read about how the dreaded double curly braces anti pattern has caused pain and suffering among Java developers for the last two decades.

The Danger of Subtype Polymorphism Applied to Tuples


Java 8 has lambdas and streams, but no tuples, which is a shame. This is why we have implemented tuples in jOOλ – Java 8’s missing parts. Tuples are really boring value type containers. Essentially, they’re just an enumeration of types like these:

public class Tuple2<T1, T2> {
    public final T1 v1;
    public final T2 v2;

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }

    // [...]
}


public class Tuple3<T1, T2, T3> {
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;

    public Tuple3(T1 v1, T2 v2, T3 v3) {
        this.v1 = v1;
        this.v2 = v2;
        this.v3 = v3;
    }

    // [...]
}

Writing tuple classes is a very boring task, and it’s best done using a source code generator.

Tuples in other languages and APIs

jOOλ‘s current version features tuples of degrees 0 – 16. C# and other .NET languages have tuple types between 1 – 8. There’s a special library just for tuples called Javatuples with tuples between degrees 1 and 10, and the authors went the extra mile and gave the tuples individual English names:

Unit<A> // (1 element)
Pair<A,B> // (2 elements)
Triplet<A,B,C> // (3 elements)
Quartet<A,B,C,D> // (4 elements)
Quintet<A,B,C,D,E> // (5 elements)
Sextet<A,B,C,D,E,F> // (6 elements)
Septet<A,B,C,D,E,F,G> // (7 elements)
Octet<A,B,C,D,E,F,G,H> // (8 elements)
Ennead<A,B,C,D,E,F,G,H,I> // (9 elements)
Decade<A,B,C,D,E,F,G,H,I,J> // (10 elements)

Why?

because Ennead really rings that sweet bell when I see it

Last, but not least, jOOQ also has a built-in tuple-like type, the org.jooq.Record, which serves as a base type for nice subtypes like Record7<T1, T2, T3, T4, T5, T6, T7>. jOOQ follows Scala and defines records up to a degree of 22.

Watch out when defining tuple type hierarchies

As we have seen in the previous example, Tuple3 has much code in common with Tuple2.

As we’re all massively brain-damaged by decades of object orientation and polymorphic design anti-patters, we might think that it would be a good idea to let Tuple3<T1, T2, T3> extend Tuple2<T1, T2>, as Tuple3 just adds one more attribute to the right of Tuple2, right? So…

public class Tuple3<T1, T2, T3> extends Tuple2<T1, T2> {
    public final T3 v3;

    public Tuple3(T1 v1, T2 v2, T3 v3) {
        super(v1, v2);
        this.v3 = v3;
    }

    // [...]
}

The truth is: That’s about the worst thing you could do, for several reasons. First off, yes. Both Tuple2 and Tuple3 are tuples, so they do have some common features. It’s not a bad idea to group those features in a common super type, such as:

public class Tuple2<T1, T2> implements Tuple {
    // [...]
}

But the degree is not one of those things. Here’s why:

Permutations

Think about all the possible tuples that you can form. If you let tuples extend each other, then a Tuple5 would also be assignment-compatible with a Tuple2, for instance. The following would compile perfectly:

Tuple2<String, Integer> t2 = tuple("A", 1, 2, 3, "B");

When letting Tuple3 extend Tuple2, it may have seemed like a good default choice to just drop the right-most attribute from the tuple in the extension chain.

But in the above example, why don’t I want to re-assign (v2, v4) such that the result is (1, 3), or maybe (v1, v3), such that the result is ("A", 2)?

There are a tremendous amount of permutations of possible attributes that could be of interest when “reducing” a higher degree tuple to a lower degree one. No way a default of dropping the right-most attribute will be sufficiently general for all use-cases

Type systems

Much worse than the above, there would be drastic implications for the type system, if Tuple3 extended Tuple2. Check out the jOOQ API, for instance. In jOOQ, you can safely assume the following:

// Compiles:
TABLE1.COL1.in(select(TABLE2.COL1).from(TABLE2))

// Must not compile:
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

The first IN predicate is correct. The left hand side of the predicate has a single column (as opposed to being a row value expression). This means that the right hand side of the predicate must also operate on single-column expressions, e.g. a SELECT subquery that selects a single column (of the same type).

The second example selects too many columns, and the jOOQ API will tell the Java compiler that this is wrong.

This is guaranteed by jOOQ via the Field.in(Select) method, whose signature reads:

public interface Field<T> {
    ...
    Condition in(Select<? extends Record1<T>> select);
    ...
}

So, you can provide a SELECT statement that produces any subtype of the Record1<T> type.

Luckily, Record2 does not extend Record1

If now Record2 extended Record1, which might have seemed like a good idea at first, the second query would suddenly compile:

// This would now compile
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

… even if it forms an invalid SQL statement. It would compile because it would generate a Select<Record2<Type1, Type2>> type, which would be a subtype of the expected Select<Record1<Type1>> from the Field.in(Select) method.

Conclusion

Tuple2 and Tuple5 types are fundamentally incompatible types. In strong type systems, you mustn’t be lured into thinking that similar types, or related types should also be compatible types.

Type hierarchies are something very object-oriented, and by object-oriented, I mean the flawed and over-engineered notion of object orientation that we’re still suffering from since the 90s. Even in “the Enterprise”, most people have learned to favour Composition over Inheritance. Composition in the case of tuples means that you can well transform a Tuple5 to a Tuple2. But you cannot assign it.

In jOOλ, such a transformation can be done very easily as follows:

// Produces (1, 3)
Tuple2<String, Integer> t2_4 = 
    tuple("A", 1, 2, 3, "B")
    .map((v1, v2, v3, v4, v5) -> tuple(v2, v4));

// Produces ("A", 2)
Tuple2<String, Integer> t1_3 = 
    tuple("A", 1, 2, 3, "B")
    .map((v1, v2, v3, v4, v5) -> tuple(v1, v3));

The idea is that you operate on immutable values, and that you can easily extract parts of those values and map / recombine them to new values.

Read more

If you’ve enjoyed reading this article, you might also like to learn why recursive generics are a terrible idea (in many situations).

A Beginner’s Guide to Using Java EE with jOOQ


Java EE ships with its own persistence API: JPA. JPA is most powerful when you want to map your RDBMS entities (tables / relations) to Java entities (classes), mostly following a 1:1 mapping strategy. The idea behind this is that often, business logic isn’t really set-oriented as relational algebra or SQL, but record-oriented, meaning that business rules and business logic is applied to individual records.

In other words, when SQL and relational algebra is about values (tuples), JPA is about identity and state (of individual records). And this is where JPA shines, because:

Life is too short to write CRUD with SQL

But as Gavin King always said:

gavin-king-on-hibernate

RDBMS are not just about CRUD

Gavin King was well aware of the OLAP hype that was going on at the time he started working on Hibernate, the most popular JPA implementation. Business intelligence, or data science as it is called nowadays, relies on much more advanced functionality than simple CRUD – functionality that has never been targeted by the JPA specification, or by its implementations.

In fact, you don’t necessarily have to do OLAP to benefit from native SQL, simpler use-cases in more ordinary OLTP environments can appear as well, such as

  • Reporting
  • Batch and bulk data processing
  • Query with complex business rules

While JPA offers JPQL and Criteria API, which will help you express some amount of complexity in your queries, you will eventually be limited by the features offered in those languages and APIs, as Michael Simons has recently documented in an interesting Criteria API to jOOQ comparison.

For this reason, all JPA implementations offer a way to query the database using “native SQL”. In a previous blog post, we’ve shown how you can leverage jOOQ’s type safe DSL API to run SQL queries via JPA’s native query API, and then fetch results…

In the above cases, jOOQ is used only as a SQL query builder, while query execution is left to JPA.

Do all database querying with jOOQ, in Java EE

Remember jOOQ’s philosophy:

jOOQ is essentially type safe JDBC. Nothing more.

Even if you can use JPA to execute native SQL, you don’t have to. You can operate directly on a JDBC level, something that is often required with JPA, e.g. when working…

  • … with vendor-specific data types
  • … with non-trivial stored procedures
  • … with statement batches
  • … with updatable cursors

When you run your application on an application server, you can pick the features that you want and need, and use proprietary APIs (such as jOOQ, which runs on top of JDBC) for the rest. For instance, you can use:

  • EJB for session and scope management
  • CDI for dependency injection
  • jOOQ for your database interaction

(you could also add JTA to the stack – for simplicity reasons, we’ll skip that for now)

The procedure is simple: Just inject a javax.sql.DataSource into your session bean using CDI:

@Stateless
public class LibraryEJB {

    @Resource(lookup="java:data-source-configuration")
    private DataSource ds;
}

… and start working with it using JDBC:

public List<Author> fetchAuthors() 
throws SQLException {
    List<Author> result = new ArrayList<>();

    // Get a Connection from the injected DataSource
    try(Connection con = ds.getConnection();
        PreparedStatement stmt = con.prepareStatement(
            "SELECT * FROM AUTHOR ORDER BY ID");
        ResultSet rs = stmt.executeQuery()
    ) {
        result.add(new Author(
            rs.getInt("ID"),
            rs.getString("FIRST_NAME"),
            rs.getString("LAST_NAME")
        ));
    }

    return result;
}

… or using jOOQ:

public Result<AuthorRecord> fetchAuthors() {

    // Pass the injected DataSource to jOOQ
    return DSL.using(ds, H2)
              .selectFrom(AUTHOR)
              .orderBy(AUTHOR.ID)
              .fetch();
}

Notice how jOOQ – by default – fetches all results eagerly into memory, closing resources like the JDBC Connection, PreparedStatement, and ResultSet eagerly, such that you’re not required to deal with the hassle of resource management yourself.

Again:

jOOQ is essentially type safe JDBC. Nothing more.

JDBC has always been an important part of Java EE applications, for all sorts of reasons, including access to vendor-specific features. jOOQ adds compile-time type safety on top of JDBC. Nothing more. Whatever works with JDBC will work with jOOQ.

In particular, jOOQ will never interfere with your transaction or session model, regardless of the choice you make. All that is needed by jOOQ is a JDBC Connection or DataSource.

Running an example in JBoss WildFly

The above example can be checked out from GitHub and run directly in WildFly, for example – or with only little adaptations in any other Java EE application server:

https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/jOOQ-javaee-example

The example was created for WildFly in the context of a Webinar with Arun Gupta. The webinar answers the following questions:

  • What is jOOQ ?
  • Why JOOQ when there is JDBC and JPA ?
  • How does it fit with Java EE apps ? Does it uses underlying JPA persistence provider or some other connection ?
  • Pros/cons over JPA ? Pure Hibernate ?
  • How well does it scale ?
  • Show code sample in a Java EE application
  • jOOQ for CRUD-based or domain-rich application ?
  • How can eventually all the work in jOOQ be integrated in JPA and be standardized ? Or would it be more of JDBC ?

The full webinar can be seen on YouTube, here:

It is all about the JDBC Basics


We’re very happy to announce a guest post by Marco Behler, who has been blogging about jOOQ in the past.

img31Marco started out in programming (reverse-engineering, actually) and now mainly programmes on the JVM in his day-to-day work. He also always had a sweet tooth for strategy and marketing. Marco Behler GmbH is the result of that hybrid role.

It is all about the JDBC Basics

It is one of the days.

You are reading the Spring documentation’s @Transactional section and still don’t understand the difference between logical and physical transaction scopes. Simultaneously your app throws an
LazyInitializationException and you have no idea why. To top it off you see spontaneous database deadlocks in production and you suspect your connection pool is leaking connections..somehow.

Know what most likely would have helped instead of banging your head against the wall? Spending a couple (literally) of hours on learning the JDBC basics. Let’s find out why:

What are the JDBC basics?

The basics are opening up/closing database connections and then working with transactions. Also understanding how deadlocks, pessimistic and optimistic locking work on a plain JDBC level. A bit of isolation levels and savepoints and then directly on to connection pools and jdbc driver logging. That’s it. Seriously.

Why are the basics so important?

Everything you will encounter in frameworks like Spring, Hibernate, jOOQ etc. builds up on these basics. For example, there are a gazillion topics on the internet regarding Hibernate’s LazyInitializationException and I was scared of that particular exception myself many years ago. But what else would you expect trying to query the database without having a connection to the database open (which is basically all that this exeception is) ?

The same with Spring’s “transaction framework”. There is so much content, or shall we say (F)ear/(U)ncertainty/(D)oubt, out there on how to open up transactions with spring, be it programmatically, with annotations or xml. But what if you knew that under the hood, there is only one way (and actually one line of code) to open up transactions in the JDBC world?

Let me not even get started on the various (mis)configurations of connection pools you see in production in the wild. Or the unawareness of JDBC (driver) logging, which usually leads to debugging in the wild. All basics, which you can master in a couple of hours and which will help you for a lifetime!

Why do people not just learn the basics?

In every middle-sized project there is a ton of technologies involved and there usually is no clear-cut path on how to learn all of them or how they all work together. It simply takes a lot of time and effort to dig through everything.

There’s JPA sessions and JDBC connections and then Spring somehow provides those transactional proxies in 5 different ways and then some other colleague just put jOOQ into the mix, but then somehow my session doesn’t flush and my objects don’t get persisted and the HibernateTransactionManager is not working as expected.

With all of this, I would also hope for my database transactions just to commit – god forbid what happens on rollback :)

But in the end, everything technology mentioned is just a layer on top of JDBC. If you understand transactions or deadlocks or savepoints on the basic level, then Spring or Hibernate or jOOQ will not throw you off.

So what do you recommend ?

If you want to get miles ahead in your day-to-day database programming, you have to start with the basics. Step-by-Step. And then you will see most of your problems automatically evaporate.

Out of my extensive database programming experience, I created an ebook with a ton of ready-to-run exercises, which will take you from Java database novice to expert. At your own pace. You can literally copy the source code of every chapter into your IDE, run it and (hopefully) learn from it. It covers plain JDBC, Spring, Hibernate, jOOQ (soon) and also distributed transactions.

You can read the whole book for free online here, and I would love to get your feedback! I would really like to let the community feedback flow back into future editions of the book. In addition, If you like what you see and the exercises help you, you can also show your support by getting a paid digital version (pdf, epub, mobi).

In any case…

…learn your JDBC basics – and you will profit from them for a lifetime!

Comparing Imperative and Functional Algorithms in Java 8


Mario Fusco’s popular tweet impressively shows what the main difference between imperative and functional approaches to similar algorithms really is:

Both algorithms do the same thing, they’re probably equally fast and reasonable. Yet, one of the algorithms is much easier to write and read than the other. The difference lies in the fact that in imperative programming, different algorithmic requirements are spread throughout the code block, when in functional programming, each requirement has its own little line of code. Compare:

  • Green: Error handling
  • Blue: Stop criteria
  • Red: IO operations
  • Yellow: “Business logic”

Functional programming doesn’t always beat imperative programming as displayed in other examples on the jOOQ blog:

But here’s an example from Stack Overflow by user Aurora_Titanium, where the difference is as clear as in Mario Fusco’s example:

Calculating the Duplicate Values in an Array

The idea is to calculate the sum of all those values that are duplicate in a set of values. For instance, the following array:

int[] list = new int[]{1,2,3,4,5,6,7,8,8,8,9,10};

… should yield as a result something like:

Duplicate: 8. Sum of all duplicate values: 24

The imperative approach

One of the answers by user Volkan Ozkan takes an imperative approach and calculates the sum as such:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

int sum = 0;
for (int j = 0; j < array.length; j++)
{
    for (int k = j + 1; k < array.length; k++) 
    {
        if (k != j && array[k] == array[j])
        {
            sum = sum + array[k];
            System.out.println(
                "Duplicate found: " 
              + array[k]
              + " " 
              + "Sum of the duplicate value is " + sum);
        }
    }
}

The approach works only for sorted arrays where duplicates appear right after one another. In that case, however, it is probably an optimal solution in terms of performance, if performance really matters to this algorithm.

The functional approach

If a slight decrease of performance is acceptable to you (boxing ints, collecting them into maps), and it probably is, you can replace the above difficult-to-read code with the following bit of functional Java-8-style logic, which communicates much more clearly what it does:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

IntStream.of(array)
         .boxed()
         .collect(groupingBy(i -> i))
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

or, with explanations:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

// Create a Stream<Integer> from your data
IntStream.of(array)
         .boxed()

// Group values into a Map<Integer, List<Integer>>
         .collect(groupingBy(i -> i))

// Filter out those map values that have only 
// 1 element in their group
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)

// Print the sum for the remaining groups
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

(note that the functional approach calculates sums for each duplicate value, not an overall sum, like the imperative approach. From the original question, this requirement wasn’t very clear)

As we’ve stated in a previous article on our blog, the power of functional programming via an API like the Java 8 Stream API is the fact that we’re approaching the expressive power of SQL-style declarative programming. We’re no longer concerned with remembering individual array indexes and how to calculate them and store intermediate results into some buffers. We can now focus on the really interesting logic, such as: “what’s a duplicate?” or “what sum am I interested in?”

Read on about how SQL compares to Java 8 Streams:

Common SQL clauses and their equivalents in Java 8 Streams

How to use Java 8 Functional Programming to Generate an Alphabetic Sequence


I’ve stumbled upon an interesting Stack Overflow question by user “mip”. The question was:

I’m looking for a way of generating an alphabetic sequence:

A, B, C, ..., Z, AA, AB, AC, ..., ZZ.

This can be quickly recognised as the headings of an Excel spreadsheet, which does precisely that:

excel.

So far, none of the answers employed any Java 8 functional programming, which I accepted as a challenge. We’re going to use jOOλ, because the Java 8 Stream API does not offer enough functionality for this task. (I stand corrected – thank you Sebastian, for this interesting answer)

But first, let’s decompose the algorithm in a functional way. What we need are these components:

  1. A (reproducible) representation of the alphabet
  2. An upper bound, i.e. how many letters we want to produce. The requested sequence goes to ZZ, which means the upper bound would be 2
  3. A way to combine each letter of the alphabet with the previously generated combined letters in a cartesian product

Let’s look into some code:

1. Generating the alphabet

We could be writing the alphabet like this:

List<String> alphabet = Arrays.asList("A", "B", ..., "Z");

but that would be lame. Let’s generate it instead, using jOOλ:

List<String> alphabet = Seq
    .rangeClosed('A', 'Z')
    .map(Object::toString)
    .toList();

The above generates a “closed” range (Java-8-Stream-speak for a range with inclusive upper bound) of characters between A and Z, maps characters to strings and collects them into a list.

So far so good. Now:

2. Using an upper bound

The requested sequence of characters includes:

A .. Z, AA, AB, .. ZZ

But we could easily imagine to extend this requirement generally to produce the following, or even more.

A .. Z, AA, AB, .. ZZ, AAA, AAB, .. ZZZ

For this, we’ll use again rangeClosed():

// 1 = A .. Z, 2 = AA .. ZZ, 3 = AAA .. ZZZ
Seq.rangeClosed(1, 2)
   .flatMap(length -> ...)
   .forEach(System.out::println);

The idea here is to produce a new stream for each individual length in the range [1 .. 2], and to flatten those streams into one single stream. flatMap() is essentially the same as a nested loop in imperative programming.

3. Combine letters in a cartesian product

This is the trickiest part: We need to combine each letter with each letter length times. For this, we’ll use the following stream:

Seq.rangeClosed(1, length - 1)
   .foldLeft(Seq.seq(alphabet), (s, i) -> 
       s.crossJoin(Seq.seq(alphabet))
        .map(t -> t.v1 + t.v2))
    );

We’re using again rangeClosed() to produce values in the range [1 .. length-1]. foldLeft() is the same as reduce(), except that foldLeft() is guaranteed to go from “left to right” in a stream, without requiring the folding function to be associative. Whew.

In other, more understandable words: foldLeft() is nothing else but an imperative loop. The “seed” of the loop, i.e. the loop’s initial value, is a complete alphabet (Seq.seq(alphabet)). Now, for every value in the range [1 .. length-1], we produce a cartesian product (crossJoin()) between the letters “folded” so far and a new alphabet, and we concatenate each combination into a single new string (t.v1 and t.v2).

That’s it!

Combining everything

The following simple program prints all the values from A .. Z, AA .. ZZ, AAA .. ZZZ to the console:

import java.util.List;

import org.jooq.lambda.Seq;

public class Test {
    public static void main(String[] args) {
        int max = 3;

        List<String> alphabet = Seq
            .rangeClosed('A', 'Z')
            .map(Object::toString)
            .toList();

        Seq.rangeClosed(1, max)
           .flatMap(length ->
               Seq.rangeClosed(1, length - 1)
                  .foldLeft(Seq.seq(alphabet), (s, i) -> 
                      s.crossJoin(Seq.seq(alphabet))
                       .map(t -> t.v1 + t.v2)))
           .forEach(System.out::println);
    }
}

Disclaimer

This is certainly not the most optimal algorithm for this particular case. One of the best implementations has been given by an unnamed user on Stack Overflow:

import static java.lang.Math.*;

private static String getString(int n) {
    char[] buf = new char[(int) floor(log(25 * (n + 1)) / log(26))];
    for (int i = buf.length - 1; i >= 0; i--) {
        n--;
        buf[i] = (char) ('A' + n % 26);
        n /= 26;
    }
    return new String(buf);
}

Unnecessary to say that the latter runs much much faster than the previous functional algorithm.

Divided we Stand: Optional


Our recent article “NULL is Not The Billion Dollar Mistake. A Counter-Rant” got us a lot of reads, controversial comments, and a 50/50 upvote / downvote ratio pretty much everywhere a blog post can be posted and voted on. This was expected.

Objectively, NULL is just a “special” value that has been implemented in a variety of languages and type systems, and in a variety of ways – including perhaps the set of natural numbers (a.k.a. “zero”, the original null – them Romans sure didn’t like that idea).

Or, as Charles Roth has put it adequately in the comments:

Chuckle. Occasionally a mathematics background comes in handy. Now we could argue about whether NULL was “invented” or “discovered”…

Now, Java’s null is a particularly obnoxious implementation of that “special value” for reasons like:

Compile-time typing vs. runtime typing

// We can assign null to any reference type
Object s = null;

// Yet, null is of no type at all
if (null instanceof Object)
    throw new Error("Will never happen");

The null literal is even more special

// Nothing can be put in this list, right?
List<?> list = new ArrayList<Void>();

// Yet, null can:
list.add(null);

Methods are present on the null literal

// This compiles, but does it run?
((Object) null).getClass();

Java 8’s Optional

The introduction of Optional might have changed everything. Many functional programmers love it so much because the type clearly communicates the cardinality of an attribute. In a way:

// Cardinality 1:
Type t1;

// Cardinality 0-1:
Optional<Type> t01;

// Cardinality 0..n:
Iterable<Type> tn;

A lot of Java 8’s Optional‘s interesting history has been dug out by Nicolai Parlog on his excellent blog.

Be sure to check it out:
http://blog.codefx.org/tag/optional

In the Java 8 expert groups, Optional wasn’t an easy decision:

[…] There has been a lot of discussion about [Optional] here and there over the years. I think they mainly amount to two technical problems, plus at least one style/usage issue:

  1. Some collections allow null elements, which means that you cannot unambiguously use null in its otherwise only reasonable sense of “there’s nothing there”.
  2. If/when some of these APIs are extended to primitives, there is no value to return in the case of nothing there. The alternative to Optional is to return boxed types, which some people would prefer not to do.
  3. Some people like the idea of using Optional to allow more fluent APIs.
    As in
    x = s.findFirst().or(valueIfEmpty)
    vs
    if ((x = s.findFirst()) == null) x = valueIfEmpty;
    Some people are happy to create an object for the sake of being able to do this. Although sometimes less happy when they realize that Optionalism then starts propagating through their designs, leading to Set<Optional<T>>’s and so on.

It’s hard to win here.

Doug Lea

Arguably, the main true reason for the JDK to have introduced Optional is the lack of availability of project valhalla’s specialization in Java 8, which meant that a performant primitive type stream (such as IntStream) needed some new type like OptionalInt to encode absent values as returned from IntStream.findAny(), for instance. For API consistency, such an OptionalInt from the IntStream type must be matched by a “similar” Optional from the Stream type.

Can Optional be introduced late in a platform?

While Doug’s concerns are certainly valid, there are some other, more significant arguments that make me wary of Optional (in Java). While Scala developers embrace their awesome Option type as they have no alternative and hardly ever see any null reference or NullPointerException – except when working with some Java libraries – this is not true for Java developers. We have our legacy collections API, which (ab-)uses null all over the place. Take java.util.Map, for instance. Map.get()‘s Javadoc reads:

Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.

[…]

If this map permits null values, then a return value of null does not necessarily indicate that the map contains no mapping for the key; it’s also possible that the map explicitly maps the key to null. The containsKey operation may be used to distinguish these two cases.

This is how much of the pre-Java 8 collection API worked, and we’re still using it actively with Java 8, with new APIs such as the Streams API, which makes extensive use of Optional.

A contrived (and obviously wrong) example:

Map<Integer, List<Integer>> map =
Stream.of(1, 1, 2, 3, 5, 8)
      .collect(Collectors.groupingBy(n -> n % 5));

IntStream.range(0, 5)
         .mapToObj(map::get)
         .map(List::size)
         .forEach(System.out::println);

Boom, NullPointerException. Can you spot it?

The map contains remainders of a modulo-5 operation as keys, and the associated, collected dividends as a value.

We then go through all numbers from 0 to 5 (the only possible remainders), extract the list of associated dividends, List::size them… wait. Oh. Map.get may return null.

You’re getting used to the fluent style of Java 8’s new APIs, you’re getting used to the functional and monadic programming style where streams and optional behave similarly, and you may be quickly surprised that anything passed to a Stream.map() method can be null.

In fact, if APIs were allowed to be retrofitted, then the Map.get method might look like this:

public interface Map<K,V> {
    Optional<V> get(Object key);
}

(it probably still wouldn’t because most maps allow for null values or even keys, which is hard to retrofit)

If we had such a retrofitting, the compiler would be complaining that we have to unwrap Optional before calling List::size. We’d fix it and write

IntStream.range(0, 5)
         .mapToObj(map::get)
         .map(l -> l.orElse(Collections.emptyList()))
         .map(List::size)
         .forEach(System.out::println);

Java’s Crux – Backwards compatibility

Backwards compatibility will lead to a mediocre adoption of Optional. Some parts of JDK API make use of it, others use null to encode the absent value. You can never be sure and you always have to remember both possibilities, because you cannot trust a non-Optional type to be truly “@NotNull“.

If you prefer using Optional over null in your business logic, that’s fine. But you will have to make very sure to apply this strategy thoroughly. Take the following blog post, for instance, which has gotten lots of upvotes on reddit:
http://shekhargulati.com/2015/07/28/day-4-lets-write-null-free-java-code

It inadvertently introduces a new anti-pattern:

public class User {
 
    private final String username;
    private Optional<String> fullname;
 
    public User(String username) {
        this.username = username;
        this.fullname = Optional.empty();
    }
 
    public String getUsername() {
        return username;
    }
 
    public Optional<String> getFullname() {
        return fullname;
    }

    //      good--------^^^
    // vvvv--------bad
 
    public void setFullname(String fullname) {
        this.fullname = Optional.of(fullname);
    }
}

The domain object establishes an “Optional opt-in” contract, without opting out of null entirely. While getFullname() forces API consumers to reason about the possible absence of a full name, setFullname() doesn’t accept such an Optional argument type, but a nullable one. What was meant as a clever convenience will result only in confusion at the consumer site.

The anti-pattern is repeated by Steven Colebourne (who brought us Joda Time and JSR-310) on his blog, calling this a “pragmatic” approach:

public class Address {
    private final String addressLine;  // never null
    private final String city;         // never null
    private final String postcode;     // optional, thus may be null

    // constructor ensures non-null fields really are non-null
    // optional field can just be stored directly, as null means optional
    public Address(String addressLine, String city, String postcode) {
      this.addressLine = Preconditions.chckNotNull(addressLine);
      this.city = Preconditions.chckNotNull(city);
      this.postcode = postcode;
    }

    // normal getters
    public String getAddressLine() { return addressLine; }
    public String getCity() { return city; }

    // special getter for optional field
    public Optional getPostcode() {
      return Optional.ofNullable(postcode);
    }

    // return optional instead of null for business logic methods that may not find a result
    public static Optional<Address> findAddress(String userInput) {
      return ... // find the address, returning Optional.empty() if not found
    }
}

See the full article here:
http://blog.joda.org/2015/08/java-se-8-optional-pragmatic-approach.html

Choose your poison

We cannot change the JDK. JDK API are a mix of nullable and Optional. But we can change our own business logic. Think carefuly before introducing Optional, as this new optional type – unlike what its name suggests – is an all-or-nothing type. Remember that by introducing Optional into your code-base, you implicitly assume the following:

// Cardinality 1:
Type t1;

// Cardinality 0-1:
Optional<Type> t01;

// Cardinality 0..n:
Iterable<Type> tn;

From there on, your code-base should no longer use the simple non-Optional Type type for 0-1 cardinalities. Ever.

Common SQL Clauses and Their Equivalents in Java 8 Streams


Functional programming allows for quasi-declarative programming in a general purpose language. By using powerful fluent APIs like Java 8’s Stream API, or jOOλ’s sequential Stream extension Seq or more sophisticated libraries like javaslang or functionaljava, we can express data transformation algorithms in an extremely concise way. Compare Mario Fusco’s imperative and functional version of the same algorithm:

Using such APIs, functional programming certainly feels like true declarative programming.

The most popular true declarative programming language is SQL. When you join two tables, you don’t tell the RDBMS how to implement that join. It may decide at its discretion whether a nested loop, merge join, hash join, or some other algorithm is the most suitable in the context of the complete query and of all the available meta information. This is extremely powerful because the performance assumptions that are valid for a simple join may no longer be valid for a complex one, where a different algorithm would outperform the original one. By this abstraction, you can just easily modify a query in 30 seconds, without worrying about low-level details like algorithms or performance.

When an API allows you to combine both (e.g. jOOQ and Streams), you will get the best of both worlds – and those worlds aren’t too different.

In the following sections, we’ll compare common SQL constructs with their equivalent expressions written in Java 8 using Streams and jOOλ, in case the Stream API doesn’t offer enough functionality.

Tuples

For the sake of this article, we’re going to assume that SQL rows / records have an equivalent representation in Java. For this, we’ll be using jOOλ’s Tuple type, which is essentially:

public class Tuple2<T1, T2> {

    public final T1 v1;
    public final T2 v2;

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }
}

… plus a lot of useful gimmicks like Tuple being Comparable, etc.

Note that we’re assuming the following imports in this and all subsequent examples.

import static org.jooq.lambda.Seq.*;
import static org.jooq.lambda.tuple.Tuple.*;

import java.util.*;
import java.util.function.*;
import java.util.stream.*;

import org.jooq.lambda.*;

Much like SQL rows, a tuple is a “value-based” type, meaning that it doesn’t really have an identity. Two tuples (1, 'A') and (1, 'A') can be considered exactly equivalent. Removing identity from the game makes SQL and functional programming with immutable data structures extremely elegant.

FROM = of(), stream(), etc.

In SQL, the FROM clause logically (but not syntactically) precedes all the other clauses. It is used to produce a set of tuples from at least one table, possibly multiple joined tables. A single-table FROM clause can be trivially mapped to Stream.of(), for instance, or to any other method that simply produces a stream:

SQL

SELECT *
FROM (
  VALUES(1, 1),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  1 |
|  2 |  2 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2)
).forEach(System.out::println);

yielding

(1, 1)
(2, 2)

CROSS JOIN = flatMap()

Selecting from multiple tables is already more interesting. The easiest way to combine two tables in SQL is by producing a cartesian product, either via a table list or using a CROSS JOIN. The following two are equivalent SQL statements:

SQL

-- Table list syntax
SELECT *
FROM (VALUES( 1 ), ( 2 )) t1(v1), 
     (VALUES('A'), ('B')) t2(v2)

-- CROSS JOIN syntax
SELECT *
FROM       (VALUES( 1 ), ( 2 )) t1(v1)
CROSS JOIN (VALUES('A'), ('B')) t2(v2)

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  A |
|  1 |  B |
|  2 |  A |
|  2 |  B |
+----+----+

In a cross join (or cartesian product), every value from t1 is combined with every value from t2 producing size(t1) * size(t2) rows in total.

Java

In functional programming using Java 8’s Stream, the Stream.flatMap() method corresponds to SQL CROSS JOIN as can be seen in the following example:

List<Integer> s1 = Stream.of(1, 2);
Supplier<Stream<String>> s2 = ()->Stream.of("A", "B");

s1.flatMap(v1 -> s2.get()
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

yielding

(1, A)
(1, B)
(2, A)
(2, B)

Note how we have to wrap the second stream in a Supplier because streams can be consumed only once, but the above algorithm is really implementing a nested loop, combining all elements of stream s2 with each element from stream s1. An alternative would be not to use streams but lists (which we will do in subsequent examples, for simplicity):

List<Integer> s1 = Arrays.asList(1, 2);
List<String> s2 = Arrays.asList("A", "B");

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

In fact, CROSS JOIN can be chained easily both in SQL and in Java:

SQL

-- Table list syntax
SELECT *
FROM (VALUES( 1 ), ( 2 )) t1(v1), 
     (VALUES('A'), ('B')) t2(v2), 
     (VALUES('X'), ('Y')) t3(v3)

-- CROSS JOIN syntax
SELECT *
FROM       (VALUES( 1 ), ( 2 )) t1(v1)
CROSS JOIN (VALUES('A'), ('B')) t2(v2)
CROSS JOIN (VALUES('X'), ('Y')) t3(v3)

yielding

+----+----+----+
| v1 | v2 | v3 |
+----+----+----+
|  1 |  A |  X |
|  1 |  A |  Y |
|  1 |  B |  X |
|  1 |  B |  Y |
|  2 |  A |  X |
|  2 |  A |  Y |
|  2 |  B |  X |
|  2 |  B |  Y |
+----+----+----+

Java

List<Integer> s1 = Arrays.asList(1, 2);
List<String> s2 = Arrays.asList("A", "B");
List<String> s3 = Arrays.asList("X", "Y");

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .flatMap(v12-> s3.stream()
                   .map(v3 -> tuple(v12.v1, v12.v2, v3)))
  .forEach(System.out::println);

yielding

(1, A, X)
(1, A, Y)
(1, B, X)
(1, B, Y)
(2, A, X)
(2, A, Y)
(2, B, X)
(2, B, Y)

Note how we explicitly unnested the tuples from the first CROSS JOIN operation to form “flat” tuples in the second operation. This is optional, of course.

Java with jOOλ’s crossJoin()

Us jOOQ developers, we’re a very SQL-oriented people, so it is only natural to have added a crossJoin() convenience method for the above use-case. So our triple-cross join can be written like this:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<String> s2 = Seq.of("A", "B");
Seq<String> s3 = Seq.of("X", "Y");

s1.crossJoin(s2)
  .crossJoin(s3)
  .forEach(System.out::println);

yielding

((1, A), X)
((1, A), Y)
((1, B), X)
((1, B), Y)
((2, A), X)
((2, A), Y)
((2, B), X)
((2, B), Y)

In this case, we didn’t unnest the tuple produced in the first cross join. From a merely relational perspective, this doesn’t matter either. Nested tuples are the same thing as flat tuples. In SQL, we just don’t see the nesting. Of course, we could still unnest as well by adding a single additional mapping:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<String> s2 = Seq.of("A", "B");
Seq<String> s3 = Seq.of("X", "Y");

s1.crossJoin(s2)
  .crossJoin(s3)
  .map(t -> tuple(t.v1.v1, t.v1.v2, t.v2))
  .forEach(System.out::println);

yielding, again

(1, A, X)
(1, A, Y)
(1, B, X)
(1, B, Y)
(2, A, X)
(2, A, Y)
(2, B, X)
(2, B, Y)

(You may have noticed that map() corresponds to SELECT as we’ll see again later on)

INNER JOIN = flatMap() with filter()

The SQL INNER JOIN is essentially just syntactic sugar for a SQL CROSS JOIN with a predicate that reduces the tuple set after cross-joining. In SQL, the following two ways of inner joining are equivalent:

SQL

-- Table list syntax
SELECT *
FROM (VALUES(1), (2)) t1(v1), 
     (VALUES(1), (3)) t2(v2)
WHERE t1.v1 = t2.v2

-- INNER JOIN syntax
SELECT *
FROM       (VALUES(1), (2)) t1(v1)
INNER JOIN (VALUES(1), (3)) t2(v2)
ON t1.v1 = t2.v2

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  1 |
+----+----+

(note that the keyword INNER is optional).

So, the values 2 from t1 and the values 3 from t2 are “thrown away”, as they produce any rows for which the join predicate yields true.

The same can be expressed easily, yet more verbosely in Java

Java (inefficient solution!)

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .filter(t -> Objects.equals(t.v1, t.v2))
  .forEach(System.out::println);

The above correctly yields

(1, 1)

But beware that you’re attaining this result after producing a cartesian product, the nightmare of every DBA! As mentioned at the beginning of this article, unlike in declarative programming, in functional programming you instruct your program to do exactly the order of operations that you specify. In other words:

In functional programming, you define the exact “execution plan” of your query.

In declarative programming, an optimiser may reorganise your “program”

There is no optimiser to transform the above into the much more efficient:

Java (more efficient)

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .filter(v2 -> Objects.equals(v1, v2))
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

The above also yields

(1, 1)

Notice, how the join predicate has moved from the “outer” stream into the “inner” stream, that is produced in the function passed to flatMap().

Java (optimal)

As mentioned previously, functional programming doesn’t necessarily allow you to rewrite algorithms depending on knowledge of the actual data. The above presented implementation for joins always implement nested loop joins going from the first stream to the second. If you join more than two streams, or if the second stream is very large, this approach can be terribly inefficient. A sophisticated RDBMS would never blindly apply nested loop joins like that, but consider constraints, indexes, and histograms on actual data.

Going deeper into that topic would be out of scope for this article, though.

Java with jOOλ’s innerJoin()

Again, inspired by our work on jOOQ we’ve also added an innerJoin() convenience method for the above use-case:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.innerJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)

… because after all, when joining two streams, the only really interesting operation is the join Predicate. All else (flatmapping, etc.) is just boilerplate.

LEFT OUTER JOIN = flatMap() with filter() and a “default”

SQL’s OUTER JOIN works like INNER JOIN, except that additional “default” rows are produced in case the JOIN predicate yields false for a pair of tuples. In terms of set theory / relational algebra, this can be expressed as such:

dd81ee1373d922122ce1b3e0da74cb28

Or in a SQL-esque dialect:

R LEFT OUTER JOIN S ::=

R INNER JOIN S
UNION (
  (R EXCEPT (SELECT R.* FROM R INNER JOIN S))
  CROSS JOIN
  (null, null, ..., null)
)

This simply means that when left outer joining S to R, there will be at least one row in the result for each row in R, with possibly an empty value for S.

Inversely, when right outer joining S to R, there will be at least one row in the result for each row in S, with possibly an empty value for R.

And finally, when full outer joining S to R, there will be at least one row in the result for each row in R with possibly an empty value for S AND for each row in S with possibly an empty value for R.

Let us look at LEFT OUTER JOIN, which is used most often in SQL.

SQL

-- Table list, Oracle syntax (don't use this!)
SELECT *
FROM (SELECT 1 v1 FROM DUAL
      UNION ALL 
      SELECT 2 v1 FROM DUAL) t1, 
     (SELECT 1 v2 FROM DUAL
      UNION ALL
      SELECT 3 v2 FROM DUAL) t2
WHERE t1.v1 = t2.v2 (+)

-- OUTER JOIN syntax
SELECT *
FROM            (VALUES(1), (2)) t1(v1)
LEFT OUTER JOIN (VALUES(1), (3)) t2(v2)
ON t1.v1 = t2.v2

yielding

+----+------+
| v1 |   v2 |
+----+------+
|  1 |    1 |
|  2 | null |
+----+------+

(note that the keyword OUTER is optional).

Java

Unfortunately, the JDK’s Stream API doesn’t provide us with an easy way to produce “at least” one value from a stream, in case the stream is empty. We could be writing a utility function as explained by Stuart Marks on Stack Overflow:

static <T> Stream<T> defaultIfEmpty(
    Stream<T> stream, Supplier<T> supplier) {
    Iterator<T> iterator = stream.iterator();

    if (iterator.hasNext()) {
        return StreamSupport.stream(
            Spliterators.spliteratorUnknownSize(
                iterator, 0
            ), false);
    } else {
        return Stream.of(supplier.get());
    }
}

Or, we just use jOOλ’s Seq.onEmpty()

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

seq(s1)
.flatMap(v1 -> seq(s2)
              .filter(v2 -> Objects.equals(v1, v2))
              .onEmpty(null)
              .map(v2 -> tuple(v1, v2)))
.forEach(System.out::println);

(notice, we’re putting null in a stream. This might not always be a good idea. We’ll follow up with that in a future blog post)

The above also yields

(1, 1)
(2, null)

How to read the implicit left outer join?

  • We’ll take each value v1 from the left stream s1
  • For each such value v1, we flatmap the right stream s2 to produce a tuple (v1, v2) (a cartesian product, cross join)
  • We’ll apply the join predicate for each such tuple (v1, v2)
  • If the join predicate leaves no tuples for any value v2, we’ll generate a single tuple containing the value of the left stream v1 and null

Java with jOOλ

For convenience, jOOλ also supports leftOuterJoin() which works as described above:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.leftOuterJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)
(2, null)

RIGHT OUTER JOIN = inverse LEFT OUTER JOIN

Trivially, a RIGHT OUTER JOIN is just the inverse of the previous LEFT OUTER JOIN. The jOOλ implementation of rightOuterJoin() looks like this:

default <U> Seq<Tuple2<T, U>> rightOuterJoin(
    Stream<U> other, BiPredicate<T, U> predicate) {
    return seq(other)
          .leftOuterJoin(this, (u, t) -> predicate.test(t, u))
          .map(t -> tuple(t.v2, t.v1));
}

As you can see, the RIGHT OUTER JOIN inverses the results of a LEFT OUTER JOIN, that’s it. For example:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.rightOuterJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)
(null, 3)

WHERE = filter()

The most straight-forward mapping is probably SQL’s WHERE clause having an exact equivalent in the Stream API: Stream.filter().

SQL

SELECT *
FROM (VALUES(1), (2), (3)) t(v)
WHERE v % 2 = 0

yielding

+---+
| v |
+---+
| 2 |
+---+

Java

Stream<Integer> s = Stream.of(1, 2, 3);

s.filter(v -> v % 2 == 0)
 .forEach(System.out::println);

yielding

2

The interesting thing with filter() and the Stream API in general is that the operation can apply at any place in the call chain, unlike the WHERE clause, which is limited to be placed right after the FROM clause – even if SQL’s JOIN .. ON or HAVING clauses are semantically similar.

GROUP BY = collect()

The least straight-forward mapping is GROUP BY vs. Stream.collect().

First off, SQL’s GROUP BY may be a bit tricky to fully understand. It is really part of the FROM clause, transforming the set of tuples produced by FROM .. JOIN .. WHERE into groups of tuples, where each group has an associated set of aggregatable tuples, which can be aggregated in the HAVING, SELECT, and ORDER BY clauses. Things get even more interesting when you use OLAP features like GROUPING SETS, which allow for duplicating tuples according to several grouping combinations.

In most SQL implementations that don’t support ARRAY or MULTISET, the aggregatable tuples are not available as such (i.e. as nested collections) in the SELECT. Here, the Stream API’s feature set excels. On the other hand, the Stream API can group values only as a terminal operation, where in SQL, GROUP BY is applied purely declaratively (and thus, lazily). The execution planner may choose not to execute the GROUP BY at all if it is not needed. For instance:

SELECT *
FROM some_table
WHERE EXISTS (
    SELECT x, sum(y)
    FROM other_table
    GROUP BY x
)

The above query is semantically equivalent to

SELECT *
FROM some_table
WHERE EXISTS (
    SELECT 1
    FROM other_table
)

The grouping in the subquery was unnecessary. Someone may have copy-pasted that subquery in there from somewhere else, or refactored the query as a whole. In Java, using the Stream API, each operation is always executed.

For the sake of simplicity, we’ll stick to the most simple examples here

Aggregation without GROUP BY

A special case is when we do not specify any GROUP BY clause. In that case, we can specify aggregations on all columns of the FROM clause, producing always exactly one record. For instance:

SQL

SELECT sum(v)
FROM (VALUES(1), (2), (3)) t(v)

yielding

+-----+
| sum |
+-----+
|   6 |
+-----+

Java

Stream<Integer> s = Stream.of(1, 2, 3);

int sum = s.collect(Collectors.summingInt(i -> i));
System.out.println(sum);

yielding

6

Aggregation with GROUP BY

A more common case of aggregation in SQL is to specify an explicit GROUP BY clause as explained before. For instance, we may want to group by even and odd numbers:

SQL

SELECT v % 2, count(v), sum(v)
FROM (VALUES(1), (2), (3)) t(v)
GROUP BY v % 2

yielding

+-------+-------+-----+
| v % 2 | count | sum |
+-------+-------+-----+
|     0 |     1 |   2 |
|     1 |     2 |   4 |
+-------+-------+-----+

Java

For this simple grouping / collection use-case, luckily, the JDK offers a utility method called Collectors.groupingBy(), which produces a collector that generates a Map<K, List<V>> type like this:

Stream<Integer> s = Stream.of(1, 2, 3);

Map<Integer, List<Integer>> map = s.collect(
    Collectors.groupingBy(v -> v % 2)
);

System.out.println(map);

yielding

{0=[2], 1=[1, 3]}

This certainly takes care of the grouping. Now we want to produce aggregations for each group. The slightly awkward JDK way to do this would be:

Stream<Integer> s = Stream.of(1, 2, 3);

Map<Integer, IntSummaryStatistics> map = s.collect(
    Collectors.groupingBy(
        v -> v % 2,
        Collectors.summarizingInt(i -> i)
    )
);

System.out.println(map);

we’ll now get:

{0=IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2},
 1=IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}}

As you can see, the count() and sum() values have been calculated somewhere along the lines of the above.

More sophisticated GROUP BY

When doing multiple aggregations with Java 8’s Stream API, you will quickly be forced to wrestle low-level API implementing complicated collectors and accumulators yourself. This is tedious and unnecessary. Consider the following SQL statement:

SQL

CREATE TABLE t (
  w INT,
  x INT,
  y INT,
  z INT
);

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM t
GROUP BY z, w;

In one go, we want to:

  • Group by several values
  • Aggregate from several values

Java

In a previous article, we’ve explained in detail how this can be achieved using convenience API from jOOλ via Seq.groupBy()

class A {
    final int w;
    final int x;
    final int y;
    final int z;
 
    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map =
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
 
// Seq.groupBy() is just short for 
// Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),
 
    // ... because once you have tuples, 
    // why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
);

System.out.println(map);

The above yields

{(1, 1)=(IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2},
         IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}),
 (4, 9)=(IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9},
         IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7}),
 (5, 2)=(IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5},
         IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})}

For more details, read the full article here.

Notice how using Stream.collect(), or Seq.groupBy() already makes for an implicit SELECT clause, which we are no longer needed to obtain via map() (see below).

HAVING = filter(), again

As mentioned before, there aren’t really different ways of applying predicates with the Stream API, there is only Stream.filter(). In SQL, HAVING is a “special” predicate clause that is syntactically put after the GROUP BY clause. For instance:

SQL

SELECT v % 2, count(v)
FROM (VALUES(1), (2), (3)) t(v)
GROUP BY v % 2
HAVING count(v) > 1

yielding

+-------+-------+
| v % 2 | count |
+-------+-------+
|     1 |     2 |
+-------+-------+

Java

Unfortunately, as we have seen before, collect() is a terminal operation in the Stream API, which means that it eagerly produces a Map, instead of transforming the Stream<T> into a Stream<K, Stream<V>, which would compose much better in complex Stream. This means that any operation that we’d like to implement right after collecting will have to be implemented on a new stream produced from the output Map:

Stream<Integer> s = Stream.of(1, 2, 3);

s.collect(Collectors.groupingBy(
      v -> v % 2,
      Collectors.summarizingInt(i -> i)
  ))
  .entrySet()
  .stream()
  .filter(e -> e.getValue().getCount() > 1)
  .forEach(System.out::println);

yielding

1=IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}

As you can see, the type transformation that is applied is:

  • Map<Integer, IntSummaryStatistics>
  • Set<Entry<Integer, IntSummaryStatistics>>
  • Stream<Entry<Integer, IntSummaryStatistics>>

SELECT = map()

The SELECT clause in SQL is nothing more than a tuple transformation function that takes the cartesian product of tuples produced by the FROM clause and transforms it into a new tuple expression, which is fed either to the client, or to some higher-level query if this is a nested SELECT. An illustration:

FROM output

+------+------+------+------+------+
| T1.A | T1.B | T1.C | T2.A | T2.D |
+------+------+------+------+------+
|    1 |    A |    a |    1 |    X |
|    1 |    B |    b |    1 |    Y |
|    2 |    C |    c |    2 |    X |
|    2 |    D |    d |    2 |    Y |
+------+------+------+------+------+

Applying SELECT

SELECT t1.a, t1.c, t1.b || t1.d

+------+------+--------------+
| T1.A | T1.C | T1.B || T1.D |
+------+------+--------------+
|    1 |    a |           AX |
|    1 |    b |           BY |
|    2 |    c |           CX |
|    2 |    d |           DY |
+------+------+--------------+

Using Java 8 Streams, SELECT can be achieved very simply by using Stream.map(), as we’ve already seen in previous examples, where we unnested tuples using map(). The following examples are functionally equivalent:

SQL

SELECT t.v1 * 3, t.v2 + 5
FROM (
  VALUES(1, 1),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| c1 | c2 |
+----+----+
|  3 |  6 |
|  6 |  7 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2)
).map(t -> tuple(t.v1 * 3, t.v2 + 5))
 .forEach(System.out::println);

yielding

(3, 6)
(6, 7)

DISTINCT = distinct()

The DISTINCT keyword that can be supplied with the SELECT clause simply removes duplicate tuples right after they have been produced by the SELECT clause. An illustration:

FROM output

+------+------+------+------+------+
| T1.A | T1.B | T1.C | T2.A | T2.D |
+------+------+------+------+------+
|    1 |    A |    a |    1 |    X |
|    1 |    B |    b |    1 |    Y |
|    2 |    C |    c |    2 |    X |
|    2 |    D |    d |    2 |    Y |
+------+------+------+------+------+

Applying SELECT DISTINCT

SELECT DISTINCT t1.a

+------+
| T1.A |
+------+
|    1 |
|    2 |
+------+

Using Java 8 Streams, SELECT DISTINCT can be achieved very simply by using Stream.distinct() right after Stream.map(). The following examples are functionally equivalent:

SQL

SELECT DISTINCT t.v1 * 3, t.v2 + 5
FROM (
  VALUES(1, 1),
        (2, 2),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| c1 | c2 |
+----+----+
|  3 |  6 |
|  6 |  7 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2),
  tuple(2, 2)
).map(t -> tuple(t.v1 * 3, t.v2 + 5))
 .distinct()
 .forEach(System.out::println);

yielding

(3, 6)
(6, 7)

UNION ALL = concat()

Set operations are powerful both in SQL and using the Stream API. The UNION ALL operation maps to Stream.concat(), as can be seen below:

SQL

SELECT *
FROM (VALUES(1), (2)) t(v)
UNION ALL
SELECT *
FROM (VALUES(1), (3)) t(v)

yielding

+---+
| v |
+---+
| 1 |
| 2 |
| 1 |
| 3 |
+---+

Java

Stream<Integer> s1 = Stream.of(1, 2);
Stream<Integer> s2 = Stream.of(1, 3);

Stream.concat(s1, s2)
      .forEach(System.out::println);

yielding

1
2
1
3

Java (using jOOλ)

Unfortunately, concat() exists in Stream only as a static method, while Seq.concat() also exists on instances when working with jOOλ.

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.concat(s2)
  .forEach(System.out::println);

UNION = concat() and distinct()

In SQL, UNION is defined to remove duplicates after concatenating the two sets via UNION ALL. The following two statements are equivalent:

SELECT * FROM t
UNION
SELECT * FROM u;

-- equivalent

SELECT DISTINCT *
FROM (
  SELECT * FROM t
  UNION ALL
  SELECT * FROM u
);

Let’s put this in action:

SQL

SELECT *
FROM (VALUES(1), (2)) t(v)
UNION
SELECT *
FROM (VALUES(1), (3)) t(v)

yielding

+---+
| v |
+---+
| 1 |
| 2 |
| 3 |
+---+

Java

Stream<Integer> s1 = Stream.of(1, 2);
Stream<Integer> s2 = Stream.of(1, 3);

Stream.concat(s1, s2)
      .distinct()
      .forEach(System.out::println);

ORDER BY = sorted()

The ORDER BY mapping is trivial

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
ORDER BY v

yielding

+---+
| v |
+---+
| 1 |
| 3 |
| 4 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.sorted()
 .forEach(System.out::println);

yielding

1
3
4

LIMIT = limit()

The LIMIT mapping is even more trivial

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
LIMIT 2

yielding

+---+
| v |
+---+
| 1 |
| 4 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.limit(2)
 .forEach(System.out::println);

yielding

1
4

OFFSET = skip()

The OFFSET mapping is trivial as well

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
OFFSET 1

yielding

+---+
| v |
+---+
| 4 |
| 3 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.skip(1)
 .forEach(System.out::println);

yielding

4
3

Conclusion

In the above article, we’ve seen pretty much all the useful SQL SELECT query clauses and how they can be mapped to the Java 8 Stream API, or to jOOλ’s Seq API, in case Stream doesn’t offer sufficient functionality.

The article shows that SQL’s declarative world is not that much different from Java 8’s functional world. SQL clauses can compose ad-hoc queries just as well as Stream methods can be used to compose functional transformation pipelines. But there is a fundamental difference.

While SQL is truly declarative, functional programming is still very instructive. The Stream API does not make optimisation decisions based on constraints, indexes, histograms and other meta information about the data that you’re transforming. Using the Stream API is like using all possible optimisation hints in SQL to force the SQL engine to choose one particular execution plan over another. However, while SQL is a higher level algorithm abstraction, the Stream API may allow you to implement more customisable algorithms.

Follow

Get every new post delivered to your Inbox.

Join 3,910 other followers

%d bloggers like this: