Java 8 Friday: Java 8 Will Revolutionize Database Access

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. For our Java 8 series, we’re honoured to host a very relevant guest post by Dr. Ming-Yee Iu.
Dr. Ming-Yee Iu completed a PhD on Database Queries in Java at EPFL. He has created the open source project Jinq to demonstrate some new techniques for supporting database queries in Java.
Our editorial note: Ever since Erik Meijer has introduced LINQ to the .NET ecosystem, us Java folks have been wondering whether we could have the same. We’ve blogged about this topic before, a couple of times: While most LINQesque APIs in the Java ecosystem operate as internal domain-specific languages like jOOQ, some try to tackle the integration on a bytecode level, like JaQu. JINQ formalises runtime bytecode transformations through what Dr. Ming-Yee Iu calls symbolic execution. We find this very interesting to a point that we wonder if we should start building a JINQ-to-jOOQ JINQ provider, where the expressive power of the Java 8 Streams API could be combined with our great SQL standardisation and transformation features…? Convince yourselves:

Java 8 Goodie: Java 8 Will Revolutionize Database Access

Java 8 is finally here! After years of waiting, Java programmers will finally get support for functional programming in Java. Functional programming support helps streamline existing code while providing powerful new capabilities to the Java language. One area that will be disrupted by these new features is how programmers work with databases in Java. Functional programming support opens up exciting new possibilities for simpler yet more powerful database APIs. Java 8 will enable new ways to access databases that are competitive with those of other programming languages such as C#’s LINQ.

The Functional Way of Working With Data

Java 8 not only adds functional-support to the Java language, but it extends the Java collection classes with new functional ways of working with data. Traditionally, working with large amounts of data in Java requires a lot of loops and iterators. For example, suppose you have a collection of Customer objects:

Collection<Customer> customers;

If you were only interested in the customers from Belgium, you would have to iterate over all the customers and save the ones you wanted.

Collection<Customer> belgians = new ArrayList<>();
for (Customer c : customers) {
    if (c.getCountry().equals("Belgium"))
        belgians.add(c);
}

This takes five lines of code. It is also poorly abstracted. What happens if you have 10 million customers, and you want to speed up the code by filtering it in parallel using two threads? You would have to rewrite everything to use futures and a lot of hairy multi-threaded code. With Java 8, you can write the same code in one line. With its support for functional programming, Java 8 lets you write a function saying which customers you are interested in (those from Belgium) and then to filter collections using that function. Java 8 has a new Streams API that lets you do this.

customers.stream().filter(
    c -> c.getCountry().equals("Belgium")
);

Not only is the Java 8 version of the code shorter, but the code is easier to understand as well. There is almost no boilerplate. The code calls the method filter(), so it’s clear that this code is used for filtering customers. You don’t have to spend your time trying to decipher the code in a loop to understand what it is doing with its data. And what happens if you want to run the code in parallel? You just have to use a different type of stream.

customers.parallelStream().filter(
    c -> c.getCountry().equals("Belgium")
);

What’s even more exciting is that this functional-style of code works with databases as well!

The Functional Way of Working with Databases

Traditionally, programmers have needed to use special database query languages to access the data in databases. For example, below is some JDBC code for finding all the customers from Belgium:

PreparedStatement s = con.prepareStatement(
      "SELECT * "
    + "FROM Customer C "
    + "WHERE C.Country = ? ");
s.setString(1, "Belgium");
ResultSet rs = s.executeQuery();

Much of the code is in the form a string, which the compiler can’t check for errors and which can lead to security problems due to sloppy coding. There is also a lot of boilerplate code that makes writing database access code quite tedious. Tools such as jOOQ solve the problem of error-checking and security by providing a database query language that can be written using special Java libraries. Or you can use tools such as object-relational mappers to hide a lot of boring database code for common access patterns, but if you need to write non-trivial database queries, you will still need to use a special database query language again. With Java 8, it’s possible to write database queries using the same functional-style used when working with the Streams API. For example, Jinq is an open source project that explores how future database APIs can make use of functional programming. Here is a database query written using Jinq:

customers.where(
    c -> c.getCountry().equals("Belgium")
);

This code is almost identical to the code using the Streams API. In fact, future versions of Jinq will let you write queries directly using the Streams API. When the code is run, Jinq will automatically translate the code into a database query like the JDBC query shown before. So without having to learn a new database query language, you can write efficient database queries. You can use the same style of code you would use for Java collections. You also don’t need a special Java compiler or virtual machine. All of this code compiles and runs using the normal Java 8 JDK. If there are errors in your code, the compiler will find them and report them to you, just like normal Java code. Jinq supports queries that can be as complicated as SQL92. Selection, projection, joins, and subqueries are all supported. The algorithm for translating Java code into database queries is also very flexible in what code it will accept and translate. For example, Jinq has no problem translating the code below into a database query, despite its complexity.

customers
    .where( c -> c.getCountry().equals("Belgium") )
    .where( c -> {
        if (c.getSalary() < 100000)
            return c.getSalary() < c.getDebt();
        else
            return c.getSalary() < 2 * c.getDebt();
        } );

As you can see, the functional programming support in Java 8 is well-suited for writing database queries. The queries are compact, and complex queries are supported.

Inner Workings

But how does this all work? How can a normal Java compiler translate Java code into database queries? Is there something special about Java 8 that makes this possible? The key to supporting these new functional-style database APIs is a type of bytecode analysis called symbolic execution. Although your code is compiled by a normal Java compiler and run in a normal Java virtual machine, Jinq is able to analyze your compiled Java code when it is run and construct database queries from them. Symbolic execution works best when analyzing small functions, which are common when using the Java 8 Streams API. The easiest way to understand how this symbolic execution works is with an example. Let’s examine how the following query is converted by Jinq into the SQL query language:

customers
    .where( c -> c.getCountry().equals("Belgium") )

Initially, the customers variable is a collection that represents this database query

SELECT *
  FROM Customers C

Then, the where() method is called, and a function is passed to it. In this where() method, Jinq opens the .class file of the function and gets the compiled bytecode for the function to analyze. In this example, instead of using real bytecode, let’s just use some simple instructions to represent the bytecode of the function:
  1. d = c.getCountry()
  2. e = “Belgium”
  3. e = d.equals(e)
  4. return e
Here, we pretend that the function has been compiled by the Java compiler into four instructions. This is what Jinq sees when the where() method is called. How can Jinq make sense of this code? Jinq analyzes the code by executing it. Jinq doesn’t run the code directly though. It runs the code ‘abstractly’. Instead of using real variables and real values, Jinq uses symbols to represent all values when executing the code. This is why the analysis is called symbolic execution. Jinq executes each instruction and keeps track of all the side-effects or all the things that the code changes in the state of the program. Below is a diagram showing all the side-effects that Jinq finds when it executes the four lines of code using symbolic execution.
Symbolic execution example
Symbolic execution example
In the diagram, you can see how after the first instruction runs, Jinq finds two side-effects: the variable d has changed and the method Customer.getCountry() has been called. With symbolic execution, the variable d is not given a real value like “USA” or “Denmark”. It is assigned the symbolic value of c.getCountry(). After all the instructions have been executed symbolically, Jinq prunes the side-effects. Since the variables d and e are local variables, any changes to them are discarded after the function exits, so those side-effects can be ignored. Jinq also knows that the methods Customer.getCountry() and String.equals() do not modify any variables or show any output, so those method calls can also be ignored. From this, Jinq can conclude that executing the function produces only one effect: it returns c.getCountry().equals("Belgium"). Once Jinq has understood what the function passed to it in the where() method does, it can then merge this knowledge with the database query underlying the customers collection to create a new database query.
Generating a database query
Generating a database query
And that’s how Jinq generates database queries from your code. The use of symbolic execution means that this approach is quite robust to the different code patterns outputted by different Java compilers. If Jinq ever encounters code with side-effects that can’t be emulated using a database query, Jinq will leave your code untouched. Since everything is written using normal Java code, Jinq can just run that code directly instead, and your code will produce the expected results. This simple translation example should have given you an idea of how the query translation works. You should feel confident that these algorithms can correctly generate database queries from your code.

An Exciting Future

I hope I have given you a taste for how Java 8 enables new ways of working with databases in Java. The functional programming support in Java 8 allows you write database code in a similar way to writing code for working with Java collections. Hopefully, existing database APIs will soon be extended to support these styles of queries. To play with a prototype for these new types of queries, you can visit http://www.jinq.org

22 thoughts on “Java 8 Friday: Java 8 Will Revolutionize Database Access

  1. It is true that Jinq enable to write SQL queries in a sexy way :)

    However, I am wondering :
    – will all the “symbolic execution” make the code harder to debug ?
    – did we have to wait java 8 to use bytecode ? I mean that is what lambdaj does; couldn’t Jinq have done the same thing mixing lambdaj and SQL ?

    That is what I love about jOOQ (not Jinq), there are no bad surprises : you have control over what you do, there is no exotic magic that do things you don’t even suspect.

    1. will all the “symbolic execution” make the code harder to debug ?

      Good point. I would imagine so, that’s an interesting question to ask to Ming. I could imagine that the byte-code rewriting would need to somehow retain the original line numbers and state to at least be able to debug and step into those methods that are not translated to SQL.

      On the other hand, Java 8 will open up a couple of questions with respect to debugging anyway. With all the lambdas and fluent APIs, people start putting lots of code and various levels of scope on a single line. It is hard to set breakpoints on such code.

      did we have to wait java 8 to use bytecode ?

      Previous versions of JINQ were able to rewrite “regular” code. You can read this up in Ming’s PhD thesis, which is available here. In principle, it’s even possible to rewrite imperative code (for loops, if branches).

      I mean that is what lambdaj does

      I think, lambdaj uses proxies, right?

      couldn’t Jinq have done the same thing mixing lambdaj and SQL ?

      You probably mean something like what Benji Weber attempted here

      That is what I love about jOOQ (not Jinq), there are no bad surprises : you have control over what you do, there is no exotic magic that do things you don’t even suspect.

      Yes, these things need to be well-understood from an API consumer perspective. Similar to Hibernate and its usage of CGLIB.

      1. Good point. I would imagine so, that’s an interesting question to ask to Ming. I could imagine that the byte-code rewriting would need to somehow retain the original line numbers and state to at least be able to debug and step into those methods that are not translated to SQL.

        That is part of the problem.
        When you dive into a library code that use bytecode or intense reflection (eg hibernate and OGNL), it is really hard to understand what is being done.
        Usually when an Exception is raised in these libraries, either you find a quick solution on Google or you spend a day debugging the whole thing.

        I think, lambdaj uses proxies, right?

        Yes, maybe lambdaj wasn’t the best example here :)

        I also have one more question, how does Jinq work with external method ?
        For example, if I write something like :

        customers.where( 
          c -> distance(storeLocation, c.getLatitude(), c.getLongitude()) < 1000 
        );
        

        With distance() a function that does something like that https://stackoverflow.com/questions/3715521/how-can-i-calculate-the-distance-between-two-gps-points-in-java
        Will Jinq be able to generate a SQL code which does what I want ?

        1. Will Jinq be able to generate a SQL code which does what I want ?

          According to Ming, yes. The distance function is recognised as “non-SQL” and thus executed in Java. c -> distance < 1000 can then generate SQL such as ? < ?. Obviuosly, this predicate might run against stale, cached versions of c.latitude and c.longitude

          1. Thank you for the explanation.

            That means that if you want to know for sure what SQL will be generated, you will have to keep using jOOQ ;)

  2. Oh finally. After so many years Java has copied Linq from dotNET.
    Unfortunately Java is still light years behind. :-(

    But why the PR guys call this revolution … it is so stupid and boooooring. After Steve Jobs died every company is revolutionaizing absolutely every shit.

  3. Writing a LINQ-like SQL queries in functional way is already implemented in Slick for Scala (a JVM based language) and Slick is already a couple years old. Why do you say that java8 will revolutionize database access?

    1. Thanks for your interest. Only a small niche of JVM-based developers have access to Scala. For the rest of us, Java 8 will be the way forward. I agree, though, that JINQ is very similar to Slick. You may also be interested in commenting on jOOQ, which does not really have any equivalent in other platforms.

    1. And Slick is a not-so-mature-yet copy of LINQ by Erik Meijer (which is what inspired JINQ in the first place)…

      Yes, Java 8 is still not as “advanced” as certain other platforms, but these claims have hardly ever led to constructive discussions ;-)

  4. Hi Lukas,
    As far as I can see – Jinq further explores the ideas in Dr. Ming-Yee doctoral thesis. The thesis explores three styles of representing SQL in Java. The symbolic execution part seems the main point though – this is Java bytecode analysis to reconstruct some representation of the initial source since that gets lost during compilation as contrasted with LINQ – .NET’s compiler retains an AST of the code that is available at runtime.
    I am not sure whether this is the best way to tackle all this but the idea has merits plus it has the great advantage that it functions like a self contained library and no change in the normal toolchain is necessary. An additional plus is that analysis of bytecode may be easier than whole Java source code analysis.
    The bytecode rewriting seems a bit harsh – class generation and delegation seems cleaner at least to me – then you get clear demarcation where the magic stops and where it ends.
    With external sources like RDBMS you always get some phase of code generation – for example table with columns becomes a bean, seems for compiled languages type providers are not so hot. Thus it seems to me that there can be a post compile phase to generate the necessary queries and cut down on analysis during runtime.

    1. Hi Alexander, thanks for joining the discussion!

      Yes, when Erik Meijer conceived LINQ, he quite heavily transformed the whole .NET ecosystem in order to be able to do actual AST transformation – which is inevitably much more reliable and powerful than reconstructing things from byte code. I guess, Ming’s ideas will first have to be challenged in a larger application.

      I agree that class generation and delegation is better. I suspect that this is actually done, no?

      Anyway, I’m always wondering if the mix of “server-side” with “database-side” logic will be largely manageable in a bigger project setup. While this mix is certainly very expressive, it is also harder to understand how to tune a query. Tuning SQL itself can already be quite a beast.

      1. I agree that class generation… this is actually done, no?

        I am not sure, I based what I wrote mainly on the thesis. ASM definitely allows to do bytecode rewriting – I have used that library to fix bugs in already compiled Java and it did a great job.

        which is inevitably .. reliable and powerful than … from byte code

        I am really not sure about this part especially since we have concrete context where we do things. JVM’s bytecode model is far simpler than the Java language.

        Ming’s ideas will first have to be challenged in a larger application

        That is very true and I wish Ming the very best with his prototype. If he is reading the blog – Go, Ming, GO, GO! Two things would be needed:

        * Support for SQL dialects in different DB. While SQL has standards, the real action is with production RDBMS
        * Support for extensions – a way to add expressions and functions to the library

        1. Support for SQL dialects in different DB. While SQL has standards, the real action is with production RDBMS

          … which is why the best implementation is JINQ-to-jOOQ :-)
          We’ve gone through so many hassles already. I don’t think anyone else will ever build another SQL transformer as good as jOOQ. We got it all! An example:

          https://blog.jooq.org/2013/05/03/sql-query-transformation-fun-predicates-with-row-value-expressions/

          Support for extensions – a way to add expressions and functions to the library

          I think that should be quite easy…

  5. From a beginner point of view, what sofwares do i need to exececute thes codes : java8, Jinq, jOOQ Api, …?

  6. Hi,
    I know a similar implementation to JINQ is jOOλ. Overall, how do we know which one is better to use ? And how do we compare their advantages and disadvantages ?

  7. My use-case is just to retrieve data from database (MySQL, Postgresql, etc) using Hibernate. JINQ does help me to query rows a lot more efficiently than JDBC’s PrepareStatement using complex string to execute queries.
    About jOOλ, I remember somewhere an answer said that it is able to use either JINQ or jOOλ (java 8’s features) to execute SQL. So I wonder if they have the same functionalities and can be comparable. I also read about these links https://blog.jooq.org/tag/joo-lambda/, https://blog.jooq.org/tag/joo%CE%BB/

Leave a Reply