jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk:

jOOQ Newsletter: July 2, 2014 – jOOLY 20% Discount Offering

Subscribe for this newsletter here

jOOLY 2014 20% Discount Offering

Have you been evaluating jOOQ for a while now, still hesitating to purchase licenses? Or are you an existing customer and looking into licensing more workstations for your team? This is your chance!

Get 20% off all your jOOQ purchases in the month of jOOLY 2014!

Google has their summer of code, we have our month of jOOLY, where you get 20% off all your purchases for jOOQ Professional and Enterprise Edition licenses. Just enter the “jOOLY” discount code with your next purchase and start coding awesome Java / SQL code! Click here to download and buy jOOQ now! The discount code is only valid until the end of jOOLY (July 31, 2014), so act quickly!

In fact, if you purchase jOOQ in the month of jOOLY (July), we’ll offer you a free PDF e-book copy of SQL Performance Explained with your purchase to help you get even more out of your jOOQ experience.

This discount cannot be combined with other discounts. Only the first period of a monthly or yearly subscription is discounted. Existing subscriptions are not eligible for this discount

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ with the world and we can hardly catch up with them! Here are:

“Henk”, who challenges the JavaEE folks to add jOOQ to the standards. It’s about time!

Roland Tepp, who sees the added value of jOOQ in “database-first” applications. That’s very true

Chris Raastad, who … well. Who puts it bluntly and is looking forward to use the “last missing features” from .NET also in Java. We’re honoured to be compared to LINQ. Thanks Chris!

Thanks for the shouts, guys! You make the jOOQ experience rock!

Community Zone – The jOOQ aficionados have been active!

The jOOQ community has been very active again in the last month. We’re happy to point out these editor’s picks from our radar:

Speaking of LINQ, you may have heard of JINQ. JINQ now also has a jOOQ integration, allowing you to query the awesome the Java 8 Streams API as if it were a database. With jOOQ being the backing SQL implementation, you get all the advantages like typesafety, SQL transformation and standardisation, etc. So, let’s hear it for Dr. Ming-Yee Iu, from whom we’ll certainly hear more in the near future.

Instil Software from Belfast is hosting this great jOOQ and Flyway workshop on July 16, 2014. “Unfortunately,” it is already sold out, but we’re sure that this killer productivity combination will be interesting for numerous future workshops. If you’re missing out on it, read this blog post to get an overview of jOOQ and Flyway. If you’re hosting such a workshop yourself, let us know. We’ll be more than happy to advertise it in our events section of the jOOQ website.

If you haven’t seen it already, consider reading the ZeroTurnaround Java Tools and Technologies Landscape for 2014, a survey of 2164 Java professionals (slightly biased towards ZeroTurnaround’s RebelLabs audience). jOOQ is listed and compared to other Java / SQL integration patterns. We’re clearly catching up with well-established brands like MyBatis – so stay tuned for more community news as we keep growing.

Feedback zone

You’ve read to the end of this newsletter, that’s great! Did you like it? What did we do great? What can we improve? What other subjects would you like us to cover?

We’d love to hear from you, so if you want to reach out to us, just drop a message to contact@datageekery.com. Looking forward to hearing from you!

Java 8 Friday: Let’s Deprecate Those Legacy Libs

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

For the last two Fridays, we’ve been off for our Easter break, but now we’re back with another fun article:

Let’s Deprecate Those Legacy Libs

d8938bef47ea2f62ed0543dd9e35a483Apart from Lambdas and extension methods, the JDK has also been enhanced with a lot of new library code, e.g. the Streams API and much more. This means that we can critically review our stacks and – to the great joy of Doctor Deprecator – throw out all the garbage that we no longer need.

Here are a couple of them, just to name a few:

LINQ-style libraries

There are lots of libraries that try to emulate LINQ (i.e. the LINQ-to-Collections part). We’ve already made our point before, because we now have the awesome Java 8 Streams API. 5 years from today, no Java developer will be missing LINQ any longer, and we’ll all be Streams-masters with Oracle Certified Streams Developer certifications hanging up our walls.

Don’t get me wrong. This isn’t about LINQ or Streams being better. They’re pretty much the same. But since we now have Streams in the JDK, why worry about LINQ? Besides, the SQLesque syntax for collection querying was misleading anyway. SQL itself is much more than Streams will ever be (or needs to be).

So let’s list a couple of LINQesque APIs, which we’ll no longer need:

LambdaJ

This was a fun attempt at emulating closures in Java through arcane and nasty tricks like ThreadLocal. Consider the following code snippet (taken from here):

// This lets you "close over" the
// System.out.println method
Closure println = closure(); { 
  of(System.out).println(var(String.class));
}

// in order to use it like so:
println.apply("one");
println.each("one", "two", "three");

Nice idea, although that semi-colon after closure(); and before that pseudo-closure-implementation block, which is not really a closure body… all of that seems quite quirky ;-)

Now, we’ll write:

Consumer<String> println = System.out::println;

println.accept("one");
Stream.of("one", "two", "three").forEach(println);

No magic here, just plain Java 8.

Let’s hear it one last time for Mario Fusco and Lambdaj.

Linq4j

Apparently, this is still being developed actively… Why? Do note that the roadmap also has a LINQ-to-SQL implementation in it, including:

Parser support. Either modify a Java parser (e.g. OpenJDK), or write a pre-processor. Generate Java code that includes expression trees.

Yes, we’d like to have such a parser for jOOQ as well. It would allow us to truly embed SQL in Java, similar to SQLJ, but typesafe. But if we have the Streams API, why not implement something like Streams-to-SQL?

We cannot say farewell to Julian Hyde‘s Linq4j just yet, as he’s still continuing work. But we believe that he’s investing in the wrong corner.

Coolection

This is a library with a fun name, and it allows for doing things like…

from(animals).where("name", eq("Lion"))
             .and("age", eq(2))
             .all();

from(animals).where("name", eq("Dog"))
             .or("age", eq(5))
             .all();

But why do it this way, when you can write:

animals.stream()
       .filter(a -> a.name.equals("Lion")
                 && a.age == 2)
       .collect(toList());

animals.stream()
       .filter(a -> a.name.equals("Dog")
                 || a.age == 5)
       .collect(toList());

Let’s hear it for Wagner Andrade. And then off to the bin

Half of Guava

Guava has been pretty much a dump for all sorts of logic that should have been in the JDK in the first place. Take com.google.guava.base.Joiner for instance. It is used for string-joining:

Joiner joiner = Joiner.on("; ").skipNulls();
. . .
return joiner.join("Harry", null, "Ron", "Hermione");

No need, any more. We can now write:

Stream.of("Harry", null, "Ron", "Hermione")
      .filter(s -> s != null)
      .collect(joining("; "));

Note also that the skipNulls flag and all sorts of other nice-to-have utilities are no longer necessary as the Streams API along with lambda expressions allows you to decouple the joining task from the filtering task very nicely.

Convinced? No?

What about:

And then, there’s the whole set of Functional stuff that can be thrown to the bin as well:

https://code.google.com/p/guava-libraries/wiki/FunctionalExplained

Of course, once you’ve settled on using Guava throughout your application, you won’t remove its usage quickly. But on the other hand, let’s hope that parts of Guava will be deprecated soon, in favour of an integration with Java 8.

JodaTime

Now, this one is a no-brainer, as the popular JodaTime library got standardised into the java.time packages. This is great news.

Let’s hear it for “Joda” Stephen Colebourne and his great work for the JSR-310.

Apache commons-io

The java.nio packages got even better with new methods that nicely integrate with the Streams API (or not). One of the main reasons why anyone would have ever used Apache Commons IO was the fact that it is horribly tedious to read files prior to Java 7 / 8. I mean, who would’ve enjoyed this piece of code (from here):

try (RandomAccessFile file = 
     new RandomAccessFile(filePath, "r")) {
    byte[] bytes = new byte[size];
    file.read(bytes);
    return new String(bytes); // encoding?? ouch!
}

Over this one?

List<String> lines = FileUtils.readLines(file);

But forget the latter. You can now use the new methods in java.nio.file.Files, e.g.

List<String> lines = Files.readAllLines(path);

No need for third-party libraries any longer!

Serialisation

Throw it all out, for there is JEP 154 deprecating serialisation. Well, it wasn’t accepted, but we could’ve surely removed about 10% of our legacy codebase.

A variety of concurrency APIs and helpers

With JEP 155, there had been a variety of improvements to concurrent APIs, e.g. to ConcurrentHashMaps (we’ve blogged about it before), but also the awesome LongAdders, about which you can read a nice article over at the Takipi blog.

Haven’t I seen a whole com.google.common.util.concurrent package over at Guava, recently? Probably not needed anymore.

JEP 154 (Serialisation) wasn’t real

It was an April Fools’ joke, of course…

Base64 encoders

How could this take so long?? In 2003, we’ve had RFC 3548, specifying Base16, Base32, and Base64 data encodings, which was in fact based upon base 64 encoding specified in RFC 1521, from 1993, or RFC 2045 from 1996, and if we’re willing to dig further into the past, I’m sure we’ll find earlier references to this simple idea of encoding binary data in text form.

Now, in 2014, we finally have JEP 135 as a part of the JavaSE8, and thus (you wouldn’t believe it): java.util.Base64.

Off to the trash can with all of these libraries!

… gee, it seems like everyone and their dog worked around this limitation, prior to the JDK 8…

More?

Provide your suggestions in the comments! We’re curious to hear your thoughts (with examples!)

Conclusion

As any Java major release, there is a lot of new stuff that we have to learn, and that allows us to remove third-party libraries. This is great, because many good concepts have been consolidated into the JDK, available on every JVM without external dependencies.

Disclaimer: Not everything in this article was meant seriously. Many people have created great pieces of work in the past. They have been very useful, even if they are somewhat deprecated now. Keep innovating, guys! :-)

Want to delve more into the many new things Java 8 offers? Go have a look over at the Baeldung blog, where this excellent list of Java 8 resources is featured:

http://www.baeldung.com/java8

… and stay tuned for our next Java 8 Friday blog post, next week!

Java 8 Friday: Java 8 Will Revolutionize Database Access

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. For our Java 8 series, we’re honoured to host a very relevant guest post by Dr. Ming-Yee Iu.

Dr. Ming-Yee Iu completed a PhD on Database Queries in Java at EPFL. He has created the open source project Jinq to demonstrate some new techniques for supporting database queries in Java.

Our editorial note:

Ever since Erik Meijer has introduced LINQ to the .NET ecosystem, us Java folks have been wondering whether we could have the same. We’ve blogged about this topic before, a couple of times:

While most LINQesque APIs in the Java ecosystem operate as internal domain-specific languages like jOOQ, some try to tackle the integration on a bytecode level, like JaQu.

JINQ formalises runtime bytecode transformations through what Dr. Ming-Yee Iu calls symbolic execution. We find this very interesting to a point that we wonder if we should start building a JINQ-to-jOOQ JINQ provider, where the expressive power of the Java 8 Streams API could be combined with our great SQL standardisation and transformation features…?

Convince yourselves:

Java 8 Goodie: Java 8 Will Revolutionize Database Access

Java 8 is finally here! After years of waiting, Java programmers will finally get support for functional programming in Java. Functional programming support helps streamline existing code while providing powerful new capabilities to the Java language. One area that will be disrupted by these new features is how programmers work with databases in Java. Functional programming support opens up exciting new possibilities for simpler yet more powerful database APIs. Java 8 will enable new ways to access databases that are competitive with those of other programming languages such as C#’s LINQ.

The Functional Way of Working With Data

Java 8 not only adds functional-support to the Java language, but it extends the Java collection classes with new functional ways of working with data. Traditionally, working with large amounts of data in Java requires a lot of loops and iterators.

For example, suppose you have a collection of Customer objects:

Collection<Customer> customers;

If you were only interested in the customers from Belgium, you would have to iterate over all the customers and save the ones you wanted.

Collection<Customer> belgians = new ArrayList<>();
for (Customer c : customers) {
    if (c.getCountry().equals("Belgium"))
        belgians.add(c);
}

This takes five lines of code. It is also poorly abstracted. What happens if you have 10 million customers, and you want to speed up the code by filtering it in parallel using two threads? You would have to rewrite everything to use futures and a lot of hairy multi-threaded code.

With Java 8, you can write the same code in one line. With its support for functional programming, Java 8 lets you write a function saying which customers you are interested in (those from Belgium) and then to filter collections using that function. Java 8 has a new Streams API that lets you do this.

customers.stream().filter(
    c -> c.getCountry().equals("Belgium")
);

Not only is the Java 8 version of the code shorter, but the code is easier to understand as well. There is almost no boilerplate. The code calls the method filter(), so it’s clear that this code is used for filtering customers. You don’t have to spend your time trying to decipher the code in a loop to understand what it is doing with its data.

And what happens if you want to run the code in parallel? You just have to use a different type of stream.

customers.parallelStream().filter(
    c -> c.getCountry().equals("Belgium")
);

What’s even more exciting is that this functional-style of code works with databases as well!

The Functional Way of Working with Databases

Traditionally, programmers have needed to use special database query languages to access the data in databases. For example, below is some JDBC code for finding all the customers from Belgium:

PreparedStatement s = con.prepareStatement(
      "SELECT * "
    + "FROM Customer C "
    + "WHERE C.Country = ? ");
s.setString(1, "Belgium");
ResultSet rs = s.executeQuery();

Much of the code is in the form a string, which the compiler can’t check for errors and which can lead to security problems due to sloppy coding. There is also a lot of boilerplate code that makes writing database access code quite tedious. Tools such as jOOQ solve the problem of error-checking and security by providing a database query language that can be written using special Java libraries. Or you can use tools such as object-relational mappers to hide a lot of boring database code for common access patterns, but if you need to write non-trivial database queries, you will still need to use a special database query language again.

With Java 8, it’s possible to write database queries using the same functional-style used when working with the Streams API. For example, Jinq is an open source project that explores how future database APIs can make use of functional programming. Here is a database query written using Jinq:

customers.where(
    c -> c.getCountry().equals("Belgium")
);

This code is almost identical to the code using the Streams API. In fact, future versions of Jinq will let you write queries directly using the Streams API. When the code is run, Jinq will automatically translate the code into a database query like the JDBC query shown before.

So without having to learn a new database query language, you can write efficient database queries. You can use the same style of code you would use for Java collections. You also don’t need a special Java compiler or virtual machine. All of this code compiles and runs using the normal Java 8 JDK. If there are errors in your code, the compiler will find them and report them to you, just like normal Java code.

Jinq supports queries that can be as complicated as SQL92. Selection, projection, joins, and subqueries are all supported. The algorithm for translating Java code into database queries is also very flexible in what code it will accept and translate. For example, Jinq has no problem translating the code below into a database query, despite its complexity.

customers
    .where( c -> c.getCountry().equals("Belgium") )
    .where( c -> {
        if (c.getSalary() < 100000)
            return c.getSalary() < c.getDebt();
        else
            return c.getSalary() < 2 * c.getDebt();
        } );

As you can see, the functional programming support in Java 8 is well-suited for writing database queries. The queries are compact, and complex queries are supported.

Inner Workings

But how does this all work? How can a normal Java compiler translate Java code into database queries? Is there something special about Java 8 that makes this possible?

The key to supporting these new functional-style database APIs is a type of bytecode analysis called symbolic execution. Although your code is compiled by a normal Java compiler and run in a normal Java virtual machine, Jinq is able to analyze your compiled Java code when it is run and construct database queries from them. Symbolic execution works best when analyzing small functions, which are common when using the Java 8 Streams API.

The easiest way to understand how this symbolic execution works is with an example. Let’s examine how the following query is converted by Jinq into the SQL query language:

customers
    .where( c -> c.getCountry().equals("Belgium") )

Initially, the customers variable is a collection that represents this database query

SELECT *
  FROM Customers C

Then, the where() method is called, and a function is passed to it. In this where() method, Jinq opens the .class file of the function and gets the compiled bytecode for the function to analyze. In this example, instead of using real bytecode, let’s just use some simple instructions to represent the bytecode of the function:

  1. d = c.getCountry()
  2. e = “Belgium”
  3. e = d.equals(e)
  4. return e

Here, we pretend that the function has been compiled by the Java compiler into four instructions. This is what Jinq sees when the where() method is called. How can Jinq make sense of this code?

Jinq analyzes the code by executing it. Jinq doesn’t run the code directly though. It runs the code ‘abstractly’. Instead of using real variables and real values, Jinq uses symbols to represent all values when executing the code. This is why the analysis is called symbolic execution.

Jinq executes each instruction and keeps track of all the side-effects or all the things that the code changes in the state of the program. Below is a diagram showing all the side-effects that Jinq finds when it executes the four lines of code using symbolic execution.

Symbolic execution example

Symbolic execution example

In the diagram, you can see how after the first instruction runs, Jinq finds two side-effects: the variable d has changed and the method Customer.getCountry() has been called. With symbolic execution, the variable d is not given a real value like “USA” or “Denmark”. It is assigned the symbolic value of c.getCountry().

After all the instructions have been executed symbolically, Jinq prunes the side-effects. Since the variables d and e are local variables, any changes to them are discarded after the function exits, so those side-effects can be ignored. Jinq also knows that the methods Customer.getCountry() and String.equals() do not modify any variables or show any output, so those method calls can also be ignored. From this, Jinq can conclude that executing the function produces only one effect: it returns c.getCountry().equals("Belgium").

Once Jinq has understood what the function passed to it in the where() method does, it can then merge this knowledge with the database query underlying the customers collection to create a new database query.

Generating a database query

Generating a database query

And that’s how Jinq generates database queries from your code. The use of symbolic execution means that this approach is quite robust to the different code patterns outputted by different Java compilers. If Jinq ever encounters code with side-effects that can’t be emulated using a database query, Jinq will leave your code untouched. Since everything is written using normal Java code, Jinq can just run that code directly instead, and your code will produce the expected results.

This simple translation example should have given you an idea of how the query translation works. You should feel confident that these algorithms can correctly generate database queries from your code.

An Exciting Future

I hope I have given you a taste for how Java 8 enables new ways of working with databases in Java. The functional programming support in Java 8 allows you write database code in a similar way to writing code for working with Java collections. Hopefully, existing database APIs will soon be extended to support these styles of queries.

To play with a prototype for these new types of queries, you can visit http://www.jinq.org

Ecto a Slick Query DSL for the Elixir Language

Elixir? What on earth is Elixir? It is a programming language that somewhat reminds me of Ruby. Here is some example Elixir code:

defmodule Hello do
  IO.puts "Defining the function world"

  def world do
    IO.puts "Hello World"
  end

  IO.puts "Function world defined"
end

And it has a Stack Overflow community with around 50 tagged questions as of December 2013. And it has a query DSL library called Ecto, which describes itself as such:

Ecto is a domain specific language for writing queries and interacting with databases in Elixir.

Example queries can be seen on the project’s GitHub readme page:

A simple SELECT

use Ecto.Query

query = from w in Weather,
      where: w.prcp > 0 or w.prcp == nil,
     select: w

Repo.all(query)

Using JOINs

query = from p in Post,
      where: p.id == 42,
  left_join: c in p.comments,
     select: assoc(p, c)

[post] = Repo.all(query)

post.comments.to_list

Looks like a slick (but not typesafe?) querying DSL mix between LINQ and Clojure’s sqlkorma.

For those brave ones among you using the Elixir language, this might be your one (and only) choice to access a SQL database! Good luck!

Does Java 8 Still Need LINQ? Or is it Better than LINQ?

Photo by Ade Oshineye

Erik Meijer, Tye Dye Expert. Photo by Ade Oshineye. Licensed under CC-BY-SA

LINQ was one of the best things that happened to the .NET software engineering ecosystem in a long time. With its introduction of lambda expressions and monads in Visual Studio 2008, it had catapulted the C# language way ahead of Java, which was at version 6 at the time, still discussing the pros and cons of generic type erasure. This achievement was mostly due to and accredited to Erik Meijer, a Dutch computer scientist and tye-dye expert who is now off to entirely other projects.

Where is Java now?

With the imminent release of Java 8 and JSR-355, do we still need LINQ? Many attempts of bringing LINQ goodness to Java have been made since the middle of the last decade. At the time, Quaere and Lambdaj seemed to be a promising implementation on the library level (not the language level). In fact, a huge amount of popular Stack Overflow questions hints at how many Java folks were (and still are!) actually looking for something equivalent:

Interestingly, “LINQ” has even made it into EL 3.0!

But do we really need LINQ?

LINQ has one major flaw, which is advertised as a feature, but in our opinion, will inevitably lead to the “next big impedance mismatch”. LINQ is inspired by SQL and this is not at all a good thing. LINQ is most popular for LINQ-to-Objects, which is a fine way of querying collections in .NET. The success of Haskell or Scala, however, has shown that the true functional nature of “collection querying” tends to employ entirely other terms than SELECT, WHERE, GROUP BY, or HAVING.  They use terms like “fold”, “map”, “flatMap”, “reduce”, and many many more. LINQ, on the other hand, employs a mixture of GROUP BY and terms like “skip”, “take” (instead of OFFSET and FETCH).

In fact, nothing could be further from the functional truth than a good old SQL partitioned outer join, grouping set, or framed window function. These constructs are mere declarations of what a SQL developer would like to see as a result. They’re not self-contained functions, which actually contain the logic to be executed in any given context. Moreover, window functions can be used only in SELECT and ORDER BY clauses, which is obvious when thinking in a declarative way, but which is also very weird if you don’t have the SQL context. Specifically, a window function in a SELECT clause influences the whole execution plan and the way indexes are employed to pre-fetch the right data.

Conversely, true functional programming can do so much more to in-memory collections than SQL ever can. Using a SQLesque API for collection querying was a cunning decision to trick “traditional” folks into adopting functional programming. But the hopes that collection and SQL table querying could be interfused were disappointed, as such constructs will not produce the wanted SQL execution plans.

But what if I am doing SQL?

It’s simple. When you do SQL, you have two essential choices.

  • Do it “top-down”, putting most focus on your Java domain model. In that case, use Hibernate / JPA for querying and transform Hibernate results using the Java 8 Streams API.
  • Do it “bottom-up”, putting most focus on your SQL / relational domain model. In that case, use JDBC or jOOQ and again, transform your results using the Java 8 Streams API.

This is illustrated more in detail here: http://www.hibernate-alternative.com

Don’t look back. Embrace the future!

While .NET was “ahead” of Java for a while, this was not due to LINQ itself. This was mainly due to the introduction of lambda expressions and the impact lambdas had on *ALL* APIs. LINQ is just one example of how such APIs could be constructed, although LINQ got most of the credit.

But I’m much more excited about Java 8’s new Streams API, and how it will embrace some functional programming in the Java ecosystem. A very good blog post by Informatech illustrates, how common LINQ expressions translate to Java 8 Streams API expressions.

So, don’t look back. Stop envying .NET developers. With Java 8, we will not need LINQ or any API that tries to imitate LINQ on the grounds of “unified querying”, which is a better sounding name for what is really the “query target impedance mismatch”. We need true SQL for relational database querying, and we need the Java 8 Streams API for functional transformations of in-memory collections. That’s it. Go Java 8!

LINQ and Java

LINQ has been quite a successful, but also controversial addition to the .NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves:

Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases.

Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();

Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked!

Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#!

Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:

  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.

Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8’s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala