jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk:

4 thoughts on “jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying

  1. “Because functional-style code has no side effects, it’s easier to analyze.” Does this mean it’s easier to analyze from a code perspective? Since it has no side effects, wouldn’t that mean it’s harder to analyze debugging?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s