jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk:

Why Everyone Hates Operator Overloading

… no, don’t tell me you like Perl. Because you don’t. You never did. It does horrible things. It makes your code look like…

Designing Perl 6
Designing Perl 6 as found on the awesome Perl Humour page

Perl made heavy use of operator overloading and used operators for a variety of things. A similar tendency can be seen in C++ and Scala. See also people comparing the two. So what’s wrong with operator overloading?

People never agreed whether Scala got operator overloading right or wrong:

Usually, people then cite the usual suspects, such as complex numbers (getting things right):

class Complex(val real:Int, 
              val imaginary:Int) {
    def +(operand:Complex):Complex = {
        new Complex(real + operand.real, 
                    imaginary + operand.imaginary)
    }
 
    def *(operand:Complex):Complex = {
        new Complex(real * operand.real - 
                    imaginary * operand.imaginary,
            real * operand.imaginary + 
            imaginary * operand.real)
    }
}

The above will now allow for adding and multiplying complex numbers, and there’s absolutely nothing wrong with that:

val c1 = new Complex(1, 2)
val c2 = new Complex(2, -3)
val c3 = c1 + c2
 
val res = c1 + c2 * c3

But then, there are these weirdo punctuation things that make average programmers simply go mad:

 ->
 ||=
 ++=
 <=
 _._
 ::
 :+=

Don’t believe it? Check out this graph library!

To the above, we say:

Operator Overloading? Meh

How operator overloading should be

Operator overloading can be good, but mostly isn’t. In Java, we’re all missing better ways to interact with BigDecimal and similar types:

// How it is:
bigdecimal1.add(bigdecimal2.multiply(bigdecimal3));

// How it should be:
bigdecimal1 + bigdecimal2 * bigdecimal3

Of course, operator precedence would take place as expected. Unlike C++ or Scala, ideal operator overloading would simply map common operators to common method names. Nothing more. No one really wants API developers to come up with fancy ##-%>> operators.

While Ceylon, Groovy, and Xtend implemented this in a somewhat predictable and useful way, Kotlin is probably the language that has implemented the best standard operator overloading mechanism into their language. Their documentation states:

Binary operations

Expression Translated to
a + b a.plus(b)
a – b a.minus(b)
a * b a.times(b)
a / b a.div(b)
a % b a.mod(b)
a..b a.rangeTo(b)

That looks pretty straightforward. Now check this out:

“Array” access

Symbol Translated to
a[i] a.get(i)
a[i, j] a.get(i, j)
a[i_1, …, i_n] a.get(i_1, …, i_n)
a[i] = b a.set(i, b)
a[i, j] = b a.set(i, j, b)
a[i_1, …, i_n] = b a.set(i_1, …, i_n, b)

Now, I really don’t see a single argument against the above. This goes on, and unfortunately, Java 8 has missed this train, as method references cannot be assigned to variables and invoked like JavaScript functions (although, that’s not too late for Java 9+):

Method calls

Symbol Translated to
a(i) a.invoke(i)
a(i, j) a.invoke(i, j)
a(i_1, …, i_n) a.invoke(i_1, …, i_n)

Simply beautiful!

Conclusion

We’ve recently blogged about Ceylon’s awesome language features. But the above Kotlin features are definitely a killer and would remove any other sorts of desires to introduce operator overloading in Java for good.

Let’s hope future Java versions take inspiration from Kotlin, a language that got operator overloading right.

ORM vs. SQL, compared to C vs. ASM

History is repeating itself. This is nothing new, but it takes wisdom (and Elephant memory) to remember when and how things had already happened in a similar way. When you feel that the whole SQL versus ORM debate is a bit boring and you may have seen it before, you’re probably right. It’s another religious war that can be compared to the war between C and Assembler some 40 years back. Think about it this way:

ORM vs. SQL

How many ORMs do you know that support:

  • Window functions
  • MERGE statement
  • Recursive queries
  • Common table expressions
  • Ordered aggregate functions
  • Statistical functions
  • Stored procedures (And I mean not just scalar ones!)
  • SQL-based XML
  • Temporary tables
  • … I could go on and on

ORMs are not just “higher-level” they are an entirely different paradigm, put on top of SQL. So, saying SQL is “lower-level” is not entirely accurate. It’s on a “different level”. ORMs are pretending that SQL is just a little “low-level” thing.

C vs. ASM

(citing reddit user henk53)

Which C compilers at the time made use of:

  • Advanced addressing modes
  • Opcode tricks
  • Special instructions (intrinsics came much later)
  • Jump tables (yeah, switch is close, but not quite it)
  • Build-in high level functions on some architectures (like string copy)
  • Page zero tricks
  • Segments (ugly things when all you have is small ones, but quite powerful when used fine-grained for ‘objects’ and structures)
  • Delay slot shadowing
  • Knowledge of fitting computations exactly in registers and L1.
  • … I could go on and on

C wasn’t just higher level, it was another paradigm, and yes, C compilers pretending that assembly is just a little “low-level” thing did raise our hackles back then. It threw away all the nice subtle intricacies of each specific CPU architecture and made them all equal by using only the common denominator of functionality that each CPU had.

It took a while, but eventually C compilers got better at optimizing than humans and could actually take advantage of advanced instructions in a CPU.

Original discussion

See an extract of the original thread here:
http://www.reddit.com/r/java/comments/sk25o/forget_hibernate_jooq_is_byfar_the_best_database/#c4f6z1w