Does Java 8 Still Need LINQ? Or is it Better than LINQ?

Photo by Ade Oshineye
Erik Meijer, Tye Dye Expert. Photo by Ade Oshineye. Licensed under CC-BY-SA

LINQ was one of the best things that happened to the .NET software engineering ecosystem in a long time. With its introduction of lambda expressions and monads in Visual Studio 2008, it had catapulted the C# language way ahead of Java, which was at version 6 at the time, still discussing the pros and cons of generic type erasure. This achievement was mostly due to and accredited to Erik Meijer, a Dutch computer scientist and tye-dye expert who is now off to entirely other projects.

Where is Java now?

With the imminent release of Java 8 and JSR-355, do we still need LINQ? Many attempts of bringing LINQ goodness to Java have been made since the middle of the last decade. At the time, Quaere and Lambdaj seemed to be a promising implementation on the library level (not the language level). In fact, a huge amount of popular Stack Overflow questions hints at how many Java folks were (and still are!) actually looking for something equivalent:

Interestingly, “LINQ” has even made it into EL 3.0!

But do we really need LINQ?

LINQ has one major flaw, which is advertised as a feature, but in our opinion, will inevitably lead to the “next big impedance mismatch”. LINQ is inspired by SQL and this is not at all a good thing. LINQ is most popular for LINQ-to-Objects, which is a fine way of querying collections in .NET. The success of Haskell or Scala, however, has shown that the true functional nature of “collection querying” tends to employ entirely other terms than SELECT, WHERE, GROUP BY, or HAVING.  They use terms like “fold”, “map”, “flatMap”, “reduce”, and many many more. LINQ, on the other hand, employs a mixture of GROUP BY and terms like “skip”, “take” (instead of OFFSET and FETCH).

In fact, nothing could be further from the functional truth than a good old SQL partitioned outer join, grouping set, or framed window function. These constructs are mere declarations of what a SQL developer would like to see as a result. They’re not self-contained functions, which actually contain the logic to be executed in any given context. Moreover, window functions can be used only in SELECT and ORDER BY clauses, which is obvious when thinking in a declarative way, but which is also very weird if you don’t have the SQL context. Specifically, a window function in a SELECT clause influences the whole execution plan and the way indexes are employed to pre-fetch the right data.

Conversely, true functional programming can do so much more to in-memory collections than SQL ever can. Using a SQLesque API for collection querying was a cunning decision to trick “traditional” folks into adopting functional programming. But the hopes that collection and SQL table querying could be interfused were disappointed, as such constructs will not produce the wanted SQL execution plans.

But what if I am doing SQL?

It’s simple. When you do SQL, you have two essential choices.

  • Do it “top-down”, putting most focus on your Java domain model. In that case, use Hibernate / JPA for querying and transform Hibernate results using the Java 8 Streams API.
  • Do it “bottom-up”, putting most focus on your SQL / relational domain model. In that case, use JDBC or jOOQ and again, transform your results using the Java 8 Streams API.

This is illustrated more in detail here: http://www.hibernate-alternative.com

Don’t look back. Embrace the future!

While .NET was “ahead” of Java for a while, this was not due to LINQ itself. This was mainly due to the introduction of lambda expressions and the impact lambdas had on *ALL* APIs. LINQ is just one example of how such APIs could be constructed, although LINQ got most of the credit.

But I’m much more excited about Java 8’s new Streams API, and how it will embrace some functional programming in the Java ecosystem. A very good blog post by Informatech illustrates, how common LINQ expressions translate to Java 8 Streams API expressions.

So, don’t look back. Stop envying .NET developers. With Java 8, we will not need LINQ or any API that tries to imitate LINQ on the grounds of “unified querying”, which is a better sounding name for what is really the “query target impedance mismatch”. We need true SQL for relational database querying, and we need the Java 8 Streams API for functional transformations of in-memory collections. That’s it. Go Java 8!

LINQ and Java

LINQ has been quite a successful, but also controversial addition to the .NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves:

Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases.

Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();

Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked!

Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#!

Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:

  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.

Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8’s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala

SLICK, integrating SQL into Scala

Now it’s official – even if version numbers are still preceded by a “zero” major release: SLICK has been publicly announced by Typesafe:

SLICK stands for Scala Language-Integrated Connection Kit, which is more or less the Scala equivalent for LINQ-to-SQL. Note that I say LINQ-to-SQL, not LINQ in general, as Scala already has sufficient means of querying collections using the Scala language itself.

Here’s a sample of what SLICK code will look like (taken from the SLICK website):

object Coffees extends Table[(String, Int, Double)]("COFFEES") {
  def name = column[String]("COF_NAME", O.PrimaryKey)
  def supID = column[Int]("SUP_ID")
  def price = column[Double]("PRICE")
  def * = name ~ supID ~ price
}

Coffees.insertAll(
  ("Colombian",         101, 7.99),
  ("Colombian_Decaf",   101, 8.99),
  ("French_Roast_Decaf", 49, 9.99)
)

val q = for {
  c <- Coffees if c.supID === 101
  //                       ^ comparing Rep[Int] to Rep[Int]!
} yield (c.name, c.price)

println(q.selectStatement)

q.foreach { case (n, p) => println(n + ": " + p) }

As you can see, SLICK neatly integrates with Scala’s own syntax. As with LINQ-to-SQL, SLICK’s goal is to

“write your database queries in Scala instead of SQL”

This is quite orthogonal to what jOOQ is aiming for:

“SQL was never meant to be anything other than… SQL!”

As a reminder, see my previous blog post about how jOOQ integrates with Scala, and how you can write almost-SQL queries in Scala using jOOQ against 13 popular databases. It would be interesting to compare the two approaches side-by-side in an independent evaluation, to see the pro’s and con’s of each one, in terms of

  • Developer productivity
  • Maintainability
  • Performance
  • Feature scope
  • etc.

I think it’s time someone made that evaluation. An example can be seen here:

http://stackoverflow.com/questions/10537766/closest-equivalent-to-sqlalchemy-for-java-scala

When will we have LINQ in Java?

LINQ is one of Microsoft’s .NET Framework’s most distinct language features. When it was first introduced to languages such as C#, it required heavy changes to the language specification. Yet, this addition was extremely powerful and probably unequalled by other languages / platforms, such as Java, Scala, etc. Granted, Scala has integrated XML in a similar fashion into its language from the beginning, but that is hardly the same accomplishment. Nowadays, Typesafe developers are developing SLICK – Scala Language Integrated Connection Kit, which has similar ambitions, although the effort spent on it is hardly comparable: one “official” Scala developer against a big Microsoft team. Let alone the potential of getting into patent wars with Microsoft, should SLICK ever become popular.

What does Java have to offer?

There are many attempts of bringing LINQ-like API’s to the Java world, as the following Stack Overflow question shows:

http://stackoverflow.com/questions/1217228/what-is-the-java-equivalent-for-linq

Here’s another newcomer project by Julian Hyde, that I’ve recently discovered:

https://github.com/julianhyde/linq4j

He tried his luck on the lambda-dev mailing list, without any response so far. We’re all eagerly awaiting Java 8 and project lambda with its lambda expressions and extension methods. But when will we be able to catch up with Microsoft’s LINQ? After all, jOOQ, linq4j are “internal domain specific languages”, which are all limited by the expressivity of their host language (see my previous blog post about building domain specific languages in Java).

Java 9 maybe? We can only hope!