Java 8 Friday: No More Need for ORMs

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

No More Need for ORMs

Debates about the usefulness of ORM (Object-Relational Mapping) have been going on for the last decade. While many people would agree that Hibernate and JPA solve a lot of problems very well (mostly the persistence of complex object graphs), others may claim that the mapping complexity is mostly overkill for data-centric applications. JPA solves mapping problems by establishing standardised, declarative mapping rules through hard-wired annotations on the receiving target types. We claim that many data-centric problems should not be limited by the narrow scope of these annotations, but be solved in a much more functional way. Java 8, and the new Streams API finally allow us to do this in a very concise manner! Let’s start with a simple example, where we’re using H2’s INFORMATION_SCHEMA to collect all tables and their columns. We’ll want to produce an ad-hoc data structure of the type Map<String, List<String>> to contain this information. For simplicity of SQL interaction, we’ll use jOOQ (as always, a shocker on this blog). Here’s how we prepare this:

public static void main(String[] args)
throws Exception {
    Class.forName("org.h2.Driver");
    try (Connection c = getConnection(
            "jdbc:h2:~/sql-goodies-with-mapping", 
            "sa", "")) {

        // This SQL statement produces all table
        // names and column names in the H2 schema
        String sql =
            "select table_name, column_name " +
            "from information_schema.columns " +
            "order by " +
                "table_catalog, " +
                "table_schema, " +
                "table_name, " +
                "ordinal_position";

        // This is jOOQ's way of executing the above
        // statement. Result implements List, which
        // makes subsequent steps much easier
        Result<Record> result =
        DSL.using(c)
           .fetch(sql)
    }
}

Now that we’ve set up this query, let’s see how we can produce the Map<String, List<String>> from the jOOQ Result:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

The above example produces the following output:
FUNCTION_COLUMNS: [ALIAS_CATALOG, ALIAS_SCHEMA, ...]
CONSTANTS: [CONSTANT_CATALOG, CONSTANT_SCHEMA, ...]
SEQUENCES: [SEQUENCE_CATALOG, SEQUENCE_SCHEMA, ...]
How does it work? Let’s go through it step-by-step

DSL.using(c)
   .fetch(sql)

// Here, we transform a List into a Stream
   .stream()

// We're collecting Stream elements into a new
// collection type
   .collect(

// The Collector is a grouping operation, producing
// a Map
            groupingBy(

// The grouping operation's group key is defined by
// the jOOQ Record's TABLE_NAME value
       r -> r.getValue("TABLE_NAME"),

// The grouping operation's group value is generated
// by this mapping expression...
       mapping(

// ... which is essentially mapping each grouped
// jOOQ Record to the Record's COLUMN_NAME value
           r -> r.getValue("COLUMN_NAME"),

// ... and then collecting all those values into a
// java.util.List. Whew
           toList()
       )
   ))

// Once we have this Map<String, List<String>> we can
// simply consume its entries with the following Consumer
// lambda expression
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

Got it? These things are certainly a bit tricky when playing around with it for the first time. The combination of new types, extensive generics, lambda expressions can be a bit confusing at first. The best thing is to simply practice with these things until you get a hang of it. After all, the whole Streams API is really a revolution compared to previous Java Collections APIs. The good news is: This API is final and here to stay. Every minute you spend practicing it is an investment into your own future. Note that the above programme used the following static import:

import static java.util.stream.Collectors.*;

Note also, that the output was no longer ordered as in the database. This is because the groupingBy collector returns a java.util.HashMap. In our case, we might prefer collecting things into a java.util.LinkedHashMap, which preserves insertion / collection order:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),

       // Add this Supplier to the groupingBy
       // method call
       LinkedHashMap::new,
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(...);

We could go on with other means of transforming results. Let’s imagine, we would like to generate simplistic DDL from the above schema. It’s very simple. First, we’ll need to select column’s data type. We’ll simply add it to our SQL query:

String sql =
    "select " +
        "table_name, " +
        "column_name, " +
        "type_name " + // Add the column type
    "from information_schema.columns " +
    "order by " +
        "table_catalog, " +
        "table_schema, " +
        "table_name, " +
        "ordinal_position";

I have also introduced a new local class for the example, to wrap name and type attributes:

class Column {
    final String name;
    final String type;

    Column(String name, String type) {
        this.name = name;
        this.type = type;
    }
}

Now, let’s see how we’ll change our Streams API method calls:

result
    .stream()
    .collect(groupingBy(
        r -> r.getValue("TABLE_NAME"),
        LinkedHashMap::new,
        mapping(

            // We now collect this new wrapper type
            // instead of just the COLUMN_NAME
            r -> new Column(
                r.getValue("COLUMN_NAME", String.class),
                r.getValue("TYPE_NAME", String.class)
            ),
            toList()
        )
    ))
    .forEach(
        (table, columns) -> {

            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

            System.out.println(");");
        }
    );

The output couldn’t be more awesome!
CREATE TABLE CATALOGS(
  CATALOG_NAME VARCHAR
);
CREATE TABLE COLLATIONS(
  NAME VARCHAR,
  KEY VARCHAR
);
CREATE TABLE COLUMNS(
  TABLE_CATALOG VARCHAR,
  TABLE_SCHEMA VARCHAR,
  TABLE_NAME VARCHAR,
  COLUMN_NAME VARCHAR,
  ORDINAL_POSITION INTEGER,
  COLUMN_DEFAULT VARCHAR,
  IS_NULLABLE VARCHAR,
  DATA_TYPE INTEGER,
  CHARACTER_MAXIMUM_LENGTH INTEGER,
  CHARACTER_OCTET_LENGTH INTEGER,
  NUMERIC_PRECISION INTEGER,
  NUMERIC_PRECISION_RADIX INTEGER,
  NUMERIC_SCALE INTEGER,
  CHARACTER_SET_NAME VARCHAR,
  COLLATION_NAME VARCHAR,
  TYPE_NAME VARCHAR,
  NULLABLE INTEGER,
  IS_COMPUTED BOOLEAN,
  SELECTIVITY INTEGER,
  CHECK_CONSTRAINT VARCHAR,
  SEQUENCE_NAME VARCHAR,
  REMARKS VARCHAR,
  SOURCE_DATA_TYPE SMALLINT
);

Excited? The ORM era may have ended just now

This is a strong statement. The ORM era may have ended. Why? Because using functional expressions to transform data sets is one of the most powerful concepts in software engineering. Functional programming is very expressive and very versatile. It is at the core of data and data streams processing. We Java developers already know existing functional languages. Everyone has used SQL before, for instance. Think about it. With SQL, you declare table sources, project / transform them onto new tuple streams, and feed them either as derived tables to other, higher-level SQL statements, or to your Java program. If you’re using XML, you can declare XML transformation using XSLT and feed results to other XML processing entities, e.g. another XSL stylesheet, using XProc pipelining. Java 8’s Streams are nothing else. Using SQL and the Streams API is one of the most powerful concepts for data processing. If you add jOOQ to the stack, you can profit from typesafe access to your database records and query APIs. Imagine writing the previous statement using jOOQ’s fluent API, instead of using SQL strings. jooq-the-best-way-to-write-sql-in-java The whole method chain could be one single fluent data transformation chain as such:

DSL.using(c)
   .select(
       COLUMNS.TABLE_NAME,
       COLUMNS.COLUMN_NAME,
       COLUMNS.TYPE_NAME
   )
   .from(COLUMNS)
   .orderBy(
       COLUMNS.TABLE_CATALOG,
       COLUMNS.TABLE_SCHEMA,
       COLUMNS.TABLE_NAME,
       COLUMNS.ORDINAL_POSITION
   )
   .fetch()  // jOOQ ends here
   .stream() // Streams start here
   .collect(groupingBy(
       r -> r.getValue(COLUMNS.TABLE_NAME),
       LinkedHashMap::new,
       mapping(
           r -> new Column(
               r.getValue(COLUMNS.COLUMN_NAME),
               r.getValue(COLUMNS.TYPE_NAME)
           ),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> {
            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

           System.out.println(");");
       }
   );

Java 8 is the future, and with jOOQ, Java 8, and the Streams API, you can write powerful data transformation APIs. I hope we got you as excited as we are! Stay tuned for more awesome Java 8 content on this blog.

Typesafe’s Slick is Not About SQL

We have stumbled upon an interesting thread on the Typesafe SLICK user group where Slick was compared to jOOQ. In that thread, Christopher Vogt has made a couple of interesting statements.

But let us have a look at the broader context, first.

Unifying Stuff

Ever since the proclamation of UDDI or RUP, we may think that the U for Unified is a clear and unmistakable indicator for what Joel Spolsky would call architecture astronautitis. In case you’ve missed those hilarious posts, here they are:

Today, many software vendors are again trying to unify database query languages. Erik Meijer’s LINQ was the most successful attempt at doing so, so far. But even LINQ doesn’t compare to Codd’s visions, which were about replacing the whole data model first by a rock-solid mathematical theory, and only then, thinking about appropriate languages to query such data models.

Flexible vs. rigid abstractions

We believe that unifying query languages to query RDBMS, XML, Objects, and NoSQL is a bad idea because such a unification is subject to either:

  • being a flexible abstraction
  • being a rigid abstraction

If an abstraction is flexible, then the heterogeneous implementation details of the abstracted data stores will inevitably leak into the query language and into your application. You don’t gain too much, for the price of adding more layers and boiler-plate.

Geek and Poke's Footprints - Licensed CC-BY 2.0
Geek and Poke’s Footprints – Licensed CC-BY 2.0

If an abstraction is rigid, then the unified query language (LINQ, JPQL, etc.) may be concise, but it will inevitably abstract away 80% of all useful features of the underlying data store. LINQ cannot meet the expressivity of SQL. Neither can it match the power of XPath/XQuery/XSLT/XProc, which is the most appropriate tool chain for XML. Maybe, it cannot even match what Java 8 calls the Streams API, which is very likely to become the most appropriate tool chain for objects and collections in Java.

Typesafe’s SLICK is Not About SQL

We’ve already compared Slick with jOOQ in our manual’s preface. Now, Christopher Vogt has made a clear statement about what SLICK is supposed to be and what SQL is:

There are understandable mistakes when your mind is (still) set on SQL. […]

Good luck with jooq and check back if you are ever annoyed by SQL semantics and want Scala back :).

That is only an extract of what Christopher said, of course, and there’s certainly quite a bit of goodness in SLICK. SLICK’s mission is to provide Scala collection semantics when querying databases. That might be a desireable thing to have in the Scala platform, specifically when comparing SLICK with LINQ.

But we’ve mentioned it before, on our blog. SQL is not an undesirable language or technology. Like any legacy technology, SQL has its ways. We’ve blogged about that, too, lots of times. SQL is a standard that is constantly evolving and that is here to stay. In our opinion, any technology operating on RDBMS but at the same time aiming for hiding SQL or abstracting it away completely is against the inevitable trend imposed by the big elephants who will not let go of their best-selling technologies.

SQL is about 10 years ahead of alternative RDBMS querying methods – most specifically Java, Scala, C# collection-based ones. T-SQL has now entered the TIOBE Top 10 and is considered by TIOBE to be the language of the year 2013, PL/SQL isn’t too far behind. Don’t fight SQL any longer, embrace it. Or in Christopher Vogt’s words:

Check back with SQL/jOOQ, if you are ever annoyed by the increasing amount of leaky or rigid abstraction created by modern language architects!

Further reading: “Don’t Jump the SQL Ship Just Yet”.

jOOQ Newsletter November 14, 2013

subscribe to the newsletter here

jOOQ Blog License now CC-BY-SA

Next to providing you with the best Java / SQL integration on the market, we’re also passionate bloggers on the matter of Java, SQL and Open Source. We think that with our experience around jOOQ, we should be major influencers on those subjects in general.

Our blog at blog.jooq.org will have reached the 200k hits threshold by the end of the week and we’ll most certainly celebrate that. Our topics and insights are increasingly appreciated by a wider and wider audience outside of the jOOQ user base, also on our syndication partners DZone (where we’ve had around 800k readers so far), JCG(readers unknown) and Tech.Pro (100k reads so far). The recent success shows that our marketing efforts pay off. Here are some stats from the jOOQ blog:

Because our blog is reaching far beyond our user base, we have decided to license its content under the terms of the CC-BY-SA 3.0 license, a permissive license that reflects our Open Source spirit. You may thus freely use our content for commercial purposes, if you attribute authorship to us. Please contact us, if you’re not sure how to create appropriate attribution.

Dual Licensing. An Experience Report

A month ago, we started dual-licensing jOOQ. We are happy to see that our competitors follow our lead in offering commercial services around their software. This is a strong indicator for having done something right. Here’s a little review from Data Geekery about the recent events around our new licensing model.

We have to admit that switching over from a very permissive Open Source license to more restrictive dual-licensing wasn’t exactly a walk in the park. Getting legal aspects right wasn’t easy. How many Open Source products out there do you think are neglecting due diligence with respect to copyright? Our estimate: 95%.

Yet, removing commercial database support from the jOOQ Open Source Edition has had only little impact on the number of downloads, nonetheless. After a short break in August / September (no jOOQ 3.1 patch releases), jOOQ 3.2 is almost as strong as ever as can be seen in this chart originating from oss.sonatype.org:

This doesn’t even count the number of downloads from www.jooq.org/download, or from SourceForge, before we removed the SourceForge download channel. The same effect can be seen on Stack Overflow and on GitHub, where jOOQ has had a significant increase of traction in the last 2-3 months!

Furthermore, with our recent discussions with the Apache GORA and Apache CloudStack guys, we’re positive that dual-licensing won’t keep jOOQ out of the professional Open Source world.

At the same time, sales talks around tailor-made agreements with medium and large customers are ongoing. We’re considering our work of the last 4 months a great success and we’re positive to be able to provide you with a much better jOOQ in the near future by creating professional Open Source software built on solid financial grounds, which everyone can greatly profit from.

Upcoming Events

As mentioned in the October newsletter, Lukas is going to be present at a number of events in the near future, talking about jOOQ and other database related stuff. ThejOOQ presentation at Topconf in beautiful Tallinn, Estonia has had around 35 attendants – well, it was hard to compete with the Google Glass presentation :-)

Upcoming events include

Stay informed about 2014 events on www.jooq.org/news.

SQL Zone – ORM (Un)Popularity

We’re personally thrilled by the fact that the ORM debate is far from over, even more than half a decade after the vietnam of computer science was first recognised. We firmly believe that ORMs are a very leaky abstraction, which is fine in “top-down” engineering approaches where the relational database is a second-class citizen.

But many companies don’t think that should be the case. Many companies want their data to be the first-class citizen, processed by more volatile entities, such as Java programs. It often just doesn’t make sense to have the data abide by the rules of the ORM. It is thus not surprising that Charles Humble from InfoQ has again detected increasing discomfort with ORMs at QCon and other conferences.

In our opinion, there’s a simple reason for this. SQL is constantly evolving, but JPA isn’t. Most importantly, JPA doesn’t do SQL as understood by the ISO / IEC standards. And it doesn’t look as though that’s going to change. Read our blog post on that subject.

SQL Zone – The History of NoSQL

A witty remark about what NoSQL really is has recently been made at the O’Reilly Strata Conference in London, where Mark Madsen, a popular researcher and analyst was walking around with a geeky T-Shirt depicting the History of NoSQL.

We sincerely hope that this awesome piece of humour will go viral. See for yourself:
https://twitter.com/edd/status/400190499585544192/photo/1

Clearly, betting on the “SQL horse” isn’t such a bad bet after all.

People Managing to Correctly Spell “Moron” in a Blog Comment

The notorious ORM pro / con discussion heavily amuses me. I always find it very funny when people have passionate discussions about which solution is better, rather than discussing about which solution is better suited for the problem at hand. In the case of ORMs vs. plain SQL, obviously, no solution is simply better as both techniques have their merits. When comparing ORMs with jOOQ, I think that this page summarises it pretty well:
http://www.hibernate-alternative.com

Now, this article and most specifically, one answer is hilarious:
http://java.dzone.com/articles/defense-hand-coded-sql

While the article’s author is already asking for trouble, check out this one particular answer. I love it when people manage to correctly spell “moron”:

People who handwrite SQL are invariably morons.

Here’s what you miss out when using a good ORM with generated mappings:

– Automatic first and second level caching

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

– True vendor independence. Yes, I’m switching between six different db’s in our products with zero issues.

– I work with objects, not relation sets. That kinda makes sense in an oop language.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

– Any decent ORM understands and injects vendor specific query hints better than you.

Also, get a clue.

Here’s my adequate reply to the above:

OK, now this was amusing :-)

– Automatic first and second level caching

This, obviously, is utterly impossible outside the world of ORMs.

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

True. No one has ever written a code generator before it was added to Hibernate.

– I work with objects, not relation sets. That kinda makes sense in an oop language

… which your DBA will probably always agree with. Remember to remind your manager why he bought that 1M$ Oracle license, when you run N+1 selects for fetching your OOP objects.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

Of course, there is always a black / white answer to “productivity”- questions. Like, how productively you can express a SQL:2003 MERGE statement with HQL. Or, how productively you can calculate a running total involving window functions, or maybe, recursive SQL with HQL.

– Any decent ORM understands and injects vendor specific query hints better than you.

That is indeed an amazing theory, which I was utterly unaware of.

The eternal debate between ORM lovers and haters. Mankind has always been this stupid.  Like the AC vs. DC discussion between Nikola Tesla and Thomas Edison

And, Eclipse will totally win over IntelliJ! ;-)

Use ModelMapper and jOOQ to Regain Control of your Domain Model

One of the things that Hibernate is quite good at is CRUD, i.e. persisting object graphs to the database. This is particularly true if your application runs in a Java domain-model-driven context. Your models are required to adhere to the standards set by JPA/Hibernate, of course. The same applies to mapping relational-model-driven data onto complex object graphs in memory. Again, you’ll have to adhere to the standards set by JPA/Hibernate. If you’re operating on rather complex relational models, mapping data onto rather complex domain models, then you might want to get back in control of the mapping process, as auto-mapping will cause more headaches than it solves problems. An interesting approach has been shown recently on the ModelMapper website in an example integration with jOOQ. (note, there is also an example integration with JDBI). With the permission of the author Jonathan Halterman, I’m citing this interesting example:

jOOQ Integration

ModelMapper’s jOOQ integration allows you to map a jOOQ Record to a JavaBean.

Setup

To get started, add the modelmapper-jooq Maven dependency to your project:

<dependency>
  <groupId>org.modelmapper</groupId>
  <artifactId>modelmapper</artifactId>
  <version>0.6.1</version>
</dependency>
<dependency>
  <groupId>org.modelmapper</groupId>
  <artifactId>modelmapper-jooq</artifactId>
  <version>0.6.1</version>
</dependency>

Next, configure ModelMapper to support the RecordValueReader, which allows for values to be read and mapped from a jOOQ Record:

modelMapper.getConfiguration()
           .addValueReader(new RecordValueReader());

Example Mapping

Now let’s see an example mapping of a jOOQ record to a JavaBean. Consider the following record representing an order:
order_id customer_id customer_address_street customer_address_city
345 678 123 Main Street SF
We may need to map this to a more complex object model:

// Assume getters and setters are present

public class Order {
  private int id;
  private Customer customer;
}

public class Customer {
  private Address address;
}

public class Address {
  private String street;
  private String city;
}

Since the source Record’s fields in this example uses an underscore naming convention, we’ll need to configure ModelMapper to tokenize source property names by underscore:

modelMapper
  .getConfiguration()
  .setSourceNameTokenizer(NameTokenizers.UNDERSCORE);

With that set, mapping an order Record to an Order object is simple:

Order order =
  modelMapper.map(orderRecord, Order.class);

And we can assert that values are mapped as expected:

assertEquals(456, order.getId());
assertEquals(789, order.getCustomer().getId());
assertEquals("123 Main Street",
             order.getCustomer()
                  .getAddress()
                  .getStreet());
assertEquals("SF",
             order.getCustomer()
                  .getAddress()
                  .getCity());

Explicit Mapping

While ModelMapper will do its best to implicitly match Record values to destination properties, sometimes you may need to explicitly define mappings between properties. Let’s map our Record’s customer_address_street to Order.customer.address.street:

PropertyMap<Record, Order> orderMap =
  new PropertyMap<Record, Order>() {
  protected void configure() {
    map(source("customer_address_street"))
        .getCustomer()
        .getAddress()
        .setStreet(null);
  }
};

Then we can add the mapping to our ModelMapper instance for the orderRecord:

modelMapper.createTypeMap(orderRecord, Order.class)
           .addMappings(orderMap);

(see the ModelMapper manual pages for more details about property mapping)

Things to Note

ModelMapper maintains a TypeMap for each source and destination type, containing the mappings bewteen the two types. For “generic” types such as Record this can be problematic since the structure of a Record can vary. In order to distinguish structurally different Records that map to the same destination type, we can provide a type map name to ModelMapper. Continuing with the example above, let’s map another order Record, this one with a different structure, to the same Order class:
order_id order_customer_id order_customer_address_street order_customer_address_city
444 777 123 Main Street LA
Mapping this Record to an order is simple, but we’ll need to provide a type map name to distinguish this Record to Order mapping from the previous unnamed mapping:

Order order = modelMapper.map(
    longOrderRecord, Order.class, "long");

Example taken from: http://modelmapper.org/user-manual/jooq-integration/

More Examples

When choosing ModelMapper, you’re not just chosing an API to map relational data to your domain model. ModelMapper is designed for arbitrary model transformation, which can make it a strategic choice for your stack. Check out this marvelous Open Source gem on the ModelMapper website.

High Complexity and Low Throughput. Reasons for Using an ORM.

I’ve recently stumbled upon an interesting blog post about when to use an ORM. I found it to be well-written and quite objective, specifically with respect to its model complexity and throughput diagram:

Original image taken from this blog post: http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html
Original image taken from this blog post: http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

The ORM or not ORM topic will probably never stop showing up on blogs. Some of them are more black and white, such as Jeff Atwood’s Object-Relational Mapping is the Vietnam of Computer Science others are more “50 shades of data access”, such as Martin Fowler’s ORM Hate.

I’m personally impressed by the work ORMs have done for us in times when repetitive SQL started to get boring and CRUD was not yet established. But ORMs do have their caveats as they are indeed leaky abstractions.

The aforementioned article shows in what situations ORMs can pull their weight, and in what situations you better keep operating on a SQL level, using tools like jOOQ, MyBatis, Apache DbUtils, or just simply JDBC.

Read the original blog post here:
http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

Other related articles:

LINQ and Java

LINQ has been quite a successful, but also controversial addition to the .NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves: Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases. Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();

Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked! Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#! Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:
  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.
Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8’s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala

Martin Fowler on “The Vietnam of Computer Science”

It couldn’t be a better match by a more suited person for jOOQ. Martin Fowler expresses his feelings about SQL, NoSQL, Object-relational mapping on his blog post, which is copied on DZone:

http://java.dzone.com/articles/martin-fowler-orm-hate

Think about the following two slogans:

“The Vietnam of Computer Science” – “A Peace Treaty Between SQL and Java”

Swell, no? :-)

ORM vs. SQL, compared to C vs. ASM

History is repeating itself. This is nothing new, but it takes wisdom (and Elephant memory) to remember when and how things had already happened in a similar way. When you feel that the whole SQL versus ORM debate is a bit boring and you may have seen it before, you’re probably right. It’s another religious war that can be compared to the war between C and Assembler some 40 years back. Think about it this way:

ORM vs. SQL

How many ORMs do you know that support:

  • Window functions
  • MERGE statement
  • Recursive queries
  • Common table expressions
  • Ordered aggregate functions
  • Statistical functions
  • Stored procedures (And I mean not just scalar ones!)
  • SQL-based XML
  • Temporary tables
  • … I could go on and on

ORMs are not just “higher-level” they are an entirely different paradigm, put on top of SQL. So, saying SQL is “lower-level” is not entirely accurate. It’s on a “different level”. ORMs are pretending that SQL is just a little “low-level” thing.

C vs. ASM

(citing reddit user henk53)

Which C compilers at the time made use of:

  • Advanced addressing modes
  • Opcode tricks
  • Special instructions (intrinsics came much later)
  • Jump tables (yeah, switch is close, but not quite it)
  • Build-in high level functions on some architectures (like string copy)
  • Page zero tricks
  • Segments (ugly things when all you have is small ones, but quite powerful when used fine-grained for ‘objects’ and structures)
  • Delay slot shadowing
  • Knowledge of fitting computations exactly in registers and L1.
  • … I could go on and on

C wasn’t just higher level, it was another paradigm, and yes, C compilers pretending that assembly is just a little “low-level” thing did raise our hackles back then. It threw away all the nice subtle intricacies of each specific CPU architecture and made them all equal by using only the common denominator of functionality that each CPU had.

It took a while, but eventually C compilers got better at optimizing than humans and could actually take advantage of advanced instructions in a CPU.

Original discussion

See an extract of the original thread here:
http://www.reddit.com/r/java/comments/sk25o/forget_hibernate_jooq_is_byfar_the_best_database/#c4f6z1w

MentaBean, the ORM/SQL Builder of Mentawai

I ran across an incredible discovery just now. I’m always closely observing jOOQ-related topics on Stack Overflow, so I immediately saw these somewhat advertising answers to a dedicated jOOQ user’s question:

MentaBean is an ORM/SQL Builder that has recently been made independent from Mentawai, a library building on top of the Servlet specification to simplify the lives of thousands of brazilian developers (counting the number of messages in the forum).

I’ve had a Stack Overflow chat with Sergio Oliveira Jr., one of the developers behind MentaBean, as I always find it interesting to chat with other people suffering from the heaviness and complexity of the Hibernate/JPA stack. His maxim is remarkable, if I may quote him quoting St. Exupéry: “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” This is a very poetic way of saying the same thing in eXtreme Programming lingo: Refactor Mercilessly. With this being your primary paradigm, I believe great, fun software can evolve. No need to say that I started to like Sergio :-)

While it doesn’t look at first as though Mentawai (and its associated MentaBean) will outperform the currently hyped Play ! Framework, or the well-established Wicket library, I still find it nice to see how much effort is put in OSS, worldwide.

See the Mentawai homepage for more details:

http://www.mentaframework.org/quick-start.jsp