Leaky Abstractions, or How to Bind Oracle DATE Correctly with Hibernate

We’ve recently published an article about how to bind the Oracle DATE type correctly in SQL / JDBC, and jOOQ. This article got a bit of traction on reddit with an interesting remark by Vlad Mihalcea, who is frequently blogging about Hibernate, JPA, transaction management and connection pooling on his blog. Vlad pointed out that this problem can also be solved with Hibernate, and we’re going to look into this, shortly.

What is the problem with Oracle DATE?

The problem that was presented in the previous article is dealing with the fact that if a query uses filters on Oracle DATE columns:

// execute_at is of type DATE and there's an index
PreparedStatement stmt = connection.prepareStatement(
    "SELECT * " + 
    "FROM rentals " +
    "WHERE rental_date > ? AND rental_date < ?");

… and we’re using java.sql.Timestamp for our bind values:

stmt.setTimestamp(1, start);
stmt.setTimestamp(2, end);

… then the execution plan will turn very bad with a FULL TABLE SCAN or perhaps an INDEX FULL SCAN, even if we should have gotten a regular INDEX RANGE SCAN.

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|*  2 |   TABLE ACCESS FULL| RENTAL |
-------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   2 - filter((INTERNAL_FUNCTION("RENTAL_DATE")>=:1 AND 
              INTERNAL_FUNCTION("RENTAL_DATE")<=:2))

This is because the database column is widened from Oracle DATE to Oracle TIMESTAMP via this INTERNAL_FUNCTION(), rather than truncating the java.sql.Timestamp value to Oracle DATE.

More details about the problem itself can be seen in the previous article

Preventing this INTERNAL_FUNCTION() with Hibernate

You can fix this with Hibernate’s proprietary API, using a org.hibernate.usertype.UserType.

Assuming that we have the following entity:

@Entity
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    public Timestamp rentalDate;
}

And now, let’s run this query here (I’m using Hibernate API, not JPA, for the example):

List<Rental> rentals =
session.createQuery("from Rental r where r.rentalDate between :from and :to")
       .setParameter("from", Timestamp.valueOf("2000-01-01 00:00:00.0"))
       .setParameter("to", Timestamp.valueOf("2000-10-01 00:00:00.0"))
       .list();

The execution plan that we’re now getting is again inefficient:

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|*  2 |   TABLE ACCESS FULL| RENTAL |
-------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   2 - filter((INTERNAL_FUNCTION("RENTAL0_"."RENTAL_DATE")>=:1 AND 
              INTERNAL_FUNCTION("RENTAL0_"."RENTAL_DATE")<=:2))

The solution is to add this @Type annotation to all relevant columns…

@Entity
@TypeDefs(
    value = @TypeDef(
        name = "oracle_date", 
        typeClass = OracleDate.class
    )
)
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    @Type(type = "oracle_date")
    public Timestamp rentalDate;
}

and register the following, simplified UserType:

import java.io.Serializable;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.util.Objects;

import oracle.sql.DATE;

import org.hibernate.engine.spi.SessionImplementor;
import org.hibernate.usertype.UserType;

public class OracleDate implements UserType {

    @Override
    public int[] sqlTypes() {
        return new int[] { Types.TIMESTAMP };
    }

    @Override
    public Class<?> returnedClass() {
        return Timestamp.class;
    }

    @Override
    public Object nullSafeGet(
        ResultSet rs, 
        String[] names, 
        SessionImplementor session, 
        Object owner
    )
    throws SQLException {
        return rs.getTimestamp(names[0]);
    }

    @Override
    public void nullSafeSet(
        PreparedStatement st, 
        Object value, 
        int index, 
        SessionImplementor session
    )
    throws SQLException {
        // The magic is here: oracle.sql.DATE!
        st.setObject(index, new DATE(value));
    }

    // The other method implementations are omitted
}

This will work because using the vendor-specific oracle.sql.DATE type will have the same effect on your execution plan as explicitly casting the bind variable in your SQL statement, as shown in the previous article: CAST(? AS DATE). The execution plan is now the desired one:

------------------------------------------------------
| Id  | Operation                    | Name          |
------------------------------------------------------
|   0 | SELECT STATEMENT             |               |
|*  1 |  FILTER                      |               |
|   2 |   TABLE ACCESS BY INDEX ROWID| RENTAL        |
|*  3 |    INDEX RANGE SCAN          | IDX_RENTAL_UQ |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(:1<=:2)
   3 - access("RENTAL0_"."RENTAL_DATE">=:1 
          AND "RENTAL0_"."RENTAL_DATE"<=:2)

If you want to reproduce this issue, just query any Oracle DATE column with a java.sql.Timestamp bind value through JPA / Hibernate, and get the execution plan as indicated here.

Don’t forget to flush shared pools and buffer caches to enforce the calculation of new plans between executions, because the generated SQL is the same each time.

Can I do it with JPA 2.1?

At first sight, it looks like the new converter feature in JPA 2.1 (which works just like jOOQ’s converter feature) should be able to do the trick. We should be able to write:

import java.sql.Timestamp;

import javax.persistence.AttributeConverter;
import javax.persistence.Converter;

import oracle.sql.DATE;

@Converter
public class OracleDateConverter 
implements AttributeConverter<Timestamp, DATE>{

    @Override
    public DATE convertToDatabaseColumn(Timestamp attribute) {
        return attribute == null ? null : new DATE(attribute);
    }

    @Override
    public Timestamp convertToEntityAttribute(DATE dbData) {
        return dbData == null ? null : dbData.timestampValue();
    }
}

This converter can then be used with our entity:

import java.sql.Timestamp;

import javax.persistence.Column;
import javax.persistence.Convert;
import javax.persistence.Entity;
import javax.persistence.Id;

@Entity
public class Rental {

    @Id
    @Column(name = "rental_id")
    public Long rentalId;

    @Column(name = "rental_date")
    @Convert(converter = OracleDateConverter.class)
    public Timestamp rentalDate;
}

But unfortunately, this doesn’t work out of the box as Hibernate 4.3.7 will think that you’re about to bind a variable of type VARBINARY:

// From org.hibernate.type.descriptor.sql.SqlTypeDescriptorRegistry

    public <X> ValueBinder<X> getBinder(JavaTypeDescriptor<X> javaTypeDescriptor) {
        if ( Serializable.class.isAssignableFrom( javaTypeDescriptor.getJavaTypeClass() ) ) {
            return VarbinaryTypeDescriptor.INSTANCE.getBinder( javaTypeDescriptor );
        }

        return new BasicBinder<X>( javaTypeDescriptor, this ) {
            @Override
            protected void doBind(PreparedStatement st, X value, int index, WrapperOptions options)
                    throws SQLException {
                st.setObject( index, value, jdbcTypeCode );
            }
        };
    }

Of course, we can probably somehow tweak this SqlTypeDescriptorRegistry to create our own “binder”, but then we’re back to Hibernate-specific API. This particular implementation is probably a “bug” at the Hibernate side, which has been registered here, for the record:

https://hibernate.atlassian.net/browse/HHH-9553

Conclusion

Abstractions are leaky on all levels, even if they are deemed a “standard” by the JCP. Standards are often a means of justifying an industry de-facto standard in hindsight (with some politics involved, of course). Let’s not forget that Hibernate didn’t start as a standard and massively revolutionised the way the standard-ish J2EE folks tended to think about persistence, 14 years ago.

In this case we have:

  • Oracle SQL, the actual implementation
  • The SQL standard, which specifies DATE quite differently from Oracle
  • ojdbc, which extends JDBC to allow for accessing Oracle features
  • JDBC, which follows the SQL standard with respect to temporal types
  • Hibernate, which offers proprietary API in order to access Oracle SQL and ojdbc features when binding variables
  • JPA, which again follows the SQL standard and JDBC with respect to temporal types
  • Your entity model

As you can see, the actual implementation (Oracle SQL) leaked up right into your own entity model, either via Hibernate’s UserType, or via JPA’s Converter. From then on, it will hopefully be shielded off from your application (until it won’t), allowing you to forget about this nasty little Oracle SQL detail.

Any way you turn it, if you want to solve real customer problems (i.e. the significant performance issue at hand), then you will need to resort to vendor-specific API from Oracle SQL, ojdbc, and Hibernate – instead of pretending that the SQL, JDBC, and JPA standards are the bottom line.

But that’s probably alright. For most projects, the resulting implementation-lockin is totally acceptable.

jOOQ Newsletter: November 28, 2014 – Black Friday jOOQ Sale – Only Today!

Subscribe to this newsletter here

Tweet of the Day and Webinar with Arun Gupta from Red Hat

Today, we have a very special Tweet of the Day by Oliver Hubaut who expresses what we believe is a general feeling in the industry. He says:

There is a lot of truth in his statement, albeit perhaps not the one he intended. Many people have misinterpreted JPA in the past, believing that it will be a full replacement for SQL. This couldn’t be farther from the truth.

Gavin King, the creator of Hibernate has told us the following:

… and this is also the point we’re trying to make. Join us next week on Wednesday, December 3 when we meet with Arun Gupta from Red Hat for his Webinar about JPA and jOOQ. If you have any questions that you’d like us to talk about, ask them here:

https://github.com/javaee-samples/webinars/issues/4

Black Friday Sale: Get 20% off any jOOQ purchase, today!

We’re participating in the Black Friday sale tradition and give you an incredible 20% off your purchase of any jOOQ license that you order today, Black Friday, November 28, 2014.

Ask your manager today to treat you to a wonderful pre-christmas gift! Don’t waste time, act quickly and order jOOQ licenses right now:

http://www.jooq.org/black-friday

jOOQ 3.5 released

If you’ve been following the jOOQ User Group, you’ve heard it already. Last Friday, we’ve shipped the awesome jOOQ 3.5 with loads of new features!

The new Binding SPI

The main improvement of this exciting release is the new org.jooq.Binding SPI which can be used to fully control all aspects of a user-type’s JDBC interaction. This goes much further than the existing org.jooq.Converter SPI that can be used to map standard JDBC types to user-types. With the new Binding SPI, virtually *ALL* vendor-specific types can be supported now. Examples include PostgreSQL’s JSON or HSTORE types, or Oracle’s DATE type – which is really incorrectly represented via java.sql.Timestamp, which is why we have retrofitted the existing <dateAsTimestamp/> feature to use such bindings, now.

Stored procedures are everywhere

Stored procedure support was generally improved in this release. This includes lots of new little features and conveniences for use with PL/SQL or Transact-SQL. For instance, jOOQ 3.5.0 now supports cross-schema references of PL/SQL OBJECT and TABLE types, which allows for binding directly to Oracle Spatial. We’ve blogged about this exciting improvement here:
https://blog.jooq.org/2014/11/04/painless-access-from-java-to-plsql-procedures-with-jooq/

And while we were at it, we’ve also added basic support for Oracle AQ, which integrates very nicely with our OBJECT type support!

In Transact-SQL and MySQL, we now support fetching arbitrary numbers of Results from stored procedures, and we’ve also implemented support for Firebird PSQL, including Firebird’s very interesting syntax for table-valued functions.

By the way, we support user-defined aggregate functions for a variety of databases, including Oracle, PostgreSQL, and HSQLDB. Definitely something you should look into!

SQL improvements

In this release, we’ve finally got support for UNION, INTERSECT, and EXCEPT right with respect to nesting such operations, as well as combining them with ORDER BY and LIMIT .. OFFSET.

Let’s talk some more DDL

We’ve continued to add support for DDL statements, including

  • CREATE TABLE
  • CREATE TABLE AS SELECT
  • CREATE VIEW and DROP VIEW
  • CREATE INDEX and DROP INDEX
  • CREATE SEQUENCE and DROP SEQUENCE
  • DROP .. IF EXISTS

We’ll continue to add support for more DDL statements also in the future.

Code generation improvements

We’ve added support for the new XMLDatabase, a code generation configuration that allows to read meta information from XML formats, e.g. from a standard INFORMATION_SCHEMA.xml, or from Vertabelo’s XML export format:
https://blog.jooq.org/2014/09/05/importing-your-erd-export-into-jooq/

Future versions of jOOQ will include built-in support for a variety of XML formats.

We’ve had an awesome contribution by Etienne Studer from Gradleware to help our users integrate the jOOQ code generation with their Gradle builds.

Last but not least: Informix!

Oh, and by the way, we now also support IBM’s second most popular database: Informix. Support for this database will be included in the jOOQ Enterprise Edition.

More information can be found here:
http://www.jooq.org/notes

jOOQ 3.2 End of Life

While 3.5 is out, 3.2 is now more than one year old, which means that it has reached its end of life. We’ll be shipping a last patch update 3.2.7 in early December. After that, we advise all our customers and users to upgrade to a newer minor release.

Do you want to stay on the 3.2 release? No problem, contact our sales team and we’ll find a solution for you.

The “Free”, “Standard”, “Open” Software Heresy

There are those people that have a strong, dogmatic belief in what they call “Free” or “Standard” or “Open” software. One of those individuals is Jimmie (let’s call him Jimmie in this article) who has responded to an article about Java persistence by Marco Behler on TheServerSide.

Let me cite Jimmie’s response here:

JPA is difficult but complete. It has a learning curve, and you’ll have surprises if you try to shortcut its complexities. But they mostly are there for a reason. Difficult stuff is difficult using JPA, that’s true.

JOOQ is quick to learn. And is proprietary stuff. Not free. Only one implementation. No public review, only one body involved in its evolution. SQL-oriented, not OO (ok, they say it’s a feature).
As a serious professional, learn JPA. Fully. There is no excuse for not knowing which sql queries are generated in your production app. Replacing it with a more basic framework is no solution.

Let’s not go deeply into the concrete difference between JPA and jOOQ / SQL. That topic has been discussed already in lengths on Reddit. Let’s consider the essence of the comparison as perceived by Jimmie. Because, Jimmie would probably say exactly the same thing when comparing

  • JSF with Ext.JS or ZK
  • PostgreSQL with Oracle
  • MS Office or Google Docs (probably OK cause “gratis”) with LibreOffice
  • Linux with Windows or MacOSX (although he might perform some doublethink as a Mac user)

Software not being free

Jimmie, Is YOUR software free and “not proprietary”? If so, how do you finance it? How do you earn a living? And why are you doing it? What really motivates you? What really motivates your customers and why?

Only one implementation

How many people actually do use alternatives to Hibernate and why? Are they using EclipseLink mainly because they used to use TopLink for the last 20 years and the learning curve (or benefit) to switch to Hibernate is too high? How often do you actually switch implementations? What keeps you from implementing the jOOQ API, and open-source its implementation?

And most importantly: Do you always adhere to the JPA API, even if Hibernate has lots of awesome, proprietary extensions that just happen to work so much better / easier?

No public review

Who exactly is “public”, and what are their main interests? Did you know that one of the major driving force for the JDK is Credit Suisse, being a large customer for Oracle in the Java environment, for instance? What is your stake and relation with Credit Suisse as your “public” representative?

Only one body involved in its evolution

Do you say that to YOUR customers also, about your own software as well?

SQL-oriented vs “a serious professional”

What’s not serious about SQL? In fact, SQL is reviewed by more entities than the JLS, let alone the JPA specs. Have you ever thought about that?

More basic

Fair enough. But don’t forget: You probably replaced your sophisticated EJB 2.0 framework (still a standard!) from the early 2000’s by a more basic one, which was (at the time) proprietary, had only one implementation, had no public review, nor multiple bodies involved in its evolution. It was, at the time, called Hibernate. And let me take the opportunity to cite Gavin King (creator of Hibernate) about when to use Hibernate:

gavin-king-on-hibernate

My reply to you, Jimmie

According to you, JPA has to be learned fully. So I challenge you to also FULLY learn SQL, including all the SQL:2011 clauses, including

  • window functions
  • grouping sets
  • common table expressions
  • distinct/match/type/submultiset/unique predicates
  • time periods
  • partitioned outer joins
  • lateral joins
  • standard OFFSET pagination
  • contextually typed value specifications
  • quantified comparison predicates

… and of course all the details of interoperation between SQL and XQuery, one of the most popular aspects of the SQL:2011 standard!

And please, learn this FULLY, regardless of whether these things are part of your specific implementation. Because as a serious professional, you shall fully learn SQL. And while you’re at that, learn also everything about execution plans, and join, fetch, buffer caching, cursor caching and all other sorts of algorithms. Because there is no excuse for not knowing which SQL transformations are generated by your database’s CBO.

I know you like standards, Jimmie. But beware of the fact that there are some people out there who cannot wait for a standard to evolve to solve their problems. They may have more immediate problems. More specific problems. Simpler problems. Problems that might be solved only by proprietary software, so far. Or problems that are solved by proprietary software, that can be put into production with much less effort than your standards, Jimmie.

Lower time-to-market is what your customer might consider “professional”. Not whether this or that tech is used.

Someone always invents something proprietary at some time. It might just evolve into a standard. It might have been a bad idea and not evolve into anything. Or it might evolve into a standard and then be the worst standard ever. See again: EJB 2.0. I think we all agree on that, today.

No, Jimmie, the world isn’t black and white. It isn’t just about standards vs. proprietary. About free (libre) vs. commercial. About free (gratis) vs. “closed”. It’s about creating value for your customer.

Oh, and Jimmie. I sincerely hope you’re neither a Windows, nor a Mac user, because that wouldn’t be free, and there is only one implementation of each OS, and no public review, and only one body involved in their evolutions. And yet, the whole world runs on one of them.

Thanks for your attention, Jimmie.

QueryDSL vs. jOOQ. Feature Completeness vs. Now More Than Ever

This week, Timo Westkämper from QueryDSL has announced feature completeness on the QueryDSL user group, along with his call for contributions and increased focus on bugfixes and documentation.

Timo and us, we have always been in close contact, observing each other’s products. In the beginning of jOOQ in 2009, QueryDSL was ahead of us.

But we learned quickly and removed all of our shortcomings such that jOOQ and QueryDSL were quickly at eye level by 2011. Ever since, we have been taking inspiration from one another, as in the end, we have had similar goals. Today, whenever someone is looking for a querying DSL, people tend to recommend either of our tools:

QueryDSL is often a good choice in JPA-based environments, while jOOQ is mostly the best choice in SQL-based environments, although jOOQ has already been given some credit in JPA-based environments as well:

Anyway, today, we’d like to congratulate Timo to his new job, and to QueryDSL’s feature completeness.

jOOQ, on the other hand, is far from feature complete.

jOOQ is what SQLJ should have been from the beginning.

We’re only at the beginning. Java and SQL are the two platforms that are used by most of the developers on this planet. According to db-engines.com, almost every popular DBMS is a SQL-based relational DBMS. According to TIOBE, Java currently ranks #2 among all languages.

We strongly believe that all of these developers are in dire need for better SQL integration into the Java language. While ORMs and JPA are very well integrated, SQL is not, and that is what we are working on. jOOQ will be feature complete when the Java compiler can natively compile actual SQL code and SQL code fragments into jOOQ, which will serve as its backing AST model for further SQL transformation.

Until we reach that goal, we’ll be adding support for more SQL goodness. A small selection of things that we already support, beyond QueryDSL’s “feature completeness”:

  • Table-valued functions
  • PIVOT tables
  • DDL (with jOOQ 3.4)
  • MERGE statement
  • Derived tables and derived column lists
  • Row value expressions
  • Flashback query
  • Window functions
  • Ordered aggregate functions
  • Common table expressions (with jOOQ 3.4)
  • Object-oriented PL/SQL
  • User-defined types
  • Hierarchical SQL
  • Custom SQL transformation
  • 16 supported RDBMS (even MS Access!)
  • … you name it

Our roadmap is full of great ideas. There’s plenty of work, so let’s get going! Join us, your partner for…

jOOQ is the best way to write SQL in Java

Popular ORMs Don’t do SQL

I’m contemplating about what has happened in the ISO / IEC SQL standard during the last 15 years. We’ve had quite a few new features added to our beloved SQL language. Check this out:

  • With the ISO/IEC SQL:1999 standard, we could take advantage of grouping sets and (recursive) common table expressions.
  • With the ISO/IEC SQL:2003 standard, we’ve had the very sophisticated window functions and the MERGE statement.
  • With the ISO/IEC SQL:2008 standard, we could perform partitioned JOINs.
  • With the ISO/IEC SQL:2011 standard, we can now interoperate with temporal databases (implemented so far in IBM DB2 and Oracle).

And obviously, there’s much more goodness hiding in the almost unreadable 1423-page-long documents.

But JPA…

Now, does any of these awesome features appear in JPA? Nope. Will the next SQL standard introduce new awesome features? I’m sure it will! I could imagine that the Oracle / CUBRID CONNECT BY clause, or the Oracle / SQL Server PIVOT / UNPIVOT clauses are good candiadates for standardisation. I’d go absolutely nuts if Oracle’s crazy MODEL clause would make it, too.

While exciting things happen at these ends, the ORM impedance mismatch will further deepen and confirm Charles Humble’s recent findings from the QCon, where he’s observed an increasing number of people who are unhappy with popular ORMs’ ever increasing complexity. A complexity example: NamedEntityGraph!

@NamedEntityGraph(
    name="ExecutiveProjects"
    attributeNodes={
        @NamedAttributeNode("address"),
        @NamedAttributeNode(
            value="projects",
            subgraph="projects"
        )
    },
    subgraphs={
        @NamedSubgraph(
            name="projects",
            attributeNodes={
                @NamedAttributeNode("properties")
            }
        ),
        @NamedSubgraph(
            name="projects",
            type=LargeProject.class,
            attributeNodes={
                @NamedAttributeNode("executive")
            }
        )
    }
)

Man, did this really have to be added to JPA? Stack Overflow cannot display that many annotations on a single screen! Well, if this is JEE’s answer to SQL’s recent evolutions, then I’m glad I don’t do too much JEE these days. I’m doing SQL, which is an awesome language if let running loose.

Does Java 8 Still Need LINQ? Or is it Better than LINQ?

Photo by Ade Oshineye
Erik Meijer, Tye Dye Expert. Photo by Ade Oshineye. Licensed under CC-BY-SA

LINQ was one of the best things that happened to the .NET software engineering ecosystem in a long time. With its introduction of lambda expressions and monads in Visual Studio 2008, it had catapulted the C# language way ahead of Java, which was at version 6 at the time, still discussing the pros and cons of generic type erasure. This achievement was mostly due to and accredited to Erik Meijer, a Dutch computer scientist and tye-dye expert who is now off to entirely other projects.

Where is Java now?

With the imminent release of Java 8 and JSR-355, do we still need LINQ? Many attempts of bringing LINQ goodness to Java have been made since the middle of the last decade. At the time, Quaere and Lambdaj seemed to be a promising implementation on the library level (not the language level). In fact, a huge amount of popular Stack Overflow questions hints at how many Java folks were (and still are!) actually looking for something equivalent:

Interestingly, “LINQ” has even made it into EL 3.0!

But do we really need LINQ?

LINQ has one major flaw, which is advertised as a feature, but in our opinion, will inevitably lead to the “next big impedance mismatch”. LINQ is inspired by SQL and this is not at all a good thing. LINQ is most popular for LINQ-to-Objects, which is a fine way of querying collections in .NET. The success of Haskell or Scala, however, has shown that the true functional nature of “collection querying” tends to employ entirely other terms than SELECT, WHERE, GROUP BY, or HAVING.  They use terms like “fold”, “map”, “flatMap”, “reduce”, and many many more. LINQ, on the other hand, employs a mixture of GROUP BY and terms like “skip”, “take” (instead of OFFSET and FETCH).

In fact, nothing could be further from the functional truth than a good old SQL partitioned outer join, grouping set, or framed window function. These constructs are mere declarations of what a SQL developer would like to see as a result. They’re not self-contained functions, which actually contain the logic to be executed in any given context. Moreover, window functions can be used only in SELECT and ORDER BY clauses, which is obvious when thinking in a declarative way, but which is also very weird if you don’t have the SQL context. Specifically, a window function in a SELECT clause influences the whole execution plan and the way indexes are employed to pre-fetch the right data.

Conversely, true functional programming can do so much more to in-memory collections than SQL ever can. Using a SQLesque API for collection querying was a cunning decision to trick “traditional” folks into adopting functional programming. But the hopes that collection and SQL table querying could be interfused were disappointed, as such constructs will not produce the wanted SQL execution plans.

But what if I am doing SQL?

It’s simple. When you do SQL, you have two essential choices.

  • Do it “top-down”, putting most focus on your Java domain model. In that case, use Hibernate / JPA for querying and transform Hibernate results using the Java 8 Streams API.
  • Do it “bottom-up”, putting most focus on your SQL / relational domain model. In that case, use JDBC or jOOQ and again, transform your results using the Java 8 Streams API.

This is illustrated more in detail here: http://www.hibernate-alternative.com

Don’t look back. Embrace the future!

While .NET was “ahead” of Java for a while, this was not due to LINQ itself. This was mainly due to the introduction of lambda expressions and the impact lambdas had on *ALL* APIs. LINQ is just one example of how such APIs could be constructed, although LINQ got most of the credit.

But I’m much more excited about Java 8’s new Streams API, and how it will embrace some functional programming in the Java ecosystem. A very good blog post by Informatech illustrates, how common LINQ expressions translate to Java 8 Streams API expressions.

So, don’t look back. Stop envying .NET developers. With Java 8, we will not need LINQ or any API that tries to imitate LINQ on the grounds of “unified querying”, which is a better sounding name for what is really the “query target impedance mismatch”. We need true SQL for relational database querying, and we need the Java 8 Streams API for functional transformations of in-memory collections. That’s it. Go Java 8!

Annotatiomania™ 2.1

It’s been a while since I have last ranted about annotations and the increasing Annotatiomania™ in the JEE ecosystem. I haven’t been exposed to much JEE either, until yesterday, when I was confronted with an awesome talk by Adam Bien at ch-open, where I’ve also held a talk about jOOQ. JEE 7 ships with lots of goodies.

But then, I saw this!

Let me treat you to a introductory example of how to use a stored procedure through JPA 2.1, which our friends from EclipseLink have had the courtesy to share with us:

    @NamedStoredProcedureQuery(
       name="ReadUsingMultipleResultSetMappings",
       procedureName="Read_Multiple_Result_Sets",
       resultSetMappings={
           "EmployeeResultSetMapping", 
           "AddressResultSetMapping", 
           "ProjectResultSetMapping", 
           "EmployeeConstructorResultSetMapping"
       }
   )
   
   @SqlResultSetMappings({
       @SqlResultSetMapping(
           name = "EmployeeResultSetMapping",
           entities = {
           @EntityResult(entityClass=Employee.class)
           }
       ),
       @SqlResultSetMapping(
           name="EmployeeConstructorResultSetMapping",
           classes = { 
               @ConstructorResult(
                   targetClass = EmployeeDetails.class,
                   columns = {
                       @ColumnResult(
                           name="EMP_ID", 
                           type=Integer.class
                       ),
                       @ColumnResult(
                           name="F_NAME", 
                           type=String.class
                       ),
                       @ColumnResult(
                           name="L_NAME", 
                           type=String.class
                       ),
                       @ColumnResult(
                           name="R_COUNT", 
                           type=Integer.class
                       )
                   }
               )
           }
       )
   })
   public Employee(){
       ....
   }

Obviously, Eclipse Copernicus (or what’s Kepler’s successor?) will ship with an Enterprise-licensed source code formatter built by Nobel Prize mathematicians to actually display the above. With that license, you also get a 50% discount coupon on the latest 67″ Samsung flat screen for an Enterprise coding experience. Awesome!

Also, whenever I hear “ReadUsingMultipleResultSetMappings”, I immediately think J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource, too. Some powerful permutation mathematics and random natural language processing is thus involved.

Not just did JPAnnotatiomania™ 2.1 treat us with stored procedure support, named fetch graphs are now also part of the game. Do note that with only a few lines of code, we will finally be safe from writing tedious SQL! Behold:

   @NamedEntityGraph(
       name="ExecutiveProjects"
       attributeNodes={
           @NamedAttributeNode("address"),
           @NamedAttributeNode(
               value="projects", 
               subgraph="projects"
           )
       },
       subgraphs={
           @NamedSubgraph(
               name="projects",
               attributeNodes={
                   @NamedAttributeNode("properties")
               }
           ),
           @NamedSubgraph(
               name="projects",
               type=LargeProject.class,
               attributeNodes={
                   @NamedAttributeNode("executive")
               }
           )
       }
   )

Another 5000$ spent on formatting licenses to format the above.

Conclusion

No longer shall effective developers be paid by the lines of code, but they should be paid by the lines of @-signs they write. And I’m surely looking forward to Adam Bien’s next talk, about JEE 7 Best Practices!

High Complexity and Low Throughput. Reasons for Using an ORM.

I’ve recently stumbled upon an interesting blog post about when to use an ORM. I found it to be well-written and quite objective, specifically with respect to its model complexity and throughput diagram:

Original image taken from this blog post: http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html
Original image taken from this blog post: http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

The ORM or not ORM topic will probably never stop showing up on blogs. Some of them are more black and white, such as Jeff Atwood’s Object-Relational Mapping is the Vietnam of Computer Science others are more “50 shades of data access”, such as Martin Fowler’s ORM Hate.

I’m personally impressed by the work ORMs have done for us in times when repetitive SQL started to get boring and CRUD was not yet established. But ORMs do have their caveats as they are indeed leaky abstractions.

The aforementioned article shows in what situations ORMs can pull their weight, and in what situations you better keep operating on a SQL level, using tools like jOOQ, MyBatis, Apache DbUtils, or just simply JDBC.

Read the original blog post here:
http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

Other related articles:

LINQ and Java

LINQ has been quite a successful, but also controversial addition to the .NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves:

Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases.

Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();

Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked!

Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#!

Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:

  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.

Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8’s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala

A Typesafety Comparison of SQL Access APIs

SQL is a very expressive and distinct language. It is one of the few declarative languages which are used by a broad audience in everyday work. As a declarative language, SQL allows to specify what we’re expecting as output, not how this output should be produced. As a side-effect of this, ad-hoc record data types are created by every statement. An example:

-- A (id: integer, title: varchar) type is created
SELECT id, title
FROM book;

The above statement generates a cursor whose records have a well-defined record type with these properties:

  • The degree of the record is 2
  • The column names are id and title
  • The column types are integer and varchar
  • The column id can be accessed at index 1. The column title can be accessed at index 2

In other words, SQL records combine features from records (access by name) and tuples (access by index). They can be seen like typesafe associative “map-arrays”, where map keys are formally bound to array indexes and their associated key/index type.

Another, more complex example shows how these ad-hoc record types can be reused within a SQL statement!

-- A (c: bigint) type is created
SELECT count(*) c
FROM book

-- A (..: integer, ..: integer) type is created and compared with...
WHERE (author_id, language_id) IN (

  -- ... another, compatible (..: integer, ..: integer) type
  SELECT a.id, a.language_id
  FROM author a
)

This query counts books written by authors in their native language.

In the above example, the projected record type is a bit simpler. It contains only one column. The interesting part is the row value expression IN comparison predicate, which compares two compatible (integer, integer) types. In SQL, you can typesafely create ad-hoc record types and immediately compare them with other ad-hoc record types. In these comparisons, column names are not important, but column indexes (and associated column types) are.

Comparing various SQL access APIs

The previous examples show how SQL allows for the formal declaration of record types including record degree, column names, column indexes, column types. While SQL is very expressive in that matter, many client languages accessing SQL are less expressive. When comparing expressiveness and typesafety, two features should be taken into consideration:

  1. Are the records produced into the client language typesafe?
  2. Are the SQL statements produced from the client language typesafe and syntax-safe?

Let’s have a look at various accessing techniques, and how expressive they are in terms of the above typesafety requirements:

JDBC: Least typesafety

JDBC offers the least expressiveness and typesafety. This isn’t surprising, as JDBC is a very low-level API. It offers:

  1. No typesafety whatsoever when accessing result records.
  2. No typesafety or syntax-safety whatsoever when producing SQL statements.

Here is an example:

PreparedStatement stmt = null;
ResultSet rs = null;

try {

  // SQL statements are just strings. Constructing them is not
  // typesafe or syntax-safe
  stmt = connection.prepareStatement(
    "SELECT id, title FROM book WHERE id = ?");

  // Bind values are set by index. There is no typesafety or
  // "index safety"
  stmt.setInt(1, 15);

  rs = stmt.executeQuery();
  while (rs.next()) {

    // There is no typesafety or "index safety" when accessing
    // result record values
    System.out.println(
      "ID: " + rs.getInt(1) + ", TITLE: " + rs.getString(2));
  }
}
finally {
  closeSafely(stmt, rs);
}

Now, this wasn’t surprising. JDBC makes up for the lack of typesafety by being absolutely general. It is possible to implement a JDBC driver for any type of relational database, no matter what kinds of SQL and JDBC features they really support.

JPA: Some typesafety

JPA has implemented quite a bit of typesafety mostly on top of JPQL, but also slightly on top of SQL. With JPA, you can have:

  1. Some typesafety when accessing records.
  2. Some typesafety and syntax-safety when producing JPQL statements through the CriteriaQuery API (not SQL statements).

Record access typesafety can be guaranteed when you project the outcome of your statements onto your JPA-annotated entities. While the mapping itself isn’t really typesafe, the outcome is, as a Java class is the closest match to a SQL record. A Java class, much like a SQL record, has:

  • A degree, expressed in the number of properties
  • Column names, expressed as property names
  • Column types, expressed as property types
  • But: No column indexes. Properties have no explicit order

JPA record mapping has additional features that exceed the expressiveness of SQL, as “flat”, tabular result sets can be mapped onto object hierarchies. In any case, you will have to create one record / entity type per query to profit from this typesafety. If you’re not projecting all columns from every table, but ad-hoc records (including values derived from functions), you will lose this typesafety again.

When it comes to statement typesafety, JPA offers the CriteriaQuery API to produce typesafe JPQL statements. The CriteriaQuery API is often criticised for its verboseness and for the fact that resulting client code is hard to read. Here is an example taken from the CriteriaQuery API docs:

CriteriaQuery<String> q = cb.createQuery(String.class);
Root<Order> order = q.from(Order.class);
q.select(order.get("shippingAddress").<String>get("state"));
 
CriteriaQuery<Product> q2 = cb.createQuery(Product.class);
q2.select(q2.from(Order.class)
            .join("items")
            .<Item,Product>join("product"));

It can be seen that there is only a limited amount of typesafety in the above query construction:

  • Columns are accessed by string literals, such as "shippingAddress".
  • Generic entity types are not really checked. The <Item,Product> generic parameters might as well be wrong.

Of course, there are more typesafe API parts in JPA’s CriteriaQuery API. Using those API parts quickly lead to the aforementioned verbosity, though, as can be seen in this Stack Overflow question, or in the Java EE 6 Tutorials.

LINQ: Much typesafety (in .NET)

LINQ goes very far in offering typesafety in both dimensions:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing LINQ-to-SQL statements (not SQL statements).

As LINQ is formally integrated into various .NET languages, it has the advantage of being able to produce formally defined record types, directly into the target language (e.g. C#). Not only can typesafe records be produced, the LINQ-to-SQL statement is formally verified by the compiler as well. An example

// Typesafe renaming (aliasing with "AS" in SQL)
From p In db.Products
// Typesafe (named!) variable binding
Where p.UnitsInStock <= ReorderLevel AndAlso Not p.Discontinued
// The typesafe projection will produce a Products record
Select p

Another example from Stack Overflow can be seen here:

// Producing a C# tuple
var r = from u in db.Users
        join s in db.Staffs on u.Id equals s.UserId
        select new Tuple<User, Staff>(u, s);

// Producing an anonymous record type
var r = from u in db.Users
    select new { u.Name, 
                 u.Address,
                 ...,
                 (from s in db.Staffs 
                  select s.Password where u.Id == s.UserId) 
               };

LINQ has many obvious advantages when it comes to typesafety. In the case of LINQ, this comes at the price of losing actual SQL expressivity and syntax, as LINQ-to-SQL is not really SQL (just as JPQL is not really SQL either). The SQL querying API is partially shared with other, heterogeneous querying targets, such as LINQ-to-Entities, LINQ-to-Collections, LINQ-to-XML. This will reduce LINQ’s feature scope (see also a previous blog post, and I will soon blog about this again).

But C# offers all typesafety aspects that a SQL record offers as well: degree, column name (anonymous types), column index (tuples), column types (both types and tuples).

SLICK: Much typesafety (in Scala)

SLICK has been inspired by LINQ, and can thus offer a lot of typesafety as well. It offers:

  1. Much typesafety when accessing tuples (not records).
  2. Much typesafety when producing SLICK statements (not SQL statements).

SLICK takes advantage of Scala’s integrated tuple expressions. This is best shown by example:

// "for" is the "entry-point" to the DSL
val q = for {

    // FROM clause   WHERE clause
    c <- Coffees     if c.supID === 101

// SELECT clause and projection to a tuple
} yield (c.name, c.price)

The above example shows that the projection onto a (String, Int) tuple is done typesafely by the yield method. At the same time, the whole query expression is formally validated by the compiler, as SLICK makes heavy use of Scala’s language features in order to introduce an internal DSL for querying. Much more than LINQ, SLICK has a unique syntax that doesn’t remind of SQL any more. It is not obvious how subqueries, complex joins, grouping and aggregation can be expressed.

jOOQ: Much typesafety

jOOQ is mainly inspired by SQL itself and embraces all the features that SQL offers. It has thus:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing SQL statements.

jOOQ offers similar capabilities as JPA when it comes to mapping SQL result sets onto records, although JPA’s mapping type hierarchies are not supported by jOOQ. But jOOQ also allows for typesafe tuple access, the way SLICK has implemented it. Ad-hoc records produced by arbitrary query projections will maintain their various column types through generic Record1<T1>, Record2<T1, T2>, Record3<T1, T2, T3>, … record types. Unlike in Java, this can be leveraged extensively in Scala, where these typesafe Record[N] types can be used just like Scala’s tuples.

On the other hand, just like LINQ-to-SQL, which has formally integrated querying as a first-class citizen into .NET languages, jOOQ allows for heavy type-checking and syntax-checking, when writing SQL statements in Java.

In SQL, you can typesafely write things like:

SELECT * FROM t WHERE (t.a, t.b) = (1, 2)
SELECT * FROM t WHERE (t.a, t.b) OVERLAPS (date1, date2)
SELECT * FROM t WHERE (t.a, t.b) IN (SELECT x, y FROM t2)
UPDATE t SET (a, b) = (SELECT x, y FROM t2 WHERE ...)
INSERT INTO t (a, b) VALUES (1, 2)

In jOOQ 3.0, you can (also typesafely!) write

select().from(t).where(row(t.a, t.b).eq(1, 2));
// Type-check here: ----------------->  ^^^^
 
select().from(t).where(row(t.a, t.b).overlaps(date1, date2));
// Type-check here: ------------------------> ^^^^^^^^^^^^
 
select().from(t).where(row(t.a, t.b).in(select(t2.x, t2.y).from(t2)));
// Type-check here: -------------------------> ^^^^^^^^^^
 
update(t).set(row(t.a, t.b), select(t2.x, t2.y).where(...));
// Type-check here: --------------> ^^^^^^^^^^

insertInto(t, t.a, t.b).values(1, 2);
// Type-check here: ---------> ^^^^

This also applies for existing API, which doesn’t involve row value expressions:

select().from(t).where(t.a.eq(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^
 
select().from(t).where(t.a.eq(any(select(t2.x).from(t2)));
// Type-check here: -------------------> ^^^^
 
select().from(t).where(t.a.in(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^

select(t1.a, t1.b).from(t1).union(select(t2.a, t2.b).from(t2));
// Type-check here: -------------------> ^^^^^^^^^^

jOOQ is not SQL, but unlike other attempts of introducing SQL as an internal domain-specific language into host languages like Java, Scala, C#, jOOQ looks very much like SQL thanks to its unique fluent API technique, which informally follows an underlying BNF notation.

Even if Java offers less expressiveness than other languages like C# or Scala, jOOQ probably comes closest to both result record typesafety and SQL syntax safety in the Java world.