There is no Such Thing as Object-Relational Impedance Mismatch

Much of the ORM criticism of the last decade missed the point, being inaccurate. By the end of this article, we will conclude with the following:

There is no significant difference between the relational (data) model and object oriented models

How to come to this conclusion? Read on!

How we came to believe in this fallacy

Many popular bloggers and opinion leaders have missed no chance to bash ORMs for their “obvious” impedance mismatch with the relational world. N+1, inefficient queries, library complexity, leaky abstractions, all sorts of buzzwords have been employed to dismiss ORMs – often containing a lot of truth, albeit without providing a viable alternative.

But are these articles really criticising the right thing?

Few of the above articles recognise a central fact, which has been elicited eloquently and humorously by Erik Meijer and Gavin Bierman in his very interesting paper “A co-Relational Model of Data for Large Shared Data Banks“, subtitled:

Contrary to popular belief, SQL and noSQL are really just two sides of the same coin.

Or in other words: The “hierarchical” object world and the “relational” database world model the exact same thing. The only difference is the direction of the arrows that you draw in your diagrams.

Let this sink in.

  • In the relational model, children point to their parent.
  • In the hierarchical model, parents point to their children.

That’s all there is to it.

hierarchical-vs-relational

What is an ORM?

ORMs fill the bridge between the two worlds. They’re the inverters of arrows, if you will. They will make sure that every “relation” in your RDBMS can be materialised as an “aggregation” or “composition” in your “hierarchical” world (this works for objects, XML, JSON, and any other format). They make sure that such materialisation is properly transacted. That changes to individual attributes or to relational (aggregational, compositional) attributes are properly tracked and purged back into the master model, the database – where the model is persisted. Individual ORMs differ in terms of offered features and in how much mapping logic they offer in addition to mapping individual entities to individual types.

  • Some ORMs may help you implement locking
  • Some may help you to patch model mismatches
  • Some may focus merely on a 1:1 mapping between these classes and tables

But all ORMs do one very simple thing. Ultimately, they take rows from your tables and materialise them as objects in your class model and vice-versa.

A very nice overview of different ORMs has been compiled on the Vertabelo blog, recently, by the way.

Tables and classes are the same thing

Give or take 1-2 implementation details, an RDBMS’s table and an OO language’s class is the same thing. A specification of a set of grouped attributes, each with their associated type. Consider the following example, using SQL and Java:

SQL

CREATE TABLE author (
  first_name VARCHAR(50),
  last_name VARCHAR(50)
);

Java

class Author {
  String firstName;
  String lastName;
}

There is absolutely no conceptual difference between the two – the mapping is straightforward. The mapping is even straightforward when you consider “relations” / “compositions” between different entities / types:

SQL (let’s leave away constraints for simplicity)

CREATE TABLE author (
  id BIGINT,
  first_name VARCHAR(50),
  last_name VARCHAR(50)
);

CREATE TABLE book (
  id BIGINT,
  author_id BIGINT,
  title VARCHAR(50),
);

Java

class Author {
  Long id;
  String firstName;
  String lastName;
  Set<Book> books;
}

class Book {
  Long id;
  Author author;
  String title;
}

The implementation details are omitted (and probably account for half of the criticism). But omitting further details allows for straight-forward 1:1 mapping of individual rows from your database to your Java model, without any surprises. Most ORMs – in the Java ecosystem Hibernate in particular – have managed to implement the above idea very well, hiding away all the technical details of actually doing such a model transfer between the RDBMS and Java.

In other words:

There is absolutely nothing wrong with this mapping approach!

Yet: There *IS* an impedance mismatch, somewhere

The “problems” that many bloggers criticise arise not from the non-existing mismatch between the two model representations (“relational” vs. “hierarchical”). The problems arise from SQL, which is a decent implementation of relational algebra.

In fact, the very same mismatch that everyone criticises is also present between:

Relational algebra has been defined in order to be able to query relations and to form new ad-hoc relations as an output of such queries. Depending on the operations and transformations that are applied, the resulting tuples may have absolutely nothing to do with the individual entities involved in a query. In other, ORM-y words: The product of relational algebra, and in particular of SQL has no use, as it can no longer be further processed by the ORM, let alone persisted back into the database.

To make things “worse”, SQL today is a large super-set of the features offered by relational algebra. It has gotten much more useful than when it was conceived.

Why this mismatch still affects modern ORMs

The previous paragraphs outlined the single main reason why ORMs are really criticised, even if such criticism often doesn’t mention this exact reason:

SQL / relational algebra is not really appropriate to partially materialise relations into a client / store changes back into the database. Yet, most RDBMS offer only SQL for that job.

Back to the author / book example. When you want to load and display an author and their books to a web application’s user, you’d like to simply fetch that author and their books, call simple methods like author.add(book) as well as author.remove(book) and let some magic flush your data back into the storage system.

Thinking about the amount of SQL code to be written for such a simple CRUD task makes everyone squeal.

tweet thisLife’s too short to spend time on CRUD

Perhaps QUEL might have been a better language for CRUD, but that ship has sailed. And unfortunately, because of SQL being an inappropriate language for this job, you cannot ignore that “magic” but have to know well what happens behind the scenes, e.g. by tweaking Hibernate’s fetching strategies.

Translated to SQL, this may be implemented in several ways:

1. Fetching with JOIN

Using outer joins, all the involved entities can be queried in one go:

SELECT author.*, book.*
FROM author
LEFT JOIN book ON author.id = book.author_id
WHERE author.id = ?

Advantages:

  • A single query can be issued and all the data can be transferred at once

Disadvantages:

  • The author attributes are repeated in every tuple. The client (ORM) has to de-duplicate authors first, before populating the author-book relationship. This can be particularly bad when you have many nested relations that should be fetched at once.

2. Fetching with SELECT

A single query is issued for each entity:

SELECT *
FROM author
WHERE id = ?

SELECT *
FROM book
WHERE author_id = ?

Advantages:

  • The amount of data to be transferred is minimal: Each row is transferred exactly once.

Disadvantages:

  • The amount of queries that are issued may explode into the well-known N+1 problem.

Hibernate in particular knows other types of fetch strategies, although they are essentially a variant / optimisation of one of the above.

Why not use SQL MULTISET?

The ideal way to fetch all data in this case using advanced SQL would be by using MULTISET:

SELECT author.*, MULTISET (
  SELECT book.*
  FROM book
  WHERE book.author_id = author.id
) AS books
FROM author
WHERE id = ?

The above will essentially create a nested collection for each author:

first_name  last_name   books (nested collection)
--------------------------------------------------

Leonard     Cohen       title
                        --------------------------
                        Book of Mercy
                        Stranger Music
                        Book of Longing

Ernest      Hemingway   title
                        --------------------------
                        For Whom the Bell Tolls
                        The Old Man and the Sea

If you add another nested entity, it is easy to see how another MULTISET could allow for additionally nested data:

SELECT author.*, MULTISET (
  SELECT book.*, MULTISET (
    SELECT c.*
    FROM language AS t
    JOIN book_language AS bl
    ON c.id = bc.language_id
    AND book.id = bc.book_id
  ) AS languages
  FROM book
  WHERE book.author_id = author.id
) AS books
FROM author
WHERE id = ?

The outcome would now be along the lines of:

first_name  last_name   books
-----------------------------------------------------

Leonard     Cohen       title            languages
                        -----------------------------
                        Book of Mercy    language
                                         ------------
                                         en

                        Stranger Music   language
                                         ------------
                                         en
                                         de

                        Book of Longing  language
                                         ------------
                                         en
                                         fr
                                         es

Advantages:

  • A single query can materialise all eager-loaded rows with minimal bandwidth usage.

Disadvantages:

  • None.

Unfortunately, MULTISET is poorly supported by RDBMS.

MULTISET (as well as arrays and other collection types) have been introduced formally into the SQL standard as of SQL:2003, as a part of an initiative to embed OO features into the SQL language. Oracle, for instance, has implemented much of it, much like Informix did, or the lesser-known CUBRID (although using vendor-specific syntax).

Other databases like PostgreSQL allow for aggregating nested rows into typed arrays, which works the same way although with a bit more syntactic effort.

MULTISET and other ORDBMS SQL features are the perfect compromise, allowing for combining the best of the “relational” model with the best of the “hierarchical” model. Allowing for combining CRUD operations with querying in one go, removing the need for sophisticated ORMs, as the SQL language can be used directly to map all your data from your (relational) database to your (hierarchical) client representation with no friction.

Conclusion and call to action!

We’re living through exciting times in our industry. The elephant (SQL) in the room is still here, learning new tricks all the time. The relational model has served us well, and has been enriched with hierarchical models in various implementations. Functional programming is gaining traction, complementing object orientation in very useful ways.

Think of the glue, putting all these great technological concepts together, allowing for:

  • Storing data in the relational model
  • Materialising data in the hierarchical model
  • Processing data using functional programming

That awesome combination of techniques is hard to beat – we’ve shown how SQL and functional programming can work with jOOQ. All that’s missing – in our opinion – is better support for MULTISET and other ORDBMS features from RDBMS vendors.

Thus, we urge you, PostgreSQL developers: You’re creating one of the most innovative databases out there. Oracle is ahead of you in this area – but their implementation is too strongly tied to PL/SQL, which makes it clumsy. Yet, you’re missing out on one of the most awesome SQL feature sets. The ability to construct nested collections (not just arrays), and to query them efficiently. If you lead the way, other RDBMS will follow.

And we can finally stop wasting time talking about the object-relational impedance non-mismatch.

Why Staying in Control of Your SQL is so Important

Lots of blog posts and research papers are written about the topics of scaling up and scaling out. This interesting blog post, for instance, sheds some light on the two strategies with respect to physical maintenance costs, such as cooling and electricity consumption. Certainly non-negligible aspects for very large systems.

But before solving problems at a very big scale, consider much simpler SQL tuning mechanisms. Very often, when you have a bottleneck in your application, it is at the database layer. This fact is used by many NoSQL evangelists to promote their products, claiming that scaling out is much easier with NoSQL databases. This might be true, but ask yourself: Do you need a system that works under heavy load? Or is your bottleneck a performance bottleneck?

In other words: Do you have a 5’000’000-concurrent-users problem? Or do you have a request-takes-more-than-3-seconds problem? Because if you suffer from the latter, you probably do not need to scale out nor up. Your “traditional” architecture is probably quite fine, but your database / SQL queries aren’t. The popular use-the-index-luke.com website features an interesting article about the two top performance problems caused by ORM tools. They are:

  • The infamous N+1 selects problem
  • The hardly-known Index-Only Scan

Both problems result from the fact that popular ORMs generate SQL code in a way that can hardly be influenced by developers. People often claim that tools like Hibernate generate better SQL than the average developer. This is true to an extent that the average developer might never care to actually learn to write better SQL. Which in turn leads to the above problems.

Hibernate is very good at generating 70% of your application’s boring CRUD SQL. At the same time, Hibernate never claimed to be a replacement for SQL as you will have to resort to native SQL in 30% of the time. Should you then use Hibernate’s native SQL API? Or an alternative like jOOQ?

The important thing is to get back in control of your SQL when performance matters. Know your indexes. Know your database meta data. And use a tool that allows you to write precisely the SQL statement you want. Learning better SQL will help you save lots of money on operations costs as you might not need to scale out nor up.

People Managing to Correctly Spell “Moron” in a Blog Comment

The notorious ORM pro / con discussion heavily amuses me. I always find it very funny when people have passionate discussions about which solution is better, rather than discussing about which solution is better suited for the problem at hand. In the case of ORMs vs. plain SQL, obviously, no solution is simply better as both techniques have their merits. When comparing ORMs with jOOQ, I think that this page summarises it pretty well:
http://www.hibernate-alternative.com

Now, this article and most specifically, one answer is hilarious:
http://java.dzone.com/articles/defense-hand-coded-sql

While the article’s author is already asking for trouble, check out this one particular answer. I love it when people manage to correctly spell “moron”:

People who handwrite SQL are invariably morons.

Here’s what you miss out when using a good ORM with generated mappings:

– Automatic first and second level caching

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

– True vendor independence. Yes, I’m switching between six different db’s in our products with zero issues.

– I work with objects, not relation sets. That kinda makes sense in an oop language.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

– Any decent ORM understands and injects vendor specific query hints better than you.

Also, get a clue.

Here’s my adequate reply to the above:

OK, now this was amusing 🙂

– Automatic first and second level caching

This, obviously, is utterly impossible outside the world of ORMs.

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

True. No one has ever written a code generator before it was added to Hibernate.

– I work with objects, not relation sets. That kinda makes sense in an oop language

… which your DBA will probably always agree with. Remember to remind your manager why he bought that 1M$ Oracle license, when you run N+1 selects for fetching your OOP objects.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

Of course, there is always a black / white answer to “productivity”- questions. Like, how productively you can express a SQL:2003 MERGE statement with HQL. Or, how productively you can calculate a running total involving window functions, or maybe, recursive SQL with HQL.

– Any decent ORM understands and injects vendor specific query hints better than you.

That is indeed an amazing theory, which I was utterly unaware of.

The eternal debate between ORM lovers and haters. Mankind has always been this stupid.  Like the AC vs. DC discussion between Nikola Tesla and Thomas Edison

And, Eclipse will totally win over IntelliJ! 😉

High Complexity and Low Throughput. Reasons for Using an ORM.

I’ve recently stumbled upon an interesting blog post about when to use an ORM. I found it to be well-written and quite objective, specifically with respect to its model complexity and throughput diagram:

The ORM or not ORM topic will probably never stop showing up on blogs. Some of them are more black and white, such as Jeff Atwood’s Object-Relational Mapping is the Vietnam of Computer Science others are more “50 shades of data access”, such as Martin Fowler’s ORM Hate.

I’m personally impressed by the work ORMs have done for us in times when repetitive SQL started to get boring and CRUD was not yet established. But ORMs do have their caveats as they are indeed leaky abstractions.

The aforementioned article shows in what situations ORMs can pull their weight, and in what situations you better keep operating on a SQL level, using tools like jOOQ, MyBatis, Apache DbUtils, or just simply JDBC.

Read the original blog post here:
http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

Other related articles:

LINQ and Java

LINQ has been quite a successful, but also controversial addition to the .NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves:

Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases.

Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();

Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked!

Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#!

Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:

  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.

Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8’s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala

Martin Fowler on “The Vietnam of Computer Science”

It couldn’t be a better match by a more suited person for jOOQ. Martin Fowler expresses his feelings about SQL, NoSQL, Object-relational mapping on his blog post, which is copied on DZone:

http://java.dzone.com/articles/martin-fowler-orm-hate

Think about the following two slogans:

“The Vietnam of Computer Science” – “A Peace Treaty Between SQL and Java”

Swell, no? 🙂

ORM vs. SQL, compared to C vs. ASM

History is repeating itself. This is nothing new, but it takes wisdom (and Elephant memory) to remember when and how things had already happened in a similar way. When you feel that the whole SQL versus ORM debate is a bit boring and you may have seen it before, you’re probably right. It’s another religious war that can be compared to the war between C and Assembler some 40 years back. Think about it this way:

ORM vs. SQL

How many ORMs do you know that support:

  • Window functions
  • MERGE statement
  • Recursive queries
  • Common table expressions
  • Ordered aggregate functions
  • Statistical functions
  • Stored procedures (And I mean not just scalar ones!)
  • SQL-based XML
  • Temporary tables
  • … I could go on and on

ORMs are not just “higher-level” they are an entirely different paradigm, put on top of SQL. So, saying SQL is “lower-level” is not entirely accurate. It’s on a “different level”. ORMs are pretending that SQL is just a little “low-level” thing.

C vs. ASM

(citing reddit user henk53)

Which C compilers at the time made use of:

  • Advanced addressing modes
  • Opcode tricks
  • Special instructions (intrinsics came much later)
  • Jump tables (yeah, switch is close, but not quite it)
  • Build-in high level functions on some architectures (like string copy)
  • Page zero tricks
  • Segments (ugly things when all you have is small ones, but quite powerful when used fine-grained for ‘objects’ and structures)
  • Delay slot shadowing
  • Knowledge of fitting computations exactly in registers and L1.
  • … I could go on and on

C wasn’t just higher level, it was another paradigm, and yes, C compilers pretending that assembly is just a little “low-level” thing did raise our hackles back then. It threw away all the nice subtle intricacies of each specific CPU architecture and made them all equal by using only the common denominator of functionality that each CPU had.

It took a while, but eventually C compilers got better at optimizing than humans and could actually take advantage of advanced instructions in a CPU.

Original discussion

See an extract of the original thread here:
http://www.reddit.com/r/java/comments/sk25o/forget_hibernate_jooq_is_byfar_the_best_database/#c4f6z1w