In a recent debate about NoSQL vs. SQL on Hackernews, I was made aware of a quite amusing paper by Erik Meijer and Gavin Bierman. Remember, Erik Meijer has brought LINQ to the .NET universe, a formidable unified query DSL whose main purpose was to unify typesafe querying against XML, SQL, and object-oriented data structures.
It is important to note that LINQ surfaced shortly before the NoSQL hype really caught on, so NoSQL data structures (e.g. key / value stores, document stores, graph databases) were not yet in scope for LINQ, and LINQ providers might have had to tweak the odd implementation detail to fully match the LINQ API.
What Erik Meijer and Gavin Bierman are claiming in their article (which was also discussed intensively on Hackernews) is the fact that SQL and NoSQL are duals of each other, i.e. two sides of the same coin. In their quest to unify all query languages, the LINQ people would obviously love to simplify things to such a level. To us, who are focusing on SQL only via jOOQ, this seems more like a plea for LINQ than anything else. It would be all too nice if things were as easy as a simple duality, specifically given the fact that Erik Meijer has now also created a company called Applied Duality Inc… What we found very interesting, though, is Erik’s and Gavin’s hint about the relational model and other models inversing arrows between relationships. When child records point to parent records in the relational model, parent objects point to child objects in object-oriented design, or in XML.
But history will teach us where these things go. I currently don’t see a second E.F. Codd to solve the complexity introduced with the new abundance of NoSQL data stores – yet. But maybe, we’ll eventually remember Erik Meijer as the new E.F. Codd, bringing NoSQL to full circle.
Relational algebra has its treats. One of the most academic features is the relational division. It is hardly ever used, but comes in handy every now and then. And when you need it, you’ll probably hate yourself for having slept during the relevant classes at the university.
What is relational division?
Relational division is the inverse of a cross join operation. The following is an approximate definition of a relational division:
Assume the following cross join / cartesian product C = A × B Then it can be said that A = C ÷ B B = C ÷ A
What does it mean, typically?
Let’s have a look at the sample provided on Wikipedia:
This looks sensible. The division of Completed ÷ DBProject leads to a list of students that have completed all projects.
Now how to phrase that in SQL??
That’s not so simple as it looks. The most commonly documented solution involves a doubly-nested select statement using anti-joins. In human language (using double negative), it is like Fred and Sarah saying “there is no DBProject that we have not Completed“. Or in SQL:
SELECT DISTINCT "c1".Student FROM Completed "c1" WHERE NOT EXISTS ( SELECT 1 FROM DBProject WHERE NOT EXISTS ( SELECT 1 FROM Completed "c2" WHERE "c2".Student = "c1".Student AND "c2".Task = DBProject.Task ) )
Now, no one sane wants to remember this just for the 1-2 times in a SQL developer’s life that they actually need it. So they use jOOQ, which wraps up the above monster in a concise syntax:
create.select().from( Completed.divideBy(DBProject) .on(Completed.Task.equal(DBProject.Task)) .returning(Completed.Student) );
Note that from the above SQL statement, it is immediately clear that proper indexing is of the essence. Be sure to have indexes on all columns referenced from the on(…) and returning(…) clauses.
For more information about relational division and some nice, real-life examples, see