When you’re spoiled with Oracle’s fabulous query transformation capabilities and its really well-done cost-based optimiser, then you might forget how difficult SQL query tuning used to be in the “old days” or with those less sophisticated databases. Here’s a really nice explanation of the various means of implementing an ANTI-JOIN in MySQL:
Relational algebra has its treats. One of the most academic features is the relational division. It is hardly ever used, but comes in handy every now and then. And when you need it, you’ll probably hate yourself for having slept during the relevant classes at the university.
What is relational division?
Relational division is the inverse of a cross join operation. The following is an approximate definition of a relational division:
Assume the following cross join / cartesian product C = A × B Then it can be said that A = C ÷ B B = C ÷ A
What does it mean, typically?
Let’s have a look at the sample provided on Wikipedia:
This looks sensible. The division of Completed ÷ DBProject leads to a list of students that have completed all projects.
Now how to phrase that in SQL??
That’s not so simple as it looks. The most commonly documented solution involves a doubly-nested select statement using anti-joins. In human language (using double negative), it is like Fred and Sarah saying “there is no DBProject that we have not Completed“. Or in SQL:
SELECT DISTINCT "c1".Student FROM Completed "c1" WHERE NOT EXISTS ( SELECT 1 FROM DBProject WHERE NOT EXISTS ( SELECT 1 FROM Completed "c2" WHERE "c2".Student = "c1".Student AND "c2".Task = DBProject.Task ) )
Now, no one sane wants to remember this just for the 1-2 times in a SQL developer’s life that they actually need it. So they use jOOQ, which wraps up the above monster in a concise syntax:
create.select().from( Completed.divideBy(DBProject) .on(Completed.Task.equal(DBProject.Task)) .returning(Completed.Student) );
Note that from the above SQL statement, it is immediately clear that proper indexing is of the essence. Be sure to have indexes on all columns referenced from the on(…) and returning(…) clauses.
For more information about relational division and some nice, real-life examples, see