When people talk about SQL JOIN, they often use Venn Diagrams to illustrate inclusion and exclusion of the two joined sets:
While these Venn diagrams are certainly useful to understand (and remember) SQL JOIN syntax, they’re not entirely accurate, because SQL JOIN is a special type of a cartesian product, the CROSS JOIN.

In a cartesian product between two sets A and B, the result is the multiplication of each set, meaning that each element a ∈ A
is combined with each element b ∈ B
to form a set of tuples (a, b)
.
Ordinary SQL JOINs do precisely this. When you join BOOK to AUTHOR, you will probably get a combination of every author ∈ AUTHOR
with each book ∈ BOOK
, such that for each combination (author, book)
, the author actually wrote the book.
The true meaning of Venn diagrams
The true meaning of Venn diagrams is much better described by the operations
UNION
INTERSECT
EXCEPT
(orMINUS
in Oracle)
In the following sections, we’ll see that these operations match exactly the semantics of operations that can be illustrated by Venn diagrams, even if you will be able to “abuse” JOIN
operations to achieve the same result.
UNION
The UNION
operation is the most well-known among these set operations. It is often also referred to as “concatenation” of two sets of tuples, where the result is the concatenation of a set B
to a set A
.
In the following example, we’ll see that we might be interested in all the different people from our database, given their first and last names, regardless if they’re customer
or staff
:
The original Venn diagrams used FULL OUTER JOIN
to model the “same” concept, although the two things are not strictly same. Consider the following query, which we’ll run against the Sakila database:
SELECT first_name, last_name FROM customer UNION SELECT first_name, last_name FROM staff ORDER BY first_name, last_name
The result looks like:
first_name last_name ------------------------------------ AARON SELBY ADAM GOOCH ADRIAN CLARY AGNES BISHOP ALAN KAHN ALBERT CROUSE ALBERTO HENNING ALEX GRESHAM ALEXANDER FENNELL ALFRED CASILLAS ALFREDO MCADAMS ALICE STEWART ALICIA MILLS ...
Now, run the following “equivalent” query (it isn’t actually equivalent with respect to duplicates):
SELECT first_name, last_name FROM customer FULL OUTER JOIN staff USING (first_name, last_name) ORDER BY first_name, last_name
The result will again yield:
first_name last_name ------------------------------------ AARON SELBY ADAM GOOCH ADRIAN CLARY AGNES BISHOP ALAN KAHN ALBERT CROUSE ALBERTO HENNING ...
This only works because we’re using the USING
clause, which not every database supports natively. If we did our JOIN with the more commonly used ON clause, we’d have to write the more tedious:
SELECT COALESCE(c.first_name, s.first_name) AS first_name, COALESCE(c.last_name, s.last_name) AS last_name FROM customer c FULL OUTER JOIN staff s ON (c.first_name, c.last_name) = (s.first_name, s.last_name) ORDER BY first_name, last_name
In this case, most people probably default to using UNION
already, as it is a much better known operation than FULL OUTER JOIN
.
All of jOOQ’s currently supported RDBMS support UNION
and UNION ALL
(the latter doesn’t remove duplicates).
In the following, we’ll see that equivalent comparisons can be made with other set operations:
INTERSECT
The INTERSECT
operation is really useful when you want to keep only those tuples that are present in both sets that are combined using INTERSECT
:
As you can see, we may want to retain only those customers that are also actors. Let’s run this query:
SELECT first_name, last_name FROM customer INTERSECT SELECT first_name, last_name FROM actor
first_name last_name ------------------------------------ JENNIFER DAVIS
One of our customers is also an actor. The same query could have been written with an INNER JOIN
as such:
SELECT first_name, last_name FROM customer INNER JOIN actor USING (first_name, last_name)
… or with the ON
syntax
SELECT c.first_name, c.last_name FROM customer c INNER JOIN actor a ON (c.first_name, c.last_name) = (a.first_name, a.last_name)
This time, no COALESCE
is needed, as INNER JOIN
retains only those tuples from the cartesian product, which are present on “both sides” of the JOIN
, so we can pick any of the tables to prefix our columns.
You may even decide to use a semi-join instead, which would yield the same results:
SELECT first_name, last_name FROM customer WHERE (first_name, last_name) IN ( SELECT first_name, last_name FROM actor )
or, using the more verbose, yet equivalent EXISTS
predicate:
SELECT first_name, last_name FROM customer c WHERE EXISTS ( SELECT 1 FROM actor a WHERE (c.first_name, c.last_name) = (a.first_name, a.last_name) )
All of the above, again, yield:
first_name last_name ------------------------------------ JENNIFER DAVIS
EXCEPT
The EXCEPT
operation is useful when you want to keep only those tuples that are present in one set, but not in another:
Running this query:
SELECT first_name, last_name FROM customer EXCEPT SELECT first_name, last_name FROM staff ORDER BY first_name, last_name
… will yield:
first_name last_name ------------------------------------ AARON SELBY ADAM GOOCH ADRIAN CLARY AGNES BISHOP ALAN KAHN ALBERT CROUSE ALBERTO HENNING ...
According to the original Venn diagrams, this can be tweaked using LEFT JOIN
and a IS NULL
predicate:
SELECT first_name, last_name FROM customer LEFT JOIN staff USING (first_name, last_name) WHERE staff_id IS NULL ORDER BY first_name, last_name
or with an ON
clause:
SELECT c.first_name, c.last_name FROM customer c LEFT JOIN staff s ON (c.first_name, c.last_name) = (s.first_name, s.last_name) WHERE staff_id IS NULL ORDER BY first_name, last_name
This is completely unreadable and doesn’t communicate the fact that we’re removing tuples from a set CUSTOMER
, given their presence in another set STAFF
.
An equivalent version using anti-join might be more readable (watch out for NULLs
in NOT IN
predicates, though!):
SELECT c.first_name, c.last_name FROM customer c WHERE (first_name, last_name) NOT IN ( SELECT first_name, last_name FROM staff ) ORDER BY first_name, last_name
… or, using NOT EXISTS
:
SELECT c.first_name, c.last_name FROM customer c WHERE NOT EXISTS ( SELECT 1 FROM staff s WHERE (c.first_name, c.last_name) = (s.first_name, s.last_name) ) ORDER BY first_name, last_name
Conclusion
UNION
, INTERSECT
, and EXCEPT
are very simple, yet very useful operations that can add a lot of value every now and then in your daily SQL tasks. While JOIN
operations are much more versatile, they are also more complex for the simple tasks that can be solved by UNION
, INTERSECT
, and EXCEPT
Did you like this article? It’s part of the Data Geekery SQL Training – a 1-day workshop helping you to get the most out of the awesome SQL language.
Read more articles about awesome SQL here:
- SQL incompatibilities: NOT IN and NULL values
- How to use SQL PIVOT to Compare Two Tables in Your Database
- Common SQL Clauses and Their Equivalents in Java 8 Streams
- The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions
- Do You Really Understand SQL’s GROUP BY and HAVING clauses?
- Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()
Excellent explanation – thank you :)
Is there any difference regarding performance?
It always depends (mostly on the database vendor). In “heavy” queries, JOINs tend to outperform set operations – especially in SQL Server, perhaps a bit less in Oracle. However, I recommend you do measurements yourself on your actual production environment to see the difference.
Nice points, wish you would point out the oft-ignored huge difference between UNION and UNION ALL
I didn’t want to summon the wrath of C. J. Date and the war between relational sets and sql multisets in this article :)
Great blog post! Really enjoyed seeing the different versions of each query.
Nice article but I resultsets from a UNION and a FULL OUTER join are not “equivalent” – it can be in some data constellations but in general it is not. These two behave differently regarding duplicates.
Thanks for your comment. “Obviously” the parentheses meant exactly that. ;-). I’ll fix that with an explicit comment.
I suggest you avoid the use of ordinal references in ORDER BY. Suppose some well-meaning developer is tasked with adding c.Middle_Name between c.first_name and c.last_name. Perhaps they are asked to simply change the order of the existing columns in the SELECT. If they do not also change the ORDER BY clause the query will attain an entirely different meaning. Explicit column references avoid that problem.
Thanks for your suggestion, Bryant. Sure, they’re not necessary in this post, I’ve replaced them by column name references.
The sad thing is that in general, ordinal references tend to work better across all dialects than referencing aliases, especially when expressions are projected, or unions are involved, see e.g. https://stackoverflow.com/q/25387951/521799