PLAIN WRONG!Venn Diagrams are perfect to illustrate … actual set operations! SQL knows three of them:
EXCEPTare more exotic, but do come in handy every now and then. The point here is: these set operations operate on sets of elements (tuples), which are all of the same type. As in the examples above, all elements are people with first and last names. This is also why
EXCEPTare more exotic, because they’re usually not very useful.
JOINis much more useful. For instance, you want to combine the set of actors with their corresponding set of films. A
JOINis really a cartesian product (also cross product) with a filter. Here’s a nice illustration of a cartesian product:
So, what’s a better way to illustrate JOIN operations?
JOINdiagrams! Let’s look at
CROSS JOINfirst, because all other
JOINtypes can be derived from
CROSS JOIN: Remember, in a cross join (in SQL also written with a comma separated table list, historically) is just taking every item on the left side, and combines it with every item on the right side. When you
CROSS JOINa table of 3 rows with a table of 4 rows, you will get 3×4=12 result rows. See, I’m using an “x” character to write the multiplication. I.e. a “cross”.
INNER JOINAll other joins are still based on cross joins, but with additional filters, and perhaps unions. Here’s an explanation of each individual
JOINtype. In plain text, an
INNER JOINis a
CROSS JOINin which only those combinations are retained which fulfil a given predicate. For instance:
-- "Classic" ANSI JOIN syntax SELECT * FROM author a JOIN book b ON a.author_id = b.author_id -- "Nice" ANSI JOIN syntax SELECT * FROM author a JOIN book b USING (author_id) -- "Old" syntax using a "CROSS JOIN" SELECT * FROM author a, book b WHERE a.author_id = b.author_id
OUTER JOINtypes help where we want to retain those rows from either the
LEFTside or the
RIGHTor both (
FULL) sides, for which there was no matching row where the predicate yielded true. A
LEFT OUTER JOINin relational algebra is defined as such: Or more verbosely in SQL:
This will produce all the authors and their books, but if an author doesn’t have any book, we still want to get the author with NULL as their only book value. So, it’s the same as writing:
SELECT * FROM author a LEFT JOIN book b USING (author_id)
But no one wants to write that much SQL, so
SELECT * FROM author a JOIN book b USING (author_id) UNION SELECT a.*, NULL, NULL, NULL, ..., NULL FROM ( SELECT a.* FROM author a EXCEPT SELECT a.* FROM author a JOIN book b USING (author_id) ) a
OUTER JOINwas implemented.
Conclusion: Say NO to Venn Diagrams
JOINs are relatively easy to understand intuitively. And they’re relatively easy to explain using Venn Diagrams. But whenever you do that, remember, that you’re making a wrong analogy. A
JOINis not strictly a set operation that can be described with Venn Diagrams. A
JOINis always a cross product with a predicate, and possibly a
UNIONto add additional rows to the
OUTER JOINresult. So, if in doubt, please use
JOINdiagrams rather than Venn Diagrams. They’re more accurate and visually more useful. (Remember, all of these slides are taken from our Data Geekery SQL Training, do get in touch, if you’re interested)