Say NO to Venn Diagrams When Explaining JOINs

In recent times, there have been a couple of tremendously popular blog posts explaining JOINs using Venn Diagrams. After all, relational algebra and SQL are set oriented theories and languages, so it only makes sense to illustrate set operations like JOINs using Venn Diagrams. Right?

Google seems to say so:

venn-google

Everyone uses Venn Diagrams to explain JOINs. But that’s…

PLAIN WRONG!

Venn Diagrams are perfect to illustrate … actual set operations! SQL knows three of them:

  • UNION
  • INTERSECT
  • EXCEPT

And they can be explained as such:

venn-union

venn-intersection

venn-difference

(all of these slides are taken from our Data Geekery SQL Training, do check it out!)

Most of you use UNION occasionally. INTERSECT and EXCEPT are more exotic, but do come in handy every now and then.

The point here is: these set operations operate on sets of elements (tuples), which are all of the same type. As in the examples above, all elements are people with first and last names. This is also why INTERSECT and EXCEPT are more exotic, because they’re usually not very useful. JOIN is much more useful. For instance, you want to combine the set of actors with their corresponding set of films.

A JOIN is really a cartesian product (also cross product) with a filter. Here’s a nice illustration of a cartesian product:

venn-cross-product

So, what’s a better way to illustrate JOIN operations?

JOIN diagrams! Let’s look at CROSS JOIN first, because all other JOIN types can be derived from CROSS JOIN:

venn-cross-join

Remember, in a cross join (in SQL also written with a comma separated table list, historically) is just taking every item on the left side, and combines it with every item on the right side. When you CROSS JOIN a table of 3 rows with a table of 4 rows, you will get 3×4=12 result rows. See, I’m using an “x” character to write the multiplication. I.e. a “cross”.

INNER JOIN

All other joins are still based on cross joins, but with additional filters, and perhaps unions. Here’s an explanation of each individual JOIN type.

venn-join

In plain text, an INNER JOIN is a CROSS JOIN in which only those combinations are retained which fulfil a given predicate. For instance:

-- "Classic" ANSI JOIN syntax
SELECT *
FROM author a
JOIN book b ON a.author_id = b.author_id

-- "Nice" ANSI JOIN syntax
SELECT *
FROM author a
JOIN book b USING (author_id)

-- "Old" syntax using a "CROSS JOIN"
SELECT *
FROM author a, book b
WHERE a.author_id = b.author_id

OUTER JOIN

OUTER JOIN types help where we want to retain those rows from either the LEFT side or the RIGHT or both (FULL) sides, for which there was no matching row where the predicate yielded true.

A LEFT OUTER JOIN in relational algebra is defined as such:

dd81ee1373d922122ce1b3e0da74cb28

Or more verbosely in SQL:

SELECT *
FROM author a
LEFT JOIN book b USING (author_id)

This will produce all the authors and their books, but if an author doesn’t have any book, we still want to get the author with NULL as their only book value. So, it’s the same as writing:

SELECT *
FROM author a
JOIN book b USING (author_id)

UNION

SELECT a.*, NULL, NULL, NULL, ..., NULL
FROM (
  SELECT a.*
  FROM author a
  
  EXCEPT
  
  SELECT a.*
  FROM author a
  JOIN book b USING (author_id)
) a

But no one wants to write that much SQL, so OUTER JOIN was implemented.

Conclusion: Say NO to Venn Diagrams

JOINs are relatively easy to understand intuitively. And they’re relatively easy to explain using Venn Diagrams. But whenever you do that, remember, that you’re making a wrong analogy. A JOIN is not strictly a set operation that can be described with Venn Diagrams. A JOIN is always a cross product with a predicate, and possibly a UNION to add additional rows to the OUTER JOIN result.

So, if in doubt, please use JOIN diagrams rather than Venn Diagrams. They’re more accurate and visually more useful.

venn-google-say-no

(Remember, all of these slides are taken from our Data Geekery SQL Training, do get in touch, if you’re interested)

34 thoughts on “Say NO to Venn Diagrams When Explaining JOINs

    • Oops, thanks for the hint! Haven’t thought of that. And apparently, neither have the creators of Microsoft Powerpoint (where I took the screenshots from) :-/

      • Great post, especially that set notation for the outer join 🙂

        I don’t think though that the labels are helpful. You are really implying a natural join here, and the alphabetic characters matched against numbers tends to negate one of your main points about a join just being a Cartesian products with a selection predicate acting as a “filter”.

        • Yes, colours in the second diagram seem to represent subrow values for equality tests of eqiJOIN ON, JOIN USING or NATURAL JOIN but not arbitrary JOINs. Given that limitation, rows would be better illustrated as boxes partitioned into don’t-care non-match column subrows and coloured match column subrows. Preferably with non-match column subrows of different lengths, and match column subrows of the same length, in left & right inputs. As to illustrating arbitrary match conditions, that requires indicating *matches* in CROSS JOIN output rows, not colouring of subrow values. Regardless of whether you illustrate arbitrary JOINs: For the colourblind, something should be 1:1 with colours, and right now it seems to be distance of numerals/letters from 1/A, and it’s not clear why both numerals & letters are used. Using numerals to identify colours & dropping letters, the first diagram’s ABC would be be 453 and the second’s 234. Although the first diagram’s inputs should be the same as the second diagram’s, to illustrate that non-CROSS JOINs are filtered from intermediate CROSS JOINs. Although the inputs should be something like 123 & 233 to also illustrate JOIN involving duplicate rows. Although the inputs should be something like 12N(ULL) & 22N to also illustrate JOIN involving rows with NULL(s) and NULL not matching anything. And probably all-NULL subrows shouldn’t be represented by *nothing*. Lukas: Try writing the *key* and *legend* for your diagrams. Currently the key involves things like, numerals and letters identify (left & right?) row values, and distances of numerals/letters from 1/A identify colours, while the legends limit the illustrations to equi/USING/NATURAL JOINs. Complicated.

  1. The point of using a Venn Diagram is for illustrative purposes. If I’m using a Venn Diagram when I’m showing someone Joins, its probably their first exposure to joins (or nearly first). As soon as I say cartesian product I’ve lost them, and the graphic is Orders of magnitude harder to under stand than the venn diagram. While it may be imperfect, for someone new it is a lot clearer and less likely to push them away or scare them off from SQL.

    • That inaccuracy causes so much damage later on. I’ll follow up on this post with an explanation why so many people misuse JOIN when they should be using SEMI JOIN. It’s precisely because of the lack of understanding of JOIN being a “fancy cartesian product”.

    • But Venn diagrams *do not illustrate join*. You will see this for yourself if you actually try to give a clear *legend* for what the various parts of the diagram denote. See my comment on the main post.

  2. If you want to dig into the results a bit more, I can see where this is helpful. The nice thing about the Venn diagram is the uniformity of the illustration. I don’t have to connect boxes to get a feel for what is included in the result. Interesting thought.

    • But this way of thinking works for equi join only. An INNER JOIN, for instance, may still yield a cartesian product, if not joining by a primary key / foreign key relationship. This will then be completely unexpected, when following only the Venn Diagram approach.

  3. I also hate using Venn diagrams for explaning SQL joins. The set-bag discrepancy is one of the reasons. The other one is multiple joins. Venn diagrams completely break for anything above 3 sets/tables (2 joins) and may confuse you more than actually explain things.

  4. I don’t disagree, but I have one of the Venn diagrams on my desk for the quick reference. I know I can’t treat is as law, but it helps me out when I can’t remember if I want a left or right join.

  5. Disagree, the Venn diagrams are less confusing and more descriptive visually to this visually-oriented thinker, at least. Maybe if you’re a math expert it’s both clearer and preferable to you, but I’m not by any stretch, so I find the Venn diagrams easier to digest in describing joins.

  6. I face palm when I see all the replies saying um yeah what you said but Venn diagrams are much easier to understand!… First of all, who are you explaining joins to.. Some developer right? Is it too much to ask that people who are paid professionals building systems for $$$ should understand how these tools exactly work?

  7. Nice article, venn diagrams never really made me visualize the result completely, although they helped. But Join diagrams make more sense, always good for a quick reference on syntax.

  8. You made a good point, and are right about JOIN being cartesian product with filters.

    But you should also notice that in all of your examples, the join conditions (filters, in your terms) are based on author_id. We are joining heterogeneous tuples based on identifiers of the exactly same type – the foreign key.

    And that’s where the Venn diagram comes in.

    What it represents is how joins work on two sets of foreign keys, and considering that we almost always want to join over foreign keys (and how exotic unfiltered CROSS JOINs are), it is an acceptable generalization for me.

  9. I think your diagrams are very good but can be improved.

    Please add an arrow between the joined “rows” (I guess colored boxes in your diagram)

    A two headed arrow for an inner join.

    A left headed arrow for a left join.

    A right headed arrow for a right join

    No arrow for full join.

    Not only does this make the diagram clearer it ALSO explains the names.

    • Interesting idea, thanks for suggesting. I guess it is a matter of style. But how about this: You copy the image, adapt it accordingly, and re-publish it as a blog post with comments on your improvements. This way, we can “beat” Venn diagrams on Google searches 🙂

  10. Although Venn diagrams are unsuitable for explaining inner join, *one* diagram under a suitable interpretation is useful for comparing inner, left outer, right outer & full outer join: a left and right circle that are the tuples returned from left and right outer joins respectively. See my comments on various questions & answers at http://stackoverflow.com/questions/38549/difference-between-inner-and-outer-joins and http://stackoverflow.com/questions/17759687/cross-join-vs-inner-join-in-sql-server-2008/25957600#25957600 . PS I came here via http://www.dbdebunk.com/2016/07/this-week_10.html#more .

  11. PS 1 And of course Venn diagrams are appropriate for UNION/INTERSECT/EXCEPT.
    PS 2 And of course Venn diagrams only work for SQL tables that are *sets*. (The way to give SQL semantics–*bag* semantics–for JOINs correctly & clearly (visually or not) is via INNER JOIN output rows as 1:1 with matches of a row from the left input & a row from the right input, and via OUTER JOIN additional output rows as 1:1 with NULL-extended unmatched rows from left/right/both inputs per LEFT/RIGHT/FULL.)
    PS 3 The wikipedia entries for Venn diagram & Euler diagram address their use & abuse.
    PS TL;DR Those who think that their Venn (which are often not Venn but Euler) diagrams illustrate how a JOIN’s output is based on its inputs (for sets or bags) should (try to) write the *key* by which to read them and mechanically convert between a diagram and its code. Those people will find that their conversions & diagrams are not straightforward and not illustrating.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s