Impress Your Coworkers With the Incredible NATURAL FULL OUTER JOIN!

There are already only very few real-world use-cases for FULL [ OUTER ] JOIN, but maybe, you have run into this beast in the past. But when was the last time you’ve seen a NATURAL JOIN? Right.

A quick reminder from our article about JOINs:

FULL JOIN

A FULL JOIN is a type of OUTER JOIN that retains data from both sides of the JOIN operation, regardless if the JOIN predicate matches or not. For example, querying the Sakila database:

SELECT first_name, last_name, title
FROM actor
FULL JOIN film_actor USING (actor_id)
FULL JOIN film USING (film_id)

The result will look something like this:

first_name   last_name   title
--[ LEFT JOIN ] -------------------------
Jon          Doe                          <-- Actors that didn't play in any film
Claire       Miller                      
--[ INNER JOIN ] ------------------------
Samuel       Jackson     Django Unchained <-- Actors played in many films
Samuel       Jackson     Pulp Fiction
John         Travolta    Pulp Fiction     <-- Films featured several actors
--[ RIGHT JOIN ]-------------------------
                         Meh              <-- Films that don't have actors

Note the special “markers” that delimit the parts that would have also been contributed by LEFT JOIN, INNER JOIN, RIGHT JOIN. In a way, a FULL JOIN combines them all.

Handy, occasionally

NATURAL JOIN

This one is less obviously useful. It looks useful at first, because if we carefully design our schemas as in the Sakila database used above, then we can simplify our joins as such:

SELECT first_name, last_name, title
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film

Which is just syntax sugar for this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (<all common columns>)
JOIN film USING (<all common columns>)

So, a NATURAL JOIN will join two tables using all the columns they have in common. You’d think that this would be the useful outcome:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

Unfortunately, the outcome looks more like this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id, last_update)
JOIN film USING (film_id, last_update)

In other words: It makes no sense at all. In real world situations (and the Sakila database is already close enough to the real world), we’ll always have accidentally conflicting column names that will be considered for NATURAL JOIN, when they hsould not be.

OK, so why NATURAL FULL JOIN?

… you’re asking. One is very rarely needed, and the other is pretty useless in the real world. Not so fast!

I recently helped a customer who had two very similar tables that both contained payments. Let’s call them: PAYMENT and STANDING_ORDER, the latter being a PAYMENT template which generates actual PAYMENTs periodically.

Now, there was some PL/SQL logic that operated on payments, which after some change request, needed to operate both on payments and/or on standing orders. If this was Java, we’d just (ab)use subtype polymorphism for the quick win, or factor out common logic into common component types for the thorough solution. For instance, these kinds of columns certainly appear in both cases:

“System” columns

  • ID – The surrogate key
  • CREATED_AT – Audit info
  • MODIFIED_AT – Audit info
  • DELETED – Soft deletion

“Business” columns

  • AMOUNT
  • CURRENCY

There are many more common columns. There are other columns that are specific to one or the other type:

Payment only

  • STANDING_ORDER_ID – Only payments reference the standing order that may have created them.

Standing order only

  • PERIODICITY – This is a standing order’s core feature.

Now, in order to refactor that PL/SQL logic, I needed a row type that combines all these columns easily. I didn’t want to manually track the exact columns of both tables, I wanted this to always work, even if the maintainers of the tables add/remove/rename columns.

NATURAL FULL JOIN to the rescue

I created a view like this:

CREATE VIEW like_a_payment
SELECT *
FROM payment
NATURAL FULL JOIN standing_order
WHERE 1 = 0

This view itself doesn’t do anything. It just sits there and produces the row type I need. Why? Because the NATURAL JOIN will join by ALL the common columns, namely ID, CREATED_AT, MODIFIED_AT, DELETED, AMOUNT, CURRENCY and then adds the columns from the left table, namely STANDING_ORDER_ID, and then from the right table, namely PERIODICITY.

I can now use this type in PL/SQL:

DECLARE
  paym like_a_payment%ROWTYPE;
BEGIN
  -- Do things...
END;

But why FULL?

True, I didn’t need FULL yet, because I’m not producing any rows anyway. Correct. But observe how I will populate the PAYM local variable in the PL/SQL block:

DECLARE
  l_paym_id like_a_payment.id%TYPE;
  l_paym    like_a_payment%ROWTYPE;
BEGIN
  l_paym_id := 1;

  SELECT *
  INTO l_paym
  FROM (
    SELECT * FROM payment WHERE id = l_paym_id
  )
  NATURAL FULL JOIN (
    SELECT * FROM standing_order WHERE id = l_paym_id
  );
END;

And I’m done! This could not have been done any more easily!

  • I’m using NATURAL because now I have to use it all the time to populate the record (or select individual columns, but meh)
  • I’m using FULL because only one of the two queries will produce a result, and the remaining columns will be NULL

Excellent!

Some observations

Of course, this is kind of a hack. Please don’t abuse this too often. Here’s some hints to follow:

  • The refactoring would have been substantial. I needed a quick win.
  • The logic actually needs almost all the columns from the payments. Otherwise, you might want to review that SELECT * usage.
  • The derived tables are necessary in Oracle for performance reasons.
  • The actual code contained two distinct queries, each one joining “one side” with an empty substitute query, as we know that the ID is either a payment or a standing order.

Having said so, now you know one of the very few real world use-cases of NATURAL FULL JOIN. Put this in your code base and impress your coworkers!

Leave a Reply