There are already only very few real-world use-cases for FULL [ OUTER ] JOIN
, but maybe, you have run into this beast in the past. But when was the last time you’ve seen a NATURAL JOIN
? Right.
A quick reminder from our article about JOINs:
FULL JOIN
A FULL JOIN
is a type of OUTER JOIN
that retains data from both sides of the JOIN
operation, regardless if the JOIN
predicate matches or not. For example, querying the Sakila database:
SELECT first_name, last_name, title
FROM actor
FULL JOIN film_actor USING (actor_id)
FULL JOIN film USING (film_id)
The result will look something like this:
first_name last_name title --[ LEFT JOIN ] ------------------------- Jon Doe <-- Actors that didn't play in any film Claire Miller --[ INNER JOIN ] ------------------------ Samuel Jackson Django Unchained <-- Actors played in many films Samuel Jackson Pulp Fiction John Travolta Pulp Fiction <-- Films featured several actors --[ RIGHT JOIN ]------------------------- Meh <-- Films that don't have actors
Note the special “markers” that delimit the parts that would have also been contributed by LEFT JOIN
, INNER JOIN
, RIGHT JOIN
. In a way, a FULL JOIN
combines them all.
Handy, occasionally
NATURAL JOIN
This one is less obviously useful. It looks useful at first, because if we carefully design our schemas as in the Sakila database used above, then we can simplify our joins as such:
SELECT first_name, last_name, title
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film
Which is just syntax sugar for this:
SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (<all common columns>)
JOIN film USING (<all common columns>)
So, a NATURAL JOIN
will join two tables using all the columns they have in common. You’d think that this would be the useful outcome:
SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)
Unfortunately, the outcome looks more like this:
SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id, last_update)
JOIN film USING (film_id, last_update)
In other words: It makes no sense at all. In real world situations (and the Sakila database is already close enough to the real world), we’ll always have accidentally conflicting column names that will be considered for NATURAL JOIN
, when they hsould not be.
OK, so why NATURAL FULL JOIN?
… you’re asking. One is very rarely needed, and the other is pretty useless in the real world. Not so fast!
I recently helped a customer who had two very similar tables that both contained payments. Let’s call them: PAYMENT
and STANDING_ORDER
, the latter being a PAYMENT
template which generates actual PAYMENTs
periodically.
Now, there was some PL/SQL logic that operated on payments, which after some change request, needed to operate both on payments and/or on standing orders. If this was Java, we’d just (ab)use subtype polymorphism for the quick win, or factor out common logic into common component types for the thorough solution. For instance, these kinds of columns certainly appear in both cases:
“System” columns
ID
– The surrogate keyCREATED_AT
– Audit infoMODIFIED_AT
– Audit infoDELETED
– Soft deletion
“Business” columns
AMOUNT
CURRENCY
There are many more common columns. There are other columns that are specific to one or the other type:
Payment only
STANDING_ORDER_ID
– Only payments reference the standing order that may have created them.
Standing order only
PERIODICITY
– This is a standing order’s core feature.
Now, in order to refactor that PL/SQL logic, I needed a row type that combines all these columns easily. I didn’t want to manually track the exact columns of both tables, I wanted this to always work, even if the maintainers of the tables add/remove/rename columns.
NATURAL FULL JOIN to the rescue
I created a view like this:
CREATE VIEW like_a_payment
SELECT *
FROM payment
NATURAL FULL JOIN standing_order
WHERE 1 = 0
This view itself doesn’t do anything. It just sits there and produces the row type I need. Why? Because the NATURAL JOIN
will join by ALL the common columns, namely ID, CREATED_AT, MODIFIED_AT, DELETED, AMOUNT, CURRENCY
and then adds the columns from the left table, namely STANDING_ORDER_ID
, and then from the right table, namely PERIODICITY
.
I can now use this type in PL/SQL:
DECLARE
paym like_a_payment%ROWTYPE;
BEGIN
-- Do things...
END;
But why FULL?
True, I didn’t need FULL
yet, because I’m not producing any rows anyway. Correct. But observe how I will populate the PAYM
local variable in the PL/SQL block:
DECLARE
l_paym_id like_a_payment.id%TYPE;
l_paym like_a_payment%ROWTYPE;
BEGIN
l_paym_id := 1;
SELECT *
INTO l_paym
FROM (
SELECT * FROM payment WHERE id = l_paym_id
)
NATURAL FULL JOIN (
SELECT * FROM standing_order WHERE id = l_paym_id
);
END;
And I’m done! This could not have been done any more easily!
- I’m using
NATURAL
because now I have to use it all the time to populate the record (or select individual columns, but meh) - I’m using
FULL
because only one of the two queries will produce a result, and the remaining columns will beNULL
Excellent!
Some observations
Of course, this is kind of a hack. Please don’t abuse this too often. Here’s some hints to follow:
- The refactoring would have been substantial. I needed a quick win.
- The logic actually needs almost all the columns from the payments. Otherwise, you might want to review that
SELECT *
usage. - The derived tables are necessary in Oracle for performance reasons.
- The actual code contained two distinct queries, each one joining “one side” with an empty substitute query, as we know that the ID is either a payment or a standing order.
Having said so, now you know one of the very few real world use-cases of NATURAL FULL JOIN
. Put this in your code base and impress your coworkers!