Use NATURAL FULL JOIN to compare two tables in SQL

There are a few ways to compare two similar tables in SQL. Assuming PostgreSQL syntax, we might have this schema:

CREATE TABLE t1 (a INT, b INT, c INT);
CREATE TABLE t2 (a INT, b INT, c INT);
INSERT INTO t1 VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9);
INSERT INTO t2 VALUES            (4, 5, 6), (7, 8, 9), (10, 11, 12);

It is now possible to use UNION and EXCEPT as suggested by Chris Saxon:

In PostgreSQL, we can write:

(TABLE t1 EXCEPT TABLE t2) 
UNION
(TABLE t2 EXCEPT TABLE t1)
ORDER BY a, b, c

Notice how TABLE x is just standard SQL, and PostgreSQL, syntax sugar for SELECT * FROM x.

And we’ll get:

a |b |c |
--|--|--|
 1| 2| 3|
10|11|12|

Unfortunately, this requires two accesses to each table. Can we do it with a single access?

Using NATURAL FULL JOIN

Yes! Using NATURAL FULL JOIN, another rare use-case for this esoteric operator.

Assuming there are no NULL values, we can write this:

SELECT *
FROM (
  SELECT 't1' AS t1, t1.* FROM t1
) t1 NATURAL FULL JOIN (
  SELECT 't2' AS t2, t2.* FROM t2
) t2 
WHERE NOT (t1, t2) IS NOT NULL;

This produces:

a |b |c |t1|t2|
--|--|--|--|--|
 1| 2| 3|t1|  |
10|11|12|  |t2|

Why? Because a NATURAL JOIN is syntax sugar for joining using all the shared column names of the two tables, and the FULL JOIN makes sure we can retrieve also the columns that are not matched by the join predicate. Another way to write this is:

-- Use JOIN .. USING, instead of NATURAL JOIN
SELECT *
FROM (
  SELECT 't1' AS t1, t1.* FROM t1
) t1 FULL JOIN (
  SELECT 't2' AS t2, t2.* FROM t2
) t2 USING (a, b, c)
WHERE NOT (t1, t2) IS NOT NULL;

Or:

-- Use JOIN .. ON, instead of JOIN .. USING
SELECT 
  coalesce(t1.a, t2.a) AS a,
  coalesce(t1.b, t2.b) AS b,
  coalesce(t1.c, t2.c) AS c,
  t1.t1,
  t2.t2
FROM (
  SELECT 't1' AS t1, t1.* FROM t1
) t1 FULL JOIN (
  SELECT 't2' AS t2, t2.* FROM t2
) t2 ON (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c) 
WHERE NOT (t1, t2) IS NOT NULL;

Unfortunately, as of PostgreSQL 12, this produces an error:

ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

Pros and cons

Pros and cons compared to the set operator solution using UNION and EXCEPT:

Pros

  • Each table is accessed only once
  • Comparison is now name based, not column index based, i.e. it can still work if only parts of the columns are the shared

Cons

  • If index based column comparison was desired (because the tables are the same structurally, but do not share the exact same column names), then we’d have to rename each individual column to a common column name.
  • If there’s duplicate data, there’s going to be a cartesian product, which might make this solution quite slower
  • UNION and EXCEPT treat NULL values as “not distinct”. This isn’t the case with NATURAL JOIN. See workaround below

When there are NULL values in the data

In the presence of NULL values, we can no longer use NATURAL JOIN or JOIN .. USING. We could use the DISTINCT predicate:

SELECT 
  coalesce(t1.a, t2.a) AS a,
  coalesce(t1.b, t2.b) AS b,
  coalesce(t1.c, t2.c) AS c,
  t1.t1,
  t2.t2
FROM (
  SELECT 't1' AS t1, t1.* FROM t1
) t1 FULL JOIN (
  SELECT 't2' AS t2, t2.* FROM t2
) t2 ON (t1.a, t1.b, t1.c) IS NOT DISTINCT FROM (t2.a, t2.b, t2.c) 
WHERE NOT (t1, t2) IS NOT NULL;

Row value expression NULL predicate

Observe the usage of the esoteric NULL predicate for row value expressions, which uses the following truth table:

+-----------------------+-----------+---------------+---------------+-------------------+
| Expression            | R IS NULL | R IS NOT NULL | NOT R IS NULL | NOT R IS NOT NULL |
+-----------------------+-----------+---------------+---------------+-------------------+
| degree 1: null        | true      | false         | false         |  true             |
| degree 1: not null    | false     | true          | true          |  false            |
| degree > 1: all null  | true      | false         | false         |  true             |
| degree > 1: some null | false     | false         | true          |  true             |
| degree > 1: none null | false     | true          | true          |  false            |
+-----------------------+-----------+---------------+---------------+-------------------+

Yes. R IS NULL and NOT R IS NOT NULL are not the same thing in SQL…

It’s just another way of writing:

SELECT *
FROM (
  SELECT 't1' AS t1, t1.* FROM t1
) t1 NATURAL FULL JOIN (
  SELECT 't2' AS t2, t2.* FROM t2
) t2 
WHERE t1 IS NULL
OR t2 IS NULL;

4 thoughts on “Use NATURAL FULL JOIN to compare two tables in SQL

  1. Outstanding query :) I allowed myself to link to it on SO: https://stackoverflow.com/questions/4602083/sql-compare-data-from-two-tables

    One thing I would just add one note that it is not exactly the same as

    (TABLE t1 EXCEPT TABLE t2) 
    UNION
    (TABLE t2 EXCEPT TABLE t1)
    ORDER BY a, b, c
    

    when NULL is involved.

    For instance if we add to both tables (13,14,NULL) the output will be different because NULL = NULL yields UKNOWN.

    https://dbfiddle.uk/?rdbms=postgres_13&fiddle=30b7428040e974c5260e14ae53e24378

  2. Great stuff Lukas, never realized you could do this :)

    Just one thing, the where clause for the final query (NOT (t1, t2) IS NOT NULL equivalent) should be:

    WHERE ( t1 IS NULL OR t2 IS NULL )

    i.e. remove the NOTs, right?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.