How to Use SQL INTERSECT to Work Around SQL’s NULL Logic

ANOTHER SQL Post this week? I got nerd-sniped:

Oooooh, challenge accepted!

So, let’s assume we have a table T with columns (A, B, C) like this:

WITH t(a, b, c) AS (
  SELECT 'a', 'b', null FROM dual UNION ALL
  SELECT 'a', null, 'c' FROM dual UNION ALL
  SELECT 'a', 'b', 'c'  FROM dual
)
SELECT * FROM t

As expected, this yields:

A       B       C
-----------------
a       b
a               c
a       b       c

Truly exciting.

Now we want to find all those rows that “match” either ('a', 'b', NULL) or ('a', NULL, 'b'). Clearly, this should produce the first two rows, right?

A       B       C
-----------------
a       b
a               c

Yes. Now the canonical solution would be to tediously write out the entire predicate as such:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a = 'a' AND b = 'b' AND c IS NULL)
OR (a = 'a' AND b IS NULL AND c = 'c')

That’s really boring. Sure, we could have factored out the first, common predicate:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE a = 'a' AND (
     (b = 'b' AND c IS NULL)
  OR (b IS NULL AND c = 'c')
)

That’s certainly better from a performance perspective, but Rafael had a nifty idea. Let’s use row value expressions (tuples) in our predicates:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a, b, c) IN (('a', 'b', NULL), ('a', NULL, 'c'))

Unfortunately this doesn’t yield any results, because nothing is equal to NULL in SQL (not even NULL itself). The above query is the same as this one:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a = 'a' AND b = 'b' AND c = NULL /* oops */)
OR (a = 'a' AND b = NULL /* oops */ AND c = 'c')

D’oh.

Solutions

The lame one

The canonical solution then would be a really lame (but perfectly valid) one. Encode NULL to be some “impossible” string value. Rafael suggested yolo. Fair enough.

This works:

WITH t(a, b, c) AS (...)
SELECT * FROM t
WHERE (a, NVL(b, 'yolo'), NVL(c, 'yolo')) 
  IN (('a', 'b', 'yolo'), ('a', 'yolo', 'c'))

All we have to do now is to always remember the term yolo which means “NULL, but not NULL thank you SQL”

The hipster one

But wait! SQL is painstakingly inconsistent when it comes to NULL. See, NULL really means UNKNOWN in three-valued logic, and this means, we never know if SQL abides to its own rules.

Come in INTERSECT. Like UNION or EXCEPT (MINUS) in Oracle, as well as SELECT DISTINCT, these set operations handle two NULL values as NOT DISTINCT. Yes, they’re not equal but also not distinct. Whatever. Just remember: That’s how it is 🙂

So, we can write this hipster solution to Rafael’s problem:

WITH t(a, b, c) AS (...)
SELECT *
FROM t
WHERE EXISTS (
  SELECT a, b, c FROM dual
  INTERSECT (
    SELECT 'a', 'b', null FROM dual
    UNION ALL
    SELECT 'a', null, 'c' FROM dual
  )
)

We create an intersection of the tuple (a, b, c), the left side of Rafael’s IN predicate, and the desired values on the right side of the IN predicate, and we’re done.

Clearly less tedious than writing the original predicates, right? (We won’t look into performance this time)

Cheers, and a happy weekend.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s