SQL incompatibilities: NOT IN and NULL values

This is something where many hours of debugging have been spent in the lives of many SQL developers. The various situations where you can have NULL values in NOT IN predicates and anti-joins. Here’s a typical situation:

with data as (
  select 1 as id from dual union all
  select 2 as id from dual union all
  select 3 as id from dual
)
select * from data
where id not in (1, null)

What do you think this will return? Well, since “dual” indicates an Oracle database, you might say: “an empty result set”. And you would be right for Oracle. In fact, you would be right for any of these databases:

  • DB2
  • Derby
  • H2
  • Ingres
  • Oracle
  • Postgres
  • SQL Server
  • SQLite
  • Sybase

BUT! You would be wrong for any of these ones:

  • HSQLDB
  • MySQL
  • Sybase ASE

Why the discrepancy?

Intuitively, you’d say that all the big ones treat NULL specially in NOT IN predicates, and it is easy to understand, why:

-- This predicate here...
id not in (1, null)

-- Could be seen as equivalent to this one:
id != 1 and id != null

There’s no id that fulfills the above predicate id != null (not even null itself), hence an empty result set. MySQL is known for some strong abuse of SQL standards compliance, so it’s not surprising that they tweaked this syntax as well.

But wait!

HSQLDB 2.0 is one of the most standards-compliant databases out there, could they really have gotten it wrong? Let’s consider the standard: SQL 1992, chapter 8.4 <in predicate>:

<in predicate> ::=
   <row value constructor>
      [ NOT ] IN <in predicate value>

<in predicate value> ::=
   <table subquery>
      | <left paren> <in value list> <right paren>

<in value list> ::=
   <value expression> { <comma> <value expression> }...

And then, further down:

2) Let RVC be the <row value constructor> and 
   let IPV be the <in predicate value>.

3) The expression
     RVC NOT IN IPV

   is equivalent to
     NOT ( RVC IN IPV )

4) The expression
     RVC IN IPV

   is equivalent to
     RVC = ANY IPV

So in fact, this can be said:

ID NOT IN (1, NULL) is equivalent to
NOT (ID IN (1, NULL)), equivalent to
NOT (ID = ANY(1, NULL)), equivalent to
NOT (ID = 1 OR ID = NULL), equivalent to
NOT (ID = 1) AND NOT (ID = NULL), which is always UNKNOWN

Conclusion

It looks for once, that HSQLDB 2.0 is not standards-compliant in that evaluating the expression inside NOT() before applying NOT() has a different outcome from transforming NOT() into a normalised boolean expression, and then evaluating the expression. For SQL developers, all of this can just mean:

Keep NULL out of NOT IN predicates or be doomed!

2 thoughts on “SQL incompatibilities: NOT IN and NULL values

Leave a Reply