This is something where many hours of debugging have been spent in the lives of many SQL developers. The various situations where you can have NULL values in NOT IN predicates and anti-joins. Here’s a typical situation:
with data as (
select 1 as id from dual union all
select 2 as id from dual union all
select 3 as id from dual
)
select * from data
where id not in (1, null)
What do you think this will return? Well, since “dual” indicates an Oracle database, you might say: “an empty result set”. And you would be right for Oracle. In fact, you would be right for any of these databases:
- DB2
- Derby
- H2
- Ingres
- Oracle
- Postgres
- SQL Server
- SQLite
- Sybase
BUT! You would be wrong for any of these ones:
- HSQLDB
- MySQL
- Sybase ASE
Why the discrepancy?
Intuitively, you’d say that all the big ones treat NULL specially in NOT IN predicates, and it is easy to understand, why:
-- This predicate here...
id not in (1, null)
-- Could be seen as equivalent to this one:
id != 1 and id != null
There’s no id that fulfills the above predicate id != null (not even null itself), hence an empty result set. MySQL is known for some strong abuse of SQL standards compliance, so it’s not surprising that they tweaked this syntax as well.
But wait!
HSQLDB 2.0 is one of the most standards-compliant databases out there, could they really have gotten it wrong? Let’s consider the standard: SQL 1992, chapter 8.4 <in predicate>:
<in predicate> ::= <row value constructor> [ NOT ] IN <in predicate value> <in predicate value> ::= <table subquery> | <left paren> <in value list> <right paren> <in value list> ::= <value expression> { <comma> <value expression> }...
And then, further down:
2) Let RVC be the <row value constructor> and let IPV be the <in predicate value>. 3) The expression RVC NOT IN IPV is equivalent to NOT ( RVC IN IPV ) 4) The expression RVC IN IPV is equivalent to RVC = ANY IPV
So in fact, this can be said:
ID NOT IN (1, NULL) is equivalent to NOT (ID IN (1, NULL)), equivalent to NOT (ID = ANY(1, NULL)), equivalent to NOT (ID = 1 OR ID = NULL), equivalent to NOT (ID = 1) AND NOT (ID = NULL), which is always UNKNOWN
Conclusion
It looks for once, that HSQLDB 2.0 is not standards-compliant in that evaluating the expression inside NOT() before applying NOT() has a different outcome from transforming NOT() into a normalised boolean expression, and then evaluating the expression. For SQL developers, all of this can just mean:
Keep NULL out of NOT IN predicates or be doomed!
A little nit-pick: NOT (ID = 1) AND NOT (ID = NULL) is UNKNOWN where NOT (ID=1) and FALSE where (ID = 1)
Thanks for your comment. Not sure what you mean, though?