Have You Ever Wondered About the Difference Between NOT NULL and DEFAULT?

When writing DDL in SQL, you can specify a couple of constraints on columns, like NOT NULL or DEFAULT constraints. Some people might wonder, if the two constraints are actually redundant, i.e. is it still necessary to specify a NOT NULL constraint, if there is already a DEFAULT clause?

The answer is: Yes!

Yes, you should still specify that NOT NULL constraint. And no, the two constraints are not redundant. The answer I gave here on Stack Overflow wraps it up by example, which I’m going to repeat here on our blog:

DEFAULT is the value that will be inserted in the absence of an explicit value in an insert / update statement. Lets assume, your DDL did not have the NOT NULL constraint:

ALTER TABLE tbl 
  ADD COLUMN col VARCHAR(20) 
    DEFAULT "MyDefault"

Then you could issue these statements

-- 1. This will insert "MyDefault" 
--    into tbl.col
INSERT INTO tbl (A, B) 
  VALUES (NULL, NULL);

-- 2. This will insert "MyDefault" 
--    into tbl.col
INSERT INTO tbl (A, B, col) 
  VALUES (NULL, NULL, DEFAULT);

-- 3. This will insert "MyDefault"
--    into tbl.col
INSERT INTO tbl (A, B, col)
  DEFAULT VALUES;

-- 4. This will insert NULL
--    into tbl.col
INSERT INTO tbl (A, B, col)
  VALUES (NULL, NULL, NULL);

Alternatively, you can also use DEFAULT in UPDATE statements, according to the SQL-1992 standard:

-- 5. This will update "MyDefault"
--    into tbl.col
UPDATE tbl SET col = DEFAULT;

-- 6. This will update NULL 
--    into tbl.col
UPDATE tbl SET col = NULL;

Note, not all databases support all of these SQL standard syntaxes. Adding the NOT NULL constraint will cause an error with statements 4, 6, while 1-3, 5 are still valid statements. So to answer your question:

No, NOT NULL and DEFAULT are not redundant

That’s already quite interesting, so the DEFAULT constraint really only interacts with DML statements and how they specify the various columns that they’re updating. The NOT NULL constraint is a much more universal guarantee, that constraints a column’s content also “outside” of the manipulating DML statements.

For instance, if you have a set of data and then you add a DEFAULT constraint, this will not affect your existing data, only new data being inserted.

If, however, you have a set of data and then you add a NOT NULL constraint, you can actually only do so if the constraint is valid – i.e. when there are no NULL values in your column. Otherwise, an error will be raised.

Query performance

Another very interesting use case that applies only to NOT NULL constraints is their usefulness for query optimisers and query execution plans. Assume that you have such a constraint on your column and then, you’re using a NOT IN predicate:

SELECT *
FROM table
WHERE value NOT IN (
  SELECT not_nullable
  FROM other_table
)

In particular, when you’re using Oracle, the above query will be much faster when the not_nullable column has an index AND that particular constraint, because unfortunately, NULL values are not included in Oracle indexes.

Read more about NULL and NOT IN predicates here.

SQL incompatibilities: NOT IN and NULL values

This is something where many hours of debugging have been spent in the lives of many SQL developers. The various situations where you can have NULL values in NOT IN predicates and anti-joins. Here’s a typical situation:

with data as (
  select 1 as id from dual union all
  select 2 as id from dual union all
  select 3 as id from dual
)
select * from data
where id not in (1, null)

What do you think this will return? Well, since “dual” indicates an Oracle database, you might say: “an empty result set”. And you would be right for Oracle. In fact, you would be right for any of these databases:

  • DB2
  • Derby
  • H2
  • Ingres
  • Oracle
  • Postgres
  • SQL Server
  • SQLite
  • Sybase

BUT! You would be wrong for any of these ones:

  • HSQLDB
  • MySQL
  • Sybase ASE

Why the discrepancy?

Intuitively, you’d say that all the big ones treat NULL specially in NOT IN predicates, and it is easy to understand, why:

-- This predicate here...
id not in (1, null)

-- Could be seen as equivalent to this one:
id != 1 and id != null

There’s no id that fulfills the above predicate id != null (not even null itself), hence an empty result set. MySQL is known for some strong abuse of SQL standards compliance, so it’s not surprising that they tweaked this syntax as well.

But wait!

HSQLDB 2.0 is one of the most standards-compliant databases out there, could they really have gotten it wrong? Let’s consider the standard: SQL 1992, chapter 8.4 <in predicate>:

<in predicate> ::=
   <row value constructor>
      [ NOT ] IN <in predicate value>

<in predicate value> ::=
   <table subquery>
      | <left paren> <in value list> <right paren>

<in value list> ::=
   <value expression> { <comma> <value expression> }...

 

And then, further down:

2) Let RVC be the <row value constructor> and 
   let IPV be the <in predicate value>.

3) The expression
     RVC NOT IN IPV

   is equivalent to
     NOT ( RVC IN IPV )

4) The expression
     RVC IN IPV

   is equivalent to
     RVC = ANY IPV

 

So in fact, this can be said:

ID NOT IN (1, NULL) is equivalent to
NOT (ID IN (1, NULL)), equivalent to
NOT (ID = ANY(1, NULL)), equivalent to
NOT (ID = 1 OR ID = NULL), equivalent to
NOT (ID = 1) AND NOT (ID = NULL), which is always UNKNOWN

Conclusion

It looks for once, that HSQLDB 2.0 is not standards-compliant in that evaluating the expression inside NOT() before applying NOT() has a different outcome from transforming NOT() into a normalised boolean expression, and then evaluating the expression. For SQL developers, all of this can just mean:

Keep NULL out of NOT IN predicates or be doomed!