Subtle SQL differences: IDENTITY columns

As I’m mostly using Oracle, IDENTITY columns were not so important to me up until I started to support them in jOOQ. Then, I found that yet again, there are many differences between various databases in how they handle IDENTITY columns in DDL and in DML.

In SQL, there are essentially three orthogonal concepts of how to identify a record (Please correct me if I may be missing more vendor-specific concepts):

  1. Primary Key: This is the most well-known concept. A primary key is the “main unique key” of a table, which means it is a constraint imposed on at least one column. It can be imposed on several columns, too. The primary key has its origin in the relational model
  2. Identity: Identities / identity columns are attributed to at most one column per table. They generate a unique ID from a sequence at record insertion – unlike primary keys, which are expected to be inserted correctly by client DML. The identity often coincides with the primary key, but it does not have to. It does not always have to be unique, either
  3. ROW ID / OID: Many databases know an internal ID that is usually not used in the logical representation of the database schema. This row id has mostly technical purposes and is also present in tables without primary key or identity columns

SQL 2008 standard

While primary key constraints are defined pretty much in the same way in every SQL dialect, and ROWID’s are very vendor-specific things, identities are something relatively new. The SQL 1992 standard does not yet define identity columns. They were formally introduced in SQL:2003, only. Here’s a part of the specification:

<column definition> ::=
  <column name> [ <data type or domain name> ]
  [ <default clause> | <identity column specification> ]
  [ ... ]

<identity column specification> ::=
  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
  [ <left paren> <common sequence generator options> <right paren> ]

 

This looks neat. In fact, in addition to column constraints, columns can specify an identity generation expression based on a sequence. What does reality look like?

Reality

DB2, Derby, HSQLDB, Ingres

These SQL dialects implement the standard very neatly.


id INTEGER GENERATED BY DEFAULT AS IDENTITY
id INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 1)

H2, MySQL, Postgres, SQL Server, Sybase ASE, Sybase SQL Anywhere

These SQL dialects implement identites, but the DDL syntax doesn’t follow the standard


-- H2 mimicks MySQL's and SQL Server's syntax
ID INTEGER IDENTITY(1,1)
ID INTEGER AUTO_INCREMENT
-- MySQL
ID INTEGER NOT NULL AUTO_INCREMENT

-- Postgres serials implicitly create a sequence
-- Postgres also allows for selecting from custom sequences
-- That way, sequences can be shared among tables
id SERIAL NOT NULL

-- SQL Server
ID INTEGER IDENTITY(1,1) NOT NULL
-- Sybase ASE
id INTEGER IDENTITY NOT NULL
-- Sybase SQL Anywhere
id INTEGER NOT NULL IDENTITY

Oracle

Oracle does not know any identity columns at all. Instead, you will have to use a trigger and update the ID column yourself, using a custom sequence. Something along these lines:

CREATE OR REPLACE TRIGGER my_trigger
BEFORE INSERT
ON my_table
REFERENCING NEW AS new
FOR EACH ROW
BEGIN
  SELECT my_sequence.nextval
  INTO :new.id
  FROM dual;
END my_trigger;

Note, that this approach can be employed in most databases supporting sequences and triggers! It is a lot more flexible than standard identities

SQLite

SQLite does not know any identity columns either. It seems to have MySQL’s AUTO_INCREMENT clause, but in fact this just sets some rules on the ROWID generation, which is not a true identity column according to the SQL standard

Conclusion

As many things in SQL, identities are a major pain to get correctly when switching databases, or when writing SQL DDL and DML that works correctly across databases.

jOOQ supports DML executed against tables with identity columns and/or trigger-generated ID values. See a previous blog post that further elaborates on the issue:

https://blog.jooq.org/postgres-insert-returning-clause-and-how-this-can-be-simulated-in-other-rdbms/

Leave a Reply