Subtle SQL differences: IDENTITY columns

As I’m mostly using Oracle, IDENTITY columns were not so important to me up until I started to support them in jOOQ. Then, I found that yet again, there are many differences between various databases in how they handle IDENTITY columns in DDL and in DML.

In SQL, there are essentially three orthogonal concepts of how to identify a record (Please correct me if I may be missing more vendor-specific concepts):

  1. Primary Key: This is the most well-known concept. A primary key is the “main unique key” of a table, which means it is a constraint imposed on at least one column. It can be imposed on several columns, too. The primary key has its origin in the relational model
  2. Identity: Identities / identity columns are attributed to at most one column per table. They generate a unique ID from a sequence at record insertion – unlike primary keys, which are expected to be inserted correctly by client DML. The identity often coincides with the primary key, but it does not have to. It does not always have to be unique, either
  3. ROW ID / OID: Many databases know an internal ID that is usually not used in the logical representation of the database schema. This row id has mostly technical purposes and is also present in tables without primary key or identity columns

SQL 2008 standard

While primary key constraints are defined pretty much in the same way in every SQL dialect, and ROWID’s are very vendor-specific things, identities are something relatively new. The SQL 1992 standard does not yet define identity columns. They were formally introduced in SQL:2003, only. Here’s a part of the specification:

<column definition> ::=
  <column name> [ <data type or domain name> ]
  [ <default clause> | <identity column specification> ]
  [ ... ]

<identity column specification> ::=
  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
  [ <left paren> <common sequence generator options> <right paren> ]

 

This looks neat. In fact, in addition to column constraints, columns can specify an identity generation expression based on a sequence. What does reality look like?

Reality

DB2, Derby, HSQLDB, Ingres

These SQL dialects implement the standard very neatly.


id INTEGER GENERATED BY DEFAULT AS IDENTITY
id INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 1)

H2, MySQL, Postgres, SQL Server, Sybase ASE, Sybase SQL Anywhere

These SQL dialects implement identites, but the DDL syntax doesn’t follow the standard


-- H2 mimicks MySQL's and SQL Server's syntax
ID INTEGER IDENTITY(1,1)
ID INTEGER AUTO_INCREMENT
-- MySQL
ID INTEGER NOT NULL AUTO_INCREMENT

-- Postgres serials implicitly create a sequence
-- Postgres also allows for selecting from custom sequences
-- That way, sequences can be shared among tables
id SERIAL NOT NULL

-- SQL Server
ID INTEGER IDENTITY(1,1) NOT NULL
-- Sybase ASE
id INTEGER IDENTITY NOT NULL
-- Sybase SQL Anywhere
id INTEGER NOT NULL IDENTITY

Oracle

Oracle does not know any identity columns at all. Instead, you will have to use a trigger and update the ID column yourself, using a custom sequence. Something along these lines:

CREATE OR REPLACE TRIGGER my_trigger
BEFORE INSERT
ON my_table
REFERENCING NEW AS new
FOR EACH ROW
BEGIN
  SELECT my_sequence.nextval
  INTO :new.id
  FROM dual;
END my_trigger;

Note, that this approach can be employed in most databases supporting sequences and triggers! It is a lot more flexible than standard identities

SQLite

SQLite does not know any identity columns either. It seems to have MySQL’s AUTO_INCREMENT clause, but in fact this just sets some rules on the ROWID generation, which is not a true identity column according to the SQL standard

Conclusion

As many things in SQL, identities are a major pain to get correctly when switching databases, or when writing SQL DDL and DML that works correctly across databases.

jOOQ supports DML executed against tables with identity columns and/or trigger-generated ID values. See a previous blog post that further elaborates on the issue:

https://lukaseder.wordpress.com/2011/08/29/postgres-insert-returning-clause-and-how-this-can-be-simulated-in-other-rdbms/

Binary data in SQL, more trivia

Yesterday, I blogged about inlining boolean literals in SQL. This goes on with other data types. Let’s look at BLOBs and BINARY data in general. This is also defined in SQL standards, not in SQL 1992, though:

<binary string literal> ::=
  X <quote> [ <space>... ] 
  [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote>

 
But is that supported as such in all databases? Of course, as usual, the answer goes “NO”. Here’s a little summary to see how binary literals are handled in which database:

SQL Server, Sybase ASE, Sybase SQL Anywhere

-- T-SQL databases treat binary literals as other languages
-- treat hexadecimal numbers
INSERT INTO lob_table VALUES (0x01FF);

DB2

-- DB2 follows the standard, but you may need to use
-- a blob constructor. This is not needed for VARCHAR
-- FOR BIT DATA types
INSERT INTO lob_table VALUES (blob(X'01FF'));

Derby, H2, HSQLDB, Ingres, MySQL, SQLite

-- Most databases follow the standard, especially
-- the Java databases
INSERT INTO lob_table VALUES (X'01FF');

Oracle

-- Oracle Omit's the standard's "X"
INSERT INTO lob_table VALUES (hextoraw('01FF'));

Postgres

-- For some odd reason Postgres, is really off this time
-- using octal representation of bytes in literals.
-- There is also hex encoding but I didn't get that to work
-- The explicit cast is important, though
INSERT INTO lob_table VALUES (E'\\001\\377'::bytea);

SQL and booleans, some trivia

Some trivia about SQL and booleans:

SQL 1992 defines three values for a boolean:

<truth value> ::=
        TRUE
      | FALSE
      | UNKNOWN

True booleans aren’t always supported, though. Here’s the truth table of boolean support:

SQL Dialect Supports booleans
DB2 0 (use 1/0 instead)
Derby true (you can safely use true/false)
H2 true
HSQLDB true
Ingres true
MySQL true
Oracle 0
Postgres true
SQL Server 0
SQLite 0
Sybase ASE 0
Sybase SQL Anywhere 0

Trivia… but nice to know.

The depths of Java: API leak exposed through covariance

Java can be very tricky some times, especially in API design. Let’s have a look at a very interesting showcase. jOOQ strongly separates API from implementation. All API is in the org.jooq package, and public. Most implementation is in the org.jooq.impl package and package-private. Only factories and some dedicated base implementations are public. This allows for very powerful package-level encapsulation, exposing mostly only interfaces to jOOQ users.

A simplified example of package-level encapsulation

Here’s roughly how jOOQ models SQL tables. The (overly simplified) API:

package org.jooq;

/**
 * A table in a database
 */
public interface Table {

  /**
   * Join two tables
   */
  Table join(Table table);
}

And two (overly simplified) implementation classes:

package org.jooq.impl;

import org.jooq.Table;

/**
 * Base implementation
 */
abstract class AbstractTable implements Table {

  @Override
  public Table join(Table table) {
    return null;
  }
}

/**
 * Custom implementation, publicly exposed to client code
 */
public class CustomTable extends AbstractTable {
}

How the internal API is exposed

Let’s assume that the internal API does some tricks with covariance:

abstract class AbstractTable implements Table, InteralStuff {

  // Note, this method returns AbstractTable, as it might
  // prove to be convenient to expose some internal API
  // facts within the internal API itself
  @Override
  public AbstractTable join(Table table) {
    return null;
  }

  /**
   * Some internal API method, also package private
   */
  void doThings() {}
  void doMoreThings() {

    // Use the internal API
    join(this).doThings();
  }
}

This looks all safe at the first sight, but is it? AbstractTable is package-private, but CustomTable extends it and inherits all of its API, including the covariant method override of “AbstractTable join(Table)”. What does that result in? Check out the following piece of client code

package org.jooq.test;

import org.jooq.Table;
import org.jooq.impl.CustomTable;

public class Test {
  public static void main(String[] args) {
    Table joined = new CustomTable();

    // This works, no knowledge of AbstractTable exposed to the compiler
    Table table1 = new CustomTable();
    Table join1 = table1.join(joined);

    // This works, even if join exposes AbstractTable
    CustomTable table2 = new CustomTable();
    Table join2 = table2.join(joined);

    // This doesn't work. The type AbstractTable is not visible
    Table join3 = table2.join(joined).join(joined);
    //            ^^^^^^^^^^^^^^^^^^^ This cannot be dereferenced

    // ... so hide these implementation details again
    // The API flaw can be circumvented with casting
    Table join4 = ((Table) table2.join(joined)).join(joined);
  }
}

Conclusion

Tampering with visibilities in class hierarchies can be dangerous. Beware of the fact that API methods declared in interfaces are always public, regardless of any covariant implementations that involve non-public artefacts. This can be quite annoying for API users when not properly dealt with by API designers.

Fixed in the next version of jOOQ 🙂

The truth about IN and EXISTS in SQL

Very nice article, finally getting rid of some doubts… (at least for Oracle)

http://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/