PostgreSQL Syntax is a Mystery Only Exceeded by its Power

I just ran across this rather powerful PostgreSQL example statement from the manual. It reads

Increment the sales count of the salesperson who manages the account for Acme Corporation, and record the whole updated row along with current time in a log table:

WITH upd AS (
  UPDATE employees 
  SET sales_count = sales_count + 1 
  WHERE id = (
    SELECT sales_person 
    FROM accounts 
    WHERE name = 'Acme Corporation'
  )
  RETURNING *
)
INSERT INTO employees_log 
SELECT *, current_timestamp 
FROM upd;

This database keeps amazing me. A single statement allows for updating one table, taking the updated record(s) including all generated values as input for an insert into another table. But at the same time, PostgreSQL does not yet implement the SQL:2003 MERGE statement.

Crazy database…

Apache Derby About to Adopt the Awesome SQL:2003 MERGE Statement

Apache Derby is one out of three popular Java embeddable databases (apart from H2 and HSQLDB). It is very SQL and JDBC standards-compliant, but maybe a bit behind on developments of more advanced SQL features. Around 6 years after its first submission, there has now been some action on the Apache Derby DERBY-3155 ticket, recently. Rick Hillegas has attached a first, promising draft for the MERGE statement specification, which can be seen here:
https://issues.apache.org/jira/secure/attachment/12597795/MergeStatement.html

Among all 14 of the SQL databases supported by jOOQ, Derby would thus be the 8th to support the SQL:2003 MERGE statement, lining up with:

Other databases support proprietary versions of MERGE:

… or other forms of UPSERT:

Visit DERBY-3155 and show the maintainers some love for implementing this awesome and powerful SQL statement!

How to simulate MySQL’s INSERT statement extensions

I have previously posted about the SQL MERGE statement, and how powerful it is here:
https://blog.jooq.org/2011/11/29/arcane-magic-with-the-sql2003-merge-statement/

Unfortunately, not all databases support this statement. Also, very often it is quite a bit of overkill, when what you want to do is to simply INSERT or UPDATE a single record, depending on whether it already exists.

MySQL’s nice INSERT statement extensions

MySQL has a couple of very nice extensions to the INSERT statement. Some of them are also supported by the CUBRID database. Let’s have a look at two features of the (simplified) syntax definition:

INSERT [IGNORE]
    [INTO] tbl_name [(col_name,...)]
    VALUES (expr,...),(...),...
    [ ON DUPLICATE KEY UPDATE
      col_name=expr
        [, col_name=expr] ... ]

 
Essentially, we’re looking at two keywords / clauses:

  1. IGNORE: This will cause the INSERT statement to silently fail with a warning, if we’re about to insert a duplicate key
  2. ON DUPLICATE KEY UPDATE: This will cause the INSERT statement to perform an update on all affected records, if we’re about to insert a duplicate key

The full documentation can be seen here:
http://dev.mysql.com/doc/refman/5.5/en/insert.html

How jOOQ models these statements in other dialects

jOOQ’s API conveniently allows for using those MySQL features directly in insert statements. An example

// Insert an author who has recently married and thus
// changed their last name. If we already have that author, then
// update their last name:

create.insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
      .values(3, "Smith")
      .onDuplicateKeyUpdate()
      .set(AUTHOR.LAST_NAME, "Smith")
      .execute();

// Alternatively, we may not care if the author already
// exists, we just want to insert them without failure:

create.insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
      .values(3, "Smith")
      .onDuplicateKeyIgnore()
      .execute();

Note how jOOQ doesn’t model the IGNORE keyword as a keyword between INSERT [IGNORE] INTO. This keeps the public API a bit more concise, while at the same time being more consistent with pre-existing functionality, both in the INSERT syntax, as well as in the Loader API:

https://blog.jooq.org/2011/08/23/loading-csv-data-with-jooq-2/

Now, databases that do not support these syntax elements can simulate them using MERGE. The above statements will render something like this on an Oracle database:

-- ON DUPLICATE KEY UPDATE statement
merge into "AUTHOR" 
using (select 1 from dual) 
on ("AUTHOR"."ID" = 3) 
when matched then update set "LAST_NAME" = 'Smith' 
when not matched then insert ("ID", "LAST_NAME") values (3, 'Smith')

-- ON DUPLICATE KEY IGNORE statement
merge into "AUTHOR" 
using (select 1 from dual) 
on ("AUTHOR"."ID" = 3) 
when not matched then insert ("ID", "LAST_NAME") values (3, 'Smith')

Conclusion

These practices allow you to write short and concise code using the jOOQ API while still being able to target several backend databases optimally using efficient SQL statements. This will help you increase both your development as well as your runtime performance!

Arcane magic with the SQL:2003 MERGE statement

Every now and then, we feel awkward about having to distinguish INSERT from UPDATE for any of the following reasons:

  • We have to issue at least two statements
  • We have to think about performance
  • We have to think about race conditions
  • We have to choose between [UPDATE; IF UPDATE_COUNT = 0 THEN INSERT] and [INSERT; IF EXCEPTION THEN UPDATE]
  • We have to do those statements once per updated / inserted record

All in all, this is a big source of error and frustration. When at the same time, it could’ve been so easy with the SQL MERGE statement!

A typical situation for MERGE

Among many other use-cases, the MERGE statement may come in handy when handling many-to-many relationships. Let’s say we have this schema:

CREATE TABLE documents (
  id NUMBER(7) NOT NULL,
  CONSTRAINT docu_id PRIMARY KEY (id)
);

CREATE TABLE persons (
  id NUMBER(7) NOT NULL,
  CONSTRAINT pers_id PRIMARY KEY (id)
);

CREATE TABLE document_person (
  docu_id NUMBER(7) NOT NULL,
  pers_id NUMBER(7) NOT NULL,
  flag NUMBER(1) NULL,

  CONSTRAINT docu_pers_pk PRIMARY KEY (docu_id, pers_id),
  CONSTRAINT docu_pers_fk_docu
    FOREIGN KEY (docu_id) REFERENCES documents(id),
  CONSTRAINT docu_pers_fk_pers
    FOREIGN KEY (pers_id) REFERENCES persons(id)
);

The above tables are used to model which person has read (flag=1) / deleted (flag=2) what document. To make things simple, the “document_person” entity is usually OUTER JOINed to “documents”, such that the presence or absence of a “document-person” record may have the same semantics: “flag IS NULL” means the document is unread.

Now when you want to mark a document as read, you have to decide whether you INSERT a new “document_person”, or whether to UPDATE the existing one. Same with deletion. Same with marking all documents as read, or deleting all documents.

Use MERGE instead

You can do it all in one statement! Let’s say, you want to INSERT/UPDATE one record, in order to mark one document as read for a person:

-- The target table
MERGE INTO document_person dst

-- The data source. In this case, just a dummy record
USING (
  SELECT :docu_id as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM DUAL
) src

-- The merge condition (if true, then update, else insert)
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The update action
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag

-- The insert action
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

This looks quite similar, yet incredibly more verbose than MySQL’s INSERT .. ON DUPLICATE KEY UPDATE statement, which is a bit more concise.

Taking it to the extreme

But you can go further! As I said previously, you may also want to mark ALL documents as read, for a given person. No problem with MERGE. The following statement does the same as the previous one, if you specify :docu_id. If you leave it null, it will just mark all documents as :flag:

MERGE INTO document_person dst

-- The data source is now all "documents" (or just :docu_id) left outer
-- joined with the "document_person" mapping
USING (
  SELECT d.id     as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM documents d
  LEFT OUTER JOIN document_person d_p
  ON d.id = d_p.docu_id AND d_p.pers_id = :pers_id
  -- If :docu_id is set, select only that document
  WHERE (:docu_id IS NOT NULL AND d.id = :docu_id)
  -- Otherwise, select all documents
     OR (:docu_id IS NULL)
) src

-- If the mapping already exists, update. Else, insert
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The rest stays the same
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

MERGE support in jOOQ

MERGE is also fully supported in jOOQ. See the manual for more details (scroll to the bottom):

http://www.jooq.org/manual/JOOQ/Query/

Happy merging! :-)