Increment the sales count of the salesperson who manages the account for Acme Corporation, and record the whole updated row along with current time in a log table:This database keeps amazing me. A single statement allows for updating one table, taking the updated record(s) including all generated values as input for an insert into another table. But at the same time, PostgreSQL does not yet implement the SQL:2003 MERGE statement. Crazy database…WITH upd AS ( UPDATE employees SET sales_count = sales_count + 1 WHERE id = ( SELECT sales_person FROM accounts WHERE name = 'Acme Corporation' ) RETURNING * ) INSERT INTO employees_log SELECT *, current_timestamp FROM upd;
PostgreSQL Syntax is a Mystery Only Exceeded by its Power
I just ran across this rather powerful PostgreSQL example statement from the manual. It reads
UPSERTING aka MERGE is a complicated subject for truly acid compliant databases. The databases that implement MERGE are generally sacrificing something (performance or consistency). Thus blindly using MERGE because the database supports it is not necessarily a good idea.
However Postgres could offer some solution that maybe fits 80% but I see the two use cases of upserting very divided e.g. stats that matter like money which will sacrifice speed or stats that don’t matter like facebook likes.
That being said “upserting” is very useful for real time analytics and I have run into Postgres issues with transaction/lock performance. I think this is where databases like VoltDB or NoSQL options like Redis really shine (than can do it generally because they are in memory and synchronous).
Yeah, a powerful MERGE execution touching millions of records is certainly not a good idea. But if you know you’ll operate only on one record (if it exists), then there is not that big of a performance loss. I particularly like the H2 MERGE syntax, which is really a very simple UPSERT.
Do you have any experience with VoltDB or other of Stonebraker’s NewSQL stores?
The complex part of UPSERT is retaining ACID while still having good performance and concurrency. These are the concerns which prevented UPSERT from being included in 9.4.
Interesting. I wasn’t aware this was considered for PostgreSQL 9.4. Can you cite a reference where this decision was made?
In pragmatic terms how one would update/insert into 100M records table in chunks loading data from an external non-rdbms system?
There is no prior knowledge which records should be inserted and which updated.
I tried following:
1. create staging table, truncate, load into temp table using insert statements and then
2. insert into permanent table – where not exists and update permanent table where exists from temp table, result – deadlocks as the process was running trying to process the 1024 record chunks with couple of parallel threads.
With such a huge merge operation, a good idea is to use a “loader table”, or staging table as you called it, i.e. a table in which you first load all those 100M records, e.g. via the PostgreSQL COPY command (http://www.postgresql.org/docs/current/static/sql-copy.html), and then transfer the records to the target table 1000 records at a time (using LIMIT), before committing. The rule here is:
If you’re running parallel transfer threads, you should partition your data to avoid contention as good as possible. There’s always an attribute by which you can partition your threads.
Also, consider upgrading to PostgreSQL 9.5, which finally supports UPSERT.
Hi Lukas,
is there a way to describe the above SQL update statement via jOOQs DSL? For example something like:
I’m asking because the “with(…).as(Select)” only takes a Select-Object and I can’t find a way to convert the Update statement into a Select.
Regards
Not yet:
– https://github.com/jOOQ/jOOQ/issues/4474
– https://github.com/jOOQ/jOOQ/issues/5689
Thx, for the fast response.