Standard SQL is a beautiful language. Vendor specific implementations, however, have their warts. In Oracle, for example, it’s not possible to update any columns in a MERGE statement, which have been referenced by the ON clause. For example:
CREATE TABLE person (
id NUMBER(18) NOT NULL PRIMARY KEY,
user_name VARCHAR2(50) NOT NULL UNIQUE,
score NUMBER(18)
);
Now, in MySQL, we can run a non-standard
INSERT .. ON DUPLICATE KEY UPDATE
statement like this:
INSERT INTO person (id, user_name, score)
VALUES (1, 'foo', 100)
ON DUPLICATE KEY UPDATE
SET user_name = 'foo', score = 100
Behind the scenes, MySQL will check
all unique constraints for duplicates and reject the insert, replacing it by the update statement instead. It’s debatable whether this is really useful (ideally, we want to check only a single unique constraint for duplicates), but that’s what MySQL offers.
In case we want to run the same behaviour by Oracle, we could use the
MERGE
statement:
MERGE INTO person t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
ON (t.id = s.id OR t.user_name = s.user_name)
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score)
That looks reasonable, but it doesn’t work. We’ll get:
SQL-Fehler: ORA-38104: Columns referenced in the ON Clause cannot be updated: “T”.”USER_NAME”
Obviously, this is some protection against the situation where such an update would suddenly move a row from the matched to the not matched group. In this particular example, it might not look like something that could cause problems, but if vendor specific extensions such as the
WHERE
or
DELETE
clause would be used, things might look different.
However, the parser is not very smart, in fact, it is almost not smart at all. While it detects extremely silly attempts at circumventing this limitation, such as this:
MERGE INTO person t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
-- Circumvention attempt here: NVL()
ON (t.id = s.id OR nvl(t.user_name, null) = s.user_name)
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score)
It does not detect any of these attempts:
Using row value expressions
MERGE INTO person t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
ON (t.id = s.id OR
-- Circumvention attempt here: row value expressions
(t.user_name, 'dummy') = ((s.user_name, 'dummy')))
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score)
Seemingly without any penalty on the execution plan. Both indexes are being used:
---------------------------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 |
| 1 | MERGE | PERSON | |
| 2 | VIEW | | |
| 3 | NESTED LOOPS OUTER | | 1 |
| 4 | FAST DUAL | | 1 |
| 5 | VIEW | VW_LAT_8626BD41 | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| PERSON | 1 |
| 7 | BITMAP CONVERSION TO ROWIDS | | |
| 8 | BITMAP OR | | |
| 9 | BITMAP CONVERSION FROM ROWIDS | | |
|* 10 | INDEX RANGE SCAN | SYS_C00106110 | |
| 11 | BITMAP CONVERSION FROM ROWIDS | | |
|* 12 | INDEX RANGE SCAN | SYS_C00106111 | |
---------------------------------------------------------------------------
Correlated subquery
MERGE INTO person t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
ON (t.id = s.id OR
-- Circumvention attempt here: correlated subquery
(SELECT t.user_name FROM dual) = s.user_name)
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score)
This seems to prevent any index usage, and should thus be avoided:
----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 |
| 1 | MERGE | PERSON | |
| 2 | VIEW | | |
| 3 | NESTED LOOPS OUTER | | 1 |
| 4 | FAST DUAL | | 1 |
| 5 | VIEW | VW_LAT_1846A928 | 1 |
|* 6 | FILTER | | |
| 7 | TABLE ACCESS FULL| PERSON | 1 |
| 8 | FAST DUAL | | 1 |
----------------------------------------------------------
Using NVL() and updating a view instead
Just plain simple usage of NVL() inside of the ON clause didn’t work before. The parser was smart enough to detect that. But it isn’t smart enough to detect NVL() inside of a view / derived table.
MERGE INTO (
SELECT id, user_name, nvl(user_name, null) n, score
FROM person
) t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
-- Circumvention attempt here: renamed column
ON (t.id = s.id OR t.n = s.user_name)
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score)
Notice that both
USER_NAME
and
N
columns are the same thing, but the parser doesn’t notice this and thinks we’re fine.
The execution plan is still optimal, as Oracle seems to have a way to
optimise NVL() expressions (but not coalesce and others!):
---------------------------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 |
| 1 | MERGE | PERSON | |
| 2 | VIEW | | |
| 3 | NESTED LOOPS OUTER | | 1 |
| 4 | FAST DUAL | | 1 |
| 5 | VIEW | VW_LAT_46651921 | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| PERSON | 1 |
| 7 | BITMAP CONVERSION TO ROWIDS | | |
| 8 | BITMAP OR | | |
| 9 | BITMAP CONVERSION FROM ROWIDS | | |
|* 10 | INDEX RANGE SCAN | SYS_C00106110 | |
| 11 | BITMAP CONVERSION FROM ROWIDS | | |
|* 12 | INDEX RANGE SCAN | SYS_C00106111 | |
---------------------------------------------------------------------------
Using the WHERE clause
If we hadn’t had an
OR
predicate in our
ON
clause, but a
AND
predicate, then we could have used the
WHERE
clause in Oracle. This works:
-- NOT the same query as the original one!
MERGE INTO person t
USING (
SELECT 1 id, 'foo' user_name, 100 score
FROM dual
) s
ON (t.id = s.id)
WHEN MATCHED THEN UPDATE
SET t.user_name = s.user_name, t.score = 100
WHERE t.user_name = s.user_name
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
VALUES (s.id, s.user_name, s.score);
This is not the same query as the original one. I just listed it here for completeness’ sake. Also to remind readers of the fact that this approach as well doesn’t seem to use indexes optimally. Only the primary key index (from the
ON
clause) seems to be used. The unique key is not being used:
----------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 |
| 1 | MERGE | PERSON | |
| 2 | VIEW | | |
| 3 | NESTED LOOPS OUTER | | 1 |
| 4 | VIEW | | 1 |
| 5 | FAST DUAL | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID| PERSON | 1 |
|* 7 | INDEX UNIQUE SCAN | SYS_C00106110 | 1 |
----------------------------------------------------------------
Careful
Be careful when applying the above workarounds. Assuming that ORA-38104 is a good thing (i.e. that Oracle still thinks it should be enforced), then the above workarounds simply expose bugs in the parser, which should detect such cases. The above behaviour has been observed in Oracle 12c and 18c.
I personally believe that ORA-38104 should be abandoned entirely, and the root cause for this restriction should be removed. But it is certainly worth exploring alternative options rather than relying on the above workarounds in production code, apart from the occasional one-shot migration query, where such loop holes are always nice tools to exploit.
Like this:
Like Loading...
Superb! I updated my answer on SO: https://stackoverflow.com/questions/52001089/update-on-seemingly-key-preserving-view-in-oracle-raises-ora-01779/52561721#52561721 with subquery directly placed in
ON
clause because the idea is the same.I like the ideas like you proposed. Similar in SQL server
SELECT * FROM tab WHERE 1=1
is marked as trivial butSELECT * FROM tab WHERE 1 = (SELECT 1)
is not trivial anymoreIn SQL Server it is sometimes needed to force specific execution plan by using
TOP (100) PERCENT
/ORDER BY 1 OFFSET 0 ROWS
https://feedback.azure.com/forums/908035-sql-server/suggestions/36215128-select-vs-insert-into-select-different-number-of-r
That is why I am so interested in “ezoteric” SQL transformations :)
That’s a different story, though. Yours is about “hinting” to use a specific execution plan because of some “knowledge” (or rather empirical observation) about the optimiser’s internals. The behaviour of your query shouldn’t be different, apart from different performance characteristics.
This blog post is about outsmarting the parser’s semantic checker. So, the behaviour is different.
Cool, I forgot about my own question :) However, I would definitely prefer the solution that uses a row value expression, because of the execution plan.
Lukas, thanks for the write up on working around ORA-38104. It’s just what I was looking for. However, when trying out your solutions with my own code I found that your “row value expressions” option only works when the row value expression is part of an OR condition. When changing the OR to an AND the ORA-38104 error returns.
Further testing on my part shows that the row value expression is not needed in your query, instead all that is needed is an OR to help mask the column being updated and at least one other predicate ANDed with the column to be updated: ON (1=2 OR (t.ID = s.ID AND t.user_name = s.user_name)).
With this understanding, the minimal ON clause that circumvents the ORA-38104 error becomes: ON (1=2 or (1=1 and t.user_name = s.user_name)).
Thanks for your comment. All of this is a hack to outsmart the parser logic that produces ORA-38104 errors. Eventually, your suggestion will also not work, as optimiser folks will continue improving this part (rather than fixing the actual problem…) Good to know, though that this is now the status quo :-)
Agreed twice. To me, it sounds like two pretty strange bugs.
When ORA-38104 should exist, then it shouldn’t rely on the parser, but on the occurrence of the column anywhere in the conditions tree.
But it shouldn’t exist. Whatever problems may occur with vendor specific extensions, they won’t get any better due to this restriction.
It doesn’t matter, but in your Oracle examples, you left out the
score = 100
update.Well, it does, in a way. But I would imagine that the almost 50 year old Oracle database doesn’t have very well designed internal expression tree navigation APIs, so finding that occurrence might just be a major PITA :) (it’s hard enough already in jOOQ). Ultimately, SQL has quite a few scoping and aliasing rules, so what appears to be the same column, might in fact be a different one.
I’m assuming that there are some edge cases that result in corrupt data, and the vendor definitely wants to avoid that at all costs. Lack of syntactic support is just a “minor” inconvenience. Corrupt data is a critical issue for any database system that claims ACID-ity.
Thanks, fixed
That is absolutely brilliant. The NVL() worked perfectly and I don’t think I’d have ever though of it.