How to Work Around ORA-38104: Columns referenced in the ON Clause cannot be updated

Standard SQL is a beautiful language. Vendor specific implementations, however, have their warts. In Oracle, for example, it’s not possible to update any columns in a MERGE statement, which have been referenced by the ON clause. For example:

CREATE TABLE person (
  id NUMBER(18) NOT NULL PRIMARY KEY,
  user_name VARCHAR2(50) NOT NULL UNIQUE,
  score NUMBER(18)
);

Now, in MySQL, we can run a non-standard INSERT .. ON DUPLICATE KEY UPDATE statement like this:

INSERT INTO person (id, user_name, score)
VALUES (1, 'foo', 100)
ON DUPLICATE KEY UPDATE
  SET user_name = 'foo', score = 100

Behind the scenes, MySQL will check all unique constraints for duplicates and reject the insert, replacing it by the update statement instead. It’s debatable whether this is really useful (ideally, we want to check only a single unique constraint for duplicates), but that’s what MySQL offers.

In case we want to run the same behaviour by Oracle, we could use the MERGE statement:

MERGE INTO person t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
ON (t.id = s.id OR t.user_name = s.user_name)
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score)

That looks reasonable, but it doesn’t work. We’ll get:

SQL-Fehler: ORA-38104: Columns referenced in the ON Clause cannot be updated: “T”.”USER_NAME”

Obviously, this is some protection against the situation where such an update would suddenly move a row from the matched to the not matched group. In this particular example, it might not look like something that could cause problems, but if vendor specific extensions such as the WHERE or DELETE clause would be used, things might look different.

However, the parser is not very smart, in fact, it is almost not smart at all. While it detects extremely silly attempts at circumventing this limitation, such as this:

MERGE INTO person t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
-- Circumvention attempt here: NVL()
ON (t.id = s.id OR nvl(t.user_name, null) = s.user_name)
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score)

It does not detect any of these attempts:

Using row value expressions

MERGE INTO person t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
ON (t.id = s.id OR 
-- Circumvention attempt here: row value expressions
  (t.user_name, 'dummy') = ((s.user_name, 'dummy')))
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score)

Seemingly without any penalty on the execution plan. Both indexes are being used:

---------------------------------------------------------------------------
| Id  | Operation                               | Name            | Rows  |
---------------------------------------------------------------------------
|   0 | MERGE STATEMENT                         |                 |     1 |
|   1 |  MERGE                                  | PERSON          |       |
|   2 |   VIEW                                  |                 |       |
|   3 |    NESTED LOOPS OUTER                   |                 |     1 |
|   4 |     FAST DUAL                           |                 |     1 |
|   5 |     VIEW                                | VW_LAT_8626BD41 |     1 |
|   6 |      TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |     1 |
|   7 |       BITMAP CONVERSION TO ROWIDS       |                 |       |
|   8 |        BITMAP OR                        |                 |       |
|   9 |         BITMAP CONVERSION FROM ROWIDS   |                 |       |
|* 10 |          INDEX RANGE SCAN               | SYS_C00106110   |       |
|  11 |         BITMAP CONVERSION FROM ROWIDS   |                 |       |
|* 12 |          INDEX RANGE SCAN               | SYS_C00106111   |       |
---------------------------------------------------------------------------

Correlated subquery

MERGE INTO person t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
ON (t.id = s.id OR 
-- Circumvention attempt here: correlated subquery
  (SELECT t.user_name FROM dual) = s.user_name)
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score)

This seems to prevent any index usage, and should thus be avoided:

----------------------------------------------------------
| Id  | Operation              | Name            | Rows  |
----------------------------------------------------------
|   0 | MERGE STATEMENT        |                 |     1 |
|   1 |  MERGE                 | PERSON          |       |
|   2 |   VIEW                 |                 |       |
|   3 |    NESTED LOOPS OUTER  |                 |     1 |
|   4 |     FAST DUAL          |                 |     1 |
|   5 |     VIEW               | VW_LAT_1846A928 |     1 |
|*  6 |      FILTER            |                 |       |
|   7 |       TABLE ACCESS FULL| PERSON          |     1 |
|   8 |       FAST DUAL        |                 |     1 |
----------------------------------------------------------

Using NVL() and updating a view instead

Just plain simple usage of NVL() inside of the ON clause didn’t work before. The parser was smart enough to detect that. But it isn’t smart enough to detect NVL() inside of a view / derived table.

MERGE INTO (
  SELECT id, user_name, nvl(user_name, null) n, score
  FROM person
) t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
-- Circumvention attempt here: renamed column
ON (t.id = s.id OR t.n = s.user_name)
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score)

Notice that both USER_NAME and N columns are the same thing, but the parser doesn’t notice this and thinks we’re fine.

The execution plan is still optimal, as Oracle seems to have a way to optimise NVL() expressions (but not coalesce and others!):

---------------------------------------------------------------------------
| Id  | Operation                               | Name            | Rows  |
---------------------------------------------------------------------------
|   0 | MERGE STATEMENT                         |                 |     1 |
|   1 |  MERGE                                  | PERSON          |       |
|   2 |   VIEW                                  |                 |       |
|   3 |    NESTED LOOPS OUTER                   |                 |     1 |
|   4 |     FAST DUAL                           |                 |     1 |
|   5 |     VIEW                                | VW_LAT_46651921 |     1 |
|   6 |      TABLE ACCESS BY INDEX ROWID BATCHED| PERSON          |     1 |
|   7 |       BITMAP CONVERSION TO ROWIDS       |                 |       |
|   8 |        BITMAP OR                        |                 |       |
|   9 |         BITMAP CONVERSION FROM ROWIDS   |                 |       |
|* 10 |          INDEX RANGE SCAN               | SYS_C00106110   |       |
|  11 |         BITMAP CONVERSION FROM ROWIDS   |                 |       |
|* 12 |          INDEX RANGE SCAN               | SYS_C00106111   |       |
---------------------------------------------------------------------------

Using the WHERE clause

If we hadn’t had an OR predicate in our ON clause, but a AND predicate, then we could have used the WHERE clause in Oracle. This works:

-- NOT the same query as the original one!
MERGE INTO person t
USING (
  SELECT 1 id, 'foo' user_name, 100 score
  FROM dual
) s
ON (t.id = s.id)
WHEN MATCHED THEN UPDATE
  SET t.user_name = s.user_name, t.score = 100
  WHERE t.user_name = s.user_name
WHEN NOT MATCHED THEN INSERT (id, user_name, score)
  VALUES (s.id, s.user_name, s.score);

This is not the same query as the original one. I just listed it here for completeness’ sake. Also to remind readers of the fact that this approach as well doesn’t seem to use indexes optimally. Only the primary key index (from the ON clause) seems to be used. The unique key is not being used:

----------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  |
----------------------------------------------------------------
|   0 | MERGE STATEMENT                |               |     1 |
|   1 |  MERGE                         | PERSON        |       |
|   2 |   VIEW                         |               |       |
|   3 |    NESTED LOOPS OUTER          |               |     1 |
|   4 |     VIEW                       |               |     1 |
|   5 |      FAST DUAL                 |               |     1 |
|   6 |     TABLE ACCESS BY INDEX ROWID| PERSON        |     1 |
|*  7 |      INDEX UNIQUE SCAN         | SYS_C00106110 |     1 |
----------------------------------------------------------------

Careful

Be careful when applying the above workarounds. Assuming that ORA-38104 is a good thing (i.e. that Oracle still thinks it should be enforced), then the above workarounds simply expose bugs in the parser, which should detect such cases. The above behaviour has been observed in Oracle 12c and 18c.

I personally believe that ORA-38104 should be abandoned entirely, and the root cause for this restriction should be removed. But it is certainly worth exploring alternative options rather than relying on the above workarounds in production code, apart from the occasional one-shot migration query, where such loop holes are always nice tools to exploit.

How Adding a UNIQUE Constraint on a OneToOne Relationship Helps Performance

A lot of people use SQL constraints mainly to enforce data integrity, and that’s already a very good thing. A UNIQUE constraint, for instance, makes sure that there is at most one instance of any possible value (or tuple, in the case of a composite constraint) in a table. For instance:

CREATE TABLE x (
  a NUMBER(10),
  UNIQUE (a)
);

-- This works:
INSERT INTO x VALUES (1);

-- This fails:
INSERT INTO x VALUES (1);

Constraints are also good for (SELECT) performance

One thing that people often do not think about, though, is the fact that constraints can also be used as very valuable meta information by the database optimiser to make a better decision when finding the most optimal execution plan. It is a big difference for an optimiser…

  • To be unaware of how many different values there are in any given column (worst case)
  • To be able to estimate the different values in any given column (statistics are present)
  • To know that each value can appear at most once

In the last case, a UNIQUE (or PRIMARY KEY) constraint tells the optimiser exactly that. Once the optimiser knows that a certain operation returns at most one row (and not 10, 100, or even 10M rows), a nested loop suddenly becomes extremely cheap, as it can abort as soon as one row was found.

A Java analogy

Compare this with the following Java code:

// This is much "harder" ...
List<Object> objects = unknownSize();
for (Object object : objects) {
    doSomethingWith(object);
}

// ... than this, where we "know" the list
// only contains one value
for (Object object : Collections.singletonList("abc")) {
    doSomethingWith(object);
}

If Java had such optimisation capabilities (bonus question: can the JIT do it?), then the second loop could be optimised as such:

// The loop can be avoided:
doSomethingWith("abc");

Let’s look at Oracle

Let’s look at a concrete example with Oracle:

CREATE TABLE x1 (
  a NUMBER(10) NOT NULL,
  data VARCHAR2(100),
  CONSTRAINT pk_y1 PRIMARY KEY (a)
);

CREATE TABLE y1 (
  a NUMBER(10) NOT NULL, 
  optional_data VARCHAR2(100),
  CONSTRAINT fk_y1 FOREIGN KEY (a) REFERENCES x1(a)
);

CREATE INDEX i1 ON y1(a);
CREATE INDEX i1data ON x1(data);

INSERT INTO x1
SELECT level, dbms_random.string('a', 100)
FROM dual
CONNECT BY level <= 10000;

INSERT INTO y1
SELECT level, dbms_random.string('a', 100)
FROM dual
CONNECT BY level <= 5000;

This is a typical one-to-many relationship between x1 and y1. With the constraints in place, there can be between 0 and N rows in y1 for each row in x1. As a user, we know that there is only one value in y1 for any value in x1, but we don’t enforce this knowledge with a constraint.

Let’s look at the following query:

SELECT count(*)
FROM x1
JOIN y1 USING (a)
WHERE data LIKE 'a%';

What we’re doing here is we want all values in x1 whose data starts with the letter ‘a’, and for which we also have any optional_data. The execution plan is

-----------------------------------------------------
| Operation                | Name   | Rows  | Cost  |
-----------------------------------------------------
| SELECT STATEMENT         |        |       |    29 |
|  SORT AGGREGATE          |        |     1 |       |
|   HASH JOIN              |        |   176 |    29 |
|    VIEW                  |        |   176 |    24 |
|     HASH JOIN            |        |       |       |
|      INDEX RANGE SCAN    | I1DATA |   176 |     4 |
|      INDEX FAST FULL SCAN| PK_Y1  |   176 |    24 |
|    INDEX FAST FULL SCAN  | I1     |  5000 |     5 |
-----------------------------------------------------

As you can see, Oracle chooses to run a hash join operation, which means that all the values from x1 starting with ‘a’ are fetched (around 176), and joined in a hashmap with the entire set of values in y1, fetched from the index i1 (5000 values).

How does this compare with using a UNIQUE constraint?

We’ll create almost the exact same schema as follows:

CREATE TABLE x2 (
  a NUMBER(10) NOT NULL,
  data VARCHAR2(100),
  CONSTRAINT pk_x2 PRIMARY KEY (a)
);

CREATE TABLE y2 (
  a NUMBER(10) NOT NULL, 
  optional_data VARCHAR2(100),
  CONSTRAINT uk_y2 UNIQUE (a),
  CONSTRAINT fk_y2 FOREIGN KEY (a) REFERENCES x2(a)
);

CREATE INDEX i2data ON x2(data);

INSERT INTO x2
SELECT * FROM x1;

INSERT INTO y2
SELECT * FROM y1;

BEGIN
  dbms_stats.gather_table_stats('TEST', 'X2');
  dbms_stats.gather_table_stats('TEST', 'Y2');
END;
/

The data is exactly the same, but now we enforce a UNIQUE constraint on y2’s foreign key, making this effectively a one-to-one relationship. Check out what happens when we run the exact same query…

SELECT count(*)
FROM x2
JOIN y2 USING (a)
WHERE data LIKE 'a%';

Its execution plan is now:

-----------------------------------------------------
| Operation                | Name   | Rows  | Cost  |
-----------------------------------------------------
| SELECT STATEMENT         |        |       |    25 |
|  SORT AGGREGATE          |        |     1 |       |
|   NESTED LOOPS           |        |   176 |    25 |
|    VIEW                  |        |   176 |    25 |
|     HASH JOIN            |        |       |       |
|      INDEX RANGE SCAN    | I2DATA |   176 |     5 |
|      INDEX FAST FULL SCAN| PK_X2  |   176 |    24 |
|    INDEX UNIQUE SCAN     | UK_Y2  |     1 |     0 |
-----------------------------------------------------

As you can see, the overall cost has decreased from 29 to 25 as we’re no longer using a hash join, but a nested loop join operation, which is probably faster if our statistics are not way off, as we only have to look up the single value in y2 corresponding to x2 for each of x2’s estimated 176 rows that start with the letter ‘a’.

But let’s not trust the execution plan, let’s benchmark things (as discussed in a previous article). Run the following code in SQL Developer, for instance:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 1000;
BEGIN
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT count(*)
      FROM x1
      JOIN y1 USING (a)
      WHERE data LIKE 'a%'
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Without UNIQUE constraint: '
    || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT count(*)
      FROM x2
      JOIN y2 USING (a)
      WHERE data LIKE 'a%'
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('With    UNIQUE constraint: '
    || (SYSTIMESTAMP - v_ts));
END;
/

The above benchmark repeats each individual query 1000 times. The results speak for themselves:

Without UNIQUE constraint: +000000000 00:00:04.250000000
With    UNIQUE constraint: +000000000 00:00:02.847000000

Remark

The lack of a UNIQUE constraint may happen in situations where you prefer using a surrogate primary key in the referencing table (which I omitted in the examples for brevity). If you’re “sharing” the primary key or use natural keys, this issue won’t happen to you, of course.

Conclusion

The execution planner can make a more informed decision if it has formal knowledge about your data via an additional UNIQUE constraint. This formal knowledge is by far more powerful than any statistics that might indicate the same thing. In the absence of a formal UNIQUE constraint, the database will always have to make sure there is not another row once it has found one. With a formal UNIQUE constraint, it can stop looking as soon as that unique row was found. This can drastically speed up queries. As we’ve seen in the above example, this improves things by a factor of 1.5, so the second query is 50% faster!

Always tell your database as much as you can. Your SELECT performance will greatly increase, at the small cast of a little overhead when inserting data.