How to Quickly Rename all Primary Keys in Oracle

Are you working with someone else’s schema and they haven’t declared nice names for all their constraints?

Unfortunately, it is all too easy to create a table like this:

CREATE TABLE order1 (
  order_id NUMBER(18) NOT NULL PRIMARY KEY
);

Or like this:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  PRIMARY KEY (order_id)
);

Sure, you get a little convenience when writing the table. But from now on, you’re stuck with weird, system generated names both for the constraint and for the backing index. For instance, when doing execution plan analyses:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|   1 |  NESTED LOOPS      |              |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0042007 |
|*  3 |   INDEX UNIQUE SCAN| SYS_C0042005 |
-------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

So, I got these system generated index names called SYS_C0042007 and SYS_C0042005. What do they mean? I can derive the actual meaning perhaps from the predicate information, as SYS_C0042007 is accessed in operation #2, which uses an access predicate on ORDER2. Fine. But do I really need to look these things up all the time?

Just name your constraints. Always!

Don’t be fooled into this convenience. It’ll hurt you time and again, not just when doing analyses. You might not be able to easily import / export your schema to some other database, because another database might already occupy these generated names.

So, do this instead:

CREATE TABLE order2 (
  order_id NUMBER(18) NOT NULL,

  CONSTRAINT pk_order2 PRIMARY KEY (order_id)
);

Find a naming schema (any naming scheme), like for instance PK_[table name]. If you’re cleaning up an existing database, this might help:

SET SERVEROUTPUT ON
BEGIN
  FOR stmts IN (
    SELECT 
      'ALTER TABLE ' || table_name || 
      ' RENAME CONSTRAINT ' || constraint_name || 
      ' TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE constraint_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
  
  FOR stmts IN (
    SELECT 
      'ALTER INDEX ' || index_name || 
      ' RENAME TO PK_' || table_name AS stmt
    FROM user_constraints
    WHERE index_name LIKE 'SYS%'
    AND constraint_type = 'P'
  ) LOOP
    dbms_output.put_line(stmts.stmt);
    EXECUTE IMMEDIATE stmts.stmt;
  END LOOP;
END;
/

The above yields (and runs)

ALTER TABLE ORDER1 RENAME CONSTRAINT SYS_C0042005 TO PK_ORDER1
ALTER TABLE ORDER2 RENAME CONSTRAINT SYS_C0042007 TO PK_ORDER2
ALTER INDEX SYS_C0042005 RENAME TO PK_ORDER1
ALTER INDEX SYS_C0042007 RENAME TO PK_ORDER2

You can of course repeat the exercise for unique constraints, etc. I omit the example here because the naming scheme might be a bit more complicated there. Now re-calculate the execution plan and check this out:

EXPLAIN PLAN FOR
SELECT *
FROM order1
JOIN order2 USING (order_id)
WHERE order_id = 1;

SELECT * FROM TABLE (dbms_xplan.display);

The simplified execution plan (output of the above queries) is this:

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  NESTED LOOPS      |           |
|*  2 |   INDEX UNIQUE SCAN| PK_ORDER2 |
|*  3 |   INDEX UNIQUE SCAN| PK_ORDER1 |
----------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ORDER2"."ORDER_ID"=1)
   3 - access("ORDER1"."ORDER_ID"=1)

There! That’s much more like it.

SQL GROUP BY and Functional Dependencies: A Very Useful Feature

Relational databases define the term “Functional Dependency” as such (from Wikipedia):

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

In SQL, functional dependencies appear whenever there is a unique constraint (e.g. a primary key constraint). Let’s assume the following:

CREATE TABLE actor (
  actor_id BIGINT NOT NULL PRIMARY KEY,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL
);

It can be said that both FIRST_NAME and LAST_NAME each have a functional dependency on the ACTOR_ID column.

Nice. So what?

This isn’t just some mathematical statement that can be applied to unique constraints. It’s extremely useful for SQL. It means that for every ACTOR_ID value, there can be only one (functionally dependent) FIRST_NAME and LAST_NAME value. The other way round, this isn’t true. For any given FIRST_NAME and/or LAST_NAME value, we can have multiple ACTOR_ID values, as we can have multiple actors by the same names.

Because there can be only one corresponding FIRST_NAME and LAST_NAME value for any given ACTOR_ID value, we can omit those columns in the GROUP BY clause. Let’s assume also:

CREATE TABLE film_actor (
  actor_id BIGINT NOT NULL,
  film_id BIGINT NOT NULL,
  
  PRIMARY KEY (actor_id, film_id),
  FOREIGN KEY (actor_id) REFERENCS actor (actor_id),
  FOREIGN KEY (film_id) REFERENCS film (film_id)
);

Now, if we want to count the number of films per actor, we can write:

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id
ORDER BY COUNT(*) DESC

This is extremely useful as it saves us from a lot of typing. In fact, the way GROUP BY semantics is defined, we can put all sorts of column references in the SELECT clause, which are any of:

  • Column expressions that appear in the GROUP BY clause
  • Column expressions that are functionally dependent on the set of column expressions in the GROUP BY clause
  • Aggregate functions

Unfortunately, not everyone supports this

If you’re using Oracle, for instance, you can’t make use of the above. You’ll need to write the classic, equivalent version where all the non-aggregate column expressions appearing in the SELECT clause must also appear in the GROUP BY clause

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id, first_name, last_name
--                 ^^^^^^^^^^  ^^^^^^^^^ unnecessary
ORDER BY COUNT(*) DESC

Further reading:

Subtle SQL differences: IDENTITY columns

As I’m mostly using Oracle, IDENTITY columns were not so important to me up until I started to support them in jOOQ. Then, I found that yet again, there are many differences between various databases in how they handle IDENTITY columns in DDL and in DML.

In SQL, there are essentially three orthogonal concepts of how to identify a record (Please correct me if I may be missing more vendor-specific concepts):

  1. Primary Key: This is the most well-known concept. A primary key is the “main unique key” of a table, which means it is a constraint imposed on at least one column. It can be imposed on several columns, too. The primary key has its origin in the relational model
  2. Identity: Identities / identity columns are attributed to at most one column per table. They generate a unique ID from a sequence at record insertion – unlike primary keys, which are expected to be inserted correctly by client DML. The identity often coincides with the primary key, but it does not have to. It does not always have to be unique, either
  3. ROW ID / OID: Many databases know an internal ID that is usually not used in the logical representation of the database schema. This row id has mostly technical purposes and is also present in tables without primary key or identity columns

SQL 2008 standard

While primary key constraints are defined pretty much in the same way in every SQL dialect, and ROWID’s are very vendor-specific things, identities are something relatively new. The SQL 1992 standard does not yet define identity columns. They were formally introduced in SQL:2003, only. Here’s a part of the specification:

<column definition> ::=
  <column name> [ <data type or domain name> ]
  [ <default clause> | <identity column specification> ]
  [ ... ]

<identity column specification> ::=
  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
  [ <left paren> <common sequence generator options> <right paren> ]

 

This looks neat. In fact, in addition to column constraints, columns can specify an identity generation expression based on a sequence. What does reality look like?

Reality

DB2, Derby, HSQLDB, Ingres

These SQL dialects implement the standard very neatly.


id INTEGER GENERATED BY DEFAULT AS IDENTITY
id INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 1)

H2, MySQL, Postgres, SQL Server, Sybase ASE, Sybase SQL Anywhere

These SQL dialects implement identites, but the DDL syntax doesn’t follow the standard


-- H2 mimicks MySQL's and SQL Server's syntax
ID INTEGER IDENTITY(1,1)
ID INTEGER AUTO_INCREMENT
-- MySQL
ID INTEGER NOT NULL AUTO_INCREMENT

-- Postgres serials implicitly create a sequence
-- Postgres also allows for selecting from custom sequences
-- That way, sequences can be shared among tables
id SERIAL NOT NULL

-- SQL Server
ID INTEGER IDENTITY(1,1) NOT NULL
-- Sybase ASE
id INTEGER IDENTITY NOT NULL
-- Sybase SQL Anywhere
id INTEGER NOT NULL IDENTITY

Oracle

Oracle does not know any identity columns at all. Instead, you will have to use a trigger and update the ID column yourself, using a custom sequence. Something along these lines:

CREATE OR REPLACE TRIGGER my_trigger
BEFORE INSERT
ON my_table
REFERENCING NEW AS new
FOR EACH ROW
BEGIN
  SELECT my_sequence.nextval
  INTO :new.id
  FROM dual;
END my_trigger;

Note, that this approach can be employed in most databases supporting sequences and triggers! It is a lot more flexible than standard identities

SQLite

SQLite does not know any identity columns either. It seems to have MySQL’s AUTO_INCREMENT clause, but in fact this just sets some rules on the ROWID generation, which is not a true identity column according to the SQL standard

Conclusion

As many things in SQL, identities are a major pain to get correctly when switching databases, or when writing SQL DDL and DML that works correctly across databases.

jOOQ supports DML executed against tables with identity columns and/or trigger-generated ID values. See a previous blog post that further elaborates on the issue:

https://lukaseder.wordpress.com/2011/08/29/postgres-insert-returning-clause-and-how-this-can-be-simulated-in-other-rdbms/

Subtle SQL differences: Constraint names

The various SQL product vendors implement subtle differences in the way they interpret SQL. In this case, I’ve been examining the reuse of constraint names within a schema / database (which is yet another story: what’s a schema, what’s a database?). Here’s the summary:

Constraint names are unique within a schema

  • Derby
  • H2
  • HSQLDB
  • Ingres
  • MySQL
  • Oracle
  • SQL Server

Constraint names are unique within a table

  • DB2
  • SQLite
  • Sybase SQL Anywhere

The “weird ones”

  • Postgres: Foreign Key names can be reused. Unique / Primary Key names cannot
  • Sybase ASE: Unique / Primary Key names can be reused. Foreign Key names cannot

For most compatibility across databases, it is never a good idea to re-use names. Keep your constraint names unique across a schema.