How to Emulate Partial Indexes in Oracle

A very interesting feature of the SQL Server and PostgreSQL databases (and some others, including SQLite) is the partial index (sometimes also called “filtered index”). That’s an index that contains only “parts” of the table data. For instance, we can write the following index in SQL Server and PostgreSQL:

CREATE INDEX i ON message WHERE deleted = 1;

Let’s imagine you have a house keeping job that periodically removes deleted messages. Now, let’s assume you have discovered, that only 0.1% of all messages are really deleted, so an index on the DELETED column is very selective if you’re looking for deleted messages.

On the other hand, it’s not selective at all if you’re looking for non-deleted messages, as such a query would return 99.9% of all messages, in case of which a full table scan is more efficient.

So, since the index is never useful for non-deleted messages, why index those messages at all? If we can avoid indexing non-deleted messages, then we can:

  • Save a lot of disk space, as the index will be much smaller
  • Save a lot of time inserting into the messages table, since we don’t have to update the index all the time
  • Save a lot of time scanning the index, since it will contain a lot less blocks

Unfortunately, Oracle doesn’t support this feature

… but “luckily”, Oracle has another controversial “feature”. In Oracle, all indexes are partial indexes, because they don’t contain NULL values. This is probably due to an ancient optimisation (remember, partial indexes are smaller), which occasionally gets into your way, performance wise, if you do want to query for NULL values.

But in this case, it’s useful. Check this out:

CREATE TABLE message(deleted number(1));

CREATE INDEX i ON message (
  CASE WHEN deleted > 0 THEN deleted END
);

The above index is a function-based index, i.e. an index that contains not the value of the deleted column itself, but an expression based on it. Concretely, it contains only deleted values that are strictly greater than zero, because if the value is zero, then it is turned to NULL by the CASE expression, and Oracle doesn’t index NULL values. Check this out:

INSERT INTO message
SELECT DECODE(LEVEL, 1, 1, 0)
FROM dual
CONNECT BY LEVEL < 100000;

The above query is inserting a single row containing a deleted value of 1, and almost 100k rows containing a value of 0. The insert is very quick, because only one row has to be added to the index. The other almost 100k rows are skipped:

EXEC dbms_stats.gather_table_stats('SANDBOX', 'MESSAGE');

SELECT NUM_ROWS, BLOCKS
FROM SYS.ALL_TAB_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

SELECT NUM_ROWS, LEAF_BLOCKS
FROM SYS.ALL_IND_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

The result is:

NUM_ROWS       BLOCKS
---------------------
   99999          152 <-- table size

NUM_ROWS  LEAF_BLOCKS
---------------------
       1            1 <-- index size

The “trouble” with this kind of emulation is: It’s a function-based index. We can use this index only if we really reproduce the same “function” (or in this case, expression) as in the index itself. So, in order to fetch all the deleted messages, we must not write the following query:

SELECT *
FROM message
WHERE deleted = 1;

But this one, instead:

SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

Check out execution plans:

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE deleted = 1;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

SELECT * FROM TABLE(dbms_xplan.display);

The output being:

------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         | 50000 |   146K|    44   (3)|
|*  1 |  TABLE ACCESS FULL| MESSAGE | 50000 |   146K|    44   (3)|
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("DELETED"=1)

And

----------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     3 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| MESSAGE |     1 |     3 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | I       |     1 |       |     1   (0)|
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(CASE  WHEN "DELETED">0 THEN "DELETED" END =1)

As you can see, the first query runs a full table scan, estimating to retrieve 50% of all the rows, when the actual result is only 1 row as can be seen in the second execution plan!

Insertion speed

What’s even more impressive is the difference in insertion speed. Consider the following code block, which measures the time it takes to insert 1 million times 0 and one million times 1:

SET SERVEROUTPUT ON
DECLARE
  ts TIMESTAMP;
BEGIN
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 0 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
  
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 1 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
END;
/

The result being:

+000000000 00:00:01.501000000
+000000000 00:00:08.162000000

The insertion is much faster if we don’t have to modify the index!

Conclusion

Partial indexes are a very neat trick in cases where your data is highly skewed and some values in a column are extremely rare and very frequently queried. They may drastically reduce the index size, which greatly improves performance in some situations, including inserting into the table, and querying the index.

In Oracle, they can be emulated using function-based indexes, which means you have to use the exact function expression from the index also in queries, in order to profit. But it may well be worth the trouble!