A very interesting feature of the SQL Server and PostgreSQL databases (and some others, including SQLite) is the partial index (sometimes also called “filtered index”). That’s an index that contains only “parts” of the table data. For instance, we can write the following index in SQL Server and PostgreSQL:
CREATE INDEX i ON message WHERE deleted = 1;
Let’s imagine you have a house keeping job that periodically removes deleted messages. Now, let’s assume you have discovered, that only 0.1% of all messages are really deleted, so an index on the DELETED
column is very selective if you’re looking for deleted messages.
On the other hand, it’s not selective at all if you’re looking for non-deleted messages, as such a query would return 99.9% of all messages, in case of which a full table scan is more efficient.
So, since the index is never useful for non-deleted messages, why index those messages at all? If we can avoid indexing non-deleted messages, then we can:
- Save a lot of disk space, as the index will be much smaller
- Save a lot of time inserting into the messages table, since we don’t have to update the index all the time
- Save a lot of time scanning the index, since it will contain a lot less blocks
Unfortunately, Oracle doesn’t support this feature
… but “luckily”, Oracle has another controversial “feature”. In Oracle, all indexes are partial indexes, because they don’t contain NULL
values. This is probably due to an ancient optimisation (remember, partial indexes are smaller), which occasionally gets into your way, performance wise, if you do want to query for NULL
values.
But in this case, it’s useful. Check this out:
CREATE TABLE message(deleted number(1)); CREATE INDEX i ON message ( CASE WHEN deleted > 0 THEN deleted END );
The above index is a function-based index, i.e. an index that contains not the value of the deleted column itself, but an expression based on it. Concretely, it contains only deleted values that are strictly greater than zero, because if the value is zero, then it is turned to NULL
by the CASE
expression, and Oracle doesn’t index NULL
values. Check this out:
INSERT INTO message SELECT DECODE(LEVEL, 1, 1, 0) FROM dual CONNECT BY LEVEL < 100000;
The above query is inserting a single row containing a deleted value of 1, and almost 100k rows containing a value of 0. The insert is very quick, because only one row has to be added to the index. The other almost 100k rows are skipped:
EXEC dbms_stats.gather_table_stats('SANDBOX', 'MESSAGE'); SELECT NUM_ROWS, BLOCKS FROM SYS.ALL_TAB_STATISTICS WHERE TABLE_NAME = 'MESSAGE'; SELECT NUM_ROWS, LEAF_BLOCKS FROM SYS.ALL_IND_STATISTICS WHERE TABLE_NAME = 'MESSAGE';
The result is:
NUM_ROWS BLOCKS --------------------- 99999 152 <-- table size NUM_ROWS LEAF_BLOCKS --------------------- 1 1 <-- index size
The “trouble” with this kind of emulation is: It’s a function-based index. We can use this index only if we really reproduce the same “function” (or in this case, expression) as in the index itself. So, in order to fetch all the deleted messages, we must not write the following query:
SELECT * FROM message WHERE deleted = 1;
But this one, instead:
SELECT * FROM message WHERE CASE WHEN deleted > 0 THEN deleted END = 1;
Check out execution plans:
EXPLAIN PLAN FOR SELECT * FROM message WHERE deleted = 1; SELECT * FROM TABLE(dbms_xplan.display); EXPLAIN PLAN FOR SELECT * FROM message WHERE CASE WHEN deleted > 0 THEN deleted END = 1; SELECT * FROM TABLE(dbms_xplan.display);
The output being:
------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50000 | 146K| 44 (3)| |* 1 | TABLE ACCESS FULL| MESSAGE | 50000 | 146K| 44 (3)| ------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DELETED"=1)
And
---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID| MESSAGE | 1 | 3 | 2 (0)| |* 2 | INDEX RANGE SCAN | I | 1 | | 1 (0)| ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(CASE WHEN "DELETED">0 THEN "DELETED" END =1)
As you can see, the first query runs a full table scan, estimating to retrieve 50% of all the rows, when the actual result is only 1 row as can be seen in the second execution plan!
Insertion speed
What’s even more impressive is the difference in insertion speed. Consider the following code block, which measures the time it takes to insert 1 million times 0 and one million times 1:
SET SERVEROUTPUT ON DECLARE ts TIMESTAMP; BEGIN ts := SYSTIMESTAMP; INSERT INTO message SELECT 0 FROM dual CONNECT BY level <= 1000000; dbms_output.put_line(SYSTIMESTAMP - ts); ts := SYSTIMESTAMP; INSERT INTO message SELECT 1 FROM dual CONNECT BY level <= 1000000; dbms_output.put_line(SYSTIMESTAMP - ts); END; /
The result being:
+000000000 00:00:01.501000000 +000000000 00:00:08.162000000
The insertion is much faster if we don’t have to modify the index!
Conclusion
Partial indexes are a very neat trick in cases where your data is highly skewed and some values in a column are extremely rare and very frequently queried. They may drastically reduce the index size, which greatly improves performance in some situations, including inserting into the table, and querying the index.
In Oracle, they can be emulated using function-based indexes, which means you have to use the exact function expression from the index also in queries, in order to profit. But it may well be worth the trouble!
It’s really cool. With just one mentioned caveat. You have to alway remember to write it this way.
And it also becomes more ugly with multicolumn indexes.
Yes, sure. Those are significant caveats, unfortunately…
As of version 11.2, define a virtual column and index that column. As of version 12c, optionally make that column “invisible”.
alter table message add is_deleted invisible as (case when deleted > 0 then 1 end);
create index i on message(is_deleted);
select * from message where is_deleted = 1;
Interesting, thanks for your hint. I hadn’t thought of virtual columns. Will investigate!