How to Emulate Partial Indexes in Oracle

A very interesting feature of the SQL Server and PostgreSQL databases (and some others, including SQLite) is the partial index (sometimes also called “filtered index”). That’s an index that contains only “parts” of the table data. For instance, we can write the following index in SQL Server and PostgreSQL:

CREATE INDEX i ON message WHERE deleted = 1;

Let’s imagine you have a house keeping job that periodically removes deleted messages. Now, let’s assume you have discovered, that only 0.1% of all messages are really deleted, so an index on the DELETED column is very selective if you’re looking for deleted messages.

On the other hand, it’s not selective at all if you’re looking for non-deleted messages, as such a query would return 99.9% of all messages, in case of which a full table scan is more efficient.

So, since the index is never useful for non-deleted messages, why index those messages at all? If we can avoid indexing non-deleted messages, then we can:

  • Save a lot of disk space, as the index will be much smaller
  • Save a lot of time inserting into the messages table, since we don’t have to update the index all the time
  • Save a lot of time scanning the index, since it will contain a lot less blocks

Unfortunately, Oracle doesn’t support this feature

… but “luckily”, Oracle has another controversial “feature”. In Oracle, all indexes are partial indexes, because they don’t contain NULL values. This is probably due to an ancient optimisation (remember, partial indexes are smaller), which occasionally gets into your way, performance wise, if you do want to query for NULL values.

But in this case, it’s useful. Check this out:

CREATE TABLE message(deleted number(1));

CREATE INDEX i ON message (
  CASE WHEN deleted > 0 THEN deleted END
);

The above index is a function-based index, i.e. an index that contains not the value of the deleted column itself, but an expression based on it. Concretely, it contains only deleted values that are strictly greater than zero, because if the value is zero, then it is turned to NULL by the CASE expression, and Oracle doesn’t index NULL values. Check this out:

INSERT INTO message
SELECT DECODE(LEVEL, 1, 1, 0)
FROM dual
CONNECT BY LEVEL < 100000;

The above query is inserting a single row containing a deleted value of 1, and almost 100k rows containing a value of 0. The insert is very quick, because only one row has to be added to the index. The other almost 100k rows are skipped:

EXEC dbms_stats.gather_table_stats('SANDBOX', 'MESSAGE');

SELECT NUM_ROWS, BLOCKS
FROM SYS.ALL_TAB_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

SELECT NUM_ROWS, LEAF_BLOCKS
FROM SYS.ALL_IND_STATISTICS
WHERE TABLE_NAME = 'MESSAGE';

The result is:

NUM_ROWS       BLOCKS
---------------------
   99999          152 <-- table size

NUM_ROWS  LEAF_BLOCKS
---------------------
       1            1 <-- index size

The “trouble” with this kind of emulation is: It’s a function-based index. We can use this index only if we really reproduce the same “function” (or in this case, expression) as in the index itself. So, in order to fetch all the deleted messages, we must not write the following query:

SELECT *
FROM message
WHERE deleted = 1;

But this one, instead:

SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

Check out execution plans:

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE deleted = 1;

SELECT * FROM TABLE(dbms_xplan.display);

EXPLAIN PLAN FOR
SELECT *
FROM message
WHERE CASE WHEN deleted > 0 THEN deleted END = 1;

SELECT * FROM TABLE(dbms_xplan.display);

The output being:

------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         | 50000 |   146K|    44   (3)|
|*  1 |  TABLE ACCESS FULL| MESSAGE | 50000 |   146K|    44   (3)|
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("DELETED"=1)

And

----------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     3 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| MESSAGE |     1 |     3 |     2   (0)|
|*  2 |   INDEX RANGE SCAN          | I       |     1 |       |     1   (0)|
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(CASE  WHEN "DELETED">0 THEN "DELETED" END =1)

As you can see, the first query runs a full table scan, estimating to retrieve 50% of all the rows, when the actual result is only 1 row as can be seen in the second execution plan!

Insertion speed

What’s even more impressive is the difference in insertion speed. Consider the following code block, which measures the time it takes to insert 1 million times 0 and one million times 1:

SET SERVEROUTPUT ON
DECLARE
  ts TIMESTAMP;
BEGIN
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 0 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
  
  ts := SYSTIMESTAMP;
  INSERT INTO message 
  SELECT 1 FROM dual CONNECT BY level <= 1000000;
  dbms_output.put_line(SYSTIMESTAMP - ts);
END;
/

The result being:

+000000000 00:00:01.501000000
+000000000 00:00:08.162000000

The insertion is much faster if we don’t have to modify the index!

Conclusion

Partial indexes are a very neat trick in cases where your data is highly skewed and some values in a column are extremely rare and very frequently queried. They may drastically reduce the index size, which greatly improves performance in some situations, including inserting into the table, and querying the index.

In Oracle, they can be emulated using function-based indexes, which means you have to use the exact function expression from the index also in queries, in order to profit. But it may well be worth the trouble!

jOOQ 3.10 Supports SQL Server’s Table Valued Parameters

SQL Server has this nice feature called table-valued parameters (TVP), where users can pass table variables to a stored procedure for bulk data processing. This is particularly nice when the stored procedure is an inline table valued function, i.e. a function that returns a table as well. For instance:

CREATE TYPE numbers AS TABLE (i INTEGER);

CREATE FUNCTION cross_multiply (
  @numbers numbers READONLY
)
RETURNS @result TABLE (
  i1 INTEGER,
  i2 INTEGER,
  product INTEGER
)
AS
BEGIN
  INSERT INTO @result
  SELECT n1.i, n2.i, n1.i * n2.i
  FROM @numbers n1
  CROSS JOIN @numbers n2

  RETURN
END

The above function creates a cross product of a table with itself, and multiplies each possible combination. So, when calling this with the following table argument:

DECLARE @arg NUMBERS;
INSERT INTO @arg VALUES (1),(2),(3),(4);
SELECT * FROM cross_multiply(@arg);

We’re getting the following, nice result:

i1	i2	product
-----------------------
1	1	1
2	1	2
3	1	3
4	1	4
1	2	2
2	2	4
3	2	6
4	2	8
1	3	3
2	3	6
3	3	9
4	3	12
1	4	4
2	4	8
3	4	12
4	4	16

Easy, eh?

Call the above from Java with JDBC

The SQL Server JDBC driver (since recently) supports TVPs if you’re ready to use vendor specific API. If you want to run this T-SQL batch:

DECLARE @arg NUMBERS;
INSERT INTO @arg VALUES (1),(2),(3),(4);
SELECT * FROM cross_multiply(@arg);

In Java, you’d write something along the lines of this:

SQLServerDataTable table = new SQLServerDataTable();
table.addColumnMetadata("i" ,java.sql.Types.INTEGER);
table.addRow(1);
table.addRow(2);
table.addRow(3);
table.addRow(4); 

try (SQLServerPreparedStatement stmt=
    (SQLServerPreparedStatement) connection.prepareStatement(
       "SELECT * FROM cross_multiply(?)")) {

    // Magic here:
    stmt.setStructured(1, "dbo.numbers", table);  

    try (ResultSet rs = stmt.executeQuery()) {
        ...
    }
}

This is a bit tedious as you have to work through all this API and remember:

  • type names
  • column names
  • column positions

But it works.

Now, call the above from Java, with jOOQ

No problem with jOOQ 3.10. Don’t worry about the boring JDBC data type binding details, as the jOOQ code generator has you covered. As always, all routines are generated classes / methods, and this time, the TABLE type is also a generated type. Let the code speak for itself. Instead of this SQL statement:

DECLARE @arg NUMBERS;
INSERT INTO @arg VALUES (1),(2),(3),(4);
SELECT * FROM cross_multiply(@arg);

You can write the following with jOOQ:

Numbers numbers = new NumbersRecord(
    new NumbersElementTypeRecord(1),
    new NumbersElementTypeRecord(2),
    new NumbersElementTypeRecord(3),
    new NumbersElementTypeRecord(4)
);

// Standalone function call:
Result<CrossMultiplyRecord> r1 =
    crossMultiply(configuration, numbers);

// Embedded table-valued function call, with predicate
Result<CrossMultiplyRecord> r2 =
DSL.using(configuration)
   .selectFrom(crossMultiply(numbers))
   .where(F_CROSS_MULTIPLY.PRODUCT.gt(5))
   .fetch();

System.out.println(r1);
System.out.println(r2);

And the nice printed output will be:

+----+----+-------+
|  i1|  i2|product|
+----+----+-------+
|   1|   1|      1|
|   2|   1|      2|
|   3|   1|      3|
|   4|   1|      4|
|   1|   2|      2|
|   2|   2|      4|
|   3|   2|      6|
|   4|   2|      8|
|   1|   3|      3|
|   2|   3|      6|
|   3|   3|      9|
|   4|   3|     12|
|   1|   4|      4|
|   2|   4|      8|
|   3|   4|     12|
|   4|   4|     16|
+----+----+-------+

+----+----+-------+
|  i1|  i2|product|
+----+----+-------+
|   3|   2|      6|
|   4|   2|      8|
|   2|   3|      6|
|   3|   3|      9|
|   4|   3|     12|
|   2|   4|      8|
|   3|   4|     12|
|   4|   4|     16|
+----+----+-------+

Not only does jOOQ understand table-valued parameters, since jOOQ 3.5, we have also supported table-valued functions, which can be used like any ordinary table:

Result<CrossMultiplyRecord> r2 =
DSL.using(configuration)
   .selectFrom(crossMultiply(numbers))
   .where(F_CROSS_MULTIPLY.PRODUCT.gt(5))
   .fetch();

As you can see, the function call can be embedded in the from clause, it even returns safely-typed CrossMultiplyRecord elements (if you’re not using any projection), and you can form predicates on table columns (i.e. function return values), you can join the table, etc.

Excellent! Let’s start using table-valued parameters!

jOOQ Newsletter: April 2, 2014

subscribe to this newsletter here

Tweet of the Day

Our customers, users, and followers are sharing their love for jOOQ to the world. Here are:

Arturo Tena who simply loves jOOQ 3.3, and expresses this with a creative transformation of our version numbering scheme:

Florin T.Pătraşcu who cannot stop integrating jOOQ with MicroMVC, because he discovers more and more new features all the time:

Thanks for the shouts, guys!

New license models – available soon

In the early days of migrating towards dual-licensing, we’ve discussed many alternative licensing models with our long-term users and early adopters. Now, we’re almost ready to publish the new, additional license terms, which are due for next week. They essentially include:

  • The existing yearly subscription for default use-cases
  • A new monthly subscription for short-running tasks, such as DB migrations
  • A new major release perpetual license for long-running jOOQ integrations with little need for upgrades

With these options, we believe that we will be able to cover even more jOOQ integration use-cases from a legal perspective, helping to further improve your jOOQ experience.

Are you an existing customer of the jOOQ yearly subscription interested in a switch to other terms? Do not hesitate to contact sales for a tailor-made migration offer.

Java Zone – Java 8 is out

Java 8 has finally been released by Oracle, a moment we’ve been waiting for quite a while now. Unsurprisingly, blogs in all corners of the web have started publishing Java and Java 8 related articles. We absolutely agree with Craig Buckler claiming Java to be the best programming language to learn in 2014.

Want to stay up to date with examples, tutorials, insights on Java 8? Follow ourJava 8 Friday blog series, then. Every Friday, we’re publishing an insightful article on a specific area of Java that will be affected by Java 8. One of the most interesting articles that has even caught the attention of Erik Meijer is Dr. Ming-Yee Iu’s guest post about JINQ and JINQ-to-jOOQ, which you should be looking out for in the next 6 months.

For more great resources, see also our blogging partner Baeldung.com’s Java 8 resources collection.

SQL Zone – In-Memory Computing

Do you know your fastest way around in-memory computing with Oracle? It’s possible, but it’s not so trivial.

If you’re used to SQL Server, you would simply create a T-SQL temporary table, which is a typesafe in-memory table for use with procedural T-SQL.

In Oracle, you have two choices to do the same:

  • Using SQL TABLE OF OBJECT types
  • Using GLOBAL TEMPORARY TABLEs

In almost all cases, GLOBAL TEMPORARY TABLEs will outperform TABLE OF OBJECT types for a very simple reason: You can perform all operations in the SQL engine only, whereas with TABLE OF OBJECT types, you will have to resort to the PL/SQL engine to keep the table in memory. With significant amounts of data, this can become quite a problem.

Want to know more? Visit our recent Stack Overflow question on the subject.

Upcoming Events

Have you missed any of our previous jOOQ talks? Soon you’ll get another chance to hear us talk about jOOQ or SQL in general in any of these upcoming events:

Stay informed about 2014 events on www.jooq.org/news.

The DBMS of the Year 2013

We have recently blogged about the DB-Engines Ranking and how MongoDB was the only NoSQL store to make it into that ranking’s top 10. Today, this marketing platform offered by solid IT has announced MongoDB to be the DBMS of the year 2013, with PostgreSQL being a close runner-up, followed by Cassandra.

solid IT as a company is slightly biased towards NoSQL, so it’s not surprising that two NoSQL databases are in their top ranking, and the only successful ORDBMS in the market is number two. As we ourselves are “slightly” biased towards SQL, we would like to announce our own DBMS of the year 2013:

SQL Server is the DBMS of the year 2013

… because its SQL dialect Transact-SQL (which Microsoft “shares” with Sybase), is the first SQL-based programming language to make it into TIOBE’s top 10 programming languages.

Congratulations to SQL Server from the jOOQ team!

Add LATERAL Joins or CROSS APPLY to Your SQL Tool Chain

The T-SQL dialect has known the powerful CROSS APPLY and OUTER APPLY JOIN syntaxes for ages. The SQL:1999 standard had introduced almost equivalent “lateral derived tables”, which are finally supported with PostgreSQL 9.3, or Oracle 12c, which has adopted both the SQL standard LATERAL syntax and the T-SQL vendor-specific CROSS APPLY and OUTER APPLY syntaxes.

But what are we even talking about?

SQL features have a unique trait that few other languages have. They are obscure to those who don’t know them, as every language feature introduces a new syntax with new keywords. In this case: APPLY and LATERAL. But it really isn’t so hard to understand. All you do with a CROSS APPLY is a CROSS JOIN between two tables where the right-hand side of the join expression can reference columns from the left-hand side of the join expression. Consider the following example by Martin Smith on Stack Overflow:

Reusing column aliases

SELECT number,
       doubled_number,
       doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (
  SELECT 2 * CAST(number AS BIGINT)
) CA1(doubled_number)
CROSS APPLY (
  SELECT doubled_number + 1
) CA2(doubled_number_plus_one)

See a SQLFiddle of the above example

In this example, we’re selecting numbers from a system table and cross apply a scalar subselect multiplying each number by two. Then to the whole table product, we cross apply another scalar subselect, adding one to the last number.

This particular example could also be implemented using subqueries in the SELECT clause. But as you can see in the above example, doubled_number_plus_one can be calculated from a previously calculated column in one go. That wouldn’t be so “simple” with subqueries.

Applying table-valued functions to each record

SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)

This example may be even more useful when you want to join a table-valued function to each record of another table.

PostgreSQL’s LATERAL derived tabels

In PostgreSQL, this can be done somewhat magically by put-ting table-valued functions in the SELECT clause:

SELECT x, GENERATE_SERIES(0, x)
FROM (VALUES(0), (1), (2)) t(x)

See a SQLFiddle of the above example

The above yields

| X | GENERATE_SERIES |
|---|-----------------|
| 0 |               0 |
| 1 |               0 |
| 1 |               1 |
| 2 |               0 |
| 2 |               1 |
| 2 |               2 |

Alternatively, since PostgreSQL 9.3, you can use an explicit lateral derived table as such:

SELECT x, y
FROM (VALUES(0), (1), (2)) t(x),
LATERAL GENERATE_SERIES(0, t.x) u(y)

See a SQLFiddle of the above example

Yielding again

| X | Y |
|---|---|
| 0 | 0 |
| 1 | 0 |
| 1 | 1 |
| 2 | 0 |
| 2 | 1 |
| 2 | 2 |

CROSS APPLY and OUTER APPLY in jOOQ 3.3

The above clauses will also be supported in the upcoming editions of jOOQ 3.3 where you can write queries like this one here:

DSL.using(configuration)
   .select()
   .from(AUTHOR)
   .crossApply(
        select(count().as("c"))
       .from(BOOK)
       .where(BOOK.AUTHOR_ID.eq(AUTHOR.ID)))
   .fetch();

Or lateral joins:

DSL.using(configuration)
   .select()
   .from(
        values(row(0), row(1), row(2))
            .as("t", "x"),
        lateral(generateSeries(0,
                fieldByName("t", "x"))
            .as("u", "y")))
   .fetch();

No matter if you’re using jOOQ or native SQL, lateral derived tables or CROSS APPLY should definitely be part of your awesome SQL tool chain!

Oracle 12c Goodies: { CROSS | OUTER } APPLY

I can’t believe my eyes. Has this been openly communicated by Oracle? I haven’t seen too many blog posts on that matter. Apart from introducing the awesome SQL Standard OFFSET .. FETCH clause (just like SQL Server 2012), there seems to be now also { CROSS | OUTER } APPLY in Oracle 12c! Check out the documentation:

http://docs.oracle.com/cd/E16655_01/server.121/e17209/statements_10002.htm#BABJHDDA

It’s about time jOOQ hops on the 12c train and supports all of these goodies! Expect to see APPLY support in jOOQ, soon, along with emulation thereof (if that’s possible and easy…)

jOOQ as a PL/Java language

Some people who get in touch with PL/SQL, PL/pgSQL, T-SQL, or any other proprietary procedural language for SQL interaction are probably missing out on a couple of language integration features in the Java world. Most Java APIs see SQL as an external domain-specific language that is “best” dealt with using string concatenation. Such APIs include:

Other APIs aim to abstract SQL away, in favour of a “higher-level” mapping to objects. These, again, include

As can be seen quickly, a lot of tool vendors and developers have travelled down similar ORM roads to try to tackle the “mapping problem” from a slightly (never fundamentally) different approach.

But not all people want ORM. Many people want SQL. A nice, general opinion about the old ORM vs. SQL discussion was phrased by Ken Downs a while ago:
http://database-programmer.blogspot.ch/2010/12/historical-perspective-of-orm-and.html

SQL as an internal domain-specific language

We can all agree that SQL itself is a domain-specific language, a language specific to the domain of database querying and database manipulation. As mentioned before, SQL is enhanced on some platforms by proprietary, procedural extensions, some of which even made it into the SQL standard (although barely implemented in the standard form, apart from HSQLDB).

The main advantage of such procedural SQL language extensions is the fact that imperative control flow can be combined with declarative SQL statement execution. Both language paradigms have their place. One is ideal to model control flows, the other is ideal to model queries, abstracting boring querying algorithms.

But imperative programming is quite limited itself. It is difficult to profit from advantages offered by object-oriented or functional paradigms, implemented by popular languages like Java or Scala. Those who have tried Oracle PL/SQL’s “object-oriented” extensions may know what I mean. Furthermore, each procedural extension is vendor-specific and has its own learning curve.

jOOQ models SQL as an internal domain-specific language in Java, and can thus be seen as enhancing Java with some procedural aspects. This has been shown previously on this blog, through an example using H2 database triggers, written in Java/jOOQ. What was meant to be a proof of concept and a nice idea was now re-created by Ronny Guillaume, who wrote an interesting article (in French) about using jOOQ as PL/Java within a Postgres database! The article can be seen here:

http://ronnyguillaume.developpez.com/introduction-pl-java

In essence, you can use another third-party tool called pljava, compile and wrap jOOQ code into a jar file and deploy that jar file into your Postgres database before using it in regular Postgres SQL, or as a trigger. Similar things can be done in Java databases, such as Derby, H2, and HSQLDB, and even in the Oracle database (for the brave among you).

Looking forward to finding more interesting articles about using jOOQ for PL/Java in the wild!