The Index You’ve Added is Useless. Why?

Recently, at the office:

Bob: I’ve looked into that slow query you’ve told me about yesterday, Alice. I’ve added the indexes you wanted. Everything should be fine now

Alice: Thanks Bob. I’ll quickly check … Nope Bob, still slow, it didn’t seem to work

Bob: You’re right Alice! It looks like Oracle isn’t picking up the index, for your query even if I add an /*+INDEX(...)*/ hint. I don’t know what went wrong!?

And so, the story continues. Alice is frustrated because her feature doesn’t ship on time, Bob is frustrated because he thinks that Oracle doesn’t work right.

True story!

Bob Forgot about Oracle and NULL

Poor Bob forgot (or didn’t know) that Oracle doesn’t put NULL values in “ordinary” indexes. Think about it this way:

CREATE TABLE person (
  id            NUMBER(38)   NOT NULL PRIMARY KEY,
  first_name    VARCHAR2(50) NOT NULL,
  last_name     VARCHAR2(50) NOT NULL,
  date_of_birth DATE             NULL
);

CREATE INDEX i_person_dob ON person(date_of_birth);

Now, Bob thinks that his index solves all problems, because he verified if the index worked using the following query:

SELECT * 
FROM   person
WHERE  date_of_birth > DATE '1980-01-01';

(of course, you generally shouldn’t SELECT *)

And the execution plan looked alright:

----------------------------------------------------
| Id  | Operation                   | Name         |
----------------------------------------------------
|   0 | SELECT STATEMENT            |              |
|   1 |  TABLE ACCESS BY INDEX ROWID| PERSON       |
|*  2 |   INDEX RANGE SCAN          | I_PERSON_DOB |
----------------------------------------------------

This is because Bob’s predicate doesn’t rely on NULL being part of the I_PERSON_DOB index. Unfortunately, Alice’s query looked more like this (simplified version):

SELECT 1 
FROM   dual
WHERE  DATE '1980-01-01' NOT IN (
  SELECT date_of_birth FROM person
);

So, essentially, Alice’s query checked if anyone had their date of birth at a given date. Her execution plan looked like this:

-------------------------------------
| Id  | Operation          | Name   |
-------------------------------------
|   0 | SELECT STATEMENT   |        |
|*  1 |  FILTER            |        |
|   2 |   FAST DUAL        |        |
|*  3 |   TABLE ACCESS FULL| PERSON |
-------------------------------------

As you can see, her query made a TABLE ACCESS FULL operation, bypassing the index. Why? It’s simple:

Even if our DATE '1980-01-01' value is or is not in the index, we’ll still have to check the whole table to see whether a single NULL value is contained in the date_of_birth column. Because, if there was a NULL value, the NOT IN predicate in Alice’s query would never yield TRUE or FALSE, but NULL.

Alice can solve this issue with NOT EXISTS

Alice can solve it easily herself, by replacing NOT IN through NOT EXISTS, a predicate that doesn’t suffer from SQL’s peculiar three-valued boolean logic.

SELECT 1
FROM   dual
WHERE  NOT EXISTS (
  SELECT 1
  FROM   person
  WHERE  date_of_birth = DATE '1980-01-01'
);

This new query now again yields an optimal plan:

------------------------------------------
| Id  | Operation         | Name         |
------------------------------------------
|   0 | SELECT STATEMENT  |              |
|*  1 |  FILTER           |              |
|   2 |   FAST DUAL       |              |
|*  3 |   INDEX RANGE SCAN| I_PERSON_DOB |
------------------------------------------

But the problem still exists, because what can happen, will happen, and Alice will have to remember this issue for every single query she writes.

Bob should just set the column to NOT NULL

The best solution, however is to simply set the column to NOT NULL:

ALTER TABLE person 
MODIFY date_of_birth DATE NOT NULL;

With this constraint, the NOT IN query is exactly equivalent to the NOT EXISTS query, and Bob and Alice can be friends again.

Takeaway: How to find “bad” columns?

It’s easy. The following useful query lists all indexes that have at least one nullable column in them.

SELECT 
  i.table_name,
  i.index_name,
  LISTAGG(
    LPAD(i.column_position,  2) || ': ' || 
    RPAD(i.column_name    , 30) || ' '  ||
    DECODE(t.nullable, 'Y', '(NULL)', '(NOT NULL)'), 
    ', '
  ) WITHIN GROUP (ORDER BY i.column_position) 
    AS "NULLABLE columns in indexes"
FROM user_ind_columns i
JOIN user_tab_cols t
ON (t.table_name, t.column_name) = 
  ((i.table_name, i.column_name))
WHERE EXISTS (
  SELECT 1
  FROM user_tab_cols t
  WHERE (t.table_name, t.column_name, t.nullable) = 
       ((i.table_name, i.column_name, 'Y'       ))
)
GROUP BY i.table_name, i.index_name
ORDER BY i.index_name ASC;

When run against Bob and Alice’s schema, the above query yields:

TABLE_NAME | INDEX_NAME   | NULLABLE columns in indexes
-----------+--------------+----------------------------
PERSON     | I_PERSON_DOB | 1: DATE_OF_BIRTH (NULL)

Use this query on your own schema now, and go through the results, carefully evaluating if you really need to keep that column nullable. In 50% of the cases, you don’t. By adding a NOT NULL constraint, you can tremendously speed up your application!

jOOQ Newsletter December 13, 2013

subscribe to the newsletter here

A jOOQ Runtime Only Distribution

Several of our customers have made us aware of the fact that they don’t really need the jOOQ code generator, only the SQL builder / query DSL API and possibly the SQL execution functionality. In the next weeks, we’ll be working towards a new “jOOQ Runtime Only Distribution” that will ship at a lower price. Concretely, we’ll be giving those customers a 25% discount on the regular distributions for any of jOOQ Express, jOOQ Professional, and jOOQ Enterprise.

Obviously, we recommend to use jOOQ’s code generator to profit from the full feature scope that we offer.

License Amendments

We have updated our commercial license to improve your legal relationship with Data Geekery. Essentially, these things have been amended:

  • Definitions: We have added the missing definition of what is a “Minor Defect”
  • 6.2 Distribution Right: The distribution right section now grants you a perpetual license to distribute the software, instead of a timely limited one. This will allow you to continue to distribute, embed and use jOOQ in your End-user Application even if you terminate your Developer Workstation License Agreement with us. You will, however, still need to license jOOQ in order to maintain your End-user Application.
  • 7.1.1 Remedial Services: The remedial services section now formally defines how Minor Defects are remedied

As this updated license grants new rights to our customers, it shall be in effect immediately also for existing jOOQ 3.2 customers.

Upcoming Events

We have recently presented jOOQ at the Java2Days conference in Sofia, Bulgaria as well as at the Java User Group Berlin-Brandenburg. Both talks have enjoyed a high attendance with lots of SQL developers challenging us with interesting questions. Of course, jOOQ responds to most questions already, as Manuel Bernhardt, another speaker has observed:

If you want us to talk about jOOQ or SQL near you, do not hesitate to contact us.

Here is an overview of other, upcoming events:

Stay informed about 2014 events on www.jooq.org/news. If you’re looking for the slides, they’re available for free under a CC BY-SA 3.0 license on SlideShare.

Using jOOQ with Hibernate

While jOOQ can be seen as a popular alternative to Hibernate, there is no stopping you from using both frameworks in the same application. While Hibernate heavily improves every day CRUD operations, jOOQ heavily improves writing SQL in Java. The two goals can be orthogonal and thus it may make perfect sense to combine the two.

Vlad Mihalcea, an enthusiastic jOOQ user, is writing up a series of blog posts on his blog about how to use jOOQ with Hibernate:

Gavin King himself chimed in on Google+ and confirmed once again that it was never his intention for Hibernate to be used for everything.

If you want to share your own experience of your jOOQ / Spring, jOOQ / Hibernate, jOOQ / Anything integration, let us know!

SQL Zone – “Lightning Fast SQL with Proper Indexing”

Understanding SQL and the history of SQL is of the essence when you want to get the best out of your relational database. This has recently been nicely explained by Markus Winand at the Oredev conference 2013. Luckily, his presentation has been recorded for free review.

More about the history of SQL can be seen in this interesting article about Codd’s Relational Vision – Has NoSQL Come Full Circle? It proves what we have been evangelising for a while, ourselves. NoSQL is not a new technology. It is more of a return to pre-Codd times when people did not have a powerful relational model to abstract their storage implementation away from their application. This article claims that Codd’s greatest achievement was the fact that he surpassed the deficiencies of early databases:

  • Access dependencies
  • Order dependencies
  • Index dependencies

Read the full article for a very interesting historic insight into the relational model, and why NoSQL might not be a good answer for most general problems.