Auto-Creation of Indexes in RDBMS

[…] generally speaking, I’m also surprised to see that in 2013 we’re creating our indexes manually.

Interesting thought! Has this thought ever occurred to you?

How this comment came about

Hackernews is very predictable. Our latest pro-SQL marketing campaign for jOOQ got quite a bit of traction as expected. It is easy to trigger love and hate for NoSQL databases with a little bit of humour, such as with Mark Madsen’s “history of databases in no-tation”.

A much more interesting and more serious blog post is Doug Turnbull’s “Codd’s Relational Vision – Has NoSQL Come Full Circle?”, which we are going to blog about soon, in a separate post. We’ve also put the latter on Hackernews and on Reddit, both of which generated tremendous traction for the subject. Comparing the current state of “NoSQL” with pre-Codd, pre-relational, pre-SQL times is clever and matches Michael Stonebraker’s and Joseph M. Hellerstein’s observations in “What Goes Around Comes Around”.

NoSQL is a movement that emerged out of necessity, when SQL databases did not evolve fast enough to keep up with what keen observers and Gartner now call “Webscale”, a new buzzword to name old things. But as history has shown, the old elephants can be taught new tricks, and SQL databases will eventually catch up.

Auto-creation of indexes in RDBMS

In the middle of the above Hackernews discussion, MehdiEG made this interesting observation about creating indexes manually being tedious. Indeed, why do we have to maintain all of our indexes manually? While platforms like Use-The-Index-Luke.com profit from teaching people how to do proper indexing, I wonder if a highly sophisticated database couldn’t gather statistics about a productive system and then generate suggestions for index additions / removals. Even more so, if the database is “absolutely sure”, it could also create/drop or at least activate/deactivate relevant indexes.

What does “absolutely sure” mean?

The Oracle database for instance, is already quite good at gathering relevant statistics and giving DBA hints about potentially effective new indexes, as it can simulate execution plans in case indexes were added. Some more information can be seen in this Stack Overflow question.

But wouldn’t it be great if Oracle (or SQL Server, DB2, any other database) had an auto-index-creation feature? On a productive system, the database could gather statistics for the longest-running queries, analyse their execution plans, simulate alternative execution plans in case potentially useful indexes were added to improve SELECT statements, or removed to improve INSERT, UPDATE, DELETE, MERGE statements. This wouldn’t be a simple task, as all available (or at least the 100 most executed) execution plans would have to be re-calculated to see how the newly added or removed index would impact the productive system.

There are a couple of things to note here:

  1. Fine-tuning indexing is easiest on a productive system. If you’re tuning your development environment, you will get most of the cases right. But only the productive system will show all those weird edge-cases that you simply cannot foresee
  2. Analysing the productive system is hard and is usually performed by the devops team or the DBA team. They’re often not the same people as the ones who developed the application / database. Since they often cannot access the DML or DDL of the application, it’s always good if they have some automatic tuning features such as the existing cost-based optimiser
  3. Blindly adding indexes without measuring is bad practice. If you know that a table is mostly-read-only, then you’re mostly-on-the-safe-side. But what happens if a table is often bulk updated? If a batch job creates large transactions with long UNDO / REDO logs? Each unnecessary index will only slow down the batch job, increasing the risk of race conditions, rollbacks or even deadlocks.

Automatic index creation or deletion could greatly improve the productive experience with commercial databases that already have many very useful tuning features. Let us hope that Oracle, IBM, Microsoft will hear us and build such a feature into their future databases!

5 thoughts on “Auto-Creation of Indexes in RDBMS

    • After all what different about this than the JVM auto-optimizing the “hot spots”? Absolutely nothing.

      As I mentioned in the article, the difficulty lies in the fact that a potentially very large number of query execution plans will have to be recalculated, given the possibility of a new / removed index. Unlike the JVM, which can locally shuffle some bytecode (I assume?), a new index can impact hundreds if not thousands of queries – not necessarily in a good way.

      So I imagine that auto-creation of indexes is a bit more complex.

      We’ve challenged this with Tom Kyte and Tanel Poder on Twitter. Let’s see if they have an answer:
      https://twitter.com/JavaOOQ/status/412500036103073793

  1. The biggest problem with this is that there is a cost to adding indexes (increasing insert and update time for example) and what you want indexed really requires designer input. For a given application certain queries really need to be fast for usability, and other queries really do not matter.

    That being said, the state of the tools for finding such hot spots WITH designer involvement is surprisingly primitive given how long this industry has been around.

    • Yes, there’s cost to adding indexes. But a good, million-$ database could gather statistics over the weeks / months to assess where the cost / effect ratio of a new index would be most beneficial. It could simulate post-index-addition execution plans for selects and compare them to the simulated post-index-addition execution plans for insert and update, and calculate a coefficient.

      It’s probably not an easy task, but again, they’re probably not cheap databases. It would appear to me like quite a killer feature.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s