Auto-Creation of Indexes in RDBMS

[…] generally speaking, I’m also surprised to see that in 2013 we’re creating our indexes manually.

Interesting thought! Has this thought ever occurred to you?

How this comment came about

Hackernews is very predictable. Our latest pro-SQL marketing campaign for jOOQ got quite a bit of traction as expected. It is easy to trigger love and hate for NoSQL databases with a little bit of humour, such as with Mark Madsen’s “history of databases in no-tation”.

A much more interesting and more serious blog post is Doug Turnbull’s “Codd’s Relational Vision – Has NoSQL Come Full Circle?”, which we are going to blog about soon, in a separate post. We’ve also put the latter on Hackernews and on Reddit, both of which generated tremendous traction for the subject. Comparing the current state of “NoSQL” with pre-Codd, pre-relational, pre-SQL times is clever and matches Michael Stonebraker’s and Joseph M. Hellerstein’s observations in “What Goes Around Comes Around”.

NoSQL is a movement that emerged out of necessity, when SQL databases did not evolve fast enough to keep up with what keen observers and Gartner now call “Webscale”, a new buzzword to name old things. But as history has shown, the old elephants can be taught new tricks, and SQL databases will eventually catch up.

Auto-creation of indexes in RDBMS

In the middle of the above Hackernews discussion, MehdiEG made this interesting observation about creating indexes manually being tedious. Indeed, why do we have to maintain all of our indexes manually? While platforms like Use-The-Index-Luke.com profit from teaching people how to do proper indexing, I wonder if a highly sophisticated database couldn’t gather statistics about a productive system and then generate suggestions for index additions / removals. Even more so, if the database is “absolutely sure”, it could also create/drop or at least activate/deactivate relevant indexes.

What does “absolutely sure” mean?

The Oracle database for instance, is already quite good at gathering relevant statistics and giving DBA hints about potentially effective new indexes, as it can simulate execution plans in case indexes were added. Some more information can be seen in this Stack Overflow question.

But wouldn’t it be great if Oracle (or SQL Server, DB2, any other database) had an auto-index-creation feature? On a productive system, the database could gather statistics for the longest-running queries, analyse their execution plans, simulate alternative execution plans in case potentially useful indexes were added to improve SELECT statements, or removed to improve INSERT, UPDATE, DELETE, MERGE statements. This wouldn’t be a simple task, as all available (or at least the 100 most executed) execution plans would have to be re-calculated to see how the newly added or removed index would impact the productive system.

There are a couple of things to note here:

  1. Fine-tuning indexing is easiest on a productive system. If you’re tuning your development environment, you will get most of the cases right. But only the productive system will show all those weird edge-cases that you simply cannot foresee
  2. Analysing the productive system is hard and is usually performed by the devops team or the DBA team. They’re often not the same people as the ones who developed the application / database. Since they often cannot access the DML or DDL of the application, it’s always good if they have some automatic tuning features such as the existing cost-based optimiser
  3. Blindly adding indexes without measuring is bad practice. If you know that a table is mostly-read-only, then you’re mostly-on-the-safe-side. But what happens if a table is often bulk updated? If a batch job creates large transactions with long UNDO / REDO logs? Each unnecessary index will only slow down the batch job, increasing the risk of race conditions, rollbacks or even deadlocks.

Automatic index creation or deletion could greatly improve the productive experience with commercial databases that already have many very useful tuning features. Let us hope that Oracle, IBM, Microsoft will hear us and build such a feature into their future databases!

SQL is the new NoNoSQL

You have probably been wondering about what SQL is. Some also refer to it as NoNoSQL. See a comprehensive comparison about SQL vs. NoSQL on our new website here:

http://www.nosql-vs-sql.com

jOOQ Newsletter December 13, 2013

subscribe to the newsletter here

A jOOQ Runtime Only Distribution

Several of our customers have made us aware of the fact that they don’t really need the jOOQ code generator, only the SQL builder / query DSL API and possibly the SQL execution functionality. In the next weeks, we’ll be working towards a new “jOOQ Runtime Only Distribution” that will ship at a lower price. Concretely, we’ll be giving those customers a 25% discount on the regular distributions for any of jOOQ Express, jOOQ Professional, and jOOQ Enterprise.

Obviously, we recommend to use jOOQ’s code generator to profit from the full feature scope that we offer.

License Amendments

We have updated our commercial license to improve your legal relationship with Data Geekery. Essentially, these things have been amended:

  • Definitions: We have added the missing definition of what is a “Minor Defect”
  • 6.2 Distribution Right: The distribution right section now grants you a perpetual license to distribute the software, instead of a timely limited one. This will allow you to continue to distribute, embed and use jOOQ in your End-user Application even if you terminate your Developer Workstation License Agreement with us. You will, however, still need to license jOOQ in order to maintain your End-user Application.
  • 7.1.1 Remedial Services: The remedial services section now formally defines how Minor Defects are remedied

As this updated license grants new rights to our customers, it shall be in effect immediately also for existing jOOQ 3.2 customers.

Upcoming Events

We have recently presented jOOQ at the Java2Days conference in Sofia, Bulgaria as well as at the Java User Group Berlin-Brandenburg. Both talks have enjoyed a high attendance with lots of SQL developers challenging us with interesting questions. Of course, jOOQ responds to most questions already, as Manuel Bernhardt, another speaker has observed:

If you want us to talk about jOOQ or SQL near you, do not hesitate to contact us.

Here is an overview of other, upcoming events:

Stay informed about 2014 events on www.jooq.org/news. If you’re looking for the slides, they’re available for free under a CC BY-SA 3.0 license on SlideShare.

Using jOOQ with Hibernate

While jOOQ can be seen as a popular alternative to Hibernate, there is no stopping you from using both frameworks in the same application. While Hibernate heavily improves every day CRUD operations, jOOQ heavily improves writing SQL in Java. The two goals can be orthogonal and thus it may make perfect sense to combine the two.

Vlad Mihalcea, an enthusiastic jOOQ user, is writing up a series of blog posts on his blog about how to use jOOQ with Hibernate:

Gavin King himself chimed in on Google+ and confirmed once again that it was never his intention for Hibernate to be used for everything.

If you want to share your own experience of your jOOQ / Spring, jOOQ / Hibernate, jOOQ / Anything integration, let us know!

SQL Zone – “Lightning Fast SQL with Proper Indexing”

Understanding SQL and the history of SQL is of the essence when you want to get the best out of your relational database. This has recently been nicely explained by Markus Winand at the Oredev conference 2013. Luckily, his presentation has been recorded for free review.

More about the history of SQL can be seen in this interesting article about Codd’s Relational Vision – Has NoSQL Come Full Circle? It proves what we have been evangelising for a while, ourselves. NoSQL is not a new technology. It is more of a return to pre-Codd times when people did not have a powerful relational model to abstract their storage implementation away from their application. This article claims that Codd’s greatest achievement was the fact that he surpassed the deficiencies of early databases:

  • Access dependencies
  • Order dependencies
  • Index dependencies

Read the full article for a very interesting historic insight into the relational model, and why NoSQL might not be a good answer for most general problems.

A History of Databases in “No-tation”

We’re heading towards very exciting times in the field of databases!

At Topconf in beautiful Tallin, Estonia, Nikita Ivanov (founder and CEO of GridGain Systems) was talking about how the ever crumbling price of DRAM gets in-memory computing and thus in-memory databases within the reach of being affordable by even small and medium enterprises. Nikita claims that 99% of all companies have less than 10TB of transactional data. While this has been completely impossible ten years ago, nowadays, you can store that much data in memory for less than 15000 USD! Compared to the Oracle license that you might buy with the server, that’s almost nothing. Imagine that you can scale up several orders of magnitude without changing your “legacy” architecture. Without switching to something like NoSQL.

A day before, Christoph Engelbert presented Hazelcast, a competitor product of GridGain Systems. Unfortunately, I couldn’t attend his talk but I was lucky enough to spend a couple of hours with Christoph on the flight back home. He’s a very interesting and fun guy to talk to and gave me quite some insight about what his company is evangelising in the context of “Big Data”. Essentially, modern data processing involves moving computation towards data, instead of moving data towards computation. While Hazelcast solves this through their own storage mechanisms, this paradigm has been equally true for “legacy” OLAP systems based on relational databases. Using PL/SQL, or T-SQL, or any other procedural language, you can execute complex algorithms right where the data is: In your database.

For those of you frequently following my blog, you will not be surprised that I am very thrilled about the above evolutions in data computing. The ever increasing consternation with ORMs and the big amount of confusion about the future of “NoSQL” have lead to a recent revival of SQL as a language.

Back to the roots.

This seems to have culminated at the recent O’Reilly Strata Conference, where Mark Madsen, a popular researcher and analyst was walking around with a geeky T-Shirt showing the History of NoSQL. I’ve had a brief chat with him on Twitter. He might be selling this T-Shirt, if it goes viral.

History of NoSQL by Mark Madsen. Picture by Ed Dumbill

History of NoSQL by Mark Madsen. Picture published by Edd Dumbill

So apparently, SQL is back, and strong as ever!

Why Staying in Control of Your SQL is so Important

Lots of blog posts and research papers are written about the topics of scaling up and scaling out. This interesting blog post, for instance, sheds some light on the two strategies with respect to physical maintenance costs, such as cooling and electricity consumption. Certainly non-negligible aspects for very large systems.

But before solving problems at a very big scale, consider much simpler SQL tuning mechanisms. Very often, when you have a bottleneck in your application, it is at the database layer. This fact is used by many NoSQL evangelists to promote their products, claiming that scaling out is much easier with NoSQL databases. This might be true, but ask yourself: Do you need a system that works under heavy load? Or is your bottleneck a performance bottleneck?

In other words: Do you have a 5’000’000-concurrent-users problem? Or do you have a request-takes-more-than-3-seconds problem? Because if you suffer from the latter, you probably do not need to scale out nor up. Your “traditional” architecture is probably quite fine, but your database / SQL queries aren’t. The popular use-the-index-luke.com website features an interesting article about the two top performance problems caused by ORM tools. They are:

  • The infamous N+1 selects problem
  • The hardly-known Index-Only Scan

Both problems result from the fact that popular ORMs generate SQL code in a way that can hardly be influenced by developers. People often claim that tools like Hibernate generate better SQL than the average developer. This is true to an extent that the average developer might never care to actually learn to write better SQL. Which in turn leads to the above problems.

Hibernate is very good at generating 70% of your application’s boring CRUD SQL. At the same time, Hibernate never claimed to be a replacement for SQL as you will have to resort to native SQL in 30% of the time. Should you then use Hibernate’s native SQL API? Or an alternative like jOOQ?

The important thing is to get back in control of your SQL when performance matters. Know your indexes. Know your database meta data. And use a tool that allows you to write precisely the SQL statement you want. Learning better SQL will help you save lots of money on operations costs as you might not need to scale out nor up.

jOOQ Newsletter October 10, 2013

Subscribe to this newsletter here.

jOOQ 3.2 Released

After a bit of time, jOOQ 3.2 has finally been released. This interesting release mainly includes two new SPIs (Service Provider Interfaces), which allow for:

  • Injecting pre and post CRUD operation behaviour, which is useful for global ID generators.
  • Injecting behaviour into the SQL rendering lifecycle allowing for arbitrary SQL transformations. This is very useful for things like shared-schema multi-tenancy.

Besides, the code generator has seen lots of improvements. More details can be seen in the release notes.

jOOQ Licensing and Support

With the new jOOQ 3.2, apart from introducing great new features, we are changing quite a few things on how we operate. At Data Geekery GmbH, we believe in Open Source. But we also believe in the creative power enabled by commercial software. This is why we have chosen to implement a dual-licensing strategy. Read more about this strategy here:

https://blog.jooq.org/2013/10/09/jooq-3-2-offering-commercial-licensing-and-support

SQL2jOOQ

A very interesting open source tool is being developed by a third-party vendor called GUDU Soft in cooperation with Data Geekery: SQL2jOOQ. This tool is based on GUDU Soft’s SQL Parser application, which can parse a variety of text-based SQL strings, transform ASTs and render new dialects. One of these output dialects is jOOQ. For jOOQ developers migrating large amounts of legacy SQL to jOOQ, this will be an invaluable tool in the tool chain.

MongoDB and NoSQL Heat

MongoDB was in the news big time last week when they announced their recent raise of $150 million in a venture-funding round. While cynics claimed that they failed to write the transaction to disk, this is still a very important milestone for 10gen. After all, MongoDB is the only non-relational DBMS figuring in a quite objective top 10 rating considering 194 systems.

To SQL or not to SQL? Our take from the jOOQ perspective is clear. Any competitor in the market dominated by Oracle and SQL Server is good, even (or maybe because) it is a non-SQL vendor:

https://blog.jooq.org/2013/10/02/the-premature-return-to-sql/

On MongoDB’s Success. Or Do Not Let Cynicism Kill Your Spirit

Success is a strange beast. On the bright side and true to entrepreneurial spirit, people admire those who are obviously successful. In the theory of our mostly capitalist societey, successful people worked hard for their success, and thus deserve it. In practice, there may be other mechanisms of success than working hard, but that’s another story which is quite off-topic on this blog.

MongoDB is successful. Extremely successful, as they have just raised $150M from a venture-funding round. They must be doing something right, at least from a marketing and sales perspective. As I admire this success, and as I am hoping to be similarly successful with jOOQ, I wanted to throw in the story at reddit for discussion. I’m curious what other business-oriented people think of this. But as always, when talking about MongoDB, cynicism arose. Heavily. See the following examples

Cynicism about the tech quality

Cynicism on the tech quality

Cynicism about the tech quality

Cynicism about getting rich

Cynicism about getting rich

Cynicism about getting rich

This was really disappointing to me. Is this just reddit being full of trolls? Or are people heavily envious if someone stands out and is successful? You may think about MongoDB whatever you want. But there is no doubt that this valuation and venture-funding round is a proof of MongoDB’s continuous success over the past years. Apparently, they are responding to a real market value with real market expectations. Raising this money will hopefully help them meet those expectations even more, in the future.

Do not let cynicism kill your spirit

No matter if this is just reddit or the “IT community” in general, it is not worth delving into such cynical discussions. I congratulate 10gen for being the only NoSQL database vendor in the DBMS popularity ranking that I’ve previously posted:

DBMS ranking. Reproduced with permission of DB-Engines.com

DBMS ranking. Reproduced with permission of DB-Engines.com

And I most certainly congratulate 10gen to this incredible milestone in their company history.