Do Not Think That One Second is Fast for Query Execution

I keep encountering situations where RDBMS users think that one second for query execution is anything near fast. Most recently, in this Stack Overflow question:

Hibernate SQL In clause making CPU usage to 100%

The poster’s original question was why a similar query executes in one second when executed in SQL Server Management Studio whereas the (seemingly) same query executes in 60 seconds when executed from Hibernate. The query looks similar to this:

select Student_Id 
from Student_Table 
where Roll_No in ('A101','A102','A103',.....'A250');

There might be many reasons for this difference. But the most striking message here is:

Please do not believe that 1 second is fast.

Databases are incredibly fast and simple queries like the one above should execute in virtually no time, even on mediocre servers. Even on your laptop! Markus Winand has made it a point to tell you that in 80% of all performance issues, all you have to do is to add that missing index. And that’s the case here as well!

The poster’s original table contains only two indexes:

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 
         UNIQUE  (Class_Id, Roll_No)
);

There is an index to implement the PRIMARY KEY, and there’s another index for the UNIQUE constraint, but both indexes aren’t really very useful, as the query predicate filters on Roll_No, which is only the second column of the UNIQUE constraint. When executing the above query on roughly 8 million rows, I’m getting an index scan on the UNIQUE index and the query runs in three seconds:

plan-1

This “Index Scan” operation is not good at all. I’m actually running through all of the index to find all the applicable Roll_No values in each index leaf node. This is explained well in Use The Index Luke’s page about concatenated indexes

The solution

But the good news is, SQL Server Management Studio gives you immediate tuning advice. Just right click on the execution plan and choose “Missing Index Details…” to get the following advice:

/*
Missing Index Details from SQLQuery1.sql -
    LUKAS-ENVY\SQLEXPRESS.test
The Query Processor estimates that implementing the 
    following index could improve the query cost 
    by 87.5035%.
*/

/*
USE [test]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[student_table] ([Roll_No])
INCLUDE ([Student_Id])
GO
*/

This doesn’t necessarily mean that the above index is the optimal choice for all your queries, but the fact that you’re querying using a predicate on Roll_No should be a strong-enough indicator that you should have an index on at least this Roll_No column. The simplest possible index here is simply:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No);

With that index in place, we’ll now get an “Index Seek” operation, which runs instantly:

plan-2

Covering index

In this particular case, a “covering index” as suggested by Vlad Mihalcea in his answer might be appropriate. For instance:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No, Student_Id);

The advantage of a covering index is that all the information needed for the query execution is already contained in the index. This is true in this particular case, but it can also be dangerous as:

  • The covering index needs more space, and if the table is already large, space might become an issue
  • The covering index only adds value as long as the query doesn’t also use additional columns (e.g. for projections, calculations, additional filtering, sorting, etc.). This is probably not the case in this simple example, which might change quickly in the near future

Thus, a covering index shouldn’t be your default choice in such cases. Better be conservative and add only those columns in the index, that add immediate value for filtering.

Conclusion

I’d like to repeat this time and again:

Do NOT think that one second is fast tweet this

In fact:

Do NOT think that anything beyond 2-3ms is fast!

Unless you’re doing heavy reporting or batch processing, where processing time obviously might take a bit longer, simple queries like these should never be anything slower than instant. And most of the time, you can achieve this speed by adding an index.

On a side-note

The poster of the aforementioned question obviously has other issues as well. None of the above explain the execution speed difference between Hibernate and “plain SQL”. But again, the message here is that if your “plain SQL” already takes more than one second, you have a very low-hanging fruit to fix.

When will we have LINQ in Java?

LINQ is one of Microsoft’s .NET Framework’s most distinct language features. When it was first introduced to languages such as C#, it required heavy changes to the language specification. Yet, this addition was extremely powerful and probably unequalled by other languages / platforms, such as Java, Scala, etc. Granted, Scala has integrated XML in a similar fashion into its language from the beginning, but that is hardly the same accomplishment. Nowadays, Typesafe developers are developing SLICK – Scala Language Integrated Connection Kit, which has similar ambitions, although the effort spent on it is hardly comparable: one “official” Scala developer against a big Microsoft team. Let alone the potential of getting into patent wars with Microsoft, should SLICK ever become popular.

What does Java have to offer?

There are many attempts of bringing LINQ-like API’s to the Java world, as the following Stack Overflow question shows:

http://stackoverflow.com/questions/1217228/what-is-the-java-equivalent-for-linq

Here’s another newcomer project by Julian Hyde, that I’ve recently discovered:

https://github.com/julianhyde/linq4j

He tried his luck on the lambda-dev mailing list, without any response so far. We’re all eagerly awaiting Java 8 and project lambda with its lambda expressions and extension methods. But when will we be able to catch up with Microsoft’s LINQ? After all, jOOQ, linq4j are “internal domain specific languages”, which are all limited by the expressivity of their host language (see my previous blog post about building domain specific languages in Java).

Java 9 maybe? We can only hope!

SQL in Scala, where jOOQ could go

I have recently blogged about how simple it is to integrate jOOQ into Scala. See the full blog post here:

https://lukaseder.wordpress.com/2011/12/11/the-ultimate-sql-dsl-jooq-in-scala/

I’m more and more thrilled by that option, as Scala is one of the fastest emerging JVM languages nowadays. The plain integration of a Java library in Scala leaves some open questions. jOOQ knows the “val()” operator or method, to create bind values. See the manual here:

http://www.jooq.org/manual/JOOQ/BindValues/

This operator cannot be used in Scala, as Scala declares “val” as a reserved word. I’ve had similar issues in Java before, when trying to use “case” or “else” in the API, which is not possible either. The path of least resistance is to overload or re-name those methods. “val” was overloaded as “value” in jOOQ 2.0.1. “case” and “else” were re-named a long time ago as “decode” (from Oracle’s DECODE function), and “otherwise” (as in XSL).

So for a full-fledged integration in Scala, jOOQ should be wrapped in a new API called scOOQ ;-). This new API should take Scala’s language features into account in order to make working with jOOQ a lot easier. This could be the chance to re-engineer some of the API and make all API methods uppercase, as is usual with SQL. With Scala’s ability of omitting syntax elements, such as “.” and “()”, the API could then declare one-word methods, such as in this Java example:

MERGE().INTO(MY_TABLE)
       .USING(SOURCE_TABLE)
       .ON(MY_TABLE.ID.equal(SOURCE_TABLE.ID))
       .WHEN().MATCHED().THEN().UPDATE()
         .SET(ID, 1)
         .SET(DATA, "Data")
       .WHEN().NOT().MATCHED().THEN().INSERT(MY_TABLE.ID, MY_TABLE.DATA)
       .VALUES(1, "Data")

While in Java, this looks quite nasty and verbose, in Scala it could be very nice! The below statement should compile in Scala if the API was declared as such:

MERGE INTO MY_TABLE
      USING (SOURCE_TABLE)
      ON (MY_TABLE.ID equal SOURCE_TABLE.ID)
      WHEN MATCHED THEN UPDATE
        SET (ID, 1)
        SET (DATA, "Data")
      WHEN NOT MATCHED THEN INSERT (MY_TABLE.ID, MY_TABLE.DATA)
      VALUES (1, "Data")

Convinced? Contributions very welcome! :-)