SQL Tip of the Day: Be Wary of SELECT COUNT(*)

Recently, I’ve encountered this sort of query all over the place at a customer site:

DECLARE
  v_var NUMBER(10);
BEGIN
  SELECT COUNT(*)
  INTO   v_var
  FROM   table1
  JOIN   table2 ON table1.t1_id = table2.t1_id
  JOIN   table3 ON table2.t2_id = table3.t2_id
  ...
  WHERE  some_predicate;

  IF (v_var = 0) THEN
    do_something
  ELSE
    do_something_else
  END IF;
END;

Unfortunately, COUNT(*) is often the first solution that comes to mind when we want to check our relations for some predicate. But COUNT() is expensive, especially if all we’re doing is checking our relations for existence. Does the word ring a bell? Yes, we should use the EXISTS predicate, because if we don’t care about the exact number of records that return true for a given predicate, we shouldn’t go through the complete data set to actually count the exact number. The above PL/SQL block can be rewritten trivially to this one:

DECLARE
  v_var NUMBER(10);
BEGIN
  SELECT CASE WHEN EXISTS (
    SELECT 1
    FROM   table1
    JOIN   table2 ON table1.t1_id = table2.t1_id
    JOIN   table3 ON table2.t2_id = table3.t2_id
    ...
    WHERE  some_predicate
  ) THEN 1 ELSE 0 END
  INTO   v_var
  FROM   dual;

  IF (v_var = 0) THEN
    do_something
  ELSE
    do_something_else
  END IF;
END;

Let’s measure!

Query 1 yields this execution plan:

-----------------------------------------------
| Id  | Operation           | E-Rows | A-Rows |
-----------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |
|   1 |  SORT AGGREGATE     |      1 |      1 |
|*  2 |   HASH JOIN         |      4 |      4 |
|*  3 |    TABLE ACCESS FULL|      2 |      2 |
|*  4 |    TABLE ACCESS FULL|      6 |      6 |
-----------------------------------------------

Query 2 yields this execution plan:

----------------------------------------------
| Id  | Operation          | E-Rows | A-Rows |
----------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |
|   1 |  NESTED LOOPS      |      4 |      1 |
|*  2 |   TABLE ACCESS FULL|      2 |      1 |
|*  3 |   TABLE ACCESS FULL|      2 |      1 |
|   4 |  FAST DUAL         |      1 |      1 |
----------------------------------------------

You can ignore the TABLE ACCESS FULL operations, the actual query was executed on a trivial database with no indexes.

What’s essential, however, are the much improved E-Rows values (E = Estimated) and even more importantly the optimal A-Rows values (A = Actual). As you can see, the EXISTS predicate could be aborted early, as soon as the first record that matches the predicate is encountered – in this case immediately.

See this post about more details of how to collect Oracle Execution plans

The same is true for SQL Server

Bruce Gordon from Webucator’s SQL Training team had picked up the topic to create a short video demo where he showed that the same kind of difference can be observed in SQL Server as well:

Conclusion

Whenever you encounter a COUNT(*) operation, you should ask yourself if it is really needed. Do you really need to know the exact number of records that match a predicate? Or are you already happy knowing that any record matches the predicate?

Answer: It’s probably the latter.

Join the No OFFSET Movement!

Markus Winand from Use The Index, Luke! did it again. He started an exciting battle against one the biggest flaws in the SQL language:

No More OFFSET

We’ve blogged about this before. OFFSET pagination is terribly slow, once you reach higher page numbers. Besides, chances are, that your database doesn’t even implement it correctly, yet (and your emulation is probably wrong, too).

Join Markus’s movement for KEYSET pagination, which isn’t only much faster, but also more intuitive. Popular websites like Reddit, Twitter, Facebook and many more already implement keyset pagination. Why don’t you?

jOOQ is the only Java SQL API that already implements KEYSET pagination natively using the synthetic SEEK clause. Here’s how to do it:

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15) // This jumps to the tuple (949, 15)
   .limit(10)
   .fetch();

Read more about this new movement here: http://use-the-index-luke.com/no-offset

Are You Using SQL PIVOT Yet? You Should!

Every once in a while, we run into these rare SQL issues where we’d like to do something that seems out of the ordinary. One of these things is pivoting rows to columns.

A recent question on Stack Overflow by Valiante asked for precisely this. Going from this table:

+------+------------+----------------+-------------------+
| dnId | propNameId |  propertyName  |   propertyValue   |
+------+------------+----------------+-------------------+
|    1 |         10 | objectsid      | S-1-5-32-548      |
|    1 |         19 | _objectclass   | group             |
|    1 |         80 | cn             | Account Operators |
|    1 |         82 | samaccountname | Account Operators |
|    1 |         85 | name           | Account Operators |
|    2 |         10 | objectsid      | S-1-5-32-544      |
|    2 |         19 | _objectclass   | group             |
|    2 |         80 | cn             | Administrators    |
|    2 |         82 | samaccountname | Administrators    |
|    2 |         85 | name           | Administrators    |
|    3 |         10 | objectsid      | S-1-5-32-551      |
|    3 |         19 | _objectclass   | group             |
|    3 |         80 | cn             | Backup Operators  |
|    3 |         82 | samaccountname | Backup Operators  |
|    3 |         85 | name           | Backup Operators  |
+------+------------+----------------+-------------------+

… we’d like to transform rows into colums as such:

+------+--------------+--------------+-------------------+-------------------+-------------------+
| dnId |  objectsid   | _objectclass |        cn         |  samaccountname   |       name        |
+------+--------------+--------------+-------------------+-------------------+-------------------+
|    1 | S-1-5-32-548 | group        | Account Operators | Account Operators | Account Operators |
|    2 | S-1-5-32-544 | group        | Administrators    | Administrators    | Administrators    |
|    3 | S-1-5-32-551 | group        | Backup Operators  | Backup Operators  | Backup Operators  |
+------+--------------+--------------+-------------------+-------------------+-------------------+

The idea is that we only want one row per distinct dnId, and then we’d like to transform the property-name-value pairs into columns, one column per property name.

Using Oracle or SQL Server PIVOT

The above transformation is actually quite easy with Oracle and SQL Server, which both support the PIVOT keyword on table expressions.

Here is how the desired result can be produced with SQL Server:

SELECT p.*
FROM (
  SELECT dnId, propertyName, propertyValue
  FROM myTable
) AS t
PIVOT(
  MAX(propertyValue)
  FOR propertyName IN (
    objectsid, 
    _objectclass, 
    cn, 
    samaccountname, 
    name
  )
) AS p;

(SQLFiddle here)

And the same query with a slightly different syntax in Oracle:

SELECT p.*
FROM (
  SELECT dnId, propertyName, propertyValue
  FROM myTable
) t
PIVOT(
  MAX(propertyValue)
  FOR propertyName IN (
    'objectsid'      as "objectsid", 
    '_objectclass'   as "_objectclass", 
    'cn'             as "cn", 
    'samaccountname' as "samaccountname", 
    'name'           as "name"
  )
) p;

(SQLFiddle here)

How does it work?

It is important to understand that PIVOT (much like JOIN) is a keyword that is applied to a table reference in order to transform it. In the above example, we’re essentially transforming the derived table t to form the pivot table p. We could take this further and join p to another derived table as so:

SELECT *
FROM (
  SELECT dnId, propertyName, propertyValue
  FROM myTable
) t
PIVOT(
  MAX(propertyValue)
  FOR propertyName IN (
    'objectsid'      as "objectsid", 
    '_objectclass'   as "_objectclass", 
    'cn'             as "cn", 
    'samaccountname' as "samaccountname", 
    'name'           as "name"
  )
) p
JOIN (
  SELECT dnId, COUNT(*) availableAttributes
  FROM myTable
  GROUP BY dnId
) q USING (dnId);

The above query will now allow for finding those rows for which there isn’t a name / value pair in every column. Let’s assume we remove one of the entries from the original table, the above query might now return:

| DNID |    OBJECTSID | _OBJECTCLASS |                CN |    SAMACCOUNTNAME |              NAME | AVAILABLEATTRIBUTES |
|------|--------------|--------------|-------------------|-------------------|-------------------|---------------------|
|    1 | S-1-5-32-548 |        group | Account Operators | Account Operators | Account Operators |                   5 |
|    2 | S-1-5-32-544 |        group |    Administrators |            (null) |    Administrators |                   4 |
|    3 | S-1-5-32-551 |        group |  Backup Operators |  Backup Operators |  Backup Operators |                   5 |

jOOQ also supports the SQL PIVOT clause through its API.

What if I don’t have PIVOT?

In simple PIVOT scenarios, users of other databases than Oracle or SQL Server can write an equivalent query that uses GROUP BY and MAX(CASE ...) expressions as documented in this answer here.

The 10 Most Annoying Things Coming Back to Java After Some Days of Scala

So, I’m experimenting with Scala because I want to write a parser, and the Scala Parsers API seems like a really good fit. After all, I can implement the parser in Scala and wrap it behind a Java interface, so apart from an additional runtime dependency, there shouldn’t be any interoperability issues.

After a few days of getting really really used to the awesomeness of Scala syntax, here are the top 10 things I’m missing the most when going back to writing Java:

1. Multiline strings

That is my personal favourite, and a really awesome feature that should be in any language. Even PHP has it: Multiline strings. As easy as writing:

println ("""Dear reader,

If we had this feature in Java,
wouldn't that be great?

Yours Sincerely,
Lukas""")

Where is this useful? With SQL, of course! Here’s how you can run a plain SQL statement with jOOQ and Scala:

println(
  DSL.using(configuration)
     .fetch("""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            ORDER BY a.id, b.id
            """)
)

And this isn’t only good for static strings. With string interpolation, you can easily inject variables into such strings:

val predicate =
  if (someCondition)
    "AND a.id = 1"
  else
    ""

println(
  DSL.using(configuration)
      // Observe this little "s"
     .fetch(s"""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            -- This predicate is the referencing the
            -- above Scala local variable. Neat!
            WHERE 1 = 1 $predicate
            ORDER BY a.id, b.id
            """)
)

That’s pretty awesome, isn’t it? For SQL, there is a lot of potential in Scala.

jOOQ: The Best Way to Write SQL in Scala

2. Semicolons

I sincerely haven’t missed them one bit. The way I structure code (and probably the way most people structure code), Scala seems not to need semicolons at all. In JavaScript, I wouldn’t say the same thing. The interpreted and non-typesafe nature of JavaScript seems to indicate that leaving away optional syntax elements is a guarantee to shoot yourself in the foot. But not with Scala.

val a = thisIs.soMuchBetter()
val b = no.semiColons()
val c = at.theEndOfALine()

This is probably due to Scala’s type safety, which would make the compiler complain in one of those rare ambiguous situations, but that’s just an educated guess.

3. Parentheses

This is a minefield and leaving away parentheses seems dangerous in many cases. In fact, you can also leave away the dots when calling a method:

myObject method myArgument

Because of the amount of ambiguities this can generate, especially when chaining more method calls, I think that this technique should be best avoided. But in some situations, it’s just convenient to “forget” about the parens. E.g.

val s = myObject.toString

4. Type inference

This one is really annoying in Java, and it seems that many other languages have gotten it right, in the meantime. Java only has limited type inference capabilities, and things aren’t as bright as they could be.

In Scala, I could simply write

val s = myObject.toString

… and not care about the fact that s is of type String. Sometimes, but only sometimes I like to explicitly specify the type of my reference. In that case, I can still do it:

val s : String = myObject.toString

5. Case classes

I think I’d fancy writing another POJO with 40 attributes, constructors, getters, setters, equals, hashCode, and toString

— Said no one. Ever

Scala has case classes. Simple immutable pojos written in one-liners. Take the Person case class for instance:

case class Person(firstName: String, lastName: String)

I do have to write down the attributes once, agreed. But everything else should be automatic.

And how do you create an instance of such a case class? Easily, you don’t even need the new operator (in fact, it completely escapes my imagination why new is really needed in the first place):

Person("George", "Orwell")

That’s it. What else do you want to write to be Enterprise-compliant?

Side-note

OK, some people will now argue to use project lombok. Annotation-based code generation is nonsense and should be best avoided. In fact, many annotations in the Java ecosystem are simple proof of the fact that the Java language is – and will forever be – very limited in its evolution capabilities. Take @Override for instance. This should be a keyword, not an annotation. You may think it’s a cosmetic difference, but I say that Scala has proven that annotations are pretty much always the wrong tool. Or have you seen heavily annotated Scala code, recently?

6. Methods (functions!) everywhere

This one is really one of the most useful features in any language, in my opinion. Why do we always have to link a method to a specific class? Why can’t we simply have methods in any scope level? Because we can, with Scala:

// "Top-level", i.e. associated with the package
def m1(i : Int) = i + 1

object Test {

    // "Static" method in the Test instance
    def m2(i : Int) = i + 2
    
    def main(args: Array[String]): Unit = {

        // Local method in the main method
        def m3(i : Int) = i + 3
        
        println(m1(1))
        println(m2(1))
        println(m3(1))
    }
}

Right? Why shouldn’t I be able to define a local method in another method? I can do that with classes in Java:

public void method() {
    class LocalClass {}

    System.out.println(new LocalClass());
}

A local class is an inner class that is local to a method. This is hardly ever useful, but what would be really useful is are local methods.

These are also supported in JavaScript or Ceylon, by the way.

7. The REPL

Because of various language features (such as 6. Methods everywhere), Scala is a language that can easily run in a REPL. This is awesome for testing out a small algorithm or concept outside of the scope of your application.

In Java, we usually tend to do this:

public class SomeRandomClass {

    // [...]
  
    public static void main(String[] args) {
        System.out.println(SomeOtherClass.testMethod());
    }

    // [...]
}

In Scala, I would’ve just written this in the REPL:

println(SomeOtherClass.testMethod)

Notice also the always available println method. Pure gold in terms of efficient debugging.

8. Arrays are NOT (that much of) a special case

In Java, apart from primitive types, there are also those weird things we call arrays. Arrays originate from an entirely separate universe, where we have to remember quirky rules originating from the ages of Capt Kirk (or so)

array

Yes, rules like:

// Compiles but fails at runtime
Object[] arrrrr = new String[1];
arrrrr[0] = new Object();

// This works
Object[] arrrr2 = new Integer[1];
arrrr2[0] = 1; // Autoboxing

// This doesn't work
Object[] arrrr3 = new int[];

// This works
Object[] arr4[] = new Object[1][];

// So does this (initialisation):
Object[][] arr5 = { { } };

// Or this (puzzle: Why does it work?):
Object[][] arr6 = { { new int[1] } };

// But this doesn't work (assignment)
arr5 = { { } };

Yes, the list could go on. With Scala, arrays are less of a special case, syntactically speaking:

val a = new Array[String](3);
a(0) = "A"
a(1) = "B"
a(2) = "C"
a.map(v => v + ":")

// output Array(A:, B:, C:)

As you can see, arrays behave much like other collections including all the useful methods that can be used on them.

9. Symbolic method names

Now, this topic is one that is more controversial, as it reminds us of the perils of operator overloading. But every once in a while, we’d wish to have something similar. Something that allows us to write

val x = BigDecimal(3);
val y = BigDecimal(4);
val z = x * y

Very intuitively, the value of z should be BigDecimal(12). That cannot be too hard, can it? I don’t care if the implementation of * is really a method called multiply() or whatever. When writing down the method, I’d like to use what looks like a very common operator for multiplication.

By the way, I’d also like to do that with SQL. Here’s an example:

select ( 
  AUTHOR.FIRST_NAME || " " || AUTHOR.LAST_NAME,
  AUTHOR.AGE - 10
)
from AUTHOR
where AUTHOR.ID > 10
fetch

Doesn’t that make sense? We know that || means concat (in some databases). We know what the meaning of - (minus) and > (greater than) is. Why not just write it?

The above is a compiling example of jOOQ in Scala, btw.

jOOQ: The Best Way to Write SQL in Scala

Attention: Caveat

There’s always a flip side to allowing something like operator overloading or symbolic method names. It can (and will be) abused. By libraries as much as by the Scala language itself.

10. Tuples

Being a SQL person, this is again one of the features I miss most in other languages. In SQL, everything is either a TABLE or a ROW. few people actually know that, and few databases actually support this way of thinking.

Scala doesn’t have ROW types (which are really records), but at least, there are anonymous tuple types. Think of rows as tuples with named attributes, whereas case classes would be named rows:

  • Tuple: Anonymous type with typed and indexed elements
  • Row: Anonymous type with typed, named, and indexed elements
  • case class: Named type with typed and named elements

In Scala, I can just write:

// A tuple with two values
val t1 = (1, "A")

// A nested tuple
val t2 = (1, "A", (2, "B"))

In Java, a similar thing can be done, but you’ll have to write the library yourself, and you have no language support:

class Tuple2<T1, T2> {
    // Lots of bloat, see missing case classes
}

class Tuple3<T1, T2, T3> {
    // Bloat bloat bloat
}

And then:

// Yikes, no type inference...
Tuple2<Integer, String> t1 = new Tuple2<>(1, "A");

// OK, this will certainly not look nice
Tuple3<Integer, String, Tuple2<Integer, String>> t2 =
    new Tuple3<>(1, "A", new Tuple2<>(2, "B"));

jOOQ makes extensive use of the above technique to bring you SQL’s row value expressions to Java, and surprisingly, in most cases, you can do without the missing type inference as jOOQ is a fluent API where you never really assign values to local variables… An example:

DSL.using(configuration)
   .select(T1.SOME_VALUE)
   .from(T1)
   .where(
      // This ROW constructor is completely type safe
      row(T1.COL1, T1.COL2)
      .in(select(T2.A, T2.B).from(T2))
   )
   .fetch();

Conclusion

This was certainly a pro-Scala and slightly contra-Java article. Don’t get me wrong. By no means, I’d like to migrate entirely to Scala. I think that the Scala language goes way beyond what is reasonable in any useful software. There are lots of little features and gimmicks that seem nice to have, but will inevitably blow up in your face, such as:

  • implicit conversion. This is not only very hard to manage, it also slows down compilation horribly. Besides, it’s probably utterly impossible to implement semantic versioning reasonably using implicit, as it is probably not possible to foresee all possible client code breakage through accidental backwards-incompatibility.
  • local imports seem great at first, but their power quickly makes code unintelligible when people start partially importing or renaming types for a local scope.
  • symbolic method names are most often abused. Take the parser API for instance, which features method names like ^^, ^^^, ^?, or ~!

Nonetheless, I think that the advantages of Scala over Java listed in this article could all be implemented in Java as well:

  • with little risk of breaking backwards-compatibility
  • with (probably) not too big of an effort, JLS-wise
  • with a huge impact on developer productivity
  • with a huge impact on Java’s competitiveness

In any case, Java 9 will be another promising release, with hot topics like value types, declaration-site variance, specialisation (very interesting!) or ClassDynamic

With these huge changes, let’s hope there’s also some room for any of the above little improvements, that would add more immediate value to every day work.