Oracle’s BINARY_DOUBLE Can Be Much Faster Than NUMBER

Using the right data type for some calculation sounds like some obvious advice.

There are many blogs about using temporal data types for temporal data, instead of strings. An obvious reason is data integrity and correctness. We don’t gain much in storing dates as 2019-09-10 in one record, and as Nov 10, 2019 in the next one. Performance is also important in this case – for example, because of statistics being more optimal (an often overlooked side effect of unfit data types).

But at a customer site, I’ve recently discovered a surprising (not surprising in hindsight) performance issue when using NUMBER instead of BINARY_DOUBLE in an Oracle database.

NUMBER for monetary amounts

The NUMBER type (or DECIMAL, NUMERIC in other RDBMS) is perfectly suited for all decimal numbers, which require the correct precision and rounding. I mean, if you ever encounter this kind of display in an invoice, such as when I purchase video games on steam:

… your trust level for the platform immediately goes down by factor 1.0e+1 (even if technically, it does not matter in this case). So, the default, e.g. in banking systems for monetary amounts is always to use NUMBER or an equivalent type, e.g. java.math.BigDecimal.

Calculations on numbers

But now, let’s assume we want to do some statistics on these amounts. Some aggregations of various numeric values, maybe even of their logarithms. We can establish that these two expressions are equal:

ln(a * b) = ln(a) + ln(b)

Or in other words, for positive numbers:

a * b = exp(ln(a) + ln(b))

We’ve already blogged about how this approach can be used to emulate a PRODUCT() aggregate function in SQL, which is very useful for some cases, but none of the SQL databases jOOQ supports has built-in support for this yet. Notice, the blog post also takes care of zero and negative numbers.

But which number type to choose?

Now, we might be tempted to just calculate the LN(SOME_NUMBER) value, and sum that up using SUM(LN(SOME_NUMBER)) for this use-case. This turns out to be terribly slow in Oracle. We were thinking of bad indexes, first, even doubted aggregation in general, until I suggested we try using BINARY_DOUBLE instead, in this case. In our case, we didn’t care about the exact numeric value of the amount. A IEEE 754 floating point number with double precision was going to be good enough.

The results were surprising. In a simple benchmark, we compared 10 approaches to calculating this sum of logarithms:

  1. Using NUMBER(20, 10) and SUM(LN(the_number))
  2. Using NUMBER(20, 10) and SUM(LN(CAST(the_number AS BINARY_DOUBLE)))
  3. Using NUMBER(20, 10) and SUM(LN(TO_BINARY_DOUBLE(the_number)))
  4. Using NUMBER(20, 10), containing a pre-calculated LN value
  5. Using DOUBLE PRECISION and SUM(LN(the_double))
  6. Using DOUBLE PRECISION and SUM(LN(CAST(the_double AS BINARY_DOUBLE)))
  7. Using DOUBLE PRECISION and SUM(LN(TO_BINARY_DOUBLE(the_double)))
  8. Using DOUBLE PRECISION, containing a pre-calculated LN value
  9. Using BINARY_DOUBLE and SUM(LN(the_binary))
  10. Using BINARY_DOUBLE, containing a pre-calculated LN value

These were the thoughts:

  • We tried the above 3 possible numeric data types, expecting BINARY_DOUBLE to be the fastest
  • We tried to pre-calculate the LN() value for the benchmark, to see how much effort goes into summing, and how much effort goes into the LN() calculation with each type. While in general, in this system, such precalculation is impractical, we still wanted to have a benchmark comparison, in case a materialised view or other technique would be feasible.
  • We tried casting and converting each type to BINARY_DOUBLE prior to passing the value to the LN() function. Instead of migrating all the data (with possible side effects), we wanted to see if we can solve this to a reasonable extent “on the fly”

The benchmark that I’m posting here uses this table and example data (full benchmark code at the end of the article):

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

So, we have 100000 records, whose SUM(LN(x)) we want to calculate in the above 10 different ways. N1 contains the raw numeric value, and N2 contains the pre-calculated LN(N1).

The benchmark technique is described here. Do note it has a lot of caveats and is only useful for a limited number of verifications. Please always be careful when running such benchmarks, they often do not test production-like situations – and never use such benchmarks to compare different RDBMS products directly. They are only useful to compare two approaches on the same RDBMS product.

The results as run on Oracle 18c XE in Docker on development hardware are below. Times are compared relative to the fastest run, as actual time spent in each execution is not interesting for comparison. We did have similar results on production-like hardware, though, and I’m sure you can produce similar results in other RDBMS:

NUMBER(20, 10)
-------------------------------------------------
Run 3, Statement 1 : 280.34143  (avg : 280.75347)
Run 3, Statement 2 : 7.99402    (avg : 8.03506)
Run 3, Statement 3 : 7.71383    (avg : 7.73017)
Run 3, Statement 4 : 1.05456    (avg : 1.11735)

DOUBLE PRECISION
------------------------------------------------
Run 3, Statement 5 : 278.89476 (avg : 279.72981)
Run 3, Statement 6 : 8.06512   (avg : 8.07033)
Run 3, Statement 7 : 7.81873   (avg : 7.80063)
Run 3, Statement 8 : 1.5315    (avg : 1.54347)

BINARY_DOUBLE
------------------------------------------------
Run 3, Statement 9 : 2.4963    (avg : 2.57184)
Run 3, Statement 10: 1         (avg : 1.02943)

How to read these results?

  • Statement 10 is the fastest one, unsurprisingly, as it aggregates pre-calculated LN(binary_double) values. The precalculation of the function means that all the work has been done already before the report, and the data type is the one we expected to perform best in general
  • Statements 4 and 8 are almost as fast (precalculated LN() values). Being only slight factors off, we can attribute the difference to the usual benchmark flaws, although it’s interesting to see that DOUBLE PRECISION seems 1.5x slower to sum than BINARY_DOUBLE and even NUMBER
  • Statements 1, 5, 9 are the ones where no data type conversion is applied and SUM(LN(the_value)) is being calculated. It is staggering how much slower both NUMBER and DOUBLE PRECISION are than BINARY_DOUBLE. Statements 1 and 5 are a factor of 112x slower than statement 9!
  • Statements 2-3, 6-7 prove that converting the NUMBER or DOUBLE PRECISION vales to BINARY_DOUBLE on the fly provides a sufficiently performant workaround, which made statements 2-3, 6-7 only 3x slower than statement 9
  • Statements 2-3, 6-7 show that casting and converting are about equivalent

Note, we found similar results with other functions, such as EXP()

Analysis

The order of magnitude may seem surprising at first, but thinking about it, it is not. We would never do CPU intensive computation with java.math.BigDecimal in Java. The BigDecimal type is there for numeric accuracy, e.g. when it really matters what the monetary amount is, exactly. When we run analytics on monetary amounts, using double is sufficient in Java as well.

If our data is BigDecimal, and we cannot reasonably change that, it might still be better to use the BigDecimal::doubleValue conversion prior to further processing using e.g. Math::log. So, this translates directly to SQL, whose LN() implementations are data type specific. IEEE 754 having been designed precisely for this purpose.

When doing CPU intensive computations both in Java, or in the database, we should always evaluate our various options of

  • Quick fixing our data sets for the report (ad-hoc conversion prior to calculation)
  • Thoroughly fixing our data sets in the schema (migration towards a better data type)
  • Preprocessing our data sets (precalculating some very commonly used computations)

Benchmark logic

Just run the below on your own hardware. I’m curious to see your results:

-- Copyright Data Geekery GmbH
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
--     http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- This version displays relative execution times (fastest execution = 1)
SET SERVEROUTPUT ON

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 10;
  v_stmt NUMBER;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
    v_stmt := 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(n1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(n1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(n1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(n2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(d1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(d1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(d1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(d2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(b1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(b2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) / MIN(elapsed) OVER() AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE data;
DROP TABLE results;

Do Not Think That One Second is Fast for Query Execution

I keep encountering situations where RDBMS users think that one second for query execution is anything near fast. Most recently, in this Stack Overflow question:

Hibernate SQL In clause making CPU usage to 100%

The poster’s original question was why a similar query executes in one second when executed in SQL Server Management Studio whereas the (seemingly) same query executes in 60 seconds when executed from Hibernate. The query looks similar to this:

select Student_Id 
from Student_Table 
where Roll_No in ('A101','A102','A103',.....'A250');

There might be many reasons for this difference. But the most striking message here is:

Please do not believe that 1 second is fast.

Databases are incredibly fast and simple queries like the one above should execute in virtually no time, even on mediocre servers. Even on your laptop! Markus Winand has made it a point to tell you that in 80% of all performance issues, all you have to do is to add that missing index. And that’s the case here as well!

The poster’s original table contains only two indexes:

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 
         UNIQUE  (Class_Id, Roll_No)
);

There is an index to implement the PRIMARY KEY, and there’s another index for the UNIQUE constraint, but both indexes aren’t really very useful, as the query predicate filters on Roll_No, which is only the second column of the UNIQUE constraint. When executing the above query on roughly 8 million rows, I’m getting an index scan on the UNIQUE index and the query runs in three seconds:

plan-1

This “Index Scan” operation is not good at all. I’m actually running through all of the index to find all the applicable Roll_No values in each index leaf node. This is explained well in Use The Index Luke’s page about concatenated indexes

The solution

But the good news is, SQL Server Management Studio gives you immediate tuning advice. Just right click on the execution plan and choose “Missing Index Details…” to get the following advice:

/*
Missing Index Details from SQLQuery1.sql -
    LUKAS-ENVY\SQLEXPRESS.test
The Query Processor estimates that implementing the 
    following index could improve the query cost 
    by 87.5035%.
*/

/*
USE [test]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[student_table] ([Roll_No])
INCLUDE ([Student_Id])
GO
*/

This doesn’t necessarily mean that the above index is the optimal choice for all your queries, but the fact that you’re querying using a predicate on Roll_No should be a strong-enough indicator that you should have an index on at least this Roll_No column. The simplest possible index here is simply:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No);

With that index in place, we’ll now get an “Index Seek” operation, which runs instantly:

plan-2

Covering index

In this particular case, a “covering index” as suggested by Vlad Mihalcea in his answer might be appropriate. For instance:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No, Student_Id);

The advantage of a covering index is that all the information needed for the query execution is already contained in the index. This is true in this particular case, but it can also be dangerous as:

  • The covering index needs more space, and if the table is already large, space might become an issue
  • The covering index only adds value as long as the query doesn’t also use additional columns (e.g. for projections, calculations, additional filtering, sorting, etc.). This is probably not the case in this simple example, which might change quickly in the near future

Thus, a covering index shouldn’t be your default choice in such cases. Better be conservative and add only those columns in the index, that add immediate value for filtering.

Conclusion

I’d like to repeat this time and again:

Do NOT think that one second is fast tweet this

In fact:

Do NOT think that anything beyond 2-3ms is fast!

Unless you’re doing heavy reporting or batch processing, where processing time obviously might take a bit longer, simple queries like these should never be anything slower than instant. And most of the time, you can achieve this speed by adding an index.

On a side-note

The poster of the aforementioned question obviously has other issues as well. None of the above explain the execution speed difference between Hibernate and “plain SQL”. But again, the message here is that if your “plain SQL” already takes more than one second, you have a very low-hanging fruit to fix.

When will we have LINQ in Java?

LINQ is one of Microsoft’s .NET Framework’s most distinct language features. When it was first introduced to languages such as C#, it required heavy changes to the language specification. Yet, this addition was extremely powerful and probably unequalled by other languages / platforms, such as Java, Scala, etc. Granted, Scala has integrated XML in a similar fashion into its language from the beginning, but that is hardly the same accomplishment. Nowadays, Typesafe developers are developing SLICK – Scala Language Integrated Connection Kit, which has similar ambitions, although the effort spent on it is hardly comparable: one “official” Scala developer against a big Microsoft team. Let alone the potential of getting into patent wars with Microsoft, should SLICK ever become popular.

What does Java have to offer?

There are many attempts of bringing LINQ-like API’s to the Java world, as the following Stack Overflow question shows:

http://stackoverflow.com/questions/1217228/what-is-the-java-equivalent-for-linq

Here’s another newcomer project by Julian Hyde, that I’ve recently discovered:

https://github.com/julianhyde/linq4j

He tried his luck on the lambda-dev mailing list, without any response so far. We’re all eagerly awaiting Java 8 and project lambda with its lambda expressions and extension methods. But when will we be able to catch up with Microsoft’s LINQ? After all, jOOQ, linq4j are “internal domain specific languages”, which are all limited by the expressivity of their host language (see my previous blog post about building domain specific languages in Java).

Java 9 maybe? We can only hope!

SQL in Scala, where jOOQ could go

I have recently blogged about how simple it is to integrate jOOQ into Scala. See the full blog post here:

https://lukaseder.wordpress.com/2011/12/11/the-ultimate-sql-dsl-jooq-in-scala/

I’m more and more thrilled by that option, as Scala is one of the fastest emerging JVM languages nowadays. The plain integration of a Java library in Scala leaves some open questions. jOOQ knows the “val()” operator or method, to create bind values. See the manual here:

http://www.jooq.org/manual/JOOQ/BindValues/

This operator cannot be used in Scala, as Scala declares “val” as a reserved word. I’ve had similar issues in Java before, when trying to use “case” or “else” in the API, which is not possible either. The path of least resistance is to overload or re-name those methods. “val” was overloaded as “value” in jOOQ 2.0.1. “case” and “else” were re-named a long time ago as “decode” (from Oracle’s DECODE function), and “otherwise” (as in XSL).

So for a full-fledged integration in Scala, jOOQ should be wrapped in a new API called scOOQ ;-). This new API should take Scala’s language features into account in order to make working with jOOQ a lot easier. This could be the chance to re-engineer some of the API and make all API methods uppercase, as is usual with SQL. With Scala’s ability of omitting syntax elements, such as “.” and “()”, the API could then declare one-word methods, such as in this Java example:

MERGE().INTO(MY_TABLE)
       .USING(SOURCE_TABLE)
       .ON(MY_TABLE.ID.equal(SOURCE_TABLE.ID))
       .WHEN().MATCHED().THEN().UPDATE()
         .SET(ID, 1)
         .SET(DATA, "Data")
       .WHEN().NOT().MATCHED().THEN().INSERT(MY_TABLE.ID, MY_TABLE.DATA)
       .VALUES(1, "Data")

While in Java, this looks quite nasty and verbose, in Scala it could be very nice! The below statement should compile in Scala if the API was declared as such:

MERGE INTO MY_TABLE
      USING (SOURCE_TABLE)
      ON (MY_TABLE.ID equal SOURCE_TABLE.ID)
      WHEN MATCHED THEN UPDATE
        SET (ID, 1)
        SET (DATA, "Data")
      WHEN NOT MATCHED THEN INSERT (MY_TABLE.ID, MY_TABLE.DATA)
      VALUES (1, "Data")

Convinced? Contributions very welcome! :-)