A DBA’s point of view of Hibernate


A very nice little rant on Hibernate:

http://jeffkemponoracle.com/2011/11/25/3-reasons-to-hate-hibernate

While I don’t agree 100% (e.g. CRUD/OLTP really is different from OLAP, and Hibernate is a strong CRUD tool), I certainly share most of his feelings.

Arcane magic with the SQL:2003 MERGE statement


Every now and then, we feel awkward about having to distinguish INSERT from UPDATE for any of the following reasons:

  • We have to issue at least two statements
  • We have to think about performance
  • We have to think about race conditions
  • We have to choose between [UPDATE; IF UPDATE_COUNT = 0 THEN INSERT] and [INSERT; IF EXCEPTION THEN UPDATE]
  • We have to do those statements once per updated / inserted record

All in all, this is a big source of error and frustration. When at the same time, it could’ve been so easy with the SQL MERGE statement!

A typical situation for MERGE

Among many other use-cases, the MERGE statement may come in handy when handling many-to-many relationships. Let’s say we have this schema:

CREATE TABLE documents (
  id NUMBER(7) NOT NULL,
  CONSTRAINT docu_id PRIMARY KEY (id)
);

CREATE TABLE persons (
  id NUMBER(7) NOT NULL,
  CONSTRAINT pers_id PRIMARY KEY (id)
);

CREATE TABLE document_person (
  docu_id NUMBER(7) NOT NULL,
  pers_id NUMBER(7) NOT NULL,
  flag NUMBER(1) NULL,

  CONSTRAINT docu_pers_pk PRIMARY KEY (docu_id, pers_id),
  CONSTRAINT docu_pers_fk_docu
    FOREIGN KEY (docu_id) REFERENCES documents(id),
  CONSTRAINT docu_pers_fk_pers
    FOREIGN KEY (pers_id) REFERENCES persons(id)
);

The above tables are used to model which person has read (flag=1) / deleted (flag=2) what document. To make things simple, the “document_person” entity is usually OUTER JOINed to “documents”, such that the presence or absence of a “document-person” record may have the same semantics: “flag IS NULL” means the document is unread.

Now when you want to mark a document as read, you have to decide whether you INSERT a new “document_person”, or whether to UPDATE the existing one. Same with deletion. Same with marking all documents as read, or deleting all documents.

Use MERGE instead

You can do it all in one statement! Let’s say, you want to INSERT/UPDATE one record, in order to mark one document as read for a person:

-- The target table
MERGE INTO document_person dst

-- The data source. In this case, just a dummy record
USING (
  SELECT :docu_id as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM DUAL
) src

-- The merge condition (if true, then update, else insert)
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The update action
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag

-- The insert action
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

This looks quite similar, yet incredibly more verbose than MySQL’s INSERT .. ON DUPLICATE KEY UPDATE statement, which is a bit more concise.

Taking it to the extreme

But you can go further! As I said previously, you may also want to mark ALL documents as read, for a given person. No problem with MERGE. The following statement does the same as the previous one, if you specify :docu_id. If you leave it null, it will just mark all documents as :flag:

MERGE INTO document_person dst

-- The data source is now all "documents" (or just :docu_id) left outer
-- joined with the "document_person" mapping
USING (
  SELECT d.id     as docu_id,
         :pers_id as pers_id,
         :flag    as flag
  FROM documents d
  LEFT OUTER JOIN document_person d_p
  ON d.id = d_p.docu_id AND d_p.pers_id = :pers_id
  -- If :docu_id is set, select only that document
  WHERE (:docu_id IS NOT NULL AND d.id = :docu_id)
  -- Otherwise, select all documents
     OR (:docu_id IS NULL)
) src

-- If the mapping already exists, update. Else, insert
ON (dst.docu_id = src.docu_id AND dst.pers_id = src.pers_id)

-- The rest stays the same
WHEN MATCHED THEN UPDATE SET
  dst.flag = src.flag
WHEN NOT MATCHED THEN INSERT (
  dst.docu_id,
  dst.pers_id,
  dst.flag
)
VALUES (
  src.docu_id,
  src.pers_id,
  src.flag
)

MERGE support in jOOQ

MERGE is also fully supported in jOOQ. See the manual for more details (scroll to the bottom):

http://www.jooq.org/manual/JOOQ/Query/

Happy merging! 🙂

GROUP BY ROLLUP / CUBE


Every now and then, you come across a requirement that will bring you to your SQL limits. Many of us probably give up early and calculate stuff in Java / [or your language]. Instead, it might’ve been so easy and fast to do with SQL. If you’re working with an advanced database, such as DB2, Oracle, SQL Server, Sybase SQL Anywhere, (and MySQL in this case, which supports the WITH ROLLUP clause), you can take advantage of the ROLLUP / CUBE / GROUPING SETS grouping functions.

Lets have a look at my fictional salary progression compared to that of a fictional friend, who has chosen a different career path (observe the salary boost in 2011):

select 'Lukas'      as employee, 
       'SoftSkills' as company, 
	   80000        as salary, 
	   2007         as year 
from dual
union all select 'Lukas', 'SoftSkills', 80000,  2008 from dual
union all select 'Lukas', 'SmartSoft',  90000,  2009 from dual
union all select 'Lukas', 'SmartSoft',  95000,  2010 from dual
union all select 'Lukas', 'jOOQ',       200000, 2011 from dual
union all select 'Lukas', 'jOOQ',       250000, 2012 from dual
union all select 'Tom',   'SoftSkills', 89000,  2007 from dual
union all select 'Tom',   'SoftSkills', 90000,  2008 from dual
union all select 'Tom',   'SoftSkills', 91000,  2009 from dual
union all select 'Tom',   'SmartSoft',  92000,  2010 from dual
union all select 'Tom',   'SmartSoft',  93000,  2011 from dual
union all select 'Tom',   'SmartSoft',  94000,  2012 from dual

Now we’re used to gathering statistics using simple grouping and simple aggregate functions. For instance, let’s calculate how much Lukas and Tom earned on average over the past few years:

with data as ([above select])
select employee, avg(salary)
from data
group by employee

This will show that Lukas has earned more:

+--------+-----------+
|EMPLOYEE|AVG(SALARY)|
+--------+-----------+
|Lukas   |     132500|
|Tom     |      91500|
+--------+-----------+

So it’s probably interesting to find out what they have earned on average in which company:

with data as (...)
select company, employee, avg(salary)
from data
group by company, employee
order by company, employee

And immediately, it becomes clear where the big bucks are and that Tom has made a bad decision 😉

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SmartSoft |Tom     |      93000|
|SoftSkills|Lukas   |      80000|
|SoftSkills|Tom     |      90000|
+----------+--------+-----------+

ROLLUP

By adding grouping fields, we “lose” some aggregation information. In the above examples, the overall average salary per employee is no longer available directly from the result. That’s obvious, considering the grouping algorithm. But in nice-looking reports, we often want to display those grouping headers as well. This is where ROLLUP, CUBE (and GROUPING SETS) come into play. Consider the following query:

with data as (...)
select company, employee, avg(salary)
from data
group by rollup(company), employee

The above rollup function will now add additional rows to the grouping result set, holding useful aggregated values. In this case, when we “roll up the salaries of the company”, we will get the average of the remaining grouping fields, i.e. the average per employee:

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
+----------+--------+-----------+

Note how these rows hold the same information as the ones from the first query, where we were only grouping by employee… This becomes even more interesting, when we put more grouping fields into the rollup function:

with data as (...)
select company, employee, avg(salary)
from data
group by rollup(employee, company)

As you can see, the order of grouping fields is important in the rollup function. The result from this query now also adds the overall average salary paid to all employees in all companies

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
|{null}    |{null}  |     112000|
+----------+--------+-----------+

In order to identify the totals rows for reporting, you can use the GROUPING() function in DB2, Oracle, SQL Server and Sybase SQL Anywhere. In Oracle and SQL Server, there’s the even more useful GROUPING_ID() function:

with data as (...)
select grouping_id(employee, company) id, company, employee, avg(salary)
from data
group by rollup(employee, company)

It documents on what “grouping level” of the rollup function the current row was produced:

+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Tom     |      91500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
|   1|{null}    |Lukas   |     132500|
|   3|{null}    |{null}  |     112000|
+----+----------+--------+-----------+

CUBE

The cube function works similar, except that the order of cube grouping fields becomes irrelevant, as all combinations of grouping are combined. This is a bit tricky to put in words, so lets put it in action:

with data as (...)
select grouping_id(employee, company) id, company, employee, avg(salary)
from data
group by cube(employee, company)

In the following result, you will get:

  • GROUPING_ID() = 0: Average per company and employee. This is the normal grouping result
  • GROUPING_ID() = 1: Average per employee
  • GROUPING_ID() = 2: Average per company
  • GROUPING_ID() = 3: Overall average
+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   3|{null}    |{null}  |     112000|
|   2|jOOQ      |{null}  |     225000|
|   2|SmartSoft |{null}  |      92800|
|   2|SoftSkills|{null}  |      86000|
|   1|{null}    |Tom     |      91500|
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Lukas   |     132500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
+----+----------+--------+-----------+

In other words, using the CUBE() function, you will get grouping results for every possible combination of the grouping fields supplied to the CUBE() function, which results in 2^n GROUPING_ID()’s for n “cubed” grouping fields

Support in jOOQ

jOOQ 2.0 introduces support for these functions. If you want to translate the last select into jOOQ, you’d roughly get this Java code:

// assuming that DATA is an actual table...
create.select(
         groupingId(DATA.EMPLOYEE, DATA.COMPANY).as("id"),
         DATA.COMPANY, DATA.EMPLOYEE, avg(SALARY))
      .from(DATA)
      .groupBy(cube(DATA.EMPLOYEE, DATA.COMPANY));

With this powerful tool, you’re ready for all of those fancy reports and data overviews. For more details, read on about ROLLUP(), CUBE(), and GROUPING SETS() functions on the SQL Server documentation page, which explains it quite nicely:

http://msdn.microsoft.com/en-us/library/bb522495.aspx

Next stop on the Annotatiomaniaâ„¢ Train: FetchGroups


Here’s how an Annotatiomaniacâ„¢ can optimise his second-level cached JPA queries conveniently by adding yet more annotations (what else?) to his simple and plain POJO’s. Remember that EJB 3.0 was about removing dependencies from javax.ejb.EJBObject, javax.ejb.EJBHome and similar interfaces? Well… check out this example:

import org.apache.openjpa.persistence.*;

@Entity
@FetchGroups({
    @FetchGroup(name="detail", attributes={
        @FetchAttribute(name="publisher"),
        @FetchAttribute(name="articles")
    }),
    ...
})
public class Magazine {

   @ManyToOne(fetch=FetchType.LAZY)
   @LoadFetchGroup("detail")
   private Publisher publisher;

   ...
}

Straightforward. But what does it do? Here’s the definition:

Fetch groups are sets of fields that load together. They can be used to to pool together associated fields in order to provide performance improvements over standard data fetching. Specifying fetch groups allows for tuning of lazy loading and eager fetching behavior.

I see. I can hear your DBA screaming at the indecipherable and out of control SQL that must result from all of that. Or can anyone show me an example where this does NOT go out of control? Here’s Fetch Groups, as documented by Oracle Fusion Middleware, a tool to simplify your life when working with JPA / JDO:

http://docs.oracle.com/cd/E16764_01/apirefs.1111/e13946/ref_guide_fetch.html

</rant>

Join the “dark side” with jOOQ


jOOQ is now also associated with the “dark side” and “SciFi”:

http://blog.pdark.de/2011/11/23/peace-between-java-and-sql/

So, join the dark side now 🙂

Java’s missing unsigned integer types


This is a topic that has been discussed many times before. Java’s lack of unsigned byte/short/int/long types. The main reasons why the JLS designers omitted those types were:

  1. They’re hardly really useful
  2. They’re a bit more difficult to implement
  3. They’re a bit more difficult to understand
  4. They would lead to more primitive types that have to be treated separately from the existing ones
  5. … and probably, there are more reasons

Nevertheless, these types are sometimes useful for cryptography, image processing, binary protocols, everything related to binary data (why is byte unsigned after all??), and the list of rants in the following ticket at Sun/Oracle is long:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4504839

For jOOQ, unsigned number types would be useful, as some databases support them (e.g. MySQL, Postgres). And they’re not necessarily trivial to map to Java. So I was looking for a good solution. The best one being to use wrapper classes extending java.lang.Number. So, I have raised the question on Stack Overflow to find such a library:

http://stackoverflow.com/questions/8193031/is-there-a-java-library-for-unsigned-number-type-wrappers

Incredibly, no one seems to have done this – except for some partial implementations in some large libraries. So I’m launching a new OSS project called jOOU – U is for Unsigned. Check out a small library for Java Unsigned Number wrappers:

http://code.google.com/p/joou/