This Common API Technique is Actually an Anti-Pattern

I admit, we’ve been lured into using this technique as well. It’s just so convenient, as it allows for avoiding a seemingly unnecessary cast. It’s the following technique here:

interface SomeWrapper {
  <T> T get();
}

Now you can type safely assign anything from the wrapper to any type:

SomeWrapper wrapper = ...

// Obviously
Object a = wrapper.get();

// Well...
Number b = wrapper.get();

// Risky
String[][] c = wrapper.get();

// Unprobable
javax.persistence.SqlResultSetMapping d = 
    wrapper.get();

This is actually the API you can use when you’re using jOOR, our reflection library that we’ve written and open sourced to improve our integration tests. With jOOR, you can write things like:

Employee[] employees = on(department)
    .call("getEmployees").get();
 
for (Employee employee : employees) {
    Street street = on(employee)
        .call("getAddress")
        .call("getStreet")
        .get();
    System.out.println(street);
}

The API is rather simple. The on() method wraps an Object or a Class. The call() methods then call a method on that object using reflection (but without requiring exact signatures, without requiring the method to be public, and without throwing any checked exceptions). And without the need for casting, you can then call get() to assign the result to any arbitrary reference type.

This is probably OK with a reflection library like jOOR, because the whole library is not really type safe. It can’t be, because it’s reflection.

But the “dirty” feeling remains. The feeling of giving the call-site a promise with respect to the resulting type, a promise that cannot be kept, and that will result in ClassCastException – a thing of the past that junior developers who have started after Java 5 and generics hardly know.

But the JDK libraries also do that…

Yes, they do. But very seldomly, and only if the generic type parameter is really irrelevant. For instance, when getting a Collection.emptyList(), whose implementation looks like this:

@SuppressWarnings("unchecked")
public static final <T> List<T> emptyList() {
    return (List<T>) EMPTY_LIST;
}

It’s true that the EMPTY_LIST is cast unsafely from List to List<T>, but from a semantic perspective, this is a safe cast. You cannot modify this List reference, and because it’s empty, there is no method in List<T> that will ever give you an instance of T or T[] that does not correspond to your target type. So, all of these are valid:

// perfectly fine
List<?> a = emptyList();

// yep
List<Object> b = emptyList();

// alright
List<Number> c = emptyList();

// no problem
List<String[][]> d = emptyList();

// if you must
List<javax.persistence.SqlResultSetMapping> e 
    = emptyList();

So, as always (or mostly), the JDK library designers have taken great care not to make any false promises about the generic type that you might get. This means that you will often get an Object type where you know that another type would be more suitable.

But even if YOU know this, the compiler won’t. Erasure comes at a price and the price is paid when your wrapper or collection is empty. There is no way of knowing the contained type of such an expression, so don’t pretend you do. In other words:

Do not use the just-to-avoid-casting generic method anti pattern

Do Not Think That One Second is Fast for Query Execution

I keep encountering situations where RDBMS users think that one second for query execution is anything near fast. Most recently, in this Stack Overflow question:

Hibernate SQL In clause making CPU usage to 100%

The poster’s original question was why a similar query executes in one second when executed in SQL Server Management Studio whereas the (seemingly) same query executes in 60 seconds when executed from Hibernate. The query looks similar to this:

select Student_Id 
from Student_Table 
where Roll_No in ('A101','A102','A103',.....'A250');

There might be many reasons for this difference. But the most striking message here is:

Please do not believe that 1 second is fast.

Databases are incredibly fast and simple queries like the one above should execute in virtually no time, even on mediocre servers. Even on your laptop! Markus Winand has made it a point to tell you that in 80% of all performance issues, all you have to do is to add that missing index. And that’s the case here as well!

The poster’s original table contains only two indexes:

CREATE TABLE student_table (
       Student_Id BIGINT NOT NULL IDENTITY
     , Class_Id BIGINT NOT NULL
     , Student_First_Name VARCHAR(100) NOT NULL
     , Student_Last_Name VARCHAR(100)
     , Roll_No VARCHAR(100) NOT NULL
     , PRIMARY KEY (Student_Id)
     , CONSTRAINT UK_StudentUnique_1 
         UNIQUE  (Class_Id, Roll_No)
);

There is an index to implement the PRIMARY KEY, and there’s another index for the UNIQUE constraint, but both indexes aren’t really very useful, as the query predicate filters on Roll_No, which is only the second column of the UNIQUE constraint. When executing the above query on roughly 8 million rows, I’m getting an index scan on the UNIQUE index and the query runs in three seconds:

plan-1

This “Index Scan” operation is not good at all. I’m actually running through all of the index to find all the applicable Roll_No values in each index leaf node. This is explained well in Use The Index Luke’s page about concatenated indexes

The solution

But the good news is, SQL Server Management Studio gives you immediate tuning advice. Just right click on the execution plan and choose “Missing Index Details…” to get the following advice:

/*
Missing Index Details from SQLQuery1.sql -
    LUKAS-ENVY\SQLEXPRESS.test
The Query Processor estimates that implementing the 
    following index could improve the query cost 
    by 87.5035%.
*/

/*
USE [test]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[student_table] ([Roll_No])
INCLUDE ([Student_Id])
GO
*/

This doesn’t necessarily mean that the above index is the optimal choice for all your queries, but the fact that you’re querying using a predicate on Roll_No should be a strong-enough indicator that you should have an index on at least this Roll_No column. The simplest possible index here is simply:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No);

With that index in place, we’ll now get an “Index Seek” operation, which runs instantly:

plan-2

Covering index

In this particular case, a “covering index” as suggested by Vlad Mihalcea in his answer might be appropriate. For instance:

CREATE INDEX i_student_roll_no
ON student_table (Roll_No, Student_Id);

The advantage of a covering index is that all the information needed for the query execution is already contained in the index. This is true in this particular case, but it can also be dangerous as:

  • The covering index needs more space, and if the table is already large, space might become an issue
  • The covering index only adds value as long as the query doesn’t also use additional columns (e.g. for projections, calculations, additional filtering, sorting, etc.). This is probably not the case in this simple example, which might change quickly in the near future

Thus, a covering index shouldn’t be your default choice in such cases. Better be conservative and add only those columns in the index, that add immediate value for filtering.

Conclusion

I’d like to repeat this time and again:

Do NOT think that one second is fast tweet this

In fact:

Do NOT think that anything beyond 2-3ms is fast!

Unless you’re doing heavy reporting or batch processing, where processing time obviously might take a bit longer, simple queries like these should never be anything slower than instant. And most of the time, you can achieve this speed by adding an index.

On a side-note

The poster of the aforementioned question obviously has other issues as well. None of the above explain the execution speed difference between Hibernate and “plain SQL”. But again, the message here is that if your “plain SQL” already takes more than one second, you have a very low-hanging fruit to fix.

Let’s Review How to Insert Clob or Blob via JDBC

LOBs are a PITA in all databases, as well as in JDBC. Handling them correctly takes a couple of lines of code, and you can be sure that you’ll get it wrong eventually. Because you have to think of a couple of things:

  • Foremost, LOBs are heavy resources that need special lifecycle management. Once you’ve allocated a LOB, you better “free” it as well to decrease the pressure on your GC. This article shows more about why you need to free lobs
  • The time when you allocate and free a lob is crucial. It might have a longer life span than any of your ResultSet, PreparedStatement, or Connection / transaction. Each database manages such life spans individually, and you might have to read up the specifications in edge cases
  • While you may use String instead of Clob, or byte[] instead of Blob for small to medium size LOBs, this may not always be the case, and may even lead to some nasty errors, like Oracle’s dreaded ORA-01461: can bind a LONG value only for insert into a LONG column

So, if you’re working on a low level using JDBC (instead of abstracting JDBC via Hibernate or jOOQ), you better write a small utility that takes care of proper LOB handling.

We’ve recently re-discovered our own utility that we’re using for jOOQ integration testing, at least in some databases, and thought this might be very useful to a couple of our readers who operate directly with JDBC. Consider the following class:

public class LOB implements AutoCloseable {

    private final Connection connection;
    private final List<Blob> blobs;
    private final List<Clob> clobs;

    public LOB(Connection connection) {
        this.connection = connection;
        this.blobs = new ArrayList<>();
        this.clobs = new ArrayList<>();
    }

    public final Blob blob(byte[] bytes) 
    throws SQLException {
        Blob blob;

        // You may write more robust dialect 
        // detection here
        if (connection.getMetaData()
                      .getDatabaseProductName()
                      .toLowerCase()
                      .contains("oracle")) {
            blob = BLOB.createTemporary(connection, 
                       false, BLOB.DURATION_SESSION);
        }
        else {
            blob = connection.createBlob();
        }

        blob.setBytes(1, bytes);
        blobs.add(blob);
        return blob;
    }

    public final Clob clob(String string) 
    throws SQLException {
        Clob clob;

        if (connection.getMetaData()
                      .getDatabaseProductName()
                      .toLowerCase()
                      .contains("oracle")) {
            clob = CLOB.createTemporary(connection, 
                       false, CLOB.DURATION_SESSION);
        }
        else {
            clob = connection.createClob();
        }

        clob.setString(1, string);
        clobs.add(clob);
        return clob;
    }


    @Override
    public final void close() throws Exception {
        blobs.forEach(JDBCUtils::safeFree);
        clobs.forEach(JDBCUtils::safeFree);
    }
}

This simple class has some nice treats:

  • It’s AutoCloseable, so you can free your lobs with the try-with-resources statement
  • It abstracts over the creation of LOBs across SQL dialects. No need to remember the Oracle way

To use this class, simply write something like the following:

try (
    LOB lob = new LOB(connection);
    PreparedStatement stmt = connection.prepareStatement(
        "insert into lobs (id, lob) values (?, ?)")
) {
    stmt.setInt(1, 1);
    stmt.setClob(2, lob.clob("abc"));
    stmt.executeUpdate();
}

That’s it! No need to keep references to the lob, safely freeing it if it’s not null, correctly recovering from exceptions, etc. Just put the LOB container in the try-with-resources statement, along with the PreparedStatement and done.

If you’re interested in why you have to call Clob.free() or Blob.free() in the first place, read our article about it. It’ll spare you one or two OutOfMemoryErrors

Is Your Eclipse Running a Bit Slow? Just Use This Simple Trick!

You wouldn’t believe it until you try it yourself. I’ve been using the Eclipse Mars developer milestones lately, and I’ve been having some issues with slow compilation. I always thought it was because of the m2e integration, which has never been famous for working perfectly. But then, it dawned upon me when I added a JPA persistence.xml file to run some jOOQ + Hibernate tests… I ran into this issue, and googled it to learn that many people are complaining about JPA validation running forever in their Eclipses.

So I searched for how to deactivate that, and boom!

All of my Eclipse got much much faster

In fact, I didn’t just deactivate JPA validation, but all validation:

deactivate all validation in your Eclipse to boost performance

I don’t remember the last time I ever needed validation, or thought that it was a useful feature in the first place. If you want to help your whole team, you can also check in the following file in each of your projects’ .settings/org.eclipse.wst.validation.prefs files:

DELEGATES_PREFERENCE=delegateValidatorList
USER_BUILD_PREFERENCE=enabledBuildValidatorListorg.eclipse.wst.wsi.ui.internal.WSIMessageValidator;
USER_MANUAL_PREFERENCE=enabledManualValidatorListorg.eclipse.wst.wsi.ui.internal.WSIMessageValidator;
USER_PREFERENCE=overrideGlobalPreferencestruedisableAllValidationtrueversion1.2.600.v201501211647
eclipse.preferences.version=1
override=true
suspend=true
vf.version=3

This has the same effect, but can be checked into version control.

Found this tip useful? See also our list of Top 5 Useful Hidden Eclipse Features

How JPA 2.1 has become the new EJB 2.0

Beauty lies in the eye of the beholder. So does “ease”:

Thorben writes very good and useful articles about JPA, and he’s recently started an excellent series about JPA 2.1’s new features. Among which: Result set mapping. You may know result set mapping from websites like CTMMC, or annotatiomania.com. We can summarise this mapping procedure as follows:

a) define the mapping

@SqlResultSetMapping(
    name = "BookAuthorMapping",
    entities = {
        @EntityResult(
            entityClass = Book.class,
            fields = {
                @FieldResult(name = "id", column = "id"),
                @FieldResult(name = "title", column = "title"),
                @FieldResult(name = "author", column = "author_id"),
                @FieldResult(name = "version", column = "version")}),
        @EntityResult(
            entityClass = Author.class,
            fields = {
                @FieldResult(name = "id", column = "authorId"),
                @FieldResult(name = "firstName", column = "firstName"),
                @FieldResult(name = "lastName", column = "lastName"),
                @FieldResult(name = "version", column = "authorVersion")})})

The above mapping is rather straight-forward. It specifies how database columns should be mapped to entity fields and to entities as a whole. Then you give this mapping a name ("BookAuthorMapping"), which you can then reuse across your application, e.g. with native JPA queries.

I specifically like the fact that Thorben then writes:

If you don’t like to add such a huge block of annotations to your entity, you can also define the mapping in an XML file

… So, we’re back to replacing huge blocks of annotations by huge blocks of XML – a technique that many of us wanted to avoid using annotations… 🙂

b) apply the mapping

Once the mapping has been statically defined on some Java type, you can then fetch those entities by applying the above BookAuthorMapping

List<Object[]> results = this.em.createNativeQuery(
    "SELECT b.id, b.title, b.author_id, b.version, " +
    "       a.id as authorId, a.firstName, a.lastName, " + 
    "       a.version as authorVersion " + 
    "FROM Book b " +
    "JOIN Author a ON b.author_id = a.id", 
    "BookAuthorMapping"
).getResultList();

results.stream().forEach((record) -> {
    Book book = (Book)record[0];
    Author author = (Author)record[1];
});

Notice how you still have to remember the Book and Author types and cast explicitly as no verifiable type information is really attached to anything.

The definition of “complex”

Now, the article claims that this is “complex” mapping, and no doubt, I would agree. This very simple query with only a simple join already triggers such an annotation mess if you want to really map your entities via JPA. You don’t want to see Thorben’s mapping annotations, once the queries get a little more complex. And remember, @SqlResultSetMapping is about mapping (native!) SQL results, so we’re no longer in object-graph-persistence land, we’re in SQL land, where bulk fetching, denormalising, aggregating, and other “fancy” SQL stuff is king.

The problem is here:

Java 5 introduced annotations. Annotations were originally intended to be used as “artificial modifiers”, i.e. things like static, final, protected (interestingly enough, Ceylon only knows annotations, no modifiers). This makes sense. Java language designers could introduce new modifiers / “keywords” without breaking existing code – because “real” keywords are reserved words, which are hard to introduce in a language. Remember enum?

So, good use-cases for annotations (and there are only few) are:

  • @Override
  • @Deprecated (although, a comment attribute would’ve been fancy)
  • @FunctionalInterface

JPA (and other Java EE APIs, as well as Spring) have gone completely wacko on their use of annotations. Repeat after me:

No language @Before or @After Java ever abused annotations as much as Java tweet this

(the @Before / @After idea was lennoff’s, on reddit)

There is a strong déjà vu in me when reading the above. Do you remember the following?

No language before or after Java ever abused checked exceptions as much as Java

We will all deeply regret Java annotations by 2020.

Annotations are a big wart in the Java type system. They have an extremely limited justified use and what we Java Enterprise developers are doing these days is absolutely not within the limits of “justified”. We’re abusing them for configuration for things that we should really be writing code for.

Here’s how you’d run the same query with jOOQ (or any other API that leverages generics and type safety for SQL):

Book b = BOOK.as("b");
Author a = AUTHOR.as("a");

DSL.using(configuration)
   .select(b.ID, b.TITLE, b.AUTHOR_ID, b.VERSION,
           a.ID, a.FIRST_NAME, a.LAST_NAME,
           a.VERSION)
   .from(b)
   .join(a).on(b.AUTHOR_ID.eq(a.ID))
   .fetch()
   .forEach(record -> {
       BookRecord book = record.into(b);
       AuthorRecord author = record.into(a);
   });

This example combines both JPA 2.1’s annotations AND querying. All the meta information about projected “entities” is already contained in the query and thus in the Result that is produced by the fetch() method. But it doesn’t really matter, the point here is that this lambda expression …

record -> {
    BookRecord book = record.into(b);
    AuthorRecord author = record.into(a);
}

… it can be anything you want! Like the more sophisticated examples we’ve shown in previous blog posts:

Mapping can be defined ad-hoc, on the fly, using functions. Functions are the ideal mappers, because they take an input, produce an output, and are completely stateless. And the best thing about functions in Java 8 is, they’re compiled by the Java compiler and can be used to type-check your mapping. And you can assign functions to objects, which allows you to reuse the functions, when a given mapping algorithm can be used several times.

In fact, the SQL SELECT clause itself is such a function. A function that transforms input tuples / rows into output tuples / rows, and you can adapt that function on the fly using additional expressions.

There is absolutely no way to type-check anything in the previous JPA 2.1 native SQL statement and @SqlResultSetMapping example. Imagine changing a column name:

List<Object[]> results = this.em.createNativeQuery(
    "SELECT b.id, b.title as book_title, " +
    "       b.author_id, b.version, " +
    "       a.id as authorId, a.firstName, a.lastName, " + 
    "       a.version as authorVersion " + 
    "FROM Book b " +
    "JOIN Author a ON b.author_id = a.id", 
    "BookAuthorMapping"
).getResultList();

Did you notice the difference? The b.title column was renamed to book_title. In a SQL string. Which blows up at run time! How to remember that you have to also adapt

@FieldResult(name = "title", column = "title")

… to be

@FieldResult(name = "title", column = "book_title")

Conversely, how to remember that once you rename the column in your @FieldResult, you’ll also have to go check wherever this "BookAuthorMapping" is used, and also change the column names in those queries.

@SqlResultSetMapping(
    name = "BookAuthorMapping",
    ...
)

Annotations are evil

You may or may not agree with some of the above. You may or may not like jOOQ as an alternative to JPA, that’s perfectly fine. But it is really hard to disagree with the fact that:

  • Java 5 introduced very useful annotations
  • Java EE / Spring heavily abused those annotations to replace XML
  • We now have a parallel universe type system in Java
  • This parallel universe type system is completely useless because the compiler cannot introspect it
  • Java SE 8 introduces functional programming and lots of type inference
  • Java SE 9-10 will introduce more awesome language features
  • It now becomes clear that what was configuration (XML or annotations) should have been code in the first place
  • JPA 2.1 has become the new EJB 2.0: Obsolete

As I said. Hard to disagree. Or in other words:

Code is much better at expressing algorithms than configuration tweet this

I’ve met Thorben personally on a number of occasions at conferences. This rant here wasn’t meant personally, Thorben 🙂 Your articles about JPA are very interesting. If you readers of this article are using JPA, please check out Thorben’s blog: http://www.thoughts-on-java.org.

In the meantime, I would love to nominate Thorben for the respected title “The Annotatiomaniac of the Year 2015

jOOQ Tuesdays: Vlad Mihalcea Gives Deep Insight into SQL and Hibernate

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

vlad_mihalcea

We have the pleasure of talking to Vlad Mihalcea in this third edition who will be telling us about the skills developers need to acquire when working with Java, SQL, and Hibernate.

Hi Vlad – You’re blog explodes with excellent posts about Hibernate. It looks like you love digging deep into the most popular persistence API in the market, right?

I really mean when saying that “teaching is my way of learning” and to master a certain technology, you have to go beyond the reference documentation. Hibernate has been around for 10 years now and there’s a plethora of projects built on top of it. The Hibernate Master Class focuses on some proven ORM design patterns, like concurrency control, caching and batching.

You’ve recently told me about your realisation of the lack of SQL insight in our industry. How did that come to be?

The Object-Relational mismatch is only the tip of the iceberg, when it comes to accessing data. The biggest problem we face in enterprise systems, is the Enterprise-Database developer mismatch.

A developer knows about the programming languages, design patterns and application architecturing, but database skills are always attributed to the Database Administrator role. This is a very dangerous assumption.

It’s as if we developed on Linux without ever wanting to learn how the operating system works, relying solely on the System Administrator knowledge. If you develop enterprise applications, you have no escape but learning how a database works. Reading the excellent “SQL Performance Explained” book, made me realize how little I knew about the inner-workings of relational database systems. This book is meant for developers and it’s a must-read for every enterprise developer professional.

What can we do to improve the situation for our industry? Is there a chance for a tighter integration of JPA and SQL? Or specifically, of Hibernate and jOOQ?

First, it’s the mindset that needs to change. We need to acknowledge that there’s no such thing as a one-size-fits-all framework, and that applies to database access as well. When I write unit tests, I don’t limit myself to JUnit. I also use Mockito and Hamcrest, a testing stack being a better alternative.

JPA excels when writing data, because you can the INSERT/UPDATE statements are automatically updated, whenever the persistence model changes. The implicit and explicit locking allow us to protect against lost updates, especially in long conversation workflows.

But while abstracting the SQL write statements is a doable task, when it comes to reading data, nothing can beat native SQL. The most commonly-used RDBMS have implemented non-standard data access techniques (window functions, Common Table Expressions, PIVOT) and the SQL-92 JPA abstraction layer can only focus on common functionalities. That’s why native querying is unavoidable on almost any enterprise system.

jOOQ has done a very good job promoting SQL knowledge into the Java ecosystem. Java is ruling the enterprise software development and SQL skills have always been the Achilles heel of most enterprise development teams.

While you can fire native queries from JPA, there’s no support for dynamic native query building. jOOQ allows you to build type-safe dynamic native queries, strengthening your application against SQL-injection attacks. jOOQ can be integrated with JPA, as I already proven on my blog, and the JPA-jOOQ combo can provide a solid data access stack.

Tell us a little bit about your Hibernate Master Class, and your personal blogging strategy.

The Hibernate Master Class blog series is actually a book in the making. Because I work a full-time job, it’s difficult to commit to a fixed writing schedule, so I can only write as much as my spare times allows me.

Once all topics are covered, I’ll turn all this info into a book, that I’m going to self-publish, following the “SQL Performance Explained” example.

[ Edit ] The book has been finished and is available here:

https://leanpub.com/high-performance-java-persistence

Where will you be in 5 years?

I enjoy both software architecture, as well as writing about it. I will continue on this journey and see where the wind will carry me.

It’s the Little Things: The PL/SQL NULL Statement, and why Every Language Should have One

Syntax is one of those topics. One of those emotional topics that lead to very very very important discussions. I personally like PL/SQL. It is extremely verbose, and precise. It forces you to adhere to a very strong and rigid type system, slowing you down, which is likely to help you avoid mistakes.

There is one thing in PL/SQL that I like in particular. There is no such thing as an empty block.

While in Java, you could write:

// Just an empty block:
{}

// An empty block with a label:
label1: {}

// Or, in fact, the empty statement:
;
label2: ;

The problem with the above from a mere syntactic perspective is that an empty block may have been left unintentionally empty. An empty statement may not even be visible at all. Consider this in the context of an if statement:

if (something) {

}

if (somethingElse) ;

In PL/SQL, this isn’t possible. There is no such thing as an empty block. The following doesn’t compile:

BEGIN
END;

Nope:

ORA-06550: line 2, column 1:
PLS-00103: Encountered the symbol "END" when expecting ...

Or, take the IF statement. No emptiness allowed here, either:

IF 1 = 1 THEN
END IF;

Doesn’t work.

If you DO want to create empty blocks (e.g. as placeholders for code you’ll write later on), you’ll have to explicitly put a dummy statement there. PL/SQL has such a statement. The NULL statement:

BEGIN
  NULL;
END;

IF 1 = 1 THEN
  NULL;
END IF;

That makes sense. You immediately see: OK, nothing going on here.

Conclusion

Verbosity helps decrease the number of bugs in your code. The less people can be concerned with syntax (see again, the discussion about very very very important topics), the more they can focus on what they really intend to write. Let’s swallow our pride. When we get used to a language, we’ll accept ANY language. They’re all flawed and quirky. It doesn’t matter. But at least, the language should keep us from arguing about different ways to style it.

Now, let me go reformat all that moron’s lower-case PL/SQL code. Who the hell would write lower-case begin and end!??