Strategy: Stop Using Linked-Lists


While using java.util.LinkedHashMap every now and then, when I feel that the insertion order is relevant to subsequent entrySet iterations, I do not recall having used a LinkedList any time, recently. Of course, I understand its purpose and since Java 6, I apreciate the notion of a Deque type. But the LinkedList implementation of the List type hasn’t occurred to be useful to me very often.

Now, here’s an interesting summary about why linked lists can be very bad for your performance:
http://highscalability.com/blog/2013/5/22/strategy-stop-using-linked-lists.html

This summary is referring another, original article:
http://www.futurechips.org/thoughts-for-researchers/quick-post-linked-lists.html#more-818

While the “academic” advantage of a linked list is obvious to everyone (big-O advantage of insertion, removal operations), “real life” disadvantages related to hardware, memory, heap, quickly turn against you when using linked lists. What holds true in the C universe is probably not so wrong in the Java universe as well…


Pinterest and SQL vs. NoSQL


I’ve recently discovered a very interesting read about Pinterest‘s architecture experimentation. One of the key messages is the fact that SQL and NoSQL data storage systems can coexist with each of them having their place. Here’s the full article:

http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html

This reminds me of a previous article about Instagram successfully displaying how they implemented large-scale sharding on Postgres:

http://blog.jooq.org/2011/12/10/a-success-story-of-sql-scaling-horizontally/


Subtle Changes in Java 8: Repeatable Annotations


Apart from the “big stuff”, related to extension methods, lambda, and the streams API, Java 8 also has a couple of minor, very subtle changes. One of them is the fact that you can now annotate an object several times with the same annotation!

An example taken from the tutorial:

@Alert(role="Manager")
@Alert(role="Administrator")
public class UnauthorizedAccessException { ... }

For this to work, your @Alert annotation has to be meta-annotated with java.lang.annotation.Repeatable.

Tools, which heavily rely on annotation processing may need to review their code to get ready for Java 8. Existing methods, such as AnnotatedElement.getAnnotations(), have not been retrofitted to return repeatable annotations. Instead, other, new methods, such as AnnotatedElement.getAnnotationsByType(Class) and AnnotatedElement.getDeclaredAnnotationsByType(Class) have been added to the JDK, in order to be able to discover repeated annotations on any element.

More authoritative information can be found here:


SQL Query Transformation Fun: Predicates with Row Value Expressions


Recently, I’ve blogged about how well jOOQ’s supported databases implement row value expressions and predicates formed from them. Some sample articles:

Row value expressions (or records, tuples) are useful to express more complex predicates, such as this one:

SELECT *
FROM t
WHERE (t.t1, t.t2) IN (
    SELECT u.u1, u.u2 FROM u
)

The above statement semi-joins u to t based on a tuple comparison, not just a single column comparison. This is useful, for example, when you want to select those users from a table whose first AND last name are also contained in another table (without any formal foreign key relationship, of course):

SELECT *
FROM users u
WHERE (u.first_name, u.last_name) IN (
    SELECT a.first_name, a.last_name FROM addresses a
)

Now, not all databases really support row value expression predicates. In fact, only very few really do. Here is a non-exhaustive list of databases, that will support some form of the above:

  • DB2
  • HSQLDB
  • MySQL
  • Oracle
  • Postgres

And these databases pretend they implement row value expression predicates, but get it wrong:

  • CUBRID (confusing them with sets)
  • H2 (confusing them with arrays)

A feature comparison matrix was listed here:
http://blog.jooq.org/2012/12/26/row-value-expressions-and-the-between-predicate

Can the above query be simulated?

Yes, it can! And it will be with jOOQ 3.1. Here’s how to transform the above query into an equivalent one, which doesn’t use row value expressions. So, here’s the original query, again:

SELECT *
FROM t
WHERE (t.t1, t.t2) IN (
    SELECT u.u1, u.u2 FROM u
)

The above can be transformed into the following query, using an EXISTS predicate

SELECT *
FROM t
WHERE EXISTS (
    SELECT * FROM u
    WHERE t.t1 = u.u1 AND t.t2 = u.u2
)

Now, in the above simple transformation, we have modified the subselect by changing the projection and by adding a predicate. This can be difficult for more complex subselects, so lets avoid touching it, by introducing another derived table:

SELECT *
FROM t
WHERE EXISTS (
    SELECT *
    FROM (
        SELECT u.u1, u.u2 FROM u -- untouched
    ) v(v1, v2)                  -- derived table
    WHERE t.t1 = v.v1 AND t.t2 = v.v2
)

That’s better. Many databases require renaming derived tables, which is why a derived column list v(v1, v2) was introduced. Not all databases support derived column lists, though, as can be seen in a previous blog post. So lets go on transforming the above into an equivalent query:

SELECT *
FROM t
WHERE EXISTS (
    SELECT *
    FROM (
        SELECT null v1, null v2  -- renaming
        FROM dual                -- if necessary
        WHERE 1 = 0              -- without new rows
        UNION ALL
        SELECT u.u1, u.u2 FROM u -- untouched
    ) v                          -- derived table
    WHERE t.t1 = v.v1 AND t.t2 = v.v2
)

Now, we’ve reached our goal. The above query will run on all databases, retaining the original semantics of a row value expression predicate using a subselect! Note, you possibly have to replace the dual with something more appropriate, of course.

Does this apply to all comparison predicates?

In principle, yes. Let’s look at a few examples.

NOT IN

SELECT *
FROM t
WHERE (t.t1, t.t2) NOT IN (
    SELECT u.u1, u.u2 FROM u
)

-- transforms into
SELECT *
FROM t
WHERE NOT EXISTS (
    SELECT *
    FROM (
        SELECT null v1, null v2
        FROM dual
        WHERE 1 = 0
        UNION ALL
        SELECT u.u1, u.u2 FROM u
    ) v
    WHERE t.t1 = v.v1 AND t.t2 = v.v2
)

Equality and non-equality

Equality and non-equality work the same way as IN and NOT IN IFF you are operating on scalar subselects. While actual comparison predicates will raise an error if subselects return more than one row, the EXISTS predicate will not. Beware!

Ordering

Just as with equality and non-equality, beware of non-scalar subselects!

SELECT *
FROM t
WHERE (t.t1, t.t2) > (
    SELECT u.u1, u.u2 FROM u
)

-- transforms into
SELECT *
FROM t
-- EXISTS is not formally correct,
-- if the subselect is a non-scalar one
WHERE EXISTS (
    SELECT *
    FROM (
        SELECT null v1, null v2
        FROM dual
        WHERE 1 = 0
        UNION ALL
        SELECT u.u1, u.u2 FROM u
    ) v
    WHERE (t.t1 > v.v1)
    OR    (t.t1 = v.v1 AND t.t2 > v.v2)
)

See the previously cited blog post about the BETWEEN predicate to learn how to simulate “ordering” comparison predicates with row value expressions.

Quantified comparison predicates

Quantifiers now become quite useful. The ANY quantifier removes the need for having scalar subselects, as in the previous example:

SELECT *
FROM t
WHERE (t.t1, t.t2) > ANY (
    SELECT u.u1, u.u2 FROM u
)

-- transforms into
SELECT *
FROM t
-- EXISTS is now formally correct
WHERE EXISTS (
    SELECT *
    FROM (
        SELECT null v1, null v2
        FROM dual
        WHERE 1 = 0
        UNION ALL
        SELECT u.u1, u.u2 FROM u
    ) v
    WHERE (t.t1 > v.v1)
    OR    (t.t1 = v.v1 AND t.t2 > v.v2)
)

The ALL quantifier, on the other hand, can be expressed with its inverse ANY quantifier, i.e.

SELECT *
FROM t
WHERE (t.t1, t.t2) > ALL (
    SELECT u.u1, u.u2 FROM u
)

-- first transforms into
SELECT *
FROM t
WHERE (t.t1, t.t2) <= ANY (
    SELECT u.u1, u.u2 FROM u
)

-- and then transforms into
SELECT *
FROM t
-- EXISTS is now formally correct
WHERE EXISTS (
    SELECT *
    FROM (
        SELECT null v1, null v2
        FROM dual
        WHERE 1 = 0
        UNION ALL
        SELECT u.u1, u.u2 FROM u
    ) v
    WHERE (t.t1 < v.v1)
    OR    (t.t1 = v.v1 AND t.t2 <= v.v2)
)

Conclusion

Happy transforming, and keep an eye out for jOOQ 3.1, conveniently implementing all of the above behind a type-safe Java API!

Disclaimer

Yes, NULLs. The above transformation deliberately left out edge cases where NULLs are involved.


Rare Uses of a “ControlFlowException”


Control flows are a “relict” from imperative programming, which has leaked into various other programming paradigms, including Java’s object oriented paradigm. Apart from the useful and ubiquitous branch and loop structures, there are also primitives (e.g. GOTO) and non-locals (e.g. exceptions). Let’s have a closer look at these controversial control flow techniques.

GOTO

goto is a reserved word in the Java language. goto is also a valid instruction in JVM bytecode. Yet, in Java, it isn’t easily possible to peform goto operations. One example taken from this Stack Overflow question can be seen here:

Jumping forward

label: {
    // do stuff
    if (check) break label;
    // do more stuff
}

In bytecode:

    2  iload_1 [check]
    3  ifeq 6          // Jumping forward
    6  ..

Jumping backward

label: do {
    // do stuff
    if (check) continue label;
    // do more stuff
    break label;
} while(true);

In bytecode:

     2  iload_1 [check]
     3  ifeq 9
     6  goto 2          // Jumping backward
     9  ..

Of course, these tricks are useful only in very very rare occasions, and even then, you might want to re-consider. Because we all know what happens when we use goto in our code:

One little goto...

Drawing taken from xkcd: http://xkcd.com/292/

Breaking out of control flows with exceptions

Exceptions are a good tool to break out of a control flow structure in the event of an error or failure. But regular jumping downwards (without error or failure) can also be done using exceptions:

try {
    // Do stuff
    if (check) throw new Exception();
    // Do more stuff
}
catch (Exception notReallyAnException) {}

This feels just as kludgy as the tricks involving labels, mentioned before.

Legitimate uses of exceptions for control flow:

However, there are some other very rare occasions, where exceptions are a good tool to break out of a complex, nested control flow (without error or failure). This may be the case when you’re parsing an XML document using a SAXParser. Maybe, your logic is going to test the occurrence of at least three <check/> elements, in case of which you may want to skip parsing the rest of the document. Here is how to implement the above:

Create a ControlFlowException:

package com.example;

public class ControlFlowException 
extends SAXException {}

Note that usually, you might prefer a RuntimeException for this, but the SAX contracts require handler implementations to throw SAXException instead.

Use that ControlFlowException in a SAX handler:

package com.example;

import java.io.File;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class Parse {
  public static void main(String[] args) 
  throws Exception {
    SAXParser parser = SAXParserFactory
        .newInstance()
        .newSAXParser();

    try {
      parser.parse(new File("test.xml"),
          new Handler());
      System.out.println(
          "Less than 3 <check/> elements found.");
    } catch (ControlFlowException e) {
      System.out.println(
          "3 or more <check/> elements found.");
    }
  }

  private static class Handler 
  extends DefaultHandler {

    int count;

    @Override
    public void startElement(
        String uri, 
        String localName, 
        String qName,
        Attributes attributes) {
      
      if ("check".equals(qName) && ++count >= 3)
        throw new ControlFlowException();
    }
  }
}

When to use exceptions for control flow:

The above practice seems reasonable with SAX, as SAX contracts expect such exceptions to happen, even if in this case, they’re not exceptions but regular control flow. Here are some indications about when to use the above practice in real world examples:

  • You want to break out of a complex algorithm (as opposed to a simple block).
  • You can implement “handlers” to introduce behaviour into complex algorithms.
  • Those “handlers” explicitly allow throwing exceptions in their contracts.
  • Your use case does not pull the weight of actually refactoring the complex algorithm.

A real-world example: Batch querying with jOOQ

In jOOQ, it is possible to “batch store” a collection of records. Instead of running a single SQL statement for every record, jOOQ collects all SQL statements and executes a JDBC batch operation to store them all at once.

As each record encapsulates its generated SQL rendering and execution for a given store() call in an object-oriented way, it would be quite tricky to extract the SQL rendering algorithm in a reusable way, without breaking (or exposing) too many things. Instead, jOOQ’s batch operation implements this simple pseudo-algorithm:

// Pseudo-code attaching a "handler" that will
// prevent query execution and throw exceptions
// instead:
context.attachQueryCollector();

// Collect the SQL for every store operation
for (int i = 0; i < records.length; i++) {
  try {
    records[i].store();
  }

  // The attached handler will result in this
  // exception being thrown rather than actually
  // storing records to the database
  catch (QueryCollectorException e) {

    // The exception is thrown after the rendered
    // SQL statement is available
    queries.add(e.query());                
  }
}

A real-world example: Exceptionally changing behaviour

Another example from jOOQ shows how this technique can be useful to introduce exceptional behaviour that is applicable only in rare cases. As explained in issue #1520, some databases have a limitation regarding the number of possible bind values per statement. These are:

  • SQLite: 999
  • Ingres 10.1.0: 1024
  • Sybase ASE 15.5: 2000
  • SQL Server 2008: 2100

In order to circumvent this limitation, it will be necessary for jOOQ to inline all bind values, once the maximum has been reached. As jOOQ’s query model heavily encapsulates SQL rendering and variable binding behaviour by applying the composite pattern, it is not possible to know the number of bind values before traversing a query model tree. For more details about jOOQ’s query model architecture, consider this previous blog post:

http://blog.jooq.org/2012/04/10/the-visitor-pattern-re-visited

So the solution is to render the SQL statement and count bind values that are effectively going to be rendered. A canonical implementation would be this pseudo code:

String sql;

query.renderWith(countRenderer);
if (countRenderer.bindValueCount() > maxBindValues) {
  sql = query.renderWithInlinedBindValues();
}
else {
  sql = query.render();
}

As can be seen, a canonical implementation will need to render the SQL statement twice. The first rendering is used only to count the number of bind values, whereas the second rendering will generate the true SQL statement. The problem here is that the exceptional behaviour should only be put in place, once the exceptional event (too many bind values) occurs. A much better solution is to introduce a “handler” that counts bind values in a regular “rendering attempt”, throwing a ControlFlowException for those few exceptional “attempts” where the number of bind values exceeds the maximum:

// Pseudo-code attaching a "handler" that will
// abort query rendering once the maximum number
// of bind values was exceeded:
context.attachBindValueCounter();
String sql;
try {

  // In most cases, this will succeed:
  sql = query.render();
}
catch (ReRenderWithInlinedVariables e) {
  sql = query.renderWithInlinedBindValues();
}

The second solution is better, because:

  • We only re-render the query in the exceptional case.
  • We don’t finish rendering the query to calculate the actual count, but abort early for re-rendering. I.e. we don’t care if we have 2000, 5000, or 100000 bind values.

Conclusion

As with all exceptional techniques, remember to use them in the right moment. If in doubt, think again.


jOOQ as a PL/Java language


Some people who get in touch with PL/SQL, PL/pgSQL, T-SQL, or any other proprietary procedural language for SQL interaction are probably missing out on a couple of language integration features in the Java world. Most Java APIs see SQL as an external domain-specific language that is “best” dealt with using string concatenation. Such APIs include:

Other APIs aim to abstract SQL away, in favour of a “higher-level” mapping to objects. These, again, include

As can be seen quickly, a lot of tool vendors and developers have travelled down similar ORM roads to try to tackle the “mapping problem” from a slightly (never fundamentally) different approach.

But not all people want ORM. Many people want SQL. A nice, general opinion about the old ORM vs. SQL discussion was phrased by Ken Downs a while ago:

http://database-programmer.blogspot.ch/2010/12/historical-perspective-of-orm-and.html

SQL as an internal domain-specific language

We can all agree that SQL itself is a domain-specific language, a language specific to the domain of database querying and database manipulation. As mentioned before, SQL is enhanced on some platforms by proprietary, procedural extensions, some of which even made it into the SQL standard (although barely implemented in the standard form, apart from HSQLDB).

The main advantage of such procedural SQL language extensions is the fact that imperative control flow can be combined with declarative SQL statement execution. Both language paradigms have their place. One is ideal to model control flows, the other is ideal to model queries, abstracting boring querying algorithms.

But imperative programming is quite limited itself. It is difficult to profit from advantages offered by object-oriented or functional paradigms, implemented by popular languages like Java or Scala. Those who have tried Oracle PL/SQL’s “object-oriented” extensions may know what I mean. Furthermore, each procedural extension is vendor-specific and has its own learning curve.

jOOQ models SQL as an internal domain-specific language in Java, and can thus be seen as enhancing Java with some procedural aspects. This has been shown previously on this blog, through an example using H2 database triggers, written in Java/jOOQ. What was meant to be a proof of concept and a nice idea was now re-created by Ronny Guillaume, who wrote an interesting article (in French) about using jOOQ as PL/Java within a Postgres database! The article can be seen here:

http://ronnyguillaume.developpez.com/introduction-pl-java

In essence, you can use another third-party tool called pljava, compile and wrap jOOQ code into a jar file and deploy that jar file into your Postgres database before using it in regular Postgres SQL, or as a trigger. Similar things can be done in Java databases, such as Derby, H2, and HSQLDB, and even in the Oracle database (for the brave among you).

Looking forward to finding more interesting articles about using jOOQ for PL/Java in the wild!


On Java 8′s introduction of Optional


I had recently discovered the JDK 8′s addition of the Optional type. The Optional type is a way to avoid NullPointerException, as API consumers that get Optional return values from methods are “forced” to perform “presence” checks in order to consume their actual return value. More details can be seen in the Javadoc.

A very interesting further read can be seen here in this blog post, which compares the general notion of null and how null is handled in Java, SML, and Ceylon:

http://blog.informatech.cr/2013/04/10/java-optional-objects

“blank” and “initial” states were already known to Turing . One could also argue that the “neutral” or “zero” state was required in the Babbage Engine, which dates back to Ada of Lovelace in the 1800′s.

On the other hand, mathematicians also prefer to distinguish “nothing” from “the empty set”, which is “a set with nothing inside”. This compares well with “NONE” and “SOME”, as illustrated by the aforementioned Informatech blog post, and as implemented by Scala, for instance.

Anyway, I’ve given Java’s Optional some thought. I’m really not sure if I’m going to like it, even if Java 9 would eventually add some syntactic sugar to the JLS, which would resemble that of Ceylon to leverage Optional on a language level. Since Java is so incredibly backwards-compatible, none of the existing APIs will be retrofitted to return Optional, e.g, the following isn’t going to surface the JDK 8:

public interface List<E> {
    Optional<E> get(int index);
    [...]
}

Not only can we assign null to an Optional variable, but the absence of “Optional” doesn’t guarantee the semantics of “SOME”, as lists will still return “naked” null values. When we mix the two ways of thinking, we will wind up with two checks, instead of one

Optional<T> optional = // [...]
T nonOptional = list.get(index);

// If we're paranoid, we'll double-check!
if (optional != null && optional.isPresent()) {
    // do stuff
}

// Here we probably can't trust the value
if (nonOptional != null) {
    // do stuff
}

Hence…

-1 from me to Java’s solution

Further reading

Of course, this has been discussed millions of times before. So here are a couple of links:


Follow

Get every new post delivered to your Inbox.

Join 221 other followers