Author Archives: lukaseder

Inadvertent Recursion Protection with Java ThreadLocals


Now here’s a little trick for those of you hacking around with third-party tools, trying to extend them without fully understanding them (yet!). Assume the following situation:

  • You want to extend a library that exposes a hierarchical data model (let’s assume you want to extend Apache Jackrabbit)
  • That library internally checks access rights before accessing any nodes of the content repository
  • You want to implement your own access control algorithm
  • Your access control algorithm will access other nodes of the content repository
  • … which in turn will again trigger access control
  • … which in turn will again access other nodes of the content repository

… Infinite recursion, possibly resulting in a StackOverflowError, if you’re not recursing breadth-first.

Now, you have two options:

  1. Take the time, sit down, understand the internals, and do it right. You probably shouldn’t recurse into your access control once you’ve reached your own extension. In the case of extending Jackrabbit, this would be done by using a System Session to further access nodes within your access control algorithm. A System Session usually bypasses access control.
  2. Be impatient, wanting to get results quickly, and prevent recursion with a trick

Of course, you really should opt for option 1. But who has the time to understand everything? ;-)

Here’s how to implement that trick.

/**
 * This thread local indicates whether you've
 * already started recursing with level 1
 */
static final ThreadLocal<Boolean> RECURSION_CONTROL 
    = new ThreadLocal<Boolean>();

/**
 * This method executes a delegate in a "protected"
 * mode, preventing recursion. If a inadvertent 
 * recursion occurred, return a default instead
 */
public static <T> T protect(
        T resultOnRecursion, 
        Protectable<T> delegate) 
throws Exception {

    // Not recursing yet, allow a single level of
    // recursion and execute the delegate once
    if (RECURSION_CONTROL.get() == null) {
        try {
            RECURSION_CONTROL.set(true);
            return delegate.call();
        }
        finally {
            RECURSION_CONTROL.remove();
        }
    }

    // Abort recursion and return early
    else {
        return resultOnRecursion;
    }
}

/**
 * An API to wrap your code with
 */
public interface Protectable<T> {
    T call() throws Exception;
}

This works easily as can be seen in this usage example:

public static void main(String[] args) 
throws Exception {
    protect(null, new Protectable<Void>() {
        @Override
        public Void call() throws Exception {

            // Recurse infinitely
            System.out.println("Recursing?");
            main(null);

            System.out.println("No!");
            return null;
        }
    });
}

The recursive call to the main() method will be aborted by the protect method, and return early, instead of executing call().

This idea can also be further elaborated by using a Map of ThreadLocals instead, allowing for specifying various keys or contexts for which to prevent recursion. Then, you could also put an Integer into the ThreadLocal, incrementing it on recursion, allowing for at most N levels of recursion.

static final ThreadLocal<Integer> RECURSION_CONTROL 
    = new ThreadLocal<Integer>();

public static <T> T protect(
        T resultOnRecursion, 
        Protectable<T> delegate)
throws Exception {
    Integer level = RECURSION_CONTROL.get();
    level = (level == null) ? 0 : level;

    if (level < 5) {
        try {
            RECURSION_CONTROL.set(level + 1);
            return delegate.call();
        }
        finally {
            if (level > 0)
                RECURSION_CONTROL.set(level - 1);
            else
                RECURSION_CONTROL.remove();
        }
    }
    else {
        return resultOnRecursion;
    }
}

But again. Maybe you should just take a couple of minutes more and learn about how the internals of your host library really work, and get things right from the beginning… As always, when applying tricks and hacks! :-)


Java Streams Preview vs .Net LINQ


I’ve started following this very promising blog by the “Geeks From Paradise”. Apart from the fact that I’m a bit envious of geeks living in Costa Rica, this comparison of the upcoming Java 8 Streams API with various of .NET’s LINQ API capabilities is a very interesting read.

A preview of what you’ll find there (just one of 19 examples):

LINQ

List<string> nameList1 = new List(){ 
  "Anders", "David", "James",
  "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
         .ForEach(c => Console.WriteLine(c));

Java Streams

List<String> nameList1 = asList(
  "Anders", "David", "James",
  "Jeff", "Joe", "Erik");
nameList1.stream()
     .map(c -> "Hello! " + c)
     .forEach(System.out::println);

Read the full blog post here:

http://blog.informatech.cr/2013/03/24/java-streams-preview-vs-net-linq/


Will Java add LINQ to EL 3.0 in JSR-341?


This fact has somehow slipped by me unnoticed so far: As the JSR-341 websites claim, Java is going to add full .NET-Style LINQ support to its expression language 3.0!

While the JSR-341 website doesn’t explicitly mention these feature additions to the expression language, a lot of details can be seen here:

http://java.net/projects/el-spec/pages/CollectionOperations

This is very interesting, and raises a lot of questions:

  • Is Microsoft on board – i.e. are they part of the expert group?
  • Will Microsoft surrender / sell their LINQ-related patents to Oracle?
  • … or will Oracle challenge those patents?
  • Why would LINQ be added to the EL before it is added to the platform?

Anyway, the mailing lists don’t seem very active and the draft specs is still in a rather early stage. We’ll see where this goes, but this is clearly an addition that I will follow closely in the future!

See also my previous blog post on the subject


Java Collections API Quirks


So we tend to think we’ve seen it all, when it comes to the Java Collections API. We know our ways around Lists, Sets, Maps, Iterables, Iterators. We’re ready for Java 8′s Collections API enhancements.

But then, every once in a while, we stumble upon one of these weird quirks that originate from the depths of the JDK and its long history of backwards-compatibility. Let’s have a look at unmodifiable collections

Unmodifiable Collections

Whether a collection is modifiable or not is not reflected by the Collections API. There is no immutable List, Set or Collection base type, which mutable subtypes could extend. So, the following API doesn’t exist in the JDK:

// Immutable part of the Collection API
public interface Collection {
  boolean  contains(Object o);
  boolean  containsAll(Collection<?> c);
  boolean  isEmpty();
  int      size();
  Object[] toArray();
  <T> T[]  toArray(T[] array);
}

// Mutable part of the Collection API
public interface MutableCollection 
extends Collection {
  boolean  add(E e);
  boolean  addAll(Collection<? extends E> c);
  void     clear();
  boolean  remove(Object o);
  boolean  removeAll(Collection<?> c);
  boolean  retainAll(Collection<?> c);
}

Now, there are probably reasons, why things hadn’t been implemented this way in the early days of Java. Most likely, mutability wasn’t seen as a feature worthy of occupying its own type in the type hierarchy. So, along came the Collections helper class, with useful methods such as unmodifiableList(), unmodifiableSet(), unmodifiableCollection(), and others. But beware when using unmodifiable collections! There is a very strange thing mentioned in the Javadoc:

The returned collection does not pass the hashCode and equals operations through to the backing collection, but relies on Object’s equals and hashCode methods. This is necessary to preserve the contracts of these operations in the case that the backing collection is a set or a list.

“To preserve the contracts of these operations”. That’s quite vague. What’s the reasoning behind it? A nice explanation is given in this Stack Overflow answer here:

An UnmodifiableList is an UnmodifiableCollection, but the same is not true in reverse — an UnmodifiableCollection that wraps a List is not an UnmodifiableList. So if you compare an UnmodifiableCollection that wraps a List a with an UnmodifiableList that wraps the same List a, the two wrappers should not be equal. If you just passed through to the wrapped list, they would be equal.

While this reasoning is correct, the implications may be rather unexpected.

The bottom line

The bottom line is that you cannot rely on Collection.equals(). While List.equals() and Set.equals() are well-defined, don’t trust Collection.equals(). It may not behave meaningfully. Keep this in mind, when accepting a Collection in a method signature:

public class MyClass {
  public void doStuff(Collection<?> collection) {
    // Don't rely on collection.equals() here!
  }
}

Hibernate, and Querying 50k Records. Too Much?


An interesting read, showing that Hibernate can quickly get to its limits when it comes to querying 50k records – a relatively small number of records for a sophisticated database:

http://koenserneels.blogspot.ch/2013/03/bulk-fetching-with-hibernate.html

Of course, Hibernate can generally deal with such situations, but you have to start tuning Hibernate, digging into its more advanced features. Makes one think whether second-level caching, flushing, evicting, and all that stuff should really be the default behaviour…?


JUnit’s Evolving Structure


This is an interesting read:

http://edmundkirwan.com/general/junit.html

It’s an analysis about how JUnit gradually and organically evolved into a heavily entangled set of inter-dependent modules


How to Behave on Mailing Lists


I recently stumbled upon the following document:
http://www.apache.org/dev/contrib-email-tips.html

It’s a useful list of rules, ideas about how to behave on open source mailing lists. Of course, these rules include both committers and users ;-)


How to Execute Something Multiple Times in Java


When writing unit / integration tests, you often want to execute something multiple times, with different configurations / parameters / arguments every time. For instance, if you want to pass a “limit” or “timeout” or any other argument value of 1, 10, and 100, you could do this:

@Test
public void test() {
    runCode(1);
    runCode(10);
    runCode(100);
}

private void runCode(int argument) {

    // Run the actual test
    assertNotNull(new MyObject(argument).execute());
}

Exracting methods is the most obvious approach, but it can quickly get nasty, as these extracted methods are hardly re-usable outside of that single test-case and thus don’t really deserve being put in their own methods. Instead, just use this little trick:

@Test
public void test() {

    // Repeat the contents 3 times, for values 1, 10, 100
    for (int argument : new int[] { 1, 10, 100 }) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }

    // Alternatively, use Arrays.asList(), which has a similar effect:
    for (Integer argument : Arrays.asList(1, 10, 100)) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }
}

The Golden Rules of Code Documentation


Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.


A Typesafety Comparison of SQL Access APIs


SQL is a very expressive and distinct language. It is one of the few declarative languages which are used by a broad audience in everyday work. As a declarative language, SQL allows to specify what we’re expecting as output, not how this output should be produced. As a side-effect of this, ad-hoc record data types are created by every statement. An example:

-- A (id: integer, title: varchar) type is created
SELECT id, title
FROM book;

The above statement generates a cursor whose records have a well-defined record type with these properties:

  • The degree of the record is 2
  • The column names are id and title
  • The column types are integer and varchar
  • The column id can be accessed at index 1. The column title can be accessed at index 2

In other words, SQL records combine features from records (access by name) and tuples (access by index). They can be seen like typesafe associative “map-arrays”, where map keys are formally bound to array indexes and their associated key/index type.

Another, more complex example shows how these ad-hoc record types can be reused within a SQL statement!

-- A (c: bigint) type is created
SELECT count(*) c
FROM book

-- A (..: integer, ..: integer) type is created and compared with...
WHERE (author_id, language_id) IN (

  -- ... another, compatible (..: integer, ..: integer) type
  SELECT a.id, a.language_id
  FROM author a
)

This query counts books written by authors in their native language.

In the above example, the projected record type is a bit simpler. It contains only one column. The interesting part is the row value expression IN comparison predicate, which compares two compatible (integer, integer) types. In SQL, you can typesafely create ad-hoc record types and immediately compare them with other ad-hoc record types. In these comparisons, column names are not important, but column indexes (and associated column types) are.

Comparing various SQL access APIs

The previous examples show how SQL allows for the formal declaration of record types including record degree, column names, column indexes, column types. While SQL is very expressive in that matter, many client languages accessing SQL are less expressive. When comparing expressiveness and typesafety, two features should be taken into consideration:

  1. Are the records produced into the client language typesafe?
  2. Are the SQL statements produced from the client language typesafe and syntax-safe?

Let’s have a look at various accessing techniques, and how expressive they are in terms of the above typesafety requirements:

JDBC: Least typesafety

JDBC offers the least expressiveness and typesafety. This isn’t surprising, as JDBC is a very low-level API. It offers:

  1. No typesafety whatsoever when accessing result records.
  2. No typesafety or syntax-safety whatsoever when producing SQL statements.

Here is an example:

PreparedStatement stmt = null;
ResultSet rs = null;

try {

  // SQL statements are just strings. Constructing them is not
  // typesafe or syntax-safe
  stmt = connection.prepareStatement(
    "SELECT id, title FROM book WHERE id = ?");

  // Bind values are set by index. There is no typesafety or
  // "index safety"
  stmt.setInt(1, 15);

  rs = stmt.executeQuery();
  while (rs.next()) {

    // There is no typesafety or "index safety" when accessing
    // result record values
    System.out.println(
      "ID: " + rs.getInt(1) + ", TITLE: " + rs.getString(2));
  }
}
finally {
  closeSafely(stmt, rs);
}

Now, this wasn’t surprising. JDBC makes up for the lack of typesafety by being absolutely general. It is possible to implement a JDBC driver for any type of relational database, no matter what kinds of SQL and JDBC features they really support.

JPA: Some typesafety

JPA has implemented quite a bit of typesafety mostly on top of JPQL, but also slightly on top of SQL. With JPA, you can have:

  1. Some typesafety when accessing records.
  2. Some typesafety and syntax-safety when producing JPQL statements through the CriteriaQuery API (not SQL statements).

Record access typesafety can be guaranteed when you project the outcome of your statements onto your JPA-annotated entities. While the mapping itself isn’t really typesafe, the outcome is, as a Java class is the closest match to a SQL record. A Java class, much like a SQL record, has:

  • A degree, expressed in the number of properties
  • Column names, expressed as property names
  • Column types, expressed as property types
  • But: No column indexes. Properties have no explicit order

JPA record mapping has additional features that exceed the expressiveness of SQL, as “flat”, tabular result sets can be mapped onto object hierarchies. In any case, you will have to create one record / entity type per query to profit from this typesafety. If you’re not projecting all columns from every table, but ad-hoc records (including values derived from functions), you will lose this typesafety again.

When it comes to statement typesafety, JPA offers the CriteriaQuery API to produce typesafe JPQL statements. The CriteriaQuery API is often criticised for its verboseness and for the fact that resulting client code is hard to read. Here is an example taken from the CriteriaQuery API docs:

CriteriaQuery<String> q = cb.createQuery(String.class);
Root<Order> order = q.from(Order.class);
q.select(order.get("shippingAddress").<String>get("state"));
 
CriteriaQuery<Product> q2 = cb.createQuery(Product.class);
q2.select(q2.from(Order.class)
            .join("items")
            .<Item,Product>join("product"));

It can be seen that there is only a limited amount of typesafety in the above query construction:

  • Columns are accessed by string literals, such as "shippingAddress".
  • Generic entity types are not really checked. The <Item,Product> generic parameters might as well be wrong.

Of course, there are more typesafe API parts in JPA’s CriteriaQuery API. Using those API parts quickly lead to the aforementioned verbosity, though, as can be seen in this Stack Overflow question, or in the Java EE 6 Tutorials.

LINQ: Much typesafety (in .NET)

LINQ goes very far in offering typesafety in both dimensions:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing LINQ-to-SQL statements (not SQL statements).

As LINQ is formally integrated into various .NET languages, it has the advantage of being able to produce formally defined record types, directly into the target language (e.g. C#). Not only can typesafe records be produced, the LINQ-to-SQL statement is formally verified by the compiler as well. An example

// Typesafe renaming (aliasing with "AS" in SQL)
From p In db.Products
// Typesafe (named!) variable binding
Where p.UnitsInStock <= ReorderLevel AndAlso Not p.Discontinued
// The typesafe projection will produce a Products record
Select p

Another example from Stack Overflow can be seen here:

// Producing a C# tuple
var r = from u in db.Users
        join s in db.Staffs on u.Id equals s.UserId
        select new Tuple<User, Staff>(u, s);

// Producing an anonymous record type
var r = from u in db.Users
    select new { u.Name, 
                 u.Address,
                 ...,
                 (from s in db.Staffs 
                  select s.Password where u.Id == s.UserId) 
               };

LINQ has many obvious advantages when it comes to typesafety. In the case of LINQ, this comes at the price of losing actual SQL expressivity and syntax, as LINQ-to-SQL is not really SQL (just as JPQL is not really SQL either). The SQL querying API is partially shared with other, heterogeneous querying targets, such as LINQ-to-Entities, LINQ-to-Collections, LINQ-to-XML. This will reduce LINQ’s feature scope (see also a previous blog post, and I will soon blog about this again).

But C# offers all typesafety aspects that a SQL record offers as well: degree, column name (anonymous types), column index (tuples), column types (both types and tuples).

SLICK: Much typesafety (in Scala)

SLICK has been inspired by LINQ, and can thus offer a lot of typesafety as well. It offers:

  1. Much typesafety when accessing tuples (not records).
  2. Much typesafety when producing SLICK statements (not SQL statements).

SLICK takes advantage of Scala’s integrated tuple expressions. This is best shown by example:

// "for" is the "entry-point" to the DSL
val q = for {

    // FROM clause   WHERE clause
    c <- Coffees     if c.supID === 101

// SELECT clause and projection to a tuple
} yield (c.name, c.price)

The above example shows that the projection onto a (String, Int) tuple is done typesafely by the yield method. At the same time, the whole query expression is formally validated by the compiler, as SLICK makes heavy use of Scala’s language features in order to introduce an internal DSL for querying. Much more than LINQ, SLICK has a unique syntax that doesn’t remind of SQL any more. It is not obvious how subqueries, complex joins, grouping and aggregation can be expressed.

jOOQ: Much typesafety

jOOQ is mainly inspired by SQL itself and embraces all the features that SQL offers. It has thus:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing SQL statements.

jOOQ offers similar capabilities as JPA when it comes to mapping SQL result sets onto records, although JPA’s mapping type hierarchies are not supported by jOOQ. But jOOQ also allows for typesafe tuple access, the way SLICK has implemented it. Ad-hoc records produced by arbitrary query projections will maintain their various column types through generic Record1<T1>, Record2<T1, T2>, Record3<T1, T2, T3>, … record types. Unlike in Java, this can be leveraged extensively in Scala, where these typesafe Record[N] types can be used just like Scala’s tuples.

On the other hand, just like LINQ-to-SQL, which has formally integrated querying as a first-class citizen into .NET languages, jOOQ allows for heavy type-checking and syntax-checking, when writing SQL statements in Java.

In SQL, you can typesafely write things like:

SELECT * FROM t WHERE (t.a, t.b) = (1, 2)
SELECT * FROM t WHERE (t.a, t.b) OVERLAPS (date1, date2)
SELECT * FROM t WHERE (t.a, t.b) IN (SELECT x, y FROM t2)
UPDATE t SET (a, b) = (SELECT x, y FROM t2 WHERE ...)
INSERT INTO t (a, b) VALUES (1, 2)

In jOOQ 3.0, you can (also typesafely!) write

select().from(t).where(row(t.a, t.b).eq(1, 2));
// Type-check here: ----------------->  ^^^^
 
select().from(t).where(row(t.a, t.b).overlaps(date1, date2));
// Type-check here: ------------------------> ^^^^^^^^^^^^
 
select().from(t).where(row(t.a, t.b).in(select(t2.x, t2.y).from(t2)));
// Type-check here: -------------------------> ^^^^^^^^^^
 
update(t).set(row(t.a, t.b), select(t2.x, t2.y).where(...));
// Type-check here: --------------> ^^^^^^^^^^

insertInto(t, t.a, t.b).values(1, 2);
// Type-check here: ---------> ^^^^

This also applies for existing API, which doesn’t involve row value expressions:

select().from(t).where(t.a.eq(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^
 
select().from(t).where(t.a.eq(any(select(t2.x).from(t2)));
// Type-check here: -------------------> ^^^^
 
select().from(t).where(t.a.in(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^

select(t1.a, t1.b).from(t1).union(select(t2.a, t2.b).from(t2));
// Type-check here: -------------------> ^^^^^^^^^^

jOOQ is not SQL, but unlike other attempts of introducing SQL as an internal domain-specific language into host languages like Java, Scala, C#, jOOQ looks very much like SQL thanks to its unique fluent API technique, which informally follows an underlying BNF notation.

Even if Java offers less expressiveness than other languages like C# or Scala, jOOQ probably comes closest to both result record typesafety and SQL syntax safety in the Java world.


Follow

Get every new post delivered to your Inbox.

Join 217 other followers