Category Archives: java

Hibernate, and Querying 50k Records. Too Much?


An interesting read, showing that Hibernate can quickly get to its limits when it comes to querying 50k records – a relatively small number of records for a sophisticated database:

http://koenserneels.blogspot.ch/2013/03/bulk-fetching-with-hibernate.html

Of course, Hibernate can generally deal with such situations, but you have to start tuning Hibernate, digging into its more advanced features. Makes one think whether second-level caching, flushing, evicting, and all that stuff should really be the default behaviour…?


JUnit’s Evolving Structure


This is an interesting read:

http://edmundkirwan.com/general/junit.html

It’s an analysis about how JUnit gradually and organically evolved into a heavily entangled set of inter-dependent modules


How to Behave on Mailing Lists


I recently stumbled upon the following document:
http://www.apache.org/dev/contrib-email-tips.html

It’s a useful list of rules, ideas about how to behave on open source mailing lists. Of course, these rules include both committers and users ;-)


How to Execute Something Multiple Times in Java


When writing unit / integration tests, you often want to execute something multiple times, with different configurations / parameters / arguments every time. For instance, if you want to pass a “limit” or “timeout” or any other argument value of 1, 10, and 100, you could do this:

@Test
public void test() {
    runCode(1);
    runCode(10);
    runCode(100);
}

private void runCode(int argument) {

    // Run the actual test
    assertNotNull(new MyObject(argument).execute());
}

Exracting methods is the most obvious approach, but it can quickly get nasty, as these extracted methods are hardly re-usable outside of that single test-case and thus don’t really deserve being put in their own methods. Instead, just use this little trick:

@Test
public void test() {

    // Repeat the contents 3 times, for values 1, 10, 100
    for (int argument : new int[] { 1, 10, 100 }) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }

    // Alternatively, use Arrays.asList(), which has a similar effect:
    for (Integer argument : Arrays.asList(1, 10, 100)) {

        // Run the actual test
        assertNotNull(new MyObject(argument).execute());
    }
}

The Golden Rules of Code Documentation


Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.


A Typesafety Comparison of SQL Access APIs


SQL is a very expressive and distinct language. It is one of the few declarative languages which are used by a broad audience in everyday work. As a declarative language, SQL allows to specify what we’re expecting as output, not how this output should be produced. As a side-effect of this, ad-hoc record data types are created by every statement. An example:

-- A (id: integer, title: varchar) type is created
SELECT id, title
FROM book;

The above statement generates a cursor whose records have a well-defined record type with these properties:

  • The degree of the record is 2
  • The column names are id and title
  • The column types are integer and varchar
  • The column id can be accessed at index 1. The column title can be accessed at index 2

In other words, SQL records combine features from records (access by name) and tuples (access by index). They can be seen like typesafe associative “map-arrays”, where map keys are formally bound to array indexes and their associated key/index type.

Another, more complex example shows how these ad-hoc record types can be reused within a SQL statement!

-- A (c: bigint) type is created
SELECT count(*) c
FROM book

-- A (..: integer, ..: integer) type is created and compared with...
WHERE (author_id, language_id) IN (

  -- ... another, compatible (..: integer, ..: integer) type
  SELECT a.id, a.language_id
  FROM author a
)

This query counts books written by authors in their native language.

In the above example, the projected record type is a bit simpler. It contains only one column. The interesting part is the row value expression IN comparison predicate, which compares two compatible (integer, integer) types. In SQL, you can typesafely create ad-hoc record types and immediately compare them with other ad-hoc record types. In these comparisons, column names are not important, but column indexes (and associated column types) are.

Comparing various SQL access APIs

The previous examples show how SQL allows for the formal declaration of record types including record degree, column names, column indexes, column types. While SQL is very expressive in that matter, many client languages accessing SQL are less expressive. When comparing expressiveness and typesafety, two features should be taken into consideration:

  1. Are the records produced into the client language typesafe?
  2. Are the SQL statements produced from the client language typesafe and syntax-safe?

Let’s have a look at various accessing techniques, and how expressive they are in terms of the above typesafety requirements:

JDBC: Least typesafety

JDBC offers the least expressiveness and typesafety. This isn’t surprising, as JDBC is a very low-level API. It offers:

  1. No typesafety whatsoever when accessing result records.
  2. No typesafety or syntax-safety whatsoever when producing SQL statements.

Here is an example:

PreparedStatement stmt = null;
ResultSet rs = null;

try {

  // SQL statements are just strings. Constructing them is not
  // typesafe or syntax-safe
  stmt = connection.prepareStatement(
    "SELECT id, title FROM book WHERE id = ?");

  // Bind values are set by index. There is no typesafety or
  // "index safety"
  stmt.setInt(1, 15);

  rs = stmt.executeQuery();
  while (rs.next()) {

    // There is no typesafety or "index safety" when accessing
    // result record values
    System.out.println(
      "ID: " + rs.getInt(1) + ", TITLE: " + rs.getString(2));
  }
}
finally {
  closeSafely(stmt, rs);
}

Now, this wasn’t surprising. JDBC makes up for the lack of typesafety by being absolutely general. It is possible to implement a JDBC driver for any type of relational database, no matter what kinds of SQL and JDBC features they really support.

JPA: Some typesafety

JPA has implemented quite a bit of typesafety mostly on top of JPQL, but also slightly on top of SQL. With JPA, you can have:

  1. Some typesafety when accessing records.
  2. Some typesafety and syntax-safety when producing JPQL statements through the CriteriaQuery API (not SQL statements).

Record access typesafety can be guaranteed when you project the outcome of your statements onto your JPA-annotated entities. While the mapping itself isn’t really typesafe, the outcome is, as a Java class is the closest match to a SQL record. A Java class, much like a SQL record, has:

  • A degree, expressed in the number of properties
  • Column names, expressed as property names
  • Column types, expressed as property types
  • But: No column indexes. Properties have no explicit order

JPA record mapping has additional features that exceed the expressiveness of SQL, as “flat”, tabular result sets can be mapped onto object hierarchies. In any case, you will have to create one record / entity type per query to profit from this typesafety. If you’re not projecting all columns from every table, but ad-hoc records (including values derived from functions), you will lose this typesafety again.

When it comes to statement typesafety, JPA offers the CriteriaQuery API to produce typesafe JPQL statements. The CriteriaQuery API is often criticised for its verboseness and for the fact that resulting client code is hard to read. Here is an example taken from the CriteriaQuery API docs:

CriteriaQuery<String> q = cb.createQuery(String.class);
Root<Order> order = q.from(Order.class);
q.select(order.get("shippingAddress").<String>get("state"));
 
CriteriaQuery<Product> q2 = cb.createQuery(Product.class);
q2.select(q2.from(Order.class)
            .join("items")
            .<Item,Product>join("product"));

It can be seen that there is only a limited amount of typesafety in the above query construction:

  • Columns are accessed by string literals, such as "shippingAddress".
  • Generic entity types are not really checked. The <Item,Product> generic parameters might as well be wrong.

Of course, there are more typesafe API parts in JPA’s CriteriaQuery API. Using those API parts quickly lead to the aforementioned verbosity, though, as can be seen in this Stack Overflow question, or in the Java EE 6 Tutorials.

LINQ: Much typesafety (in .NET)

LINQ goes very far in offering typesafety in both dimensions:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing LINQ-to-SQL statements (not SQL statements).

As LINQ is formally integrated into various .NET languages, it has the advantage of being able to produce formally defined record types, directly into the target language (e.g. C#). Not only can typesafe records be produced, the LINQ-to-SQL statement is formally verified by the compiler as well. An example

// Typesafe renaming (aliasing with "AS" in SQL)
From p In db.Products
// Typesafe (named!) variable binding
Where p.UnitsInStock <= ReorderLevel AndAlso Not p.Discontinued
// The typesafe projection will produce a Products record
Select p

Another example from Stack Overflow can be seen here:

// Producing a C# tuple
var r = from u in db.Users
        join s in db.Staffs on u.Id equals s.UserId
        select new Tuple<User, Staff>(u, s);

// Producing an anonymous record type
var r = from u in db.Users
    select new { u.Name, 
                 u.Address,
                 ...,
                 (from s in db.Staffs 
                  select s.Password where u.Id == s.UserId) 
               };

LINQ has many obvious advantages when it comes to typesafety. In the case of LINQ, this comes at the price of losing actual SQL expressivity and syntax, as LINQ-to-SQL is not really SQL (just as JPQL is not really SQL either). The SQL querying API is partially shared with other, heterogeneous querying targets, such as LINQ-to-Entities, LINQ-to-Collections, LINQ-to-XML. This will reduce LINQ’s feature scope (see also a previous blog post, and I will soon blog about this again).

But C# offers all typesafety aspects that a SQL record offers as well: degree, column name (anonymous types), column index (tuples), column types (both types and tuples).

SLICK: Much typesafety (in Scala)

SLICK has been inspired by LINQ, and can thus offer a lot of typesafety as well. It offers:

  1. Much typesafety when accessing tuples (not records).
  2. Much typesafety when producing SLICK statements (not SQL statements).

SLICK takes advantage of Scala’s integrated tuple expressions. This is best shown by example:

// "for" is the "entry-point" to the DSL
val q = for {

    // FROM clause   WHERE clause
    c <- Coffees     if c.supID === 101

// SELECT clause and projection to a tuple
} yield (c.name, c.price)

The above example shows that the projection onto a (String, Int) tuple is done typesafely by the yield method. At the same time, the whole query expression is formally validated by the compiler, as SLICK makes heavy use of Scala’s language features in order to introduce an internal DSL for querying. Much more than LINQ, SLICK has a unique syntax that doesn’t remind of SQL any more. It is not obvious how subqueries, complex joins, grouping and aggregation can be expressed.

jOOQ: Much typesafety

jOOQ is mainly inspired by SQL itself and embraces all the features that SQL offers. It has thus:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing SQL statements.

jOOQ offers similar capabilities as JPA when it comes to mapping SQL result sets onto records, although JPA’s mapping type hierarchies are not supported by jOOQ. But jOOQ also allows for typesafe tuple access, the way SLICK has implemented it. Ad-hoc records produced by arbitrary query projections will maintain their various column types through generic Record1<T1>, Record2<T1, T2>, Record3<T1, T2, T3>, … record types. Unlike in Java, this can be leveraged extensively in Scala, where these typesafe Record[N] types can be used just like Scala’s tuples.

On the other hand, just like LINQ-to-SQL, which has formally integrated querying as a first-class citizen into .NET languages, jOOQ allows for heavy type-checking and syntax-checking, when writing SQL statements in Java.

In SQL, you can typesafely write things like:

SELECT * FROM t WHERE (t.a, t.b) = (1, 2)
SELECT * FROM t WHERE (t.a, t.b) OVERLAPS (date1, date2)
SELECT * FROM t WHERE (t.a, t.b) IN (SELECT x, y FROM t2)
UPDATE t SET (a, b) = (SELECT x, y FROM t2 WHERE ...)
INSERT INTO t (a, b) VALUES (1, 2)

In jOOQ 3.0, you can (also typesafely!) write

select().from(t).where(row(t.a, t.b).eq(1, 2));
// Type-check here: ----------------->  ^^^^
 
select().from(t).where(row(t.a, t.b).overlaps(date1, date2));
// Type-check here: ------------------------> ^^^^^^^^^^^^
 
select().from(t).where(row(t.a, t.b).in(select(t2.x, t2.y).from(t2)));
// Type-check here: -------------------------> ^^^^^^^^^^
 
update(t).set(row(t.a, t.b), select(t2.x, t2.y).where(...));
// Type-check here: --------------> ^^^^^^^^^^

insertInto(t, t.a, t.b).values(1, 2);
// Type-check here: ---------> ^^^^

This also applies for existing API, which doesn’t involve row value expressions:

select().from(t).where(t.a.eq(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^
 
select().from(t).where(t.a.eq(any(select(t2.x).from(t2)));
// Type-check here: -------------------> ^^^^
 
select().from(t).where(t.a.in(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^

select(t1.a, t1.b).from(t1).union(select(t2.a, t2.b).from(t2));
// Type-check here: -------------------> ^^^^^^^^^^

jOOQ is not SQL, but unlike other attempts of introducing SQL as an internal domain-specific language into host languages like Java, Scala, C#, jOOQ looks very much like SQL thanks to its unique fluent API technique, which informally follows an underlying BNF notation.

Even if Java offers less expressiveness than other languages like C# or Scala, jOOQ probably comes closest to both result record typesafety and SQL syntax safety in the Java world.


Java, if this were a better world


Just a little dreaming about a better world, where some old blunders in the Java platform would’ve been corrected and some awesome missing features would’ve been implemented. Don’t get me wrong. I think Java is awesome. But it still has some issues, like any other platform.

Without any particular order, without claiming to be anything near exhaustive, and most importantly, without claiming to be well thought-through and entirely correct, I wish for these things:

Serialisability

Within an object, serialisability is the default. If you don’t want a member to be serialisable, you mark it “transient”. Why on earth do we have to add this silly marker interface “Serializable” to all of our classes?

All objects should be Serializable by default. Non-serialisability should be the “feature” that is marked explicitly

Of course, serialisability itself has a lot of weird details that I won’t go into, here

Cloning

Since all objects should be Serializable by default,

All objects should also be Cloneable by default. Non-cloneability should be the “feature” that is marked explicitly

Besides, shallow cloning is hardly ever useful. Hence

All objects should deep-clone themselves by default. Shallow cloning can be implemented explicitly

Note, the clone method should be some native method in java.lang.System or some other utility. It shouldn’t be on java.lang.Object, allowing client code to implement their proper interpretation of cloning, without any accidental name clashes.

Alternatively, similar private callback methods could be implemented, the same way that this is done for serialisation, if cloning should be customised.

Unsigned numbers

Why isn’t this part of Java?

There should be an unsigned version of all integer primitives, as well as java.lang.Number wrappers

Primitives

Primitives are a pain to support in APIs. int and Integer should be the same from a syntax perspective. int[] and Integer[] should be, too

Primitives and their wrappers should be better integrated in the language and in the JVM

This one is probably not really resolveable without giving up on the performance advantage that true primitives offer. See Scala…

Properties

Getters and setters aren’t really state-of-the-art.

Properties should be supported more formally

See also a recent article and its comments on this blog:

http://blog.jooq.org/2013/01/12/bloated-javabeans-part-ii-or-dont-add-getters-to-your-api/

Collections

The collection API should be better integrated with the language. Like in many other languages, it should be possible to dereference collection contents using square brackets and curly braces. The JSON syntax would be an obvious choice. It should be possible to write:

// Translates to new ArrayList<>(...);
List<Integer> list = [ 1, 2, 3 ];

// Translates to list.get(0);
Integer value = list[0];

// Translates to list.set(0, 3);
list[0] = 3;

// Translates to list.add(4);
list[] = 4;

// Translates to new LinkedHashMap<>(...);
Map<String, Integer> map = { "A": 1, "B": 2 }; 

// Translates to map.get(0);
Integer value = map["A"]

// Translates to map.put("C", 3);
map["C"] = 3;

ThreadLocal

ThreadLocal can be a nice thing in some contexts. Probably, the concept of ThreadLocal isn’t 100% sound, as it can cause memory leaks. But assuming that there were no problems,

threadlocal should be a keyword, like volatile and transient

If transient deserves to be a keyword, then threadlocal should be, too. This would work as follows:

class Foo {
  threadlocal Integer bar;

  void baz() {
    bar = 1;           // Corresponds to ThreadLocal.set()
    Integer baz = bar; // Corresponds to ThreadLocal.get()
    bar = null;        // Corresponds to ThreadLocal.remove()
  }
}

Of course, such a keyword could be applied to primitives as well

References

References are something weird in Java. They’re implemented as Java objects in the java.lang.ref package, but treated very specially by the JVM and the GC.

Just like for threadlocal, there should be keywords to denote a reference

Of course, with the introduction of generics, there is only little gain in adding such a keyword. But it still feels smelly that some classes are “very special” within the JVM, but not language syntax features.

Reflection

Please! Why on earth does it have to be so verbose?? Why can’t Java (Java-the-language) be much more dynamic? I’m not asking for a Smalltalk-kind of dynamic, but couldn’t reflection be built into the language somehow, as syntactic sugar?

The Java language should allow for a special syntax for reflection

Some pain-easing can be achieved on a library-level, of course. jOOR is one example. There are many others.

Interfaces

Interfaces in Java always feel very weird. Specifically, with Java 8′s extension methods, they start losing their right to exist, as they move closer to abstract classes. Of course, even with Java 8, the main difference is that classes do not allow multiple inheritance. Interfaces do – at least, they allow for multiple inheritance of specification (abstract methods) and behaviour (default methods), not for state.

But they still feel weird, mainly because their syntax diverges from classes, while their features converge. Why did the lambda expert group decide to introduce a default keyword?? If interfaces allow for abstract methods (as today) and concrete methods (defender methods, extension methods), why can’t interfaces have the same syntax as classes? I’ve asked the expert group with no luck:

http://mail.openjdk.java.net/pipermail/lambda-dev/2012-August/005393.html

Still, I’d wish that…

Interface syntax should be exactly the same as class syntax, wherever appropriate

This includes static methods, final methods, private methods, package-private methods, protected methods, etc.

Default visibility

Default visibility shouldn’t be specified by the absence of a private/protected/public keyword. First of all, this absence isn’t dealt with the same way in classes and interfaces. Then, it is not very readable.

Default visibility should be specified by a “package” or “local” or similar keyword

Literals

This would be an awesome addition in everyday work.

There should be list, map, regex, tuple, record, string (improved), range literals

I’ve blogged about this before:

http://blog.jooq.org/2012/06/01/array-list-set-map-tuple-record-literals-in-java/

Some ideas mentioned by Brian Goetz on the lambda-dev mailing list were found here:

http://mail.openjdk.java.net/pipermail/lambda-dev/2012-May/004979.html

#[ 1, 2, 3 ]                          // Array, list, set
#{ "foo" : "bar", "blah" : "wooga" }  // Map literals
#/(\d+)$/                             // Regex
#(a, b)                               // Tuple
#(a: 3, b: 4)                         // Record
#"There are {foo.size()} foos"        // String literal

I’ll add

#(1..10)                              // Range (producing a List)

Final

Methods, attributes, parameters, local variables, they can all be declared as “final”. Immutability is a good thing in many ways, and should be encouraged (I’ll blog about this, soon). Other languages, such as Scala, distinguish the “val” and “var” keywords. In addition to those other languages’ impressive type inference capabilities, in most cases, val is preferred to var. If one wants to express a modifiable variable, they can still use “var”

Final should be the default behaviour for members, parameters and local variables

Override

It is dangerous to accidentally override a method. Other languages have solved this by causing compilation errors on overrides

An override keyword should be introduced to explicitly override a method

Some Java compilers (e.g. the Eclipse compiler) can be configured to emit a warning / error on the absence of the java.lang.Override annotation. However, this really should be a keyword, not an annotation.

Modules

Dependency management is a nightmare in Java. There is one other language, that builds compilation units in terms of modules: Fantom. Stephen Colebourne (the JodaTime guy) is a big fan of Fantom, and has held a speech at Devoxx. He also blogs about Fantom, from time to time:
http://blog.joda.org/search/label/fantom

A compilation unit should be expressed in the form of a “module” / jar file

This would of course make Maven obsolete, as the Java compiler could already handle dependencies much better.

Varargs and generics

Come on. @SafeVarargs?? Of course, this can never be resolved entirely correctly, due to generic type erasure. But still

There should be no generic type erasure

Tuples and Records

I really think that this is something missing from Java

There should be language support for tuples and records

Scala has integrated tuples up to a degree of 22, .NET supports tuples up to a degree of 8. This would be a nice feature in the Java language as well. Specifically, records (or structs) would be a nice thing to have. As mentioned before, there should be literals for tuples and records, too. Something along these lines:

#(a, b)                               // Tuple
#(a: 3, b: 4)                         // Record

Compiler

A compiler API that goes far beyond adding some annotation processing would be nice. I’d love to be able to extend Java-the-language itself. I’d like to embed SQL statements directly into Java code, similar to SQL being embeddable in PL/SQL. Of course, such SQL code would be backed by a library like jOOQ.

The compiler API should allow for arbitrary language extension

Of course, this improved compiler API should be done in a way that auto-completion, syntax highlighting and other features work automatically in IDEs like Eclipse, as the compiler extensions would be able to expose necessary artefacts to IDEs.

OK, I agree, this improvement is a lot of dreaming :-)

Type inference

If unambiguous, couldn’t type inference just be as powerful as Scala’s? I don’t want to write down the complete type of every local variable.

Scala’s local type inference should be supported

Operator overloading

OK, this is a highly religious topic. Many of you won’t agree. But I just like it

Java should support operator overloading

Some library operations are just better expressed using operators, rather than methods. Think of BigInteger and BigDecimal’s horribly verbose API.

Any other ideas? Add comments!

Of course, lambdas and extension methods are missing and generics are erased. While the latter will never be fixed, the first will be in Java 8. So lets forgive Sun and Oracle for making us wait so long for lambdas


Java trivia: the double-checked locking pattern


Some Java trivia:

In most cases, it is sufficient to simply mark a lazy initialising method as synchronized. The following example can be found in the Wikipedia article about double-checked locking:

// Correct but possibly expensive multithreaded version
class Foo {
    private Helper helper = null;
    public synchronized Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }
 
    // other functions and members...
}

Sometimes, however, you want to avoid obtaining a lock on the Foo instance every time you access Foo’s Helper. Then you may choose to apply double-checked locking (only with Java 1.5+. A good article why it didn’t work prior to Java 1.5 can be found here). If you forget how it works, or doubt whether it works at all, consider looking at the source code of java.io.File.toPath():

// "volatile" is of the essence, here:
private volatile transient Path filePath;

public Path toPath() {
    Path result = filePath;
    if (result == null) {
        synchronized (this) {
            result = filePath;
            if (result == null) {
                result = FileSystems.getDefault().getPath(path);
                filePath = result;
            }
        }
    }
    return result;
}

Defensive API evolution with Java interfaces


API evolution is something absolutely non-trivial. Something that only few have to deal with. Most of us work on internal, proprietary APIs every day. Modern IDEs ship with awesome tooling to factor out, rename, pull up, push down, indirect, delegate, infer, generalise our code artefacts. These tools make refactoring our internal APIs a piece of cake.

But some of us work on public APIs, where the rules change drastically. Public APIs, if done properly, are versioned. Every change – compatible or incompatible – should be published in a new API version. Most people will agree that API evolution should be done in major and minor releases, similar to what is specified in semantic versioning. In short: Incompatible API changes are published in major releases (1.0, 2.0, 3.0), whereas compatible API changes / enhancements are published in minor releases (1.0, 1.1, 1.2).

If you’re planning ahead, you’re going to foresee most of your incompatible changes a long time before actually publishing the next major release. A good tool in Java to announce such a change early is deprecation.

Interface API evolution

Now, deprecation is a good tool to indicate that you’re about to remove a type or member from your API. What if you’re going to add a method, or a type to an interface’s type hierarchy? This means that all client code implementing your interface will break – at least as long as Java 8′s defender methods aren’t introduced yet. There are several techniques to circumvent / work around this problem:

1. Don’t care about it

Yes, that’s an option too. Your API is public, but maybe not so much used. Let’s face it: Not all of us work on the JDK / Eclipse / Apache / etc codebases.

If you’re friendly, you’re at least going to wait for a major release to introduce new methods. But you can break the rules of semantic versioning if you really have to – if you can deal with the consequences of getting a mob of angry users.

Note, though, that other platforms aren’t as backwards-compatible as the Java universe (often by language design, or by language complexity). E.g. with Scala’s various ways of declaring things as implicit, your API can’t always be perfect.

2. Do it the Java way

The “Java” way is not to evolve interfaces at all. Most API types in the JDK have been the way they are today forever. Of course, this makes APIs feel quite “dinosaury” and adds a lot of redundancy between various similar types, such as StringBuffer and StringBuilder, or Hashtable and HashMap.

Note that some parts of Java don’t adhere to the “Java” way. Most specifically, this is the case for the JDBC API, which evolves according to the rules of section #1: “Don’t care about it”.

3. Do it the Eclipse way

Eclipse’s internals contain huge APIs. There are a lot of guidelines how to evolve your own APIs (i.e. public parts of your plugin), when developing for / within Eclipse. One example about how the Eclipse guys extend interfaces is the IAnnotationHover type. By Javadoc contract, it allows implementations to also implement IAnnotationHoverExtension and IAnnotationHoverExtension2. Obviously, in the long run, such an evolved API is quite hard to maintain, test, and document, and ultimately, hard to use! (consider ICompletionProposal and its 6 (!) extension types)

4. Wait for Java 8

In Java 8, you will be able to make use of defender methods. This means that you can provide a sensible default implementation for your new interface methods as can be seen in Java 1.8′s java.util.Iterator (an extract):

public interface Iterator<E> {

    // These methods are kept the same:
    boolean hasNext();
    E next();

    // This method is now made "optional" (finally!)
    public default void remove() {
        throw new UnsupportedOperationException("remove");
    }

    // This method has been added compatibly in Java 1.8
    default void forEach(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        while (hasNext())
            consumer.accept(next());
    }
}

Of course, you don’t always want to provide a default implementation. Often, your interface is a contract that has to be implemented entirely by client code.

5. Provide public default implementations

In many cases, it is wise to tell the client code that they may implement an interface at their own risk (due to API evolution), and they should better extend a supplied abstract or default implementation, instead. A good example for this is java.util.List, which can be a pain to implement correctly. For simple, not performance-critical custom lists, most users probably choose to extend java.util.AbstractList instead. The only methods left to implement are then get(int) and size(), The behaviour of all other methods can be derived from these two:

class EmptyList<E> extends AbstractList<E> {
    @Override
    public E get(int index) {
        throw new IndexOutOfBoundsException("No elements here");
    }

    @Override
    public int size() {
        return 0;
    }
}

A good convention to follow is to name your default implementation AbstractXXX if it is abstract, or DefaultXXX if it is concrete

6. Make your API very hard to implement

Now, this isn’t really a good technique, but just a probable fact. If your API is very hard to implement (you have 100s of methods in an interface), then users are probably not going to do it. Note: probably. Never underestimate the crazy user. An example of this is jOOQ’s org.jooq.Field type, which represents a database field / column. In fact, this type is part of jOOQ’s internal domain specific language, offering all sorts of operations and functions that can be performed upon a database column.

Of course, having so many methods is an exception and – if you’re not designing a DSL – is probably a sign of a bad overall design.

7. Add compiler and IDE tricks

Last but not least, there are some nifty tricks that you can apply to your API, to help people understand what they ought to do in order to correctly implement your interface-based API. Here’s a tough example, that slaps the API designer’s intention straight into your face. Consider this extract of the org.hamcrest.Matcher API:

public interface Matcher<T> extends SelfDescribing {

    // This is what a Matcher really does.
    boolean matches(Object item);
    void describeMismatch(Object item, Description mismatchDescription);

    // Now check out this method here:

    /**
     * This method simply acts a friendly reminder not to implement 
     * Matcher directly and instead extend BaseMatcher. It's easy to 
     * ignore JavaDoc, but a bit harder to ignore compile errors .
     *
     * @see Matcher for reasons why.
     * @see BaseMatcher
     * @deprecated to make
     */
    @Deprecated
    void _dont_implement_Matcher___instead_extend_BaseMatcher_();
}

“Friendly reminder”, come on. ;-)

Other ways

I’m sure there are dozens of other ways to evolve an interface-based API. I’m curious to hear your thoughts!


A dirt-ugly hack to modify private final fields in Java


We all use reflection from time to time. We may even tamper with visibility through Java’s Field.setAccessible() and similar methods. But this post here takes things to the extreme and shows how to modify private (static) final fields in Java. Think twice, when choosing this tool ;-)

http://zarnekow.blogspot.ch/2013/01/java-hacks-changing-final-fields.html


Follow

Get every new post delivered to your Inbox.

Join 217 other followers