How to Eliminate Bugs Through High Cohesion

Intuition tells us that methods like these ones suffer from a distinct code smell:

CompilationTask getTask(
    Writer out,
    JavaFileManager fileManager,
    DiagnosticListener<? super JavaFileObject> 
        diagnosticListener,
    Iterable<String> options,
    Iterable<String> classes,
    Iterable<? extends JavaFileObject> 
        compilationUnits
);

Why is that so? Let’s delve into this intuition. Here is an example from the JavaCompiler Javadoc:

Iterable<? extends JavaFileObject> compilationUnits1 =
    fileManager.getJavaFileObjectsFromFiles(
        Arrays.asList(files1));

compiler.getTask(null, fileManager, null, 
                 null, null, compilationUnits1)
        .call();

So what’s wrong here? We have a lot of very disjunctly typed parameters that are very likely to be set to null. This decreases the above method’s reusability, or in the terms of the JArchitect guys, we’re probably in the “Zone of Pain” as we have a low level of stability coupled with a low level of abstractness.

  • Low stability: it is very likely that we’re going to need another very specific argument in a future version of the JavaCompiler, e.g. another Iterable of something. This will make for incompatible API enhancement
  • Low abstractness: Even if the above is an interface method, there is very little chance of this method being implemented more than once, as it is quite hard to fulfil the above contract in a useful way.

A common way to circumvent this issue for single methods is to use the builder pattern as Petri Kainulainen nicely described it.

High cohesion instead of “Zone of Pain”

Maybe, for this compiler API, this isn’t too important you may think. But the biggest value of “high cohesion”, i.e. of an ideal stability / abstractness balance is the fact that you have highly reusable code. This isn’t just good because your developers spend less time implementing a specific task, it also means that your code es extremely error-resistant. For example, check out the data type conversion logic from the internals of jOOQ:

jOOQ's data type conversion call hierarchy
jOOQ’s data type conversion hierarchy

The above is just an extract of the call hierarchy leading towards a single data type conversion API that is indirectly used in the whole framework. Everything leads through there, so if there is any data type conversion bug, it is either

  • Extremely local to a single method / single leaf of the above tree representation
  • Extremely global to the whole tree

In other words, any bug related to data type conversion is either merely cosmetic, or completely catastrophic. Which basically means that there is almost no possibility for a regression in that area, as any data type conversion regression will immediately break hundreds of unit and integration tests. This is a major benefit of having high cohesion in your code.

How to attain high cohesion

It’s simple: By refactoring mercilessly. You should never introduce a new feature only locally. For instance, let’s consider this fix here [#3023] DefaultRecordMapper does not map nested UDTs onto nested POJOs. So we want the jOOQ RecordMapperProvider feature to be applied to nested records. Why? Imagine we have a PERSON table with Oracle OBJECT types for ADDRESS and STREET properties. Yes, you could also just normalise this data, but imagine we are using UDTs:

CREATE TYPE street_type AS OBJECT (
  street VARCHAR2(100),
  no VARCHAR2(30)
);

CREATE TYPE address_type AS OBJECT (
  street street_type,
  zip VARCHAR2(50),
  city VARCHAR2(50)
);

And now, we would like to recursively map this data onto custom nested POJOs:

public class Street {
    public String street;
    public String number;
}

public class Address {
    public Street street;
    public String city;
    public String country;
}

public class Person {
    public String firstName;
    public String lastName;
    public Address address;
}

And the mapping should be available through:

// The configuration object contains the
// Mapping algorithm implementation
Person person = DSL.using(configuration)
                   .selectFrom(PERSON)
                   .where(PERSON.ID.eq(1))

// We want to make the mapping algorithm recursive
// to automatically map Address and Street as well
                   .fetchOneInto(Person.class);

Mapping of a Record onto a POJO is already implemented, but recursion is not. And when we do implement recursion, we want to respect the existing, aforementioned customisable mapping SPI that was introduced in jOOQ 3.1. It’s very simple, we just have a single implementation point at the top in the ConvertAll type.

Implementing this in a highly cohesive code base means that:

  • We have to implement this new feature only once
  • Implementing this new feature costs less effort than writing this blog post
  • Nesting of record mapping and conversion will work for all use-cases in one go
  • We have only slightly increased complexity (low risk of bugs) while adding an awesome new feature

Do you refactor mercilessly? tweet this

The perfect design cannot be foreseen. It grows, slowly. Today, we know so many things about Java and collections, it took a while for the new Streams API to surface. No one would have implemented such a great new API in the JDK 1.2 from scratch, although from that perspective, it has already been pretty good at the time.

This mainly means two things for you:

  • For your essential core code, it is important to get it to a state where you attain high cohesion. If you’re an E-Banking vendor, your payment and brokerage logic should be exactly as above, with a balanced stability / abstractness ratio
  • For your non-essential code (e.g. UI / DB-access), you should rely on third-party software, because someone else will spend a lot more time at getting their code on a high level of quality (UI: such as Vaadin, ZK or DB-access: such as Hibernate, jOOQ, Spring Data, just to name a few)

… and if you request a new feature from a highly cohesive framework, it might just be that the only thing that needs to be done is these four lines of code.

Usability vs. Reusability

This blog post I’ve found from 2009 has a nice way of looking at the problem of comparing the ease of use with the ability to reuse. It claims that usability and reusability is always a tradeoff between building

  • a heavyweight, coarse-grained software component with few dependencies (very usable)
  • a lightweight, fine-grained software component with complex dependencies (very reusable)

The following picture nicely depicts this relationship:

Image taken from: http://techdistrict.kirkk.com/2009/07/08/reuse-is-the-dream-dead

Indeed, coarse-grained components are hard to reuse because they try to solve too many problems in the context of the coarse overall point of view. But I’m not sure if they’re necessarily easier to “use”. Being coarse, and thus complex, they may have solved the problem in the wrong way. And since they’re so complex, they cannot be changed easily to fit a slightly different problem domain. Since time can change the original problem setup (and it will in any project), heavyweight, coarse-grained components often cannot even be used nor reused for their “original” purpose. I.e. a coarse solution that is developed in a project over several years cannot be finished, because it cannot be changed after those four years.

And fine-grained components aren’t necessarily hard to use. It is possible to create components with very little dependencies, such that they do not introduce a lot of complexity. jOOQ is one example of such a component, which has no dependencies itself, apart from the JDK. But jOOQ is a library and not a business module, i.e. it implements horizontal reusability, not vertical reusability.

So, let’s hope the original post was not entirely correct and there is a good, middle way! See for yourself:
http://techdistrict.kirkk.com/2009/07/08/reuse-is-the-dream-dead

The Dangers of Correlating Subtype Polymorphism with Generic Polymorphism

Java 5 has introduced generic polymorphism to the Java ecosystem. This has been a great addition to the Java language, even if we’re all aware of the numerous caveats due to generic type erasure and the consequences thereof. Generic polymorphism (also known as parametric polymorphism) is usually maintained orthogonally to possibly pre-existing subtype polymorphism. A simple example for this is the collections API

List<? extends Number> c = new ArrayList<Integer>();

In the above example, the subtype ArrayList is assigned to a variable of the super type List. At the same time ArrayList is parameterised with the type Integer, which can be assigned to the compatible parameter supertype ? extends Number. This usage of subtype polymorphism in the context of generic polymorphism is also called covariance, although covariance can also be achieved in non-generic contexts, of course.

Covariance with Generic Polymorphism

Covariance is important with generics. It allows for creating complex type systems. Easy examples involve using covariance with generic methods:

<E extends Serializable> void serialize(
    Collection<E> collection) {}

The above example accepts any Collection type, which can be subtyped at the call-site with types such as List, ArrayList, Set, and many more. At the same time, the generic type argument at the call site is only required to be a subtype of Serializable. I.e. it could be a List<Integer> or an ArrayList<String>, etc.

Correlating Subtype Polymorphism with Generic Polymorphism

People are then often lured into correlating the two orthogonal types of polymorphism. A simple example of such a correlation would be to specialise an IntegerList or StringSet as such:

class IntegerList extends ArrayList<Integer> {}
class StringSet extends HashSet<String> {}

It is easy to see that the number of explicit types will explode, if you start to span the cartesian product of the subtype and generic type hierarchies, wanting to specialise more precisely by creating things like IntegerArrayList, IntegerAbstractList, IntegerLinkedList etc.

Making the Correlation Generic

As seen above, such correlations will often remove the genericity from the type hierarchy, although they are not required to do so. This can be seen in the following, more general example:

// AnyContainer can contain AnyObject
class AnyContainer<E extends AnyObject> {}
class AnyObject {}

// PhysicalContainer contains only PhysicalObjects
class PhysicalContainer<E extends PhysicalObject>
  extends AnyContainer<E> {}
class PhysicalObject extends AnyObject {}

// FruitContainer contains only Fruit,
// which in turn are PhysicalObjects
class FruitContainer<E extends Fruit>
  extends PhysicalContainer<E> {}
class Fruit extends PhysicalObject {}

The above example is a typical one, where the API designer was lured into correlating subtype polymorphism (Fruit extends PhysicalObject extends AnyObject) with generic polymorphism (<E>), while keeping it generic, allowing to add further subtypes below FruitContainer. This gets more interesting when AnyObject should know its own subtype, generically. This can be achieved with a recursive generic parameter. Let’s fix the previous example

// AnyContainer can contain AnyObject
class AnyContainer<E extends AnyObject<E>> {}
class AnyObject<O extends AnyObject<O>> {}

// PhysicalContainer contains only PhysicalObjects
class PhysicalContainer<E extends PhysicalObject<E>>
  extends AnyContainer<E> {}
class PhysicalObject<O extends PhysicalObject<O>>
  extends AnyObject<O> {}

// FruitContainer contains only Fruit,
// which in turn are PhysicalObjects
class FruitContainer<E extends Fruit<E>>
  extends PhysicalContainer<E> {}
class Fruit<O extends Fruit<O>>
  extends PhysicalObject<O> {}

The interesting part here are no longer the containers, but the AnyObject type hierarchy, which correlates subtype polymorphism with generic polymorphism on its own type! This is also done with java.lang.Enum:

public class Enum<E extends Enum<E>>
implements Comparable<E> {
  public final int compareTo(E other) { ... }
  public final Class<E> getDeclaringClass() { ... }
}

enum MyEnum {}

// Which is syntactic sugar for:
final class MyEnum extends Enum<MyEnum> {}

Where Lies the Danger?

The subtle difference between enums and our custom AnyObject hierarchy is the fact that MyEnum terminates recursive self-correlation of the two orthogonal typing techniques by being final! AnyObject subtypes, on the other hand should not be allowed to remove the generic type parameter, unless they are made final as well. An example:

// "Dangerous"
class Apple extends Fruit<Apple> {}

// "Safe"
final class Apple extends Fruit<Apple> {}

Why is final so important, or in other words, why must AnyObject subtypes be careful when terminating recursive self-correlation, such as Apple did, before? It’s simple. Let’s assume the following addition:

class AnyObject<O extends AnyObject<O>>
  implements Comparable<O> {

  @Override
  public int compareTo(O other) { ... }
  public AnyContainer<O> container() { ... }
}

The above contract on AnyObject.compareTo() implies that any subtype of AnyObject can only ever be compared to the the same subtype. The following is not possible:

Fruit<?> fruit = // ...
Vegetable<?> vegetable = // ...

// Compilation error!
fruit.compareTo(vegetable);

The only currently comparable type in the hierarchy is Apple:

Apple a1 = new Apple();
Apple a2 = new Apple();

a1.compareTo(a2);

But what if we wanted to add GoldenDelicious and Gala apples?

class GoldenDelicious extends Apple {}
class Gala extends Apple {}

We can now compare them!

GoldenDelicious g1 = new GoldenDelicious();
Gala g2 = new Gala();

g1.compareTo(g2);

This was not the intention of the author of AnyObject!

The same applies to the container() method. Subtypes are allowed to covariantly specialise the AnyContainer type, which is fine:

class Fruit<O extends Fruit<O>>
  extends PhysicalObject<O> {

  @Override
  public FruitContainer<O> container() { ... }
}

But what happens to the container() method in GoldenDelicious and Gala?

GoldenDelicious g = new GoldenDelicious();
FruitContainer<Apple> c = g.container();

Yes, it will return an Apple container, not a GoldenDelicious container as intended by the AnyObject designer.

Conclusion

Subtype polymorphism and generic polymorphism span orthogonal type axes. Making them correlate can be a design smell in your type system. Making them correlate on the same type is dangerous, as it is hard to get right. Users will try to terminate the recursive generic type definition on a subtype of your base type. The reason for this termination is the fact that base types with recursive self-bounds are hard to use. But the termination often goes wrong, as it should only be done on final classes, not regular classes or interfaces.

In other words, if you think you need a recursive generic type definition on a common base type, think again very carefully, if you really need it and if your type users can correctly terminate the recursive generic type definition in a final class.

The Visitor Pattern Re-visited

The visitor pattern is one of the most overrated and yet underestimated patterns in object-oriented design. Overrated, because it is often chosen too quickly (possibly by an architecture astronaut), and then bloats an otherwise very simple design, when added in the wrong way. Underestimated, because it can be very powerful, if you don’t follow the school-book example. Let’s have a look in detail.

Problem #1: The naming

Its biggest flaw (in my opinion) is its naming itself. The “visitor” pattern. When we google it, we most likely find ourselves on the related Wikipedia article, showing funny images like this one:

Wikipedia Visitor Pattern example
Wikipedia Visitor Pattern example

Right. For the 98% of us thinking in wheels and engines and bodies in their every day software engineering work, this is immediately clear, because we know that the mechanic billing us several 1000$ for mending our car will first visit the wheels, then the engine, before eventually visiting our wallet and accepting our cash. If we’re unfortunate, he’ll also visit our wife while we’re at work, but she’ll never accept, that faithful soul.

But what about the 2% that solve other problems in their worklife? Like when we code complex data structures for E-Banking systems, stock exchange clients, intranet portals, etc. etc. Why not apply a visitor pattern to a truly hierarchical data structure? Like folders and files? (ok, not so complex after all)

OK, so we’ll “visit” folders and every folder is going to let its files “accept” a “visitor” and then we’ll let the visitor “visit” the files, too. What?? The car lets its parts accept the visitor and then let the visitor visit itself? The terms are misleading. They’re generic and good for the design pattern. But they will kill your real-life design, because no one thinks in terms of “accepting” and “visiting”, when in fact, you read/write/delete/modify your file system.

Problem #2: The polymorphism

This is the part that causes even more headache than the naming, when applied to the wrong situation. Why on earth does the visitor know everyone else? Why does the visitor need a method for every involved element in the hierarchy? Polymorphism and encapsulation claim that the implementation should be hidden behind an API. The API (of our data structure) probably implements the composite pattern in some way, i.e. its parts inherit from a common interface. OK, of course, a wheel is not a car, neither is my wife a mechanic. But when we take the folder/file structure, aren’t they all java.util.File objects?

Understanding the problem

The actual problem is not the naming and horrible API verbosity of visiting code, but the mis-understanding of the pattern. It’s not a pattern that is best suited for visiting large and complex data structures with lots of objects of different types. It’s the pattern that is best suited for visiting simple data structures with few different types, but visiting them with hundreds of visitors. Take files and folders. That’s a simple data structure. You have two types. One can contain the other, both share some properties. Various visitors could be:

  • CalculateSizeVisitor
  • FindOldestFileVisitor
  • DeleteAllVisitor
  • FindFilesByContentVisitor
  • ScanForVirusesVisitor
  • … you name it

I still dislike the naming, but the pattern works perfectly in this paradigm.

So when is the visitor pattern “wrong”?

I’d like to give the jOOQ QueryPart structure as an example. There are a great many of them, modelling various SQL query constructs, allowing jOOQ to build and execute SQL queries of arbitrary complexity. Let’s name a few examples:

  • Condition
    • CombinedCondition
    • NotCondition
    • InCondition
    • BetweenCondition
  • Field
    • TableField
    • Function
    • AggregateFunction
    • BindValue
  • FieldList

There are many more. Each one of them must be able to perform two actions: render SQL and bind variables. That would make two visitors each one knowing more than… 40-50 types…? Maybe in the faraway future, jOOQ queries will be able to render JPQL or some other query type. That would make 3 visitors against 40-50 types. Clearly, here, the classic visitor pattern is a bad choice. But I still want to “visit” the QueryParts, delegating rendering and binding to lower levels of abstraction.

How to implement this, then?

It’s simple: Stick with the composite pattern! It allows you to add some API elements to your data structure, that everyone has to implement.

So by intuition, step 1 would be this

interface QueryPart {
  // Let the QueryPart return its SQL
  String getSQL();

  // Let the QueryPart bind variables to a prepared
  // statement, given the next bind index, returning
  // the last bind index
  int bind(PreparedStatement statement, int nextIndex);
}

With this API, we can easily abstract a SQL query and delegate the responsibilities to lower-level artefacts. A BetweenCondition for instance. It takes care of correctly ordering the parts of a [field] BETWEEN [lower] AND [upper] condition, rendering syntactically correct SQL, delegating parts of the tasks to its child-QueryParts:

class BetweenCondition {
  Field field;
  Field lower;
  Field upper;

  public String getSQL() {
    return field.getSQL() + " between " +
           lower.getSQL() + " and " +
           upper.getSQL();
  }

  public int bind(PreparedStatement statement, int nextIndex) {
    int result = nextIndex;

    result = field.bind(statement, result);
    result = lower.bind(statement, result);
    result = upper.bind(statement, result);

    return result;
  }
}

Whereas BindValue on the other hand, would mainly take care of variable binding

class BindValue {
  Object value;

  public String getSQL() {
    return "?";
  }

  public int bind(PreparedStatement statement, int nextIndex) {
    statement.setObject(nextIndex, value);
    return nextIndex + 1;
  }
}

Combined, we can now easily create conditions of this form: ? BETWEEN ? AND ?. When more QueryParts are implemented, we could also imagine things like MY_TABLE.MY_FIELD BETWEEN ? AND (SELECT ? FROM DUAL), when appropriate Field implementations are available. That’s what makes the composite pattern so powerful, a common API and many components encapsulating behaviour, delegating parts of the behaviour to sub-components.

Step 2 takes care of API evolution

The composite pattern that we’ve seen so far is pretty intuitive, and yet very powerful. But sooner or later, we will need more parameters, as we find out that we want to pass state from parent QueryParts to their children. For instance, we want to be able to inline some bind values for some clauses. Maybe, some SQL dialects do not allow bind values in the BETWEEN clause. How to handle that with the current API? Extend it, adding a “boolean inline” parameter? No! That’s one of the reasons why the visitor pattern was invented. To keep the API of the composite structure elements simple (they only have to implement “accept”). But in this case, much better than implementing a true visitor pattern is to replace parameters by a “context”:

interface QueryPart {
  // The QueryPart now renders its SQL to the context
  void toSQL(RenderContext context);

  // The QueryPart now binds its variables to the context
  void bind(BindContext context);
}

The above contexts would contain properties like these (setters and render methods return the context itself, to allow for method chaining):

interface RenderContext {
  // Whether we're inlining bind variables
  boolean inline();
  RenderContext inline(boolean inline);

  // Whether fields should be rendered as a field declaration
  // (as opposed to a field reference). This is used for aliased fields
  boolean declareFields();
  RenderContext declareFields(boolean declare);

  // Whether tables should be rendered as a table declaration
  // (as opposed to a table reference). This is used for aliased tables
  boolean declareTables();
  RenderContext declareTables(boolean declare);

  // Whether we should cast bind variables
  boolean cast();

  // Render methods
  RenderContext sql(String sql);
  RenderContext sql(char sql);
  RenderContext keyword(String keyword);
  RenderContext literal(String literal);

  // The context's "visit" method
  RenderContext sql(QueryPart sql);
}

The same goes for the BindContext. As you can see, this API is quite extensible, new properties can be added, other common means of rendering SQL can be added, too. But the BetweenCondition does not have to surrender its encapsulated knowledge about how to render its SQL, and whether bind variables are allowed or not. It’ll keep that knowledge to itself:

class BetweenCondition {
  Field field;
  Field lower;
  Field upper;

  // The QueryPart now renders its SQL to the context
  public void toSQL(RenderContext context) {
    context.sql(field).keyword(" between ")
           .sql(lower).keyword(" and ")
           .sql(upper);
  }

  // The QueryPart now binds its variables to the context
  public void bind(BindContext context) {
    context.bind(field).bind(lower).bind(upper);
  }
}

Whereas BindValue on the other hand, would mainly take care of variable binding

class BindValue {
  Object value;

  public void toSQL(RenderContext context) {
    context.sql("?");
  }

  public void bind(BindContext context) {
    context.statement().setObject(context.nextIndex(), value);
  }
}

Conclusion: Name it Context-Pattern, not Visitor-Pattern

Be careful when jumping quickly to the visitor pattern. In many many cases, you’re going to bloat your design, making it utterly unreadable und difficult to debug. Here are the rules to remember, summed up:

  1. If you have many many visitors and a relatively simple data structure (few types), the visitor pattern is probably OK.
  2. If you have many many types and a relatively small set of visitors (few behaviours), the visitor pattern is overkill, stick with the composite pattern
  3. To allow for simple API evolution, design your composite objects to have methods taking a single context parameter.
  4. All of a sudden, you will find yourself with an “almost-visitor” pattern again, where context=visitor, “visit” and “accept”=”your proprietary method names”

The “Context Pattern” is at the same time intuitive like the “Composite Pattern”, and powerful as the “Visitor Pattern”, combining the best of both worlds.