The Visitor Pattern Re-visited

The visitor pattern is one of the most overrated and yet underestimated patterns in object-oriented design. Overrated, because it is often chosen too quickly (possibly by an architecture astronaut), and then bloats an otherwise very simple design, when added in the wrong way. Underestimated, because it can be very powerful, if you don’t follow the school-book example. Let’s have a look in detail.

Problem #1: The naming

Its biggest flaw (in my opinion) is its naming itself. The “visitor” pattern. When we google it, we most likely find ourselves on the related Wikipedia article, showing funny images like this one:

Right. For the 98% of us thinking in wheels and engines and bodies in their every day software engineering work, this is immediately clear, because we know that the mechanic billing us several 1000$ for mending our car will first visit the wheels, then the engine, before eventually visiting our wallet and accepting our cash. If we’re unfortunate, he’ll also visit our wife while we’re at work, but she’ll never accept, that faithful soul. But what about the 2% that solve other problems in their worklife? Like when we code complex data structures for E-Banking systems, stock exchange clients, intranet portals, etc. etc. Why not apply a visitor pattern to a truly hierarchical data structure? Like folders and files? (ok, not so complex after all) OK, so we’ll “visit” folders and every folder is going to let its files “accept” a “visitor” and then we’ll let the visitor “visit” the files, too. What?? The car lets its parts accept the visitor and then let the visitor visit itself? The terms are misleading. They’re generic and good for the design pattern. But they will kill your real-life design, because no one thinks in terms of “accepting” and “visiting”, when in fact, you read/write/delete/modify your file system.

Problem #2: The polymorphism

This is the part that causes even more headache than the naming, when applied to the wrong situation. Why on earth does the visitor know everyone else? Why does the visitor need a method for every involved element in the hierarchy? Polymorphism and encapsulation claim that the implementation should be hidden behind an API. The API (of our data structure) probably implements the composite pattern in some way, i.e. its parts inherit from a common interface. OK, of course, a wheel is not a car, neither is my wife a mechanic. But when we take the folder/file structure, aren’t they all java.util.File objects?

Understanding the problem

The actual problem is not the naming and horrible API verbosity of visiting code, but the mis-understanding of the pattern. It’s not a pattern that is best suited for visiting large and complex data structures with lots of objects of different types. It’s the pattern that is best suited for visiting simple data structures with few different types, but visiting them with hundreds of visitors. Take files and folders. That’s a simple data structure. You have two types. One can contain the other, both share some properties. Various visitors could be:

CalculateSizeVisitor
FindOldestFileVisitor
DeleteAllVisitor
FindFilesByContentVisitor
ScanForVirusesVisitor
… you name it

I still dislike the naming, but the pattern works perfectly in this paradigm.

So when is the visitor pattern “wrong”?

I’d like to give the jOOQ QueryPart structure as an example. There are a great many of them, modelling various SQL query constructs, allowing jOOQ to build and execute SQL queries of arbitrary complexity. Let’s name a few examples:

Condition
- CombinedCondition
- NotCondition
- InCondition
- BetweenCondition
Field
- TableField
- Function
- AggregateFunction
- BindValue
FieldList

There are many more. Each one of them must be able to perform two actions: render SQL and bind variables. That would make two visitors each one knowing more than… 40-50 types…? Maybe in the faraway future, jOOQ queries will be able to render JPQL or some other query type. That would make 3 visitors against 40-50 types. Clearly, here, the classic visitor pattern is a bad choice. But I still want to “visit” the QueryParts, delegating rendering and binding to lower levels of abstraction.

How to implement this, then?

It’s simple: Stick with the composite pattern! It allows you to add some API elements to your data structure, that everyone has to implement. So by intuition, step 1 would be this


interface QueryPart {
  // Let the QueryPart return its SQL
  String getSQL();

  // Let the QueryPart bind variables to a prepared
  // statement, given the next bind index, returning
  // the last bind index
  int bind(PreparedStatement statement, int nextIndex);
}

With this API, we can easily abstract a SQL query and delegate the responsibilities to lower-level artefacts. A BetweenCondition for instance. It takes care of correctly ordering the parts of a [field] BETWEEN [lower] AND [upper] condition, rendering syntactically correct SQL, delegating parts of the tasks to its child-QueryParts:


class BetweenCondition {
  Field field;
  Field lower;
  Field upper;

  public String getSQL() {
    return field.getSQL() + " between " +
           lower.getSQL() + " and " +
           upper.getSQL();
  }

  public int bind(PreparedStatement statement, int nextIndex) {
    int result = nextIndex;

    result = field.bind(statement, result);
    result = lower.bind(statement, result);
    result = upper.bind(statement, result);

    return result;
  }
}

Whereas BindValue on the other hand, would mainly take care of variable binding


class BindValue {
  Object value;

  public String getSQL() {
    return "?";
  }

  public int bind(PreparedStatement statement, int nextIndex) {
    statement.setObject(nextIndex, value);
    return nextIndex + 1;
  }
}

Combined, we can now easily create conditions of this form: ? BETWEEN ? AND ?. When more QueryParts are implemented, we could also imagine things like MY_TABLE.MY_FIELD BETWEEN ? AND (SELECT ? FROM DUAL), when appropriate Field implementations are available. That’s what makes the composite pattern so powerful, a common API and many components encapsulating behaviour, delegating parts of the behaviour to sub-components. Step 2 takes care of API evolution The composite pattern that we’ve seen so far is pretty intuitive, and yet very powerful. But sooner or later, we will need more parameters, as we find out that we want to pass state from parent QueryParts to their children. For instance, we want to be able to inline some bind values for some clauses. Maybe, some SQL dialects do not allow bind values in the BETWEEN clause. How to handle that with the current API? Extend it, adding a “boolean inline” parameter? No! That’s one of the reasons why the visitor pattern was invented. To keep the API of the composite structure elements simple (they only have to implement “accept”). But in this case, much better than implementing a true visitor pattern is to replace parameters by a “context”:


interface QueryPart {
  // The QueryPart now renders its SQL to the context
  void toSQL(RenderContext context);

  // The QueryPart now binds its variables to the context
  void bind(BindContext context);
}

The above contexts would contain properties like these (setters and render methods return the context itself, to allow for method chaining):


interface RenderContext {
  // Whether we're inlining bind variables
  boolean inline();
  RenderContext inline(boolean inline);

  // Whether fields should be rendered as a field declaration
  // (as opposed to a field reference). This is used for aliased fields
  boolean declareFields();
  RenderContext declareFields(boolean declare);

  // Whether tables should be rendered as a table declaration
  // (as opposed to a table reference). This is used for aliased tables
  boolean declareTables();
  RenderContext declareTables(boolean declare);

  // Whether we should cast bind variables
  boolean cast();

  // Render methods
  RenderContext sql(String sql);
  RenderContext sql(char sql);
  RenderContext keyword(String keyword);
  RenderContext literal(String literal);

  // The context's "visit" method
  RenderContext sql(QueryPart sql);
}

The same goes for the BindContext. As you can see, this API is quite extensible, new properties can be added, other common means of rendering SQL can be added, too. But the BetweenCondition does not have to surrender its encapsulated knowledge about how to render its SQL, and whether bind variables are allowed or not. It’ll keep that knowledge to itself:


class BetweenCondition {
  Field field;
  Field lower;
  Field upper;

  // The QueryPart now renders its SQL to the context
  public void toSQL(RenderContext context) {
    context.sql(field).keyword(" between ")
           .sql(lower).keyword(" and ")
           .sql(upper);
  }

  // The QueryPart now binds its variables to the context
  public void bind(BindContext context) {
    context.bind(field).bind(lower).bind(upper);
  }
}

Whereas BindValue on the other hand, would mainly take care of variable binding


class BindValue {
  Object value;

  public void toSQL(RenderContext context) {
    context.sql("?");
  }

  public void bind(BindContext context) {
    context.statement().setObject(context.nextIndex(), value);
  }
}

Conclusion: Name it Context-Pattern, not Visitor-Pattern

Be careful when jumping quickly to the visitor pattern. In many many cases, you’re going to bloat your design, making it utterly unreadable und difficult to debug. Here are the rules to remember, summed up:

If you have many many visitors and a relatively simple data structure (few types), the visitor pattern is probably OK.
If you have many many types and a relatively small set of visitors (few behaviours), the visitor pattern is overkill, stick with the composite pattern
To allow for simple API evolution, design your composite objects to have methods taking a single context parameter.
All of a sudden, you will find yourself with an “almost-visitor” pattern again, where context=visitor, “visit” and “accept”=”your proprietary method names”

The “Context Pattern” is at the same time intuitive like the “Composite Pattern”, and powerful as the “Visitor Pattern”, combining the best of both worlds.

22 thoughts on “The Visitor Pattern Re-visited”

Frisian says:

April 16, 2012 at 08:38

Very insightful article, especially the rules of thumb for choosing one pattern over the other.
But would a BetweenCondition be allowed to change the “inline” flag on the RenderContext? If not, it would make sense to split up the RenderContext into two interfaces, RenderSettings (immutable flags and values) and RenderContext (mutable part).

Loading...

1. lukaseder says:
  
  April 16, 2012 at 10:46
  
  Thanks for the feedback. Yes, the inline flag may be changed (carefully) by any QueryPart. The reason for this specific case is that bind values are always “child QueryParts” of a QueryPart that has knowledge about whether its children must be inlined or not. An example for this is the LIMIT clause, which renders LIMIT/FETCH/TOP/ROW_NUMBER() paging constructs, depending on the SQL dialect. DB2, Ingres, SQL Server, Sybase ASE, and Sybase SQL Anywhere don’t allow bind values in these clauses. This knowledge is encapsulated in the Limit QueryPart. A similar rule applies to Oracle’s PIVOT clause, which doesn’t accept bind values either.
  
  Generally, you’re right of course. The immutable flags simply don’t have a setter. For example: the underlying JDBC Connection, or the configured SQL dialect. It doesn’t make sense to switch from the Oracle dialect to SQL Server in a given QueryPart
  
  Loading...
  
Eric Schwarzenbach says:

April 27, 2012 at 16:39

Nice!

This reminds me of another rant about the visitor pattern http://etymon.blogspot.com/2006/04/visitor-pattern-and-trees-considered.html and its followup http://etymon.blogspot.com/2006/04/more-on-visitor-pattern.html, which you may find interesting.

The visitor pattern is often described as a solution to what some have dubbed “the extensibility problem” and others “the expression” problem. It’s a sticky problem and you can find lots of writeups out there offering solutions to it with various cutting edge functional language features. The visitor pattern is a cure I usually find worse than the problem.

Loading...

1. lukaseder says:
  
  April 27, 2012 at 17:27
  
  Thanks for those links. Yes, I can see how this pattern is even worse when applied to something like an AST, which has by definition a gazillion number of node types, constantly changing as the underlying language evolves…
  
  Too bad, I’m not such a crack in compiler business to understand the full meaning of this rant ;-)
  
  Loading...
  
idugic says:

October 24, 2012 at 15:18

Nice article!

Loading...

1. lukaseder says:
  
  October 24, 2012 at 17:07
  
  Hehe, glad you liked it!
  
  Loading...
  
Fred says:

July 29, 2013 at 08:26

“most overrated and yet underestimated” – that’s a self-contradiction. Pick one…

Loading...

1. lukaseder says:
  
  July 29, 2013 at 08:31
  
  Feel free to read on. You might see what I’m trying to say…
  
  Loading...
  
Marcelo says:

March 2, 2014 at 15:09

Interesting article. One question though. Let’s suppose you have a small data structure and many visitors (for example, 6 – 8 class types, and 6 – 8 visitors), and I dont want to mess my heriarchy classes with these 6 – 8 operatiions. It seems the visitor pattern is a good fit. But what if you need to add some arguments for some visitors? It is desirable that the “accept” and “visit” interface remain untouched. How would you solve it? Great article by the way

Loading...

1. lukaseder says:
  
  March 2, 2014 at 17:30
  Thanks for the feedback. Yes, that use-case justifies the use of the visitor pattern, as you don’t want to implement all of those 8 visitors’ features in each one of the 8 hierarchy types.
  
  In jOOQ, a visitor is called a “Context”, and it contains a Map<Object, Object> for all specific arguments that are not worth adding to the API. You’ll lose on typesafety, yes. But these are very specific, remote, and hardly reusable parameters, such that any change to the type hierarchy / visitor API doesn’t seem to pull its weight.
  
  In fact, this Map can even be used for inter-hierarchy-type communication. There are currently several such maps, each with a different lifecycle:
  - Global, i.e. reusable between several traversals
  - Local, i.e. tied to a single traversal
  - Local to a subtree, i.e. tied to the current hierarchy elements and all sub-elements
  Loading...
  1. Marcelo says:
    
    March 2, 2014 at 18:57
    Ok, so if I understood correct, before the visitor is ready to be used, you should instantiate it with the map containing the arguments. For example
    
    public MyVisitor(Map args) { this.args = args; }
    
    And then every “visit” method will have access to the arguments.
    
    I can think of an scenario where a visitor could need lots of arguments to work, and this instantiation could be a little complex, but as you rightly said before, this scenarios doesn’t happen to often. Thanks for the reply.
    
    Loading...
    1. lukaseder says:
      
      March 2, 2014 at 19:14
      
      Yes, that’s one way of doing it.
      
      Loading...
      
Aykut Kilic says:

August 26, 2014 at 00:03

Hello, thanks for the good article.

I think one attribute of visitor pattern is extending without modifying the existing code. If you write a library and your classes will be extended with new operations then this becomes significant. In addition maybe a designer also should take object algebra pattern into account.

https://www.cs.utexas.edu/~wcook/Drafts/2012/ecoop2012.pdf

Loading...

1. lukaseder says:
  
  August 26, 2014 at 07:48
  
  Yes, that is indeed one attribute of the visitor pattern, although I’ve often seen it being used when late extension was a very rare use-case. Again – as mentioned in the conclusion – the visitor pattern is OK if you have many visitors (e.g. by “late” extension) with a simple data structure.
  
  I have only skimmed through your linked paper, thanks for referencing it. Maybe, the author is correlating subtype polymorphism with generic polymorphism, occasionally? I.e. subtyping VInt implements Value rather than generifying Value<Integer>. If Java primitive types are not absolutely of the essence, then I think that subtyping would be the wrong tool there.
  
  Loading...
  
Gabor Hornyak says:

February 11, 2016 at 21:21

Sorry, I don’t really get it. Is it the RenderContext implementations who ‘know’ the native sql, or who? So, let’s say, there is a OracleRenderContext, DerbyRenderContext, etc. ?

Loading...

1. lukaseder says:
  
  February 11, 2016 at 22:31
  
  Au contraire.
  
  While many people who implement the visitor pattern would indeed create specialised visitors like OracleRenderContext, etc. the claim here is that in many cases, the logic should be put inside of the AST element, i.e. the BetweenCondition, the BindValue, etc. So, inside of BetweenCondition, for instance, you can choose between an imperative approach (if oracle then this, else if derby then this), or you specialise the BetweenCondition in an OO way (OracleBetweenCondition, DerbyBetweenCondition).
  
  This rule is valid if you have few visitors and many many more composite elements.
  
  Loading...

The Visitor Pattern Re-visited

Problem #1: The naming

Problem #2: The polymorphism

Understanding the problem

So when is the visitor pattern “wrong”?

How to implement this, then?

Conclusion: Name it Context-Pattern, not Visitor-Pattern

Like this:

Published by lukaseder

22 thoughts on “The Visitor Pattern Re-visited”

Leave a ReplyCancel reply

Problem #1: The naming

Problem #2: The polymorphism

Understanding the problem

So when is the visitor pattern “wrong”?

How to implement this, then?

Conclusion: Name it Context-Pattern, not Visitor-Pattern

Like this:

Published by lukaseder

22 thoughts on “The Visitor Pattern Re-visited”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.