Should I Implement the Arcane Iterator.remove() Method? Yes You (Probably) Should


An interesting question was asked on reddit’s /r/java recently:

Should Iterators be used to modify a custom Collection?

Paraphrasing the question: The author wondered whether a custom java.util.Iterator that is returned from a mutable Collection.iterator() method should implement the weird Iterator.remove() method.

A totally understandable question.

What does Iterator.remove() do?

Few people ever use this method at all. For instance, if you want to implement a generic way to remove null values from an arbitrary Collection, this would be the most generic approach:

Collection<Integer> collection =
Stream.of(1, 2, null, 3, 4, null, 5, 6)
      .collect(Collectors.toCollection(ArrayList::new));

System.out.println(collection);

Iterator<Integer> it = collection.iterator();
while (it.hasNext())
    if (it.next() == null)
        it.remove();

System.out.println(collection);

The above program will print:

[1, 2, null, 3, 4, null, 5, 6]
[1, 2, 3, 4, 5, 6]

Somehow, this API usage does feel dirty. An Iterator seems to be useful to … well … iterate its backing collection. It’s really weird that it also allows for modifying it. It’s even weirder that it only offers removal. E.g. we cannot add a new element before or after the current element of iteration, or replace it.

Luckily, Java 8 provides us with a much better method on the Collection API directly, namely Collection.removeIf(Predicate).

The above iteration code can be re-written as such:

collection.removeIf(Objects::isNull);

OK, now should I implement remove() on my own iterators?

Yes, you should – if your custom collection is mutable. For a very simple reason. Check out the default implementation of Collection.removeIf():

default boolean removeIf(Predicate<? super E> filter) {
    Objects.requireNonNull(filter);
    boolean removed = false;
    final Iterator<E> each = iterator();
    while (each.hasNext()) {
        if (filter.test(each.next())) {
            each.remove();
            removed = true;
        }
    }
    return removed;
}

As I said. The most generic way to remove specific elements from a Collection is precisely to go by its Iterator.remove() method and that’s precisely what the JDK does. Subtypes like ArrayList may of course override this implementation because there’s a more performant alternative, but in general, if you write your own custom, modifiable collection, you should implement this method.

And enjoy the ride into Java’s peculiar, historic caveats for which we all love the language.

What we Need is Standardised Non-OSS Licenses


If you’ve followed the recent (fake) news, you’ve probably already heard it. Oracle is “massively ramping up audits of Java customers it claims are in breach of its licences”

After a quick check on the source (The Register), here’s a more realistic, probably more accurate version of that headline:

Oracle is thinking about auditing 1-2 companies that massively ran the commercial Java extensions in production without paying

There, fixed. Also:

But there is a deeper problem to this discussion

Of course, all sorts of (ex) Red Hat and or Pivotal employees quickly jumped to the conclusion of the kind: Hey this wouldn’t happen with us – the good guys – the OSS guys.

For example:

That’s not surprising, of course. What’s also not surprising is that people who are already strongly opinionated will see their opinions reinforced. Another random example:

If you want more examples, just search Twitter for the article URL. There are tons of reactions.

The latter case is not very interesting. The former, however, is. Aleksey Shipilëv obviously has a good point.

use products with unambiguous licenses, like OSS

… and, of course, he’s not right at all. 🙂 There are some very ambiguous licenses in the OSS field, including many copyleft licenses. Take, for instance, LGPL 2.1, which is a very long license, and contains ridiculuous things like:

If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.)

(emphasis mine). ten lines of code. What’s a line? Everything between two \n characters? On Windows, does a line have to end in \r\n for this clause to be applicable? What if I remove formatting and have 10000 character lines? Such functions aren’t small, but certainly less than 10 lines. Right? RIGHT?

Hmm…

Not to mention that this single ambiguity (there are more) infects the entire rest of the license text, because it introduces unrestricted use in a rather restrictive library. Think that’s nuts? Go check Hibernate’s license. Most of it (and thus YOUR application, if you patched Hibernate) is affected.

Licensing = restricting

At the end of the day, pretty much every license will restrict rights in some way (except for the public domain “license”). The problem with commercial licenses, however, is that they’re very unique, whereas OSS licenses are usually always the same (mostly some [X]GPL or ASL, MIT, BSD). In other words, OSS licenses are standardised and thus: pretty well understood. And thus: Much less risky.

That’s not the case with commercial licenses. Take the jOOQ license for instance. As of the end of 2016, it’s 23 pages strong (including the annex containing pricing). What does the license mean to our customers? Here’s a TL;DR version (obviously, if in doubt: the actual license will apply, not this TL;DR version):

  • Developer workstations need a timely limited or perpetual license
  • All server workstations are licensed for free, perpetually
  • Object code may be distributed and sublicensed
  • Source code may be used (e.g. for maintenance), but not distributed

And, of course, there are different price plans, but those aren’t really part of the license. So, jOOQ feels like Open Source: source code is shipped, may be used for documentation purpose, may be patched, recompiled, but not distributed, i.e. it isn’t free as in freedom (of course not, it would be the end of our business).

But what does it mean that the source code may be used? The license explicitly allows “modification”, but what does that mean? Are you also allowed to document such modification, just not ship it? E.g. in a public GitHub issue? Such that other users who are affected may profit from your fix?

If in doubt, the best way forward is to ask the vendor. In our case, we’re very open minded and quick to answer – and also quick to improve the license when it is not clear.

In Oracle’s case, a bit less. Of course, because Oracle is a huge company, and who are you even going to ask? Who will take the time to answer an individual question? It’s simply not possible.

The solution: Standardised commercial licenses

There aren’t too many business models with software. First off, there are a few different categories of software, e.g.:

  • SaaS: This is still the wild west. But essentially, you don’t license the software, you rent an access point.
  • Servers: Databases, programming environments, operating systems, they all fall into this category. These are systems that run your software (and/or data).
  • Libraries: Things like jOOQ, Hibernate. These are programs that are embedded in other programs (e.g. SaaS or Servers)
  • Tools: Things like IntelliJ, JRebel. These are programs to create and manipulate data, but they aren’t needed to run it. They can be easily removed.

Each category works entirely differently. For instance, copyleft doesn’t really affect SaaS and tools categories (unless you want to protect your trade secrets, of course), whereas it’s a killer for libraries.

SaaS, libraries and tools are usually per seat licenses, whereas servers are usually per core licenses – i.e. whatever scales better for both the vendor and customer.

This is an extremely simplified overview of commercial licensing, but imagine: What if all vendors in each one of the above categories could just pick a couple of yes/no answers to a standardised set of questions (e.g. what may be distributed? what may be modified? what may be run?), and they could pick only well understood standard wording of these concepts, then everything would be much clearer.

Back to the original Oracle auditing story

In the linked article, Oracle allegedly starts auditing Java users. Because the OracleJDK obviously isn’t “free” (as in freedom), but partially, it is “free” (as in beer) because there are a variety of use-cases where you don’t pay. However, there are some features that are “commercial” (i.e. non-free-as-in-beer), such as JMC and the Flight Recorder.

The interesting thing is that both of these features (and some others) ship with the “free” (as in beer) OracleJDK, but they’re part of the “COMMERCIAL FEATURES” (legal yelling) and those features must even be documented in YOUR LICENSE using this notice, such that YOUR end users may also not use them for free:

Use of the Commercial Features for any commercial or production purpose requires a separate license from Oracle. “Commercial Features” means those features identified Table 1-1 (Commercial Features In Java SE Product Editions) of the Java SE documentation accessible at http://www.oracle.com/technetwork/java/javase/documentation/index.html

(Did you know that? If you’re using OracleJDK in your application, you have to embed the above in your own EULA).

But do note, outside of these cryptic licenses, I’ve found several references to

Java Mission Control is available free of charge for development

E.g. here: http://download.oracle.com/technology/products/missioncontrol/updatesites/base/5.2.0/eclipse

Of course, this has absolutely no legal value, it might have been true at some time but now outdated. But that’s how I remember it. I can use Java Mission Control for free for development (not for productive use). Now, we’re back to this discussion. What’s productive use?

  • Can I profile a simple test program for free? Probably yes.
  • Can I profile my entire program (e.g. jOOQ) for free? Probably yes.
  • Can I run the profile in an CI environment to detect regressions for free? Hmmm.

And how is that understanding of “free” encoded in the actual license?

Standardised wording

Oracle has a long tradition of giving away software for free-as-in-beer to developers. Back in the days (before OSS, when there was only Oracle and IBM), that was a cunning move, because the money is not in development. It’s in operations. So, if developers get top notch software for free, they become evangelists. They’ll love the products, and convince the end users.

But again. Who are developers? When do they stop developing and start operating? When they test? When they ship?

We’ll never know for sure – as every vendor writes their own, unique license.

What we need is a standardised set of well understood commercial licenses, just like the OSS folks have their standardised set of well understood OSS licenses. For our industry as a whole, this would be of immense value, because the little fish (like ourselves), we could compete much better with the big ones without having to give away all of our IP for free under the terms of an OSS license. Our customers would no longer run into any legal issues. All risks from weird license texts would be removed.

And hopefully, this would put pressure on the big ones. And prevent articles like the one from the Register.

Do You Really Have to Name Everything in Software?


This is one of software engineering’s oldest battles. No, I’m not talking about where to put curly braces, or whether to use tabs or spaces. I mean the eternal battle between nominal typing and structural typing.

This article is inspired by a very vocal blogger who eloquently reminds us to …

[…] Please Avoid Functional Vomit

Read the full article here:
https://dzone.com/articles/using-java-8-please-avoid-functional-vomit

What’s the post really about?

It is about naming things. As we all know:

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Now, for some reason, there is a group of people who wants constant pain and suffering by explicitly naming everything, including rather abstract concepts and algorithmic components, such as compound predicates. Those people like nominal typing and all the features that are derived from it. What is nominal typing (as opposed to structural typing)?

Structural typing

SQL is a good example to study the two worlds. When you write SQL statements, you’re creating structural row types all the time. For instance, when you write:

SELECT first_name, last_name
FROM customer

… what you’re really doing is you’re creating a new rowtype of the structure (in pseudo-SQL):

TYPE (
  first_name VARCHAR,
  last_name VARCHAR
)

The type has the following properties:

  • It is a tuple or record (as always in SQL)
  • It contains two attributes or columns
  • Those two attributes / columns are called first_name and last_name
  • Their types is VARCHAR

This is a structural type, because the SQL statement that produces the type only declares the type’s structure implicitly, by producing a set of column expressions.

In Java, we know lambda expressions, which are (incomplete) structural types, such as:

// A type that can check for i to be even
i -> i % 2 == 0

Nominal typing

Nominal typing takes things one step further. In SQL, nominal typing is perfectly possible as well, for instance, in the above statement, we selected from a well-known table by name customer. Nominal typing assigns a name to a structural type (and possibly stores the type somewhere, for reuse).

If we want to name our (first_name, last_name) type, we could do things like:

-- By using a derived table:
SELECT *
FROM (
  SELECT first_name, last_name
  FROM customer
) AS people

-- By using a common table expression:
WITH people AS (
  SELECT first_name, last_name
  FROM customer
)
SELECT *
FROM people

-- By using a view
CREATE VIEW people AS
SELECT first_name, last_name
FROM customer

In all cases, we’ve assigned the name people to the structural type (first_name, last_name). The only difference being the scope for which the name (and the corresponding content) is defined.

In Java, we can only use lambda expressions, once we assign them to a typed name, either by using an assignment, or by passing the expression to a method that takes a named type argument:

// Naming the lambda expression itself
Predicate<Integer> p = i -> i % 2 == 0

// Passing the lambda expression to a method
Stream.of(1, 2, 3)
      .filter(i -> i % 2 == 0);

Back to the article

The article claims that giving a name to things is always better. For instance, the author proposes giving a name to what we would commonly refer to as a “predicate”:

//original, less clear code
if(barrier.value() > LIMIT && barrier.value() > 0){
//extracted out to helper function. More code, more clear
if(barrierHasPositiveLimitBreach()){

So, the author thinks that extracting a rather trivial predicate into an external function is better because a future reader of such code will better understand what’s going on. At least in the article’s opinion. Let’s refute this claim for the sake of the argument:

  • The proposed name is verbose and requires quite some thinking.
  • What does breach mean?
  • Is breach the same as >= or the same as >?
  • Is LIMIT a constant? From where?
  • Where is barrier? Who owns it?
  • What does the verb “has” mean, here? Does it depend on something outside of barrier? E.g. some shared state?
  • What happens if there’s a negative limit?

By naming the predicate (remember, naming things is hard), the OP has added several layers of cognitive complexity to the reader, while quite possibly introducing subtle bugs, because probably both LIMIT and barrier should be function arguments, rather than global (im)mutable state that is assumed to be there, by the function.

The name introduced several concepts (“to have a breach”, “positive limit”, “breach”) that are not well defined and need some deciphering. How do we decipher it? Probably by looking inside the function and reading the actual code. So what do we gain? Better reuse, perhaps? But is this really reusable?

Finally, there is a (very slight) risk of introducing a performance penalty by the additional indirection. If we translate this to SQL, we could have written a stored function and then queried:

SELECT *
FROM orders -- Just an assumption here
WHERE barrier_has_positive_limit_breach(orders.barrier)

If this was some really complicated business logic depending on a huge number of things, perhaps extracting the function might’ve been worthwile. But in this particular case, is it really better than:

SELECT *
FROM orders
WHERE barrier > :limit AND barrier > 0

or even

SELECT *
FROM orders
WHERE barrier > GREATEST(:limit, 0)

Conclusion

There are some people in our industry who constantly want to see the world in black and white. As soon as they’ve had one small success story (e.g. reusing a very common predicate 4-5 times by extracting it into a function), they conclude with a general rule of this approach being always superior.

They struggle with the notion of “it depends”. Nominal typing and structural typing are both very interesting concepts. Structural typing is extremely powerful, whereas nominal typing helps us humans keep track of complexity. In SQL, we’ve always liked to structure our huge SQL statements, e.g. in nameable views. Likewise, Java programmers structure their code in nameable classes and methods.

But it should be immediately clear to anyone reading the linked article that the author seems to like hyperboles and probably wasn’t really serious, given the silly example he came up with. The message he’s conveying is wrong, because it claims that naming things is always better. It’s not true.

Be pragmatic. Name things where it really helps. Don’t name things where it doesn’t. Or as Leon Bambrick amended Phil Karlton’s quote:

There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors

Here’s my advice to you, dear nominal typing loving blogger. There’s are only two ways of typing: nominal typing and structural typing. And it depends typing.

SQL, Streams, For Comprehension… It’s All the Same


Recently, at Devoxx, I’ve seen this beautiful slide in a talk by Kevlin Henney

In his talk, he was displaying a variety of approaches to solve the FizzBuzz “problem”, including a couple of very elegant solutions in completely declarative approaches and languages.

In this particular slide, Kevlin used a notation that is derived from maths. The set builder notation. Here’s an example from Wikipedia:

even-numbers

The example reads: For all n in (the set of all integer numbers), take those for which there exists () another integer k, for which the following equation is satisfied: n = 2k.

Or in plain English: All even integers. (because for even integers, there exists another integer that is half the even integer)

Beautiful, eh? In imperative programming, we’d probably do something like this instead:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i++)
    even.add(i * 2);

Or this:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i = i + 2)
    even.add(i);

But there are several problems with the imperative approach:

  • We have to realistically start somewhere
  • We have to realistically end somewhere
  • We have to store all values in an intermediate collection

Sure, those aren’t severe limitations in every day use-cases, because we’re probably solving a real world problem where we don’t actually need an infinite number of even integers, and storing them in an intermediate collection doesn’t consume all of our memory, but still, the declarative, mathematical approach is much leaner, because we can still answer those questions about where to start and where to end later, and we never need to materialise any intermediate collection before we make those final decisions.

For instance, we can declare X to be that set, and then declare Y to be a set that is derived from X, and finally materialise Z, which is a very tiny set derived from Y. For this, we may have never needed to materialise all the (even) integers.

How this compares to SQL

Kevlin made a cunning comparison. Of course, all functional programming aficionados will immediately recognise that languages like Scala have something called a “for comprehension”, which models precisely the mathematical set-builder notation.

Java 8 now has the Streams API, which allows us, to some extent, model something similar (although not as powerful). But Kevlin didn’t use those “modern” languages. He used SQL as a comparison. That “arcane” declarative programming language that has been around forever, and that we love so much. Yes, here’s how we can declare all the even numbers in SQL:

SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE n = 2 * k
)

If optimisers were perfect, this semi-self-join between the two references of the integers “table” could be optimised perfectly. In most databases, we’d probably manually transform the above notation to this equivalent one:

SELECT n
FROM integers
WHERE MOD(n, 2) = 0

Yes, indeed. The set-builder notation and the SQL language are very similar beasts. The former prefers using mathematical symbols for brevity and conciseness, the latter prefers using English words to connect the different operators, but it’s the same thing. And if you squint hard enough, you’ll see that Java 8 Streams, for instance, are also pretty much the same thing:

everything-is-a-table

I’ve blogged about this recently where all the Java 8 Streams operations are compared to their SQL clause counterparts:
https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-equivalents-in-java-8-streams

How is this better?

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

In the imperative approach, we have to refactor constantly, to avoid the overhead of

  • Producing too many integers
  • Storing too many integers (or storing them at all)

In truly declarative languages like SQL, we’re just describing “even integers” with an expression, possibly assigning the expression a name:

CREATE VIEW even_integers AS
SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE k = 2 * n
)

So, when we actually use and materialise the even integers, e.g. positive integers less than 100, the optimiser can optimise away the double access to the integer table and produce only the exact number of values that we’re requesting (without materialising them in intermediate collections):

SELECT n
FROM even_integers
WHERE n BETWEEN 0 AND 100

Conclusion

Thinking in terms of sets, in terms of declaring sets, has always been our dream as software engineers. The approach is extremely compelling and elegant. We can delegate a lot of boring algorithmic work to the implementation engine of the declarative programming language. In the case of SQL, it would be a SQL database optimiser, which figures out a great lot of optimisations that we might not have thought of.

The above example is trivial. We can perfectly live in a world where we manually iterate over a local integer variable that goes from 0 to 100:

for (int i = 0; i <= 100; i++)
  doSomething(i);

But stuff gets hairy quite quickly. Compare Mario Fusco‘s famous tweet’s two versions of the same algorithm:

This also applies to SQL, and what’s even better in SQL than with Streams: The SQL statement is a declarative expression tree, not a formally ordered set of stream pipeline operations. The optimiser can freely reorder / transform the expression tree into something that it thinks is more optimal. This isn’t just a promise. This works in modern SQL databases every day, for very complex queries, which you can write in a matter of seconds, rather than hours.

Stay tuned for a short series of blog posts on the jOOQ blog illustrating what modern cost-based optimisation can do for you, when you’re using the SQL language.

The Java Ecosystem’s Obsession with NonNull Annotations


I’m not well known for my love of annotations. While I do recognise that they can serve a very limited purpose in some areas (e.g. hinting stuff to the compiler or extending the language where we don’t want new keywords), I certainly don’t think they were ever meant to be used for API design.

“unfortunately” (but this is a matter of taste), Java 8 introduced type annotations. An entirely new extension to the annotation type system, which allows you to do things like:

@Positive int positive = 1;

Thus far, I’ve seen such common type restriction features only in the Ada or PL/SQL languages in a much more rigid way, but others may have similar features.

The nice thing about Java 8’s implementation is the fact that the meaning of the type of the above local variable (@Positive int) is unknown to the Java compiler (and to the runtime), until you write and activate a specific compiler plugin to enforce your custom meaning. The easiest way to do that is by using the Checker Framework (and yes, we’re guilty at jOOQ. We have our own checker for SQL dialect validation). You can implement any semantics, for instance:

// This compiles because @Positive int is a subtype of int
int number = positive;

// Doesn't compile, because number might be negative
@Positive int positive2 = number;

// Doesn't compile, because -1 is negative
@Positive int positive3 = -1;

As you can see, using type annotations is a very strategic decision. Either you want to create hundreds of types in this parallel universe as in this example:

Or, in my opinion, you better leave this set of features alone, because probably: YAGNI

Unfortunately, and to the disappointment of Mike Ernst, the author of the Checker Framework (whom I’ve talked to about this some years ago), most people abuse this new JSR-308 feature for boring and simple null checking. For instance, just recently, there had been a feature request on the popular Javaslang library to add support for such annotations that help users and IDEs guarantee that Javaslang API methods return non-null results.

Please no. Don’t use this atomic bomb for boring null checks

Let me make this very clear:

Type annotations are the wrong tool to enforce nullability

– Lukas Eder, timeless

You may quote me on that. The only exception to the above is if you strategically embrace JSR-308 type annotations in every possible way and start adding annotations for all sorts of type restrictions, including the @Positive example that I’ve given, then yes, adding nullability annotations won’t hurt you much anymore, as your types will take 50 lines of code to declare and reference anyway. But frankly, this is an extremely niche approach to type systems that only few general purpose programs, let alone publicly available APIs can profit from. If in doubt, don’t use type annotations.

One important reason why a library like Javaslang shouldn’t add such annotations is the fact that in a library like Javaslang, you can be very sure that you will hardly ever encounter null, because references in Javaslang are mostly one of three things:

  • A collection, which is never null but empty
  • An Option which replaces null (it is in fact a collection of cardinality 0..1)
  • A non-null reference (because in the presence of Option, all references can be expected to be non-null)

Of course, these rules aren’t valid for every API. There are some low quality APIs out there that return “unexpected” null values, or leak “internal” null values (and historically, some of the JDK APIs, unfortunately, are part of these “low quality APIs”). But Javaslang is not one of them, and the APIs you are designing also shouldn’t be one of them.

So, let go of your null fear. null is not a problem in well-designed software. You can spare yourself the work of adding a @NonNull annotation on 99% of all of your types just to shut up your IDE, in case you turned on those warnings. Focus on writing high-quality software rather than bikeshedding null.

Because: YAGNI.

And, if you haven’t had enough bikeshedding already, consider watching this entertaining talk by Stuart Marks:

Applying Queueing Theory to Dynamic Connection Pool Sizing with FlexyPool


I’m very happy to have another interesting blog post by Vlad Mihalcea on the jOOQ blog, this time about his Open Source library flexypool. Read his previous jOOQ Tuesdays post on Hibernate here.

Vlad is a Hibernate developer advocate and he’s the author of the popular book High Performance Java Persistence, and he knows 1-2 things about connection pooling.

vladmihalcea

Introduction

Back in 2014, I was working as a software architect, and our team was building a real-estate platform which was composed of multiple nodes, as depicted in the following diagram:

databaseintegrationpoint

This is a classic enterprise architecture layout. The database is replicated to provide better throughout and availability in case of node failures. There are front-end nodes that deliver the website content. There are also many back-end nodes as well, like email schedulers or data import batch processors.

All these nodes require database connectivity, either to a Master node, for read-write transactions or to the Slave nodes, for read-only transactions.

Because acquiring database connections is an expensive process, each system node uses its own connection pool. By reusing physical database connections, the connection acquisition is very fast, therefore reducing the overall transaction response time.

Not only that a connection pool can reduce transaction response time, but it can level up traffic spikes as well. Without a connection pool, during a traffic spike, a front-end nodes might acquire all database connections, leaving the back-end processors with no database connectivity.

The connection pool, having a maximum number of database connections, allows the connections to queue whenever a traffic spike is happening. Therefore, during a traffic spike, the transaction response time will increase due to the queuing mechanism, but this is way better than taking down the whole system.

For these two reasons, the connection pool is a very good choice in many enterprise systems.

Based on the underlying hardware resources, a relational database can only offer a limited number of connections. For this reason, we must be very careful when choosing the pool size for each particular system node.

Connection pool sizing

I was the lucky person to get the task of figuring out how many connections should we allocate for each system node in our real-estate platform. Since I graduated Electronics and Telecommunications, I remembered that we learned about a similar problem when having to provision telecommunications networks. Agner Krarup Erlang invented Queuing theory for solving this problem, and I was curious if we could also find the right pool size by applying Erlang queuing models.

I was not the only one trying to apply the Queuing theory principles to software systems. Percona has a very interesting study: Forecasting MySQL Scalability with the actual service time in a system that is affected by a myriad of variables.

In the end, I realized that the best way to tackle this problem is to constant measuring and adjustments. For this reason, I needed a tool to capture database connection metrics, as well as a way to adjust a given connection pool while the enterprise system is running.

And, that’s how FlexyPool was born.

Basically, FlexyPool is a DataSource Proxy that stands in front of the actual JDBC DataSource or other proxies (e.g. statement logging).

datasourceproxyarchitecture

FlexyPool supports a great variety of stand-alone connection pools:

And it collects the following metrics:

  • concurrent connections histogram
  • concurrent connection requests histogram
  • data source connection acquiring time histogram
  • connection lease time histogram
  • maximum pool size histogram
  • total connection acquiring time histogram
  • overflow pool size histogram
  • retries attempts histogram

For instance, the concurrent connection count metric gives you an insight into how many connections are required by a certain application under a given traffic load:

concurrentconnectioncount

The connection acquisition metric tells you how much time it takes to obtain a database connection from the pool:

connectionacquire

The connection lease time allows you to spot long-running transactions, which are undesirable in high-performance OLTP applications:

connectionlease

For the stand-alone connection pools, FlexyPool can increment the pool size beyond the maximum capacity, as it offers an overflow buffer. The benefit of this overflow buffer is that it allows you to increase the pool size only when the incoming traffic causes a certain connection acquisition timeout.

Although FlexyPool can also monitor Java EE connection pools, it cannot increase the pool size in Java EE environments since the DataSource is an application server managed resource.

Conclusion

Because enterprise systems evolve, so does the underlying data access patterns. For this reasons, monitoring the underlying database connection usage is a very important metric, which needs to be monitored on a regular basis. FlexyPool builds on top of CodaHale and Dropwizard Metrics, so you can easily integrate it with well-known Application Performance Monitoring tools, such as Graphite or Grafana.

FlexyPool is open-source, and it uses an Apache license 2.0. You can find it the project repository on GitHub, and all the released dependencies are available on Maven Central, so it’s very easy to integrate it in your own project.

FkexyPool is powering many enterprise systems, like Etuovi, Mitch&Mates, and ScentBird. If you decide to use it in your current enterprise system, and you are willing to provide a testimonial, you can win a free copy of my High-Performance Java Persistence book.

jOOQ Tuesdays: Daniel Dietrich Explains the Benefits of Object-Functional Programming


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

danieldietrich

I’m very excited to feature today Daniel Dietrich whose popular library JΛVΛSLΛNG is picking up a lot of momentum among functional programming afictionados working with Java.

Daniel, you created JΛVΛSLΛNG – Object-Functional Programming in Java, a library that is becoming more and more popular among functional programmers. Why is Javaslang so popular?

Thank you Lukas for giving me the opportunity to share my thoughts.

I think that many users were disappointed about Java 8 in the whole, especially those who are already familiar with more advanced languages. The Java language architects did an awesome job. Java 8 brought groundbreaking new features like Lambdas, the new Stream API and CompletableFuture. But the new abstractions were only poorly integrated into the language from an API perspective.

There is already an increasing amount of write-ups about the disadvantages of Java 8, starting with the drawbacks of the Optional type. We read that we have to take care when using parallel Streams. These are self-made problems that keep us busy, stealing our expensive time. Javaslang provides us with alternatives.

There is no reason to reinvent the wheel. My vision is to bring as much as possible of the Scala goodness to Java. In fact Scala emerged from Java in the form of the Pizza language. Back in 2001 it had features like generics, function pointers (aka lambdas), case classes (aka value types) and pattern matching. In 2004 Java got generics, in 2014 came lambdas, and hopefully Java 10 will include value types. Scala left Java far behind. It used the last 15 year to evolve.

Object-functional programming is nothing new. It is the best of both worlds, object-oriented programming and functional programming. Scala is one of the better choices to do it on the JVM. Java’s Lambdas are an enabling feature. They allowed us to create a Javaslang API that is similar to Scala.

Java developers who get their hands on Javaslang often react in a way that I call the nice-effect: “Wow that’s nice, it feels like Scala”.

You have published a guest post on the jOOQ blog about Javaslang more than one year ago. Since then, Javaslang has moved forward quite a bit and you’ve recently published the roadmap for version 3.0. What have you done since then and where are you going?

Yes, that is true, it has changed a lot since then. We released Javaslang 1.2.2 two weeks before the first jOOQ guest post went online. Beside enriched functions that release offered popular Scala features like Option for null-safety, Try for performing computations headache-free in the presence of exceptions and a fluent pattern matching DSL. Also notably we shipped two common persistent collections, an eagerly evaluated linked List and the lazy form of it, also called Stream.

Roughly one year later we released Javaslang 2.0.0. We hardened the existing features and most notably included Future and Promise for concurrent programming and a full-fledged, Scala-like persistent collection library. Beside that, we replaced the pattern matching DSL with a more powerful pattern matching API that allows us to recursively match arbitrary object trees.

I spent a significant amount of time and energy abstracting on the type level over the mentioned features, as far as this is possible in Java. For Java developers it is not important to call things monads, sum-types or products. For example we do not need to know group theory in order to calculate 1 + 1. My duty as library developer is to make it as simple as possible for users of Javaslang to reach their goals. The need to learn new APIs and DSLs should be reduced to the minimum. This is the main reason for aligning Javaslang to Scala.

Our efforts for the next release concentrate on adding more syntactic sugar and missing persistent collections beyond those of Scala. It will be sufficient to add one import to reach 90% of Javaslang’s API. There will be new persistent collections BitSet, several MultiMaps and a PriorityQueue. We are improving the performance of our collections, most notably our persistent Vector. It will be faster than Java’s Stream for some operations and have a smaller memory footprint than Java’s ArrayList for primitive elements.

Beyond library features we pay special attention on three things: backward compatibility, controlled growth and integration aspects. Web is important. Our Jackson module ensures that all Javaslang types can be sent over the wire as serialized JSON. The next release will include a GWT module, first tests already run Javaslang in the browser. However, the Javaslang core will stay thin. It will not depend on any other libraries than the JDK.

Towards the next major release 3.0.0 I’m starting to adjust the roadmap I sketched in a previous blog post. I’ve learned that it is most important to our users that they can rely on backward compatibility. Major releases should not appear often, following the 2.x line is a better strategy. We will start to deprecate a few APIs that will be removed in a future major release. Also I keep an eye on some interesting developments that will influence the next major release. For example a new major Scala release is in the works and there are new interesting Java features that will appear in Java 10.

Looking at the current issues I don’t have to be an oracle to foresee that the next minor release 2.1.0 will take some more time. I understand that users want to start using the new Javaslang features but we need the time and the flexibility to get things right. Therefore we target a first beta release of 2.1.0 in Q4 2016.

In the meantime, there is a variety of functional(-ish) libraries for Java 8, like our own jOOλ, StreamEx, Cyclops, or the much older FunctionalJλvλ. How do all these libraries compare and how is yours different?

This question goes a little bit in the philosophical direction, maybe it is also political. These are my subjective thoughts, please treat them as such.

Humans have the ability to abstract over things. They express themselves in various ways, e.g. with painting and music. These areas split into different fields. For example in literature things are expressed in manifold ways like rhythmic prose and poetry. Furthermore different styles can be applied within these fields, like the iambic trimeter in poetry. The styles across different areas are often embossed by outer circumstances, bound to time, like an epoch.

In the area of mathematics there are also several fields, like algebra and mathematical analysis. Both have a notion of functions. Which field should I take when I want to express myself in a functional style?

Personally, I’m not able to afford the time to write non-trivial applications in each of the mentioned libraries. But I took a look at the source code and followed discussions. I see that nearly all libraries are embossed by the outer circumstance that lambdas finally made it to all curly-braces languages, especially to Java in our case. Library designers are keen to modernize their APIs in order to keep pace. But library designers are also interested in staying independent of 3rd party libraries for reasons like stability and progression.

The field of jOOQ is SQL in Java, the field of Cyclops is asynchronous systems. Both libraries are similar in the way that they adapted the new Java Lambda feature. I already mentioned that the new Java features are only poorly integrated into the language. This is the reason why we see a variety of new libraries that try to close this gap.

jOOQ needs jOOλ in order to stay independent. On the technical level StreamEx is similar to jOOλ in the way that both sit on top of Java’s Stream. They augment it with additional functionality that can be accessed using a fluent API. The biggest difference between them is that StreamEx supports parallel computations while jOOλ concentrates on sequential computations only. Looking at the SQL-ish method names it shows that jOOλ is tailored to be used with jOOQ.

Cyclops states to be the answer to the cambrian explosion of functional(-ish) libraries. It offers a facade that is backed by one of several integration modules. From the developer perspective I see this with skepticism. The one-size-fits-all approach did not work well for me in the past because it does not cover all features of the backing libraries. An abstraction layer adds another source of errors, which is unnecessary.

Many names of Cyclops look unfamiliar to me, maybe because of the huge amount of types. Looking at the API, the library seems to be a black hole, a cambrian implosion of reactive and functional features. John McClean did a great job abstracting over all the different libraries and providing a common API but I prefer to use a library directly.

FunctionalJλvλ is different. It existed long before the other libraries and has the noble goal of purely functional programming: If it does compile, it is correct. FunctionalJλvλ was originally driven by people well known from the Scala community, more specifically from the Scalaz community. Scalaz is highly influenced by Haskell, a purely functional language.

Haskell and Scala are much more expressive than Java. Porting the algebraic abstractions from Scalaz to Java turned out to be awkward. Java’s type system isn’t powerful enough, it does not allow us to reach that goal in a practical way. The committers seem to be disillusioned to me. Some state that Java is not the right language for functional programming.

Javaslang is a fresh take on porting Scala functionality to Java. At its core it is not as highly influenced by Scalaz and Haskell as FunctionalJλvλ is. However, for purely functional abstractions it offers an algebra module that depends on the core. The relation algebra/core can be compared to Scalaz/Scala.

Javaslang is similar to StreamEx in the way that it is not bound to a specific domain, in contrast to jOOλ and Cyclops. It is different from StreamEx in the sense that it does not build on top of Java’s Stream. I understand Javaslang as language addition that integrates well with existing Java features.

You have never spoken at conferences, you let other people do that for you. What’s your secret? 🙂

In fact I never attended a conference at all. My secret is to delegate the real work to others.

Joking aside, I feel more comfortable spending my time on the Javaslang source code than preparing conferences and travelling. Currently I am working on Javaslang beside my job but I’m still looking for opportunities to do it full-time.

It is awesome to see other people jumping on the Javaslang train. We receive help from all over the world. Beside IntelliJ and YourKit we recently got TouK as new sponsor and produced Javaslang stickers that are handed out at conferences.

Because of the increasing popularity of Javaslang there is also an increasing amount of questions and pull requests. Beside the conception and development I concentrate on code-reviews, discussions and managing the project.

Where do you see Java’s future with projects like Valhalla?

Java stands for stability and safety. New language features are moderately added, like salt to a soup. This is what we can expect from a future Java.

In his recent mission statement Brian Goetz gives us a great overview about the goals of Project Valhalla. From the developer point of view I really love to see that the Java language architects attach great importance to improve the expressiveness of Java. Value types for example will reduce a lot of redundant code and ceremony we are currently confronted with. It is also nice to see that value types will be immutable.

Another feature I’m really looking forward to is the extension of generics. It will allow us to remove several specializations that exist only for primitive types and void. Popular functional interfaces like Predicate, Consumer, Supplier and Runnable will be equivalent to Function. In Javaslang we currently provide additional API for performing side-effects. Having extended generics that API can be reduced to the general case, like it should have been from the beginning.

There are two more features I’m really interested in: local variable type inference, that will come to Java, and reified generics, that might come. Reified generics are needed when we want to get the type of a generic parameter at runtime. We already have type inference for lambdas. Extending it to local variables will increase conciseness and readability of method and lambda bodies while preserving type-safety. I think it is a good idea that we will still have to specify the return type of methods. It is a clear documentation of the API of an application.

I’m deeply impressed how Java and the JVM evolve over time without breaking backward compatibility. It is a safe platform we can rely on. The gap between Java and other, more modern languages is getting smaller but Java is still behind. Some popular features might never come and most probably outdated API will not get a complete refresh or a replacement. This is a field where libraries such as Javaslang can help.