Open Source Completely Underestimates Contributor License Agreements


Reddit’s /r/ProgrammerHumor has recently treated us to this politically incorrect and quite childish little Open Source rant

Obviously, like most “discussions” on reddit and specifically those discussions about Open Source, things got quickly very serious with people referring to Richard Stallman and how these critiques are childish and immature and what’s-wrong-with-our-industry™ etc.

Let’s not delve into useless polemics but let’s have another look at a real problem in Open Source:

Two types of Open Source

There are essentially two types of Open Source:

  • Hobbyist’s Open Source
  • Professional Open Source

Hobbyist’s Open Source

Hobbyist’s Open Source projects are geeky side-projects by some engineers / hackers / script kiddies / etc. who have fun experimenting with 1-2 things and who are hoping to have “the public” comment / use / profit from their work. They most often have no interest in money / fame / rewards. They’re just doing it for the fun. Most often, they also choose funny licenses, like the Beer License. There’s nothing wrong with that.

Professional Open Source

Professional Open Source may evolve from the above (as in our case), or it may be conceived as professional Open Source from the beginning (as most Apache, Red Hat, or Oracle projects). When doing professional Open Source, choosing the right license is of the essence, as it is almost impossible to change that license again. Why? Because all contributors effectively own their copyrights of their contributions under the terms of the original license or worse under their own terms.

This is less of a problem if you’re choosing a license like the ASL 2.0, which also manages contributions (and any trademarks, patents foregoing a contribution) in section 5:

5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

However, you may still not change the license form ASL 2.0 to anything else (like GPL, or commercial) without the express permission of all your contributors. At least not for the contributed code.

Managing contribution restrictions with CLAs

At Data Geekery, we want to stay in full control of both source code maintenance, but also copyright. We take copyright very seriously, which is why we have bought / internalised all essential contributions by our contributors through a CLA (Contributor License Agreement). This allows us to continue publishing our jOOQ code under the terms of the ASL 2.0 and our more restrictive commercial license.

Few Open Source projects / platforms actually do this. Among the most famous ones are

If you’re serious about your Open Source project, do take due diligence seriously as well. Copyright is not an option, you have to get this right, also in the best of interest of your customers / users. If you do not agree on contributor license terms with your contributors (e.g. by blindly merging all sorts of pull requests in your GitHub repository), you will:

  • Restrict yourself to the current license forever
  • … this includes not being able to dual-license your software
  • … this includes not being able to commercially license your software
  • … this may include the risk of having to pay royalties later on (not with ASL 2.0)
  • … this may include the risk of your users having to pay royalties later on (not with ASL 2.0)

CLAs on GitHub

GitHub has become the number one hosting service for Open Source projects, world-wide. Yet, many projects are not getting licensing right. While GitHub offers a simple way to specify a popular license for your repository, there is currently no easy way to have your contributors sign CLAs.

I’ve recently had a short discussion on Twitter with Assaf Arkin and Stefan Tilkov:

We all agreed that this is currently not handled correctly, in most cases. I have thus sent off a feature request to GitHub:

I would like to enable a checkbox which all contributors have to check (after reading a document), to confirm that they’re complying with the contribution terms. You already have a similar feature with licenses when creating a new repository, which is great. But many people inevitably run into due dilience cases, because they just blindly merge all pull requests offered by anyone. So it would be useful to have a couple of default texts to chose from, and a possibility to create our own.

Right now, I’m sending off PDFs by E-Mail for a signature. I hadn’t thought of Google Docs, yet, good idea.

One source of inspiration could be the Eclipse Foundation, which has a fully automated CLA process, integrated into BugZilla. If a user submits a patch, you can immediately see if they have already signed the Eclipse Foundation CLA.

I think that this would be a killer feature for GitHub, at least for the more professional OSS repositories.

This feature request was well-received with GitHub support. If you think that this is a good idea, send them some love as well. It would really be great to finally get this right.

Conclusion

If you’re a user of Open Source software, beware of the above. Don’t just integrate any geeky script / tool that you happened to have found on the internet in your corporate, enterprise software. You are putting your employer at great legal risk if you do so. Open Source is not an excuse to pretend everything is free (of charge and of obligations). It is as well-defined a business as anything else. We cannot say this enough:

Choose wisely when integrating third-party software. Even if it is Open Source

Java 8 Friday: Optional Will Remain an Option in Java


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Optional: A new Option in Java

So far, we’ve been pretty thrilled with all the additions to Java 8. All in all, this is a revolution more than anything before. But there are also one or two sore spots. One of them is how Java will never really get rid of

Null: The billion dollar mistake tweet this

In a previous blog post, we have explained the merits of NULL handling in the Ceylon language, which has found one of the best solutions to tackle this issue – at least on the JVM which is doomed to support the null pointer forever. In Ceylon, nullability is a flag that can be added to every type by appending a question mark to the type name. An example:

void hello() {
    String? name = process.arguments.first;
    String greeting;
    if (exists name) {
        greeting = "Hello, ``name``!";
    }
    else {
        greeting = "Hello, World!";
    }
    print(greeting);
}

That’s pretty slick. Combined with flow-sensitive typing, you will never run into the dreaded NullPointerException again:

Recently in the Operating Room. By Geek and Poke

Recently in the Operating Room. By Geek and Poke

Other languages have introduced the Option type. Most prominently: Scala. Java 8 now also introduced the Optional type (as well as the OptionalInt, OptionalLong, OptionalDouble types – more about those later on)

How does Optional work?

The main point behind Optional is to wrap an Object and to provide convenience API to handle nullability in a fluent manner. This goes well with Java 8 lambda expressions, which allow for lazy execution of operations. An example:

Optional<String> stringOrNot = Optional.of("123");

// This String reference will never be null
String alwaysAString =
    stringOrNot.orElse("");

// This Integer reference will be wrapped again
Optional<Integer> integerOrNot = 
    stringOrNot.map(Integer::parseInt);

// This int reference will never be null
int alwaysAnInt = stringOrNot
        .map(s -> Integer.parseInt(s))
        .orElse(0);

There are certain merits to the above in fluent APIs, specifically in the new Java 8 Streams API, which makes extensive use of Optional. For example:

Arrays.asList(1, 2, 3)
      .stream()
      .findAny()
      .ifPresent(System.out::println);

The above piece of code will print any number from the Stream onto the console, but only if such a number exists.

Old API is not retrofitted

For obvious backwards-compatibility reasons, the “old API” is not retrofitted. In other words, unlike Scala, Java 8 doesn’t use Optional all over the JDK. In fact, the only place where Optional is used is in the Streams API. As you can see in the Javadoc, usage is very scarce:

http://docs.oracle.com/javase/8/docs/api/java/util/class-use/Optional.html

This makes Optional a bit difficult to use. We’ve already blogged about this topic before. Concretely, the absence of an Optional type in the API is no guarantee of non-nullability. This is particularly nasty if you convert Streams into collections and collections into streams.

The Java 8 Optional type is treacherous tweet this

Parametric polymorphism

The worst implication of Optional on its “infected” API is parametric polymorphism, or simply: generics. When you reason about types, you will quickly understand that:

// This is a reference to a simple type:
Number s;

// This is a reference to a collection of
// the above simple type:
Collection<Number> c;

Generics are often used for what is generally accepted as composition. We have a Collection of String. With Optional, this compositional semantics is slightly abused (both in Scala and Java) to “wrap” a potentially nullable value. We now have:

// This is a reference to a nullable simple type:
Optional<Number> s;

// This is a reference to a collection of 
// possibly nullable simple types
Collection<Optional<Number>> c;

So far so good. We can substitute types to get the following:

// This is a reference to a simple type:
T s;

// This is a reference to a collection of
// the above simple type:
Collection<T> c;

But now enter wildcards and use-site variance. We can write

// No variance can be applied to simple types:
T s;

// Variance can be applied to collections of
// simple types:
Collection<? extends T> source;
Collection<? super T> target;

What do the above types mean in the context of Optional? Intuitively, we would like this to be about things like Optional<? extends Number> or Optional<? super Number>. In the above example we can write:

// Read a T-value from the source
T s = source.iterator().next();

// ... and put it into the target
target.add(s);

But this doesn’t work any longer with Optional

Collection<Optional<? extends T>> source;
Collection<Optional<? super T>> target;

// Read a value from the source
Optional<? extends T> s = source.iterator().next();

// ... cannot put it into the target
target.add(s); // Nope

… and there is no other way to reason about use-site variance when we have Optional and subtly more complex API.

If you add generic type erasure to the discussion, things get even worse. We no longer erase the component type of the above Collection, we also erase the type of virtually any reference. From a runtime / reflection perspective, this is almost like using Object all over the place!

Generic type systems are incredibly complex even for simple use-cases. Optional makes things only worse. It is quite hard to blend Optional with traditional collections API or other APIs. Compared to the ease of use of Ceylon’s flow-sensitive typing, or even Groovy’s elvis operator, Optional is like a sledge-hammer in your face.

Be careful when you apply it to your API!

Primitive types

One of the main reasons why Optional is still a very useful addition is the fact that the “object-stream” and the “primitive streams” have a “unified API” by the fact that we also have OptionalInt, OptionalLong, OptionalDouble types.

In other words, if you’re operating on primitive types, you can just switch the stream construction and reuse the rest of your stream API usage source code, in almost the same way. Compare these two chains:

// Stream and Optional
Optional<Integer> anyInteger = 
Arrays.asList(1, 2, 3)
      .stream()
      .filter(i -> i % 2 == 0)
      .findAny();
anyInteger.ifPresent(System.out::println);

// IntStream and OptionalInt
OptionalInt anyInt =
Arrays.stream(new int[] {1, 2, 3})
      .filter(i -> i % 2 == 0)
      .findAny();
anyInt.ifPresent(System.out::println);

In other words, given the scarce usage of these new types in JDK API, the dubious usefulness of such a type in general (if retrofitted into a very backwards-compatible environment) and the implications generics erasure have on Optional we dare say that

The only reason why this type was really added is to provide a more unified Streams API for both reference and primitive types tweet this

That’s tough. And makes us wonder, if we should finally get rid of primitive types altogether.

Oh, and…

Optional isn’t Serializable.

Nope. Not Serializable. Unlike ArrayList, for instance. For the usual reason:

Making something in the JDK serializable makes a dramatic increase in our maintenance costs, because it means that the representation is frozen for all time. This constrains our ability to evolve implementations in the future, and the number of cases where we are unable to easily fix a bug or provide an enhancement, which would otherwise be simple, is enormous. So, while it may look like a simple matter of “implements Serializable” to you, it is more than that. The amount of effort consumed by working around an earlier choice to make something serializable is staggering.

Citing Brian Goetz, from:

http://mail.openjdk.java.net/pipermail/jdk8-dev/2013-September/003276.html

Want to discuss Optional? Read these threads on reddit:

Stay tuned for more exciting Java 8 stuff published in this blog series.

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page

CUME_DIST(), a Lesser-Known SQL Gem


When doing reporting or statistics with SQL, you better know your window functions. There are many of them, and few SQL developers know about them.

CUME_DIST() is one such function. We’ve recently re-discovered it on Stack Overflow. The following query yields two times the same result for fraction1 and fraction2:

SELECT 
  ename, 
  CUME_DIST() OVER (ORDER BY ename) fraction1,
  ROWNUM / (MAX(ROWNUM) OVER()) fraction2
FROM emp
ORDER BY ename

The above query then yields:

|  ENAME | FRACTION1 | FRACTION2 |
|--------|-----------|-----------|
|  ALLEN |      0.08 |      0.08 |
|  BLAKE |      0.17 |      0.17 |
|  CLARK |      0.25 |      0.25 |
|   FORD |      0.33 |      0.33 |
|  JAMES |      0.42 |      0.42 |
|  JONES |       0.5 |       0.5 |
|   KING |      0.58 |      0.58 |
| MARTIN |      0.67 |      0.67 |
| MILLER |      0.75 |      0.75 |
|  SMITH |      0.83 |      0.83 |
| TURNER |      0.92 |      0.92 |
|   WARD |         1 |         1 |

… as can be seen in this SQLFiddle. In plain English, the CUME_DIST() (or cumulative distribution) of a value within a group of values helps you see how far “advanced” a value is in the ordering of the whole result set – or of a partition thereof.

The second expression using ROWNUM informally explains this with an equivalent expression. The value is always strictly greater than zero and smaller or equal to 1:

0 < CUME_DIST() OVER(ORDER BY ename) <= 1

Note that Oracle (and the SQL standard) also support CUME_DIST() as an “ordered aggregate function”, “ordered set function” or “hypothetical set function”:

SELECT 
  ename, 
  CUME_DIST(ename)
    WITHIN GROUP (ORDER BY ename) fraction
FROM emp
GROUP BY ename
ORDER BY ename

The standard specifies the above as:

<hypothetical set function> ::=
    <rank function type> <left paren>
        <hypothetical set function value expression list> 
            <right paren>
        <within group specification>

<within group specification> ::=
    WITHIN GROUP <left paren> 
        ORDER BY <sort specification list> 
            <right paren>

jOOQ also supports the cumeDist() window function, and the upcoming jOOQ 3.4 will also support the ordered aggregate function.

… and you, you should definitely make this nice function a part of your SQL vocabulary.

Java 8 Friday: Java 8 Will Revolutionize Database Access


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. For our Java 8 series, we’re honoured to host a very relevant guest post by Dr. Ming-Yee Iu.

Dr. Ming-Yee Iu completed a PhD on Database Queries in Java at EPFL. He has created the open source project Jinq to demonstrate some new techniques for supporting database queries in Java.

Our editorial note:

Ever since Erik Meijer has introduced LINQ to the .NET ecosystem, us Java folks have been wondering whether we could have the same. We’ve blogged about this topic before, a couple of times:

While most LINQesque APIs in the Java ecosystem operate as internal domain-specific languages like jOOQ, some try to tackle the integration on a bytecode level, like JaQu.

JINQ formalises runtime bytecode transformations through what Dr. Ming-Yee Iu calls symbolic execution. We find this very interesting to a point that we wonder if we should start building a JINQ-to-jOOQ JINQ provider, where the expressive power of the Java 8 Streams API could be combined with our great SQL standardisation and transformation features…?

Convince yourselves:

Java 8 Goodie: Java 8 Will Revolutionize Database Access

Java 8 is finally here! After years of waiting, Java programmers will finally get support for functional programming in Java. Functional programming support helps streamline existing code while providing powerful new capabilities to the Java language. One area that will be disrupted by these new features is how programmers work with databases in Java. Functional programming support opens up exciting new possibilities for simpler yet more powerful database APIs. Java 8 will enable new ways to access databases that are competitive with those of other programming languages such as C#’s LINQ.

The Functional Way of Working With Data

Java 8 not only adds functional-support to the Java language, but it extends the Java collection classes with new functional ways of working with data. Traditionally, working with large amounts of data in Java requires a lot of loops and iterators.

For example, suppose you have a collection of Customer objects:

Collection<Customer> customers;

If you were only interested in the customers from Belgium, you would have to iterate over all the customers and save the ones you wanted.

Collection<Customer> belgians = new ArrayList<>();
for (Customer c : customers) {
    if (c.getCountry().equals("Belgium"))
        belgians.add(c);
}

This takes five lines of code. It is also poorly abstracted. What happens if you have 10 million customers, and you want to speed up the code by filtering it in parallel using two threads? You would have to rewrite everything to use futures and a lot of hairy multi-threaded code.

With Java 8, you can write the same code in one line. With its support for functional programming, Java 8 lets you write a function saying which customers you are interested in (those from Belgium) and then to filter collections using that function. Java 8 has a new Streams API that lets you do this.

customers.stream().filter(
    c -> c.getCountry().equals("Belgium")
);

Not only is the Java 8 version of the code shorter, but the code is easier to understand as well. There is almost no boilerplate. The code calls the method filter(), so it’s clear that this code is used for filtering customers. You don’t have to spend your time trying to decipher the code in a loop to understand what it is doing with its data.

And what happens if you want to run the code in parallel? You just have to use a different type of stream.

customers.parallelStream().filter(
    c -> c.getCountry().equals("Belgium")
);

What’s even more exciting is that this functional-style of code works with databases as well!

The Functional Way of Working with Databases

Traditionally, programmers have needed to use special database query languages to access the data in databases. For example, below is some JDBC code for finding all the customers from Belgium:

PreparedStatement s = con.prepareStatement(
      "SELECT * "
    + "FROM Customer C "
    + "WHERE C.Country = ? ");
s.setString(1, "Belgium");
ResultSet rs = s.executeQuery();

Much of the code is in the form a string, which the compiler can’t check for errors and which can lead to security problems due to sloppy coding. There is also a lot of boilerplate code that makes writing database access code quite tedious. Tools such as jOOQ solve the problem of error-checking and security by providing a database query language that can be written using special Java libraries. Or you can use tools such as object-relational mappers to hide a lot of boring database code for common access patterns, but if you need to write non-trivial database queries, you will still need to use a special database query language again.

With Java 8, it’s possible to write database queries using the same functional-style used when working with the Streams API. For example, Jinq is an open source project that explores how future database APIs can make use of functional programming. Here is a database query written using Jinq:

customers.where(
    c -> c.getCountry().equals("Belgium")
);

This code is almost identical to the code using the Streams API. In fact, future versions of Jinq will let you write queries directly using the Streams API. When the code is run, Jinq will automatically translate the code into a database query like the JDBC query shown before.

So without having to learn a new database query language, you can write efficient database queries. You can use the same style of code you would use for Java collections. You also don’t need a special Java compiler or virtual machine. All of this code compiles and runs using the normal Java 8 JDK. If there are errors in your code, the compiler will find them and report them to you, just like normal Java code.

Jinq supports queries that can be as complicated as SQL92. Selection, projection, joins, and subqueries are all supported. The algorithm for translating Java code into database queries is also very flexible in what code it will accept and translate. For example, Jinq has no problem translating the code below into a database query, despite its complexity.

customers
    .where( c -> c.getCountry().equals("Belgium") )
    .where( c -> {
        if (c.getSalary() < 100000)
            return c.getSalary() < c.getDebt();
        else
            return c.getSalary() < 2 * c.getDebt();
        } );

As you can see, the functional programming support in Java 8 is well-suited for writing database queries. The queries are compact, and complex queries are supported.

Inner Workings

But how does this all work? How can a normal Java compiler translate Java code into database queries? Is there something special about Java 8 that makes this possible?

The key to supporting these new functional-style database APIs is a type of bytecode analysis called symbolic execution. Although your code is compiled by a normal Java compiler and run in a normal Java virtual machine, Jinq is able to analyze your compiled Java code when it is run and construct database queries from them. Symbolic execution works best when analyzing small functions, which are common when using the Java 8 Streams API.

The easiest way to understand how this symbolic execution works is with an example. Let’s examine how the following query is converted by Jinq into the SQL query language:

customers
    .where( c -> c.getCountry().equals("Belgium") )

Initially, the customers variable is a collection that represents this database query

SELECT *
  FROM Customers C

Then, the where() method is called, and a function is passed to it. In this where() method, Jinq opens the .class file of the function and gets the compiled bytecode for the function to analyze. In this example, instead of using real bytecode, let’s just use some simple instructions to represent the bytecode of the function:

  1. d = c.getCountry()
  2. e = “Belgium”
  3. e = d.equals(e)
  4. return e

Here, we pretend that the function has been compiled by the Java compiler into four instructions. This is what Jinq sees when the where() method is called. How can Jinq make sense of this code?

Jinq analyzes the code by executing it. Jinq doesn’t run the code directly though. It runs the code ‘abstractly’. Instead of using real variables and real values, Jinq uses symbols to represent all values when executing the code. This is why the analysis is called symbolic execution.

Jinq executes each instruction and keeps track of all the side-effects or all the things that the code changes in the state of the program. Below is a diagram showing all the side-effects that Jinq finds when it executes the four lines of code using symbolic execution.

Symbolic execution example

Symbolic execution example

In the diagram, you can see how after the first instruction runs, Jinq finds two side-effects: the variable d has changed and the method Customer.getCountry() has been called. With symbolic execution, the variable d is not given a real value like “USA” or “Denmark”. It is assigned the symbolic value of c.getCountry().

After all the instructions have been executed symbolically, Jinq prunes the side-effects. Since the variables d and e are local variables, any changes to them are discarded after the function exits, so those side-effects can be ignored. Jinq also knows that the methods Customer.getCountry() and String.equals() do not modify any variables or show any output, so those method calls can also be ignored. From this, Jinq can conclude that executing the function produces only one effect: it returns c.getCountry().equals("Belgium").

Once Jinq has understood what the function passed to it in the where() method does, it can then merge this knowledge with the database query underlying the customers collection to create a new database query.

Generating a database query

Generating a database query

And that’s how Jinq generates database queries from your code. The use of symbolic execution means that this approach is quite robust to the different code patterns outputted by different Java compilers. If Jinq ever encounters code with side-effects that can’t be emulated using a database query, Jinq will leave your code untouched. Since everything is written using normal Java code, Jinq can just run that code directly instead, and your code will produce the expected results.

This simple translation example should have given you an idea of how the query translation works. You should feel confident that these algorithms can correctly generate database queries from your code.

An Exciting Future

I hope I have given you a taste for how Java 8 enables new ways of working with databases in Java. The functional programming support in Java 8 allows you write database code in a similar way to writing code for working with Java collections. Hopefully, existing database APIs will soon be extended to support these styles of queries.

To play with a prototype for these new types of queries, you can visit http://www.jinq.org

A Better Query Language than SQL


Leland Richardson, Founder of Tech.Pro has recently published a very interesting article about BQL, his visions of a better query language (than SQL). The deciding feat of his new language proposal is the fact that it is really a superset of SQL itself.

SQL is a very rich and expressive language to query relational databases. But it is awkward in many aspects, and a lot of people perceive it to be evolving only slowly – even if that is not true, considering the pace of SQL standards. But the standard is one thing, the implementations another – especially in the enterprise. When we blog about SQL, we’re constantly surprised ourselves, how awesome the PostgreSQL dialect is. But often, PostgreSQL actually just implements the standard. So there is hope that we’re getting somewhere.

Nonetheless, in Leland’s article, there are a couple of ideas worth picking up. From our point of view, these are mainly:

Flexibility in ordering the SELECT clause and the table expression

In SQL, SELECT is always the first keyword. It must be expressed before the table expression. We’ve shown in a previous article that this is quite confusing for many SQL users. While the existing syntax should continue to exist, it would be good to be able to inverse the SELECT clause and the table expression.

FROM table
WHERE predicate
GROUP BY columns
SELECT columns

Remember, the table expression contains FROM, WHERE, GROUP BY clauses, as well as vendor-specific CONNECT BY clauses and others:

<query specification> ::=
  SELECT [ <set quantifier> ] 
    <select list> <table expression>

This language feature is already available in LINQ, by the way.

Implicit KEY JOINs

This feature is also available in jOOQ, using the ON KEY clause. Note that Sybase also supports ON KEY joins:

from post 
key join user
key join comment
select *

Named projections

This is one of the features we really wish that the SQL language had. However, we wouldn’t count on specifying projections in a dedicated syntax. We had rather use an extension to the table expression syntax, allowing for a table to produce “side-tables” as such:

from dbo.users
with projection as (
  firstName, lastName, phoneNumber, email
)
select projection.*

In the above example, projection is really nothing else than another table expression that is derived from the users table. From a SQL syntax semantics, this would be extremely powerful, because such projections would inherit all syntactic features of a regular table. We’ve blogged about this before, when we called that feature “common column expressions”.

Conclusion

Leland has lots of other ideas. He’s just at the beginning of a project that will still need a lot of refinement. The feedback he got on reddit, however, is rather good. Clearly, there is a lot of potential in creating “BQL” for SQL what

Let’s see where this endeavour leads. We’ll certainly be keeping an eye out for BQL’s next steps.

The Power of Spreadsheets in a Reactive, RESTful API


Being mostly a techie, I’ve recently and admittedly been deceived by my own Dilbertesque attitude when I stumbled upon this buzzword-filled TechCrunch article about Espresso Logic. Ever concerned about my social media reputation (e.g. reddit and hackernews karma), I thought it would be witty to put a link on those platforms titled:

Just found this article on TechCrunch. Reads like a markov-chain-generated series of buzzwords.

With such a catchy headline, the post quickly skyrocketed – and like many other redditors, my thoughts were with Geek and Poke:

But like a few other redditors, I couldn’t resist clicking through to the actual product that claims to implement “reactive programming” through a REST and JSON API. And I’m frankly impressed by the ideas behind this product. For once, the buzzwords are backed by software implementing them very nicely! Let’s first delve into…

Reactive Programming

Reactive programming is a term that has gained quite some traction recently around Typesafe, the company behind Akka. It has also gained additional traction since Erik Meijer (creator of LINQ) has left Microsoft to fully dedicate his time to his new company Applied Duality. With those brilliant minds sharply on the topic, we’ll certainly hear more about the Reactive Manifesto in the near future.

excelBut in fact, every manager knows the merits of “reactive programming” already as they’re working with the most reactive and probably the most awesome software on the planet: Microsoft Excel, a device whose mystery is only exceeded by its power. Think about how awesome Excel is. You have hundreds of rules, formulas, cell-interdependencies. And any time you change a value, the whole spreadsheet magically updates itself. That’s Reactive Programming.

The power of reactive programming lies in its expressiveness. With only very little expressive logic, you can express what otherwise needs dozens of lines of SQL, or hundreds of lines of Java.

Espresso Logic

With this in mind, I started to delve into Espresso Logic’s free trial. Note, that I’m the kind of impatient person who wants quick results without reading the docs. In case you work the other way round, there are some interesting resources to get you started:

Anyway, the demo ships with a pre-installed MySQL database containing what looks like a typical E-Commerce schema containing customer, employee, lineitem, product, purchaseorder, and purchaseorder_audit tables:

The schema browsing view in Espresso Logic

The schema browsing view in Espresso Logic

So I get schema navigation information (such as parent / child relationships) and an overview of rules. These rules look like triggers calculating sums or validating things. We’ll get to these rules later on.

Live API

So far, things are as expected. The UI is maybe a bit edgy, as the product only exists since late 2013. But what struck me as quite interesting is what Espresso Logic calls the Live API. With a couple of clicks, I can assemble a REST Resource tree structure from various types of resources, such as database tables. The Espresso Designer will then almost automatically join tables to produce trees like this one:

The Resource Tree view of Espresso Logic

The Resource Tree view of Espresso Logic

Notice how I can connect child entities to their parents quite easily. Now, this API is still a bit limited. For instance, I couldn’t figure out how to drag-and-drop a reporting relationship where I calculate the order amount per customer and product. However, I can switch the Resource Type from “Normal” to “SQL” to achieve just that with a plain old GROUP BY and aggregate function.

I started to grasp that I’m actually managing and developing a RESTful API based on the available database resources! A little further down the menu, I then found the “Quick Ref” item, which helped me understand how to call this API:

A quick API reference

A quick API reference

So, each of the previously defined resources is exposed through a URL as I’d expect from any RESTful API. What looks really nice is that I have built-in API versioning and an API key. Note, it is strongly discouraged from an OWASP point of view to pass API keys around in GET requests. This is just a use-case for a quick-start demo and for the odd developer test. Do not use this in production!

Anyway, I called the URL in my browser with the API key as parameter (going against my own rules):

https://eval.espressologic.com/rest/[my-user]/demo/v1/AllCustomers?auth=[my-key]:1

And I got a JSON document like this:

[
  {
    "@metadata": {
      "href": "https://eval.espressologic.com/rest/[my-user]/demo/v1/Customers/Alpha%20and%20Sons",
      "checksum": "A:cf1f4fb79e8e7142"
    },
    "Name": "Alpha and Sons",
    "Balance": 105,
    "CreditLimit": 900,
    "links": [
    ],
    "Orders": [
      {
        "@metadata": {
          "href": "https://eval.espressologic.com/rest/[my-user]/demo/v1/Customers.Orders/6",
          "checksum": "A:0bf14e2d58cc97b5"
        },
        "OrderNumber": 6,
        "TotalAmount": 70,
        "Paid": false,
        "Notes": "Pack with care - fragile merchandise",
        "links": [
        ], ...

Notice how each resource has a link and a checksum. The checksum is needed for optimistic locking, which is built-in, should you choose to concurrently update any of the above resources. Notice also, how the nested resource Orders is referenced as Customers.Orders. I can also access it directly by calling the above URL.

Live Logic / Reactive Programming

So far so good. Similar things have been implemented in a variety of software. For instance, Adobe Experience Manager / Apache Sling intuitively exposes the JCR repository through REST as well. But where the idea behind Espresso Logic really started fascinating me is when I clicked on “Live Logic”, and I was exposed to a preconfigured set of rules that are applied to the data:

The rules view

The rules view

I’ve quickly skimmed through the manual to see if I understood correctly. These rules actually resemble the kind of rules that I can enter in any spreadsheet software. For instance, it appears as though the customer.balance column is calculated as the sum of all purchaseorder.amount_total having a paid value of false, and so on.

So, if I continue through this rule-chain I’ll wind up with lineitem.product_price being the shared dependency of all other calculated values. When changing that value, a whole set of updates should run through my rule set to finally change the customer.balance:

changing lineitem.product_price
-> changes lineitem.amount
  -> changes purchaseorder.amount_total
    -> changes customer.balance

Depending how much of a console hacker you are, you might want to write your own PUT call using curl, or you can leverage the REST Lab from the Espresso Designer, which helps you get all the parameters right. So, assuming we want to change a line item from the previous call:

{
  "@metadata": {
    "href": "https://eval.espressologic.com/rest/[my_user]/demo/v1/Customers.Orders.LineItems/11",
    "checksum": "A:2e3d8cb0bff42763"
  },
  "lineitem_id": 11,
  "ProductNumber": 2,
  "OrderNumber": 6,
  "Quantity": 2,
  "Price": 25,
  ...

Let’s just try to update that to have a price of 30:

Using the REST lab to execute PUT requests

Using the REST lab to execute PUT requests

And you can see in the response, there is a transaction summary, which shows that the Customers.Orders.TotalAmount has changed from 50 to 60, the Customers.Balance has changed from 105 to 95, and an audit record has been written. The audit record itself is also defined by a rule like any other rule. But there’s also an ordinary log file that shows what really happened when I ran this PUT request:

The log view showing all the INSERTs and UPDATEs executed

The log view showing all the INSERTs and UPDATEs executed

Imagine having to put all those INSERT and UPDATE statements into a correct order yourself, and correctly manage caching, and transactions! Instead, all we have done is define some rules. For a complete overview of what rule types are available, consider this page of the Live Logic manual

Out of scope features for this post

… So far, we’ve had a look at the most obvious features of Espresso Logic. There are more, though. A couple of examples:

Server-side JavaScript

If rules cannot express it, JavaScript can. There are various points of the application where you can inject your JavaScript snippets, e.g. for validation, more complex rule expressions, request and response transformation, etc. Although we haven’t tried it, it reads like row-based triggers written in JavaScript.

Stored procedure support

The people behind Espresso Logic are “legacy-embracing” people, just like us at Data Geekery. Their target audience might already have thousands of complex stored procedures with lots of business logic in them. Those should not be rewritten in JavaScript. But just like tables, views, and REST resources, they are exposed through the REST API, taking GET parameters for IN parameters and returning JSON for OUT parameters and cursors.

From a jOOQ perspective, it’s pretty awesome to see that someone else is taking stored procedures as seriously as we do.

Row / column level security

There is a built-in user and role management module that allows you to provide centrally-managed, fine-grained access control to your data. Not many databases support row-level security like the Oracle database, for instance. So having this kind of feature in your platform really adds value to many RDBMS integrations. Some further resources on that topic:

Conclusion: Querying vs. updating vs. rule-based persistence

On our jOOQ blog and our marketing websites (e.g. hibernate-alternative.com), we always advocate two main use-cases when operating on databases:

  • Querying: You have very complex queries to calculate things like reports. For this, SQL (e.g. through jOOQ) is perfect
  • Updating: You have a very complex domain model with lots of items and deltas that you want to persist in one go. For this, Hibernate / ORMs are perfect

But today, Espresso Logic has shown to us that there is yet another use-case. One that is covered by reactive programming (or “spreadsheet-programming“) techniques. And that’s:

  • Rule-based persistence: You have a very complex domain model with lots of items and lots of rules which you want to validate, calculate, and keep in sync all the time. For this, both SQL and ORMs are solutions at the wrong level of abstraction.

This “new” use-case is actually quite common in a lot of enterprise applications where complex business rules are currently spelled out in millions of lines of imperative code that is very hard to decipher and even harder to validate / modify. How can you reverse-engineer your business rules from millions of lines of legacy code, written in COBOL?

At Data Geekery, we’re always looking out for brand new tech. Espresso Logic is a young startup with a new product. Yet, as originally mentioned, they’re a startup with seed funding, a very compelling and innovative idea, and a huge market of legacy COBOL applications that wants to start delving into “sexy” new technologies, such as RESTful APIs, JSON, reactive programming. It might just work! If you haven’t seen enough, go work through this tutorial, which covers advanced examples such as a “bill of materials price rollup”, “bill of materials kit explosion”, “budget rollup”, “audit salary chagnes” and more.

We’ll certainly keep an eye out for future improvements to the Espresso Logic platform!

A SQL query DSL for Scala by ScalikeJDBC


There are a tremendous amount of SQL APIs natively written in Scala. Manuel Bernhardt has summarised a nice collection in his a post. Another collection of Scala SQL APIs can be seen in this Stack Overflow question.

One API that we want to focus on in particular is ScalikeJDBC (licensed ASL 2.0), which has recently published a SQL query DSL API similar to that of jOOQ. See the full documentation here:

http://scalikejdbc.org/documentation/query-dsl.html

A couple of examples:

val orders: List[Order] = withSQL {
  select
    .from(Order as o)
    .innerJoin(Product as p).on(o.productId, p.id)
    .leftJoin(Account as a).on(o.accountId, a.id)
    .where.eq(o.productId, 123)
    .orderBy(o.id).desc
    .limit(4)
    .offset(0)
  }.map(Order(o, p, a)).list.apply()

The above example looks very similar to jOOQ code, except that the SELECT DSL seems to be a bit more rigid than jOOQ’s. For instance, it is not immediately obvious how to connect several complex predicates in that WHERE clause, or if complex predicates are available at all.

What’s really nice, however, is their way of leveraging Scala language features to provide a very fluent way of constructing dynamic SQL, as can be seen in this example:

def findOrder(id: Long, accountRequired: Boolean) = 
withSQL {
  select
    .from[Order](Order as o)
    .innerJoin(Product as p).on(o.productId, p.id)
    .map { sql =>
      if (accountRequired) 
        sql.leftJoin(Account as a)
           .on(o.accountId, a.id)
      else 
        sql
    }.where.eq(o.id, 13)
  }.map { rs =>
    if (accountRequired) 
      Order(o, p, a)(rs) 
    else 
      Order(o, p)(rs)
  }.single.apply()

From how we understand things, the map method that is invoked in the middle of the SQL statement (between innerJoin and where) can transform the intermediate DSL state using a lambda expression that allows for appending a leftJoin if needed. Obviously, this can be done in a more procedural fashion as well, by assigning that intermediate DSL state to a local variable.

The need for SQL query DSLs

We’ve blogged about many of these similar SQL query DSLs in the past. The fact that they constantly pop up in various APIs is no coincidence. SQL is a very typesafe and composable language that is hard to use dynamically through string-based APIs such as JDBC, ODBC, etc.

Having a typesafe internal domain-specific language model SQL in a host language like Java or Scala brings great advantages. But the disadvantages may shine through quickly, when the DSL is not carefully crafted in a completely foreseeable way. Take the following ScalikeJDBC QueryDSL example, for instance:

val ids = withSQL {
  select(o.result.id).from(Order as o)
    .where(sqls.toAndConditionOpt(
      productId.map(id => sqls.eq(o.productId, id)),
      accountId.map(id => sqls.eq(o.accountId, id))
    ))
    .orderBy(o.id)
}.map(_.int(1)).list.apply()

This toAndConditionOpt method is really unexpected and doesn’t follow the principle of least astonishment.

This is why jOOQ’s API design is based on a formal BNF that closely mimicks SQL itself. Read more about that here.