Lesser Known jOOλ Features: Useful Collectors

jOOλ is our second most popular library. It implements a set of useful extensions to the JDK’s Stream API, which are useful especially when streams are sequential only, which according to our assumptions is how most people use streams in Java.

Such extensions include:

// (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, ...)
Seq.of(1, 2, 3).cycle();

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// (1, 0, 2, 0, 3, 0, 4)
Seq.of(1, 2, 3, 4).intersperse(0);

// (4, 3, 2, 1)
Seq.of(1, 2, 3, 4).reverse();

… and many more.

Collectors

But that’s not the only thing jOOλ offers. It also ships with a set of useful Collectors, which can be used both with JDK streams, as well as with jOOλ’s Seq type. Most of them are available from the org.jooq.lambda.Agg type, where Agg stands for aggregations.

Just like the rest of jOOλ, these collectors are inspired by SQL, and you will find quite a few SQL aggregate functions represented in this class.

Here are some of these collectors:

Counting

While the JDK has Collectors.counting(), jOOλ also has a way to count distinct values, just like SQL:

// A simple wrapper for two values:
class A {
    final String s;
    final long l;
    A(String s, long l) {
        this.s = s;
        this.l = l;
    }

    static A A(String s, long l) {
        return new A(s, l);
    }
}

@Test
public void testCount() {
    assertEquals(7L, (long) 
        Stream.of(1, 2, 3, 3, 4, 4, 5)
              .collect(Agg.count()));
    assertEquals(5L, (long) 
        Stream.of(1, 2, 3, 3, 4, 4, 5)
              .collect(Agg.countDistinct()));
    assertEquals(5L, (long) 
        Stream.of(A("a", 1), 
                  A("b", 2), 
                  A("c", 3), 
                  A("d", 3), 
                  A("e", 4), 
                  A("f", 4), 
                  A("g", 5))
              .collect(Agg.countDistinctBy(a -> a.l)));
    assertEquals(7L, (long) 
        Stream.of(A("a", 1),
                  A("b", 2), 
                  A("c", 3), 
                  A("d", 3), 
                  A("e", 4), 
                  A("f", 4), 
                  A("g", 5))
              .collect(Agg.countDistinctBy(a -> a.s)));
}

These are pretty self explanatory, I think.

Percentiles

Just recently, I’ve blogged about the usefulness of SQL’s percentile functions, and how to emulate them if they’re unavailable.

Percentiles can also be nicely calculated on streams. Why not? As soon as a Stream’s contents implements Comparable, or if you supply your custom Comparator, percentiles are easy to calculate:

// Assuming a static import of Agg.percentile:
assertEquals(
    Optional.empty(), 
    Stream.<Integer> of().collect(percentile(0.25)));
assertEquals(
    Optional.of(1), 
    Stream.of(1).collect(percentile(0.25)));
assertEquals(
    Optional.of(1), 
    Stream.of(1, 2).collect(percentile(0.25)));
assertEquals(
    Optional.of(1), 
    Stream.of(1, 2, 3).collect(percentile(0.25)));
assertEquals(
    Optional.of(1), 
    Stream.of(1, 2, 3, 4).collect(percentile(0.25)));
assertEquals(
    Optional.of(2), 
    Stream.of(1, 2, 3, 4, 10).collect(percentile(0.25)));
assertEquals(
    Optional.of(2), 
    Stream.of(1, 2, 3, 4, 10, 9).collect(percentile(0.25)));
assertEquals(
    Optional.of(2), 
    Stream.of(1, 2, 3, 4, 10, 9, 3).collect(percentile(0.25)));
assertEquals(
    Optional.of(2), 
    Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(percentile(0.25)));
assertEquals(
    Optional.of(3), 
    Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20).collect(percentile(0.25)));
assertEquals(
    Optional.of(3), 
    Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21).collect(percentile(0.25)));
assertEquals(
    Optional.of(3), 
    Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21, 22).collect(percentile(0.25)));

Notice that jOOλ implements SQL’s percentile_disc semantics. Also, there are 3 “special” percentiles that deserve their own names:

A variety of overloads allows for calculating:

  • The percentile of the values contained in the stream
  • The percentile of the values contained in the stream, if sorted by another value mapped by a function
  • The percentile of the values mapped to another value by a function

Mode

Speaking of statistics. What about the mode? I.e. the value that appears the most often in a stream? Easy, with Agg.mode()

assertEquals(
    Optional.of(1), 
    Stream.of(1, 1, 1, 2, 3, 4).collect(Agg.mode()));
assertEquals(
    Optional.of(1), 
    Stream.of(1, 1, 2, 2, 3, 4).collect(Agg.mode()));
assertEquals(
    Optional.of(2), 
    Stream.of(1, 1, 2, 2, 2, 4).collect(Agg.mode()));

Other useful collectors

Other collectors that can be useful occasionally are:

Combine the aggregations

And one last important feature when working with jOOλ is the capability of combining aggregations, just like in SQL. Following the examples above, I can easily calculate several percentiles in one go:

// Unfortunately, Java's type inference might need
// a little help here
var percentiles =
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(
  Tuple.collectors(
    Agg.<Integer>percentile(0.0),
    Agg.<Integer>percentile(0.25),
    Agg.<Integer>percentile(0.5),
    Agg.<Integer>percentile(0.75),
    Agg.<Integer>percentile(1.0)
  )
);

System.out.println(percentiles);

The result being:

(Optional[1], Optional[2], Optional[3], Optional[4], Optional[10])

Free as in Beer has caused Heartbleed (and Much More)

heartbleedHeartbleed is a bit over one month old now. A bug significant enough to have its own Wikipedia page. Today, we’re going to look into how wrong we have been in assuming that Open Source software is more secure than commercial software, because of our thinking that source code is open and that many developers are looking into it.

Free as in Beer

One of the core principles of Open Source software is, well, that it is open and that this openness is free in one way or another. This allows us developers to browse source code of third-party software and libraries for various reasons:

  • To learn from it
  • To copy it (under the terms of the respective license)
  • To modify it (under the terms of the respective license)
  • To verify it

Proprietary software often does not have the above attributes in exchange for warranties. When you read through the millions of lines of unintelligible legal Microsoft text, for instance, you will see that Microsoft gives a couple of guarantees, also with respect to security and how security flaws are remedied.

The warranty that is legally shouted at you by OpenSSL is this one:

THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT “AS IS” AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In fact, you don’t have any warranties. Specifically, there is no liability for direct or indirect loss of data, as has happened because of Heartbleed.

Obviuous, right? Because it’s free as in beer, any consequences resulting from the Heartbleed bug are entirely your own fault, if you’re using OpenSSL, or any other open source software that includes OpenSSL under similar terms.

Yes. You should have known that you were vulnerable, because you actually could verify it. Did you? Of course not. Did we? No way we’re delving through all that C code. Did our “suppliers”? We don’t even know. We’re using WinSCP extensively, and check this out, WinSCP was affected by Heartbleed! The developers over at WinSCP have been nice enough to fix this issue quickly and release a new version. They didn’t have to do this. You know what? Let’s hit that donate button, right now to thank them.

“They” should pay

OK, but again, wasn’t Heartbleed supposed to never even happen? Isn’t this Open Source? Doesn’t anyone (read: “they”. Because, I don’t have time) review such code, especially if it’s that widely used? Isn’t Open Source security much better than closed source security, which is essentially security through obscurity?

Eberhard Wolff, a well-known German freelance consultant and trainer recently replied this to me:

Yes, I’m sorry, Eberhard. But that’s just the case. Security is often neglected in many pieces of software, both commercial and open source. It’s not that the openness helps anyone discover things, security issues are very hard to discover per se. Also in commercial software. Remember GOTO fail? GOTO fail also affected SSL, but only in Apple software.

The issue lies elsewhere, though. Because Microsoft somehow manages to patch all security issues in virtually no time. They fix things so fast that users get frustrated about the sheer frequency of fixes ;-)

But Microsoft has a lot to lose. Pretty much 90% of desktop OS market share, that’s what they’ve got to lose. They’re paying a high amount of insurance money, invested in security teams that are penetration testing their own software to look for leaks. Yes. The vendor themselves are doing this. And yes, it costs money. And yes, you’ve already paid that when you bought Windows (or Office for that matter).

Money that OpenSSL never had. Read this frustrated section in the following public letter by Steve Marquess:

Thanks to that publicity there has been an outpouring of grassroots support from the OpenSSL user community, roughly two hundred donations this past week[3] along with many messages of support and encouragement[4]. Most were for $5 or $10 and, judging from the E-mail addresses and names, were from all around the world. I haven’t finished entering all of them to get an exact total, but all those donations together come to about US$9,000. Even if those donations continue to arrive at the same rate indefinitely (they won’t), and even though every penny of those funds goes directly to OpenSSL team members, it is nowhere near enough to properly sustain the manpower levels needed to support such a complex and critical software product. While OpenSSL does “belong to the people” it is neither realistic nor appropriate to expect that a few hundred, or even a few thousand, individuals provide all the financial support. The ones who should be contributing real resources are the commercial companies[5] and governments[6] who use OpenSSL extensively and take it for granted.

I can feel with Steve. US$9,000 to improve OpenSSL, a software that has added so much value to all of our servers and clients. That’s hardly enough for even 2-3 bug fixes, depending on whose salaries you’re paying.

But Steve tells off the “commercial companies” and “governments” to not have paid enough. But who are these “commercial companies”?

Meanwhile, at Data Geekery

We know similar stories, of course. When we have transitioned from completely Open Source to dual-licensed, we heard a lot of complaints by users that we have never heard of up to that moment. Users who have never donated or contributed code, bug reports, manual improvements or anything. This is OK. We were giving jOOQ away for free under the terms of the ASL 2.0 as part of a extended market analysis. We didn’t expect donations.

Of course, we have also disappointed one or two people who were hoping that jOOQ would remain “free as in freedom”, who have contributed, who have participated, and who now felt deceived. And we’re genuinely sorry about that. But participation doesn’t pay for our bills, money does. So we reached out for money (luckily, we always owned the code as we paid contributors of larger contributions, and had them all sign a CLA, so we could actually do this step, from a legal perspective).

But let’s again focus on the people looking for free beer. No contributions. No feedback. Free rides. And then, when asked for money, they’re pissed (even 8 months later!).

Fair enough. Our price has risen for that particular user. From free to something. They, too, have a right to be frustrated about this change, which they might not have expected. But they, too, have taken “free” for granted without proper reason. Without verifying.

“They”

We don’t believe that “THEY” should pay for our licenses. We don’t think that “THEY” should pay for enterprise support. There is no “THEY” in free software, there’s only all of us, because who decides which project is really important enough to deserve some funding from “THEM” and which one isn’t? All attempts of “fairly” distributing money will inevitably lead to corruption, misuse, abuse, and eventually, to a lack of innovation. We’ve known that from other industries.

So, let’s simply establish two facts:

  • Free software is lying around in the streets. It is publicly acclaimed to be AS IS software. If we’re using it, it is our own risk. And if it goes wrong, it is our own fault. No one else’s. Not the big companies’ (who are already investing tons of money in Open Source software), and not the governments’ (same thing there). Stop pretending that FOSS is in any way better or less of a legal risk than commercial software. That’s just not true.
  • In fact, there is no such thing as free software. “Free” is a price we’re paying as a down payment (or non-payment). The costs will or might arise later. If we’re lucky, the thing will remain “free”. But if we’re professionals, then we’ll insure ourselves against any risk arising from free “AS IS” software. This means that we’ll give back (= “pay”) in another way. Through donations, through bug fixes, through verification.

Further reading

Bruno Borges from Oracle has expressed his very interesting views and counter measures to the current state and meaning of Open Source software in this blog post.

Java 8 Friday Goodies: Map Enhancements

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. We have blogged a couple of times about some nice Java 8 goodies, and now we feel it’s time to start a new blog series, the…

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub. tweet this

Java 8 Goodie: Map Enhancements

In previous posts, we’ve already dealt with a couple of new Streams features, for instance when sorting. Most API improvements are indeed part of the new Streams API. But a few nice methods were also added to java.util.List and most importantly, to java.util.Map. If you want a quick overview of feature additions, go to the JDK8 Javadoc and click on the new “Default Methods” tab:

java.util.Map default methods

java.util.Map default methods

For backwards-compatibility reasons, all new methods added to Java interfaces are in fact default methods. So we have a couple of exciting new additions!

compute() methods

Often, we fetch a value from a map, make some calculations on it and put it back into the map. This can be verbose and hard to get right if concurrency is involved. With Java 8, we can pass a BiFunction to the new compute(), computeIfAbsent(), or computeIfPresent() methods and have the Map implementation handle the semantics of replacing a value.

The following example shows how this works:

// We'll be using this simple map
// Unfortunately, still no map literals in Java 8..
Map<String, Integer> map = new HashMap<>();
map.put("A", 1);
map.put("B", 2);
map.put("C", 3);

// Compute a new value for the existing key
System.out.println(map.compute("A", 
    (k, v) -> v == null ? 42 : v + 41));
System.out.println(map);

// This will add a new (key, value) pair
System.out.println(map.compute("X", 
    (k, v) -> v == null ? 42 : v + 41));
System.out.println(map);

The output of the above program is this:

42
{A=42, B=2, C=3}
42
{A=42, B=2, C=3, X=42}

This is really useful for ConcurrentHashMap, which ships with the following guarantee:

The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

forEach() method

This is a really nice goodie which lets you pass a method reference or a lambda to receive (key, value) pairs one by one. A trivial example would be this:

map.forEach((k, v) -> 
    System.out.println(k + "=" + v));

Its output being:

A=1
B=2
C=3

merge() method

Now this one is really not so easy to understand. The Javadoc uses this example here:

map.merge(key, msg, String::concat)

Given the following contract:

If the specified key is not already associated with a value or is associated with null, associates it with the given value. Otherwise, replaces the value with the results of the given remapping function, or removes if the result is null.

So, the above code translates to the following atomic operation:

String value = map.get(key);
if (value == null)
    map.put(key, msg);
else
    map.put(key, value.concat(msg));

This is certainly not an everyday functionality and might just have leaked from an implementation to the top-level API. Additionally, if the map already contains null (so, null values are OK), and your remappingFunction returns null, then the entry is removed. That’s quite unexpected. Consider the following program:

map.put("X", null);
System.out.println(map.merge(
    "X", null, (v1, v2) -> null));
System.out.println(map);

Its output is:

null
{A=1, B=2, C=3}

Update: I first wrote the above code first with JDK 8 build 116. With build 129, things have changed completely again. First off, the value passed to merge() is not allowed to be null. Secondly. nullvalues are treated by merge() just like absent values. To produce the same output, we’ll write:

map.put("X", 1);
System.out.println(map.merge(
    "X", 1, (v1, v2) -> null));
System.out.println(map);

This merge() operation has thus removed a value from the map. That’s probably OK because the semantics of “merge” is often a combination of INSERT, UPDATE, and DELETE if we’re using SQL-speak. And a somewhat reasonable way to indicate that a value should be removed is to return null from such a function.

But the map is allowed to contain null values, which can never be inserted into the map using merge(). tweet this

getOrDefault()

This is a no-brainer. Right? Right! Wrong!

Unfortunately, there are two types of Maps. Those supporting null keys and/or values and those who don’t support nulls. While the previous merge() method didn’t distinguish between a map not containing a key and a map containing a key with a null value, this new getOrDefault() only returns the default when the key is not contained. It won’t protect you from a NullPointerException:

map.put("X", null);
try {
  System.out.println(map.getOrDefault("X", 21) + 21);
}
catch (NullPointerException nope) {
  nope.printStackTrace();
}

That’s quite a bummer. In general, it can be said the Map API has become even more complex with respect to nulls. tweet this

Trivial additions

There are a few more methods, like putIfAbsent() (pulled up from ConcurrentHashMap, remove() (with key and value arguments), replace().

Conclusion

All in all, it can be said that a lot of atomic operations have made it to the top-level Map API, which is good. But then again, the pre-existing confusion related to the semantics of null in maps has deepened. The terminologies “present” vs. “absent”, “contains”, “default” don’t necessarily help clarifying these things, which is surprisingly against the rules of keeping an API consistent and most importantly, regular. Thus as a consumer of this API, ideally, you should keep null out of maps, both as keys and as values!

Next week in this blog series, we’re going to look at how Java 8 will allow you to define local transactional scope very easily, so stay tuned!

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page

JDBC 4.0’s Lesser-known Clob.free() and Blob.free() Methods

When I talk about jOOQ at conferences, I always show this slide containing a bunch of very common JDBC mistakes that people often commit:

Six common JDBC bugs in this image

Six common JDBC bugs in this image

Can you find the bugs? Some of them are obvious, such as:

  • Line 4: Syntax errors resulting from bad concatenation on line 3
  • Line 7: Syntax errors and SQL injection risk due to variable inlining
  • Line 8: Wrong bind index resulting from a potential mismatch on line 3
  • Line 14: Wrong column name due to sloppy rename
  • Line 18: Bad resource management

But then, there’s another very subtle bug that most people are unaware of because the fix was only possible since the upgrade in Java 6 / JDBC 4.0. See the solution, below:

Solution to the previous six bugs

Solution to the previous six bugs

With JDBC 4.0, the Clob.free() and the Blob.free() methods were introduced. While calling them is optional, it may be a very bad idea not to call them as early as possible, as you should not rely on the garbage collector to kick in early enough to free these resources. In fact, in certain databases / JDBC drivers, LOBs can outlive individual statements and/or transactions. They’re beasts of their own. If you’re reading through the JDBC tutorial (and also in the JDBC specification), it says:

Blob, Clob, and NClob Java objects remain valid for at least the duration of the transaction in which they are created. This could potentially result in an application running out of resources during a long running transaction.

The same is true for arrays, which also have an Array.free() method since Java 6 / JDBC 4.0.

So if your application has long-running transactions, do call these free() methods, or make it a habit to always call them. We’ll file an issue to FindBugs to make this a potential bug pattern.

Top 10 Ceylon Language Features I Wish We Had In Java

What does one do when Hibernate is “finished” and feature complete and one needs new challenges? Right. One creates a new JVM language called Ceylon.

On November 12, 2013, Ceylon 1.0.0 was finally released and we congratulate the whole team at Red Hat for their achievements in what looks like a very promising new JVM language. While it will be a slight challenge for Ceylon to compete with Scala, there are lots of very interesting features that distinguish it.

In fact, this language has so many interesting features, it’ll be hard to write up a blog post about the 10 most interesting ones. Which ones to choose? On Google Plus, I’ve had a short chat with Gavin King who also brought us Hibernate, Ross Tate who is also involved with JetBrains’ Kotlin, and Lukas Rytz who was a PhD student and committer for EPFL’s Scala and now works at Google Dart. I wanted those language Uberdesigners to help me find the 10 most thrilling language features that they have and we Java developers don’t. Now I have 20 interesting ones. I’ll certainly write a follow-up post to this one.

I have observed Gavin King and the other guys to be very enthusiastic and knowledgeable. I’ve already had this impression before when I first heard about Ceylon from Stéphane Épardaud at the JUGS in Berne, Switzerland in February 2013, another one of RedHat’s passionate engineers (see his presentation’s slides here).

Anyway, enough of the who’s who. Here’s our personal Top 10 List of Ceylon Language Features I Wish We Had In Java:

1. Modules

In Java, Jigsaw has been postponed about 34 times and we’re only now closing in on Java 8 GA! Yes, we have OSGi and Maven, and both work very well to manage dependencies at runtime (OSGi) or at compile-time (Maven). But compare this black magic Maven/OSGi configuration using Apache Felix

<plugin>
  <groupId>org.apache.felix</groupId>
  <artifactId>maven-bundle-plugin</artifactId>
  <version>2.1.0</version>
  <extensions>true</extensions>
  <executions>
    <execution>
      <id>bundle-manifest</id>
      <phase>process-classes</phase>
      <goals>
        <goal>manifest</goal>
      </goals>
    </execution>
  </executions>
  <configuration>
    <supportedProjectTypes>
      <supportedProjectType>
        jar
      </supportedProjectType>
    </supportedProjectTypes>
    <instructions>
      <Bundle-SymbolicName>
        org.jooq
      </Bundle-SymbolicName>
      <Export-Package>*</Export-Package>
      <Import-Package>
        javax.persistence;resolution:=optional,
        org.apache.log4j;resolution:=optional,
        *
      </Import-Package>
      <_versionpolicy>
        [$(version;==;$(@)),$(version;+;$(@)))
      </_versionpolicy>
    </instructions>
  </configuration>
</plugin>

… with this one by Ceylon:

"The second best ever ORM solution!"
license "http://www.gnu.org/licenses/lgpl.html"
module org.hibernate "3.0.0.beta" {
    import ceylon.collection "1.0.0";
    import java.base "7";
    shared import java.jdbc "7";
}

Finally, things can be controlled on a jar-level, including visibility of packages. With only few lines of code. Please, Java, integrate Ceylon’s powerful module support.

It may be worth mentioning that Fantom is another language with integrated module support. See JodaTime’s Stephen Colebourne’s talk at Devoxx 2011: “Is Fantom Light Years Ahead of Scala?”. Stephen has also brought us ElSql, a new external SQL DSL for Java templating.

2. Sequences

This is the first time I’ve seen this kind of first class support for sequences in a typesafe language. Not only does Ceylon ship with all sorts of collection literals, it also knows types for these constructs. Concretely, you can declare an Iterable as such:

{String+} words = { "hello", "world" };

Notice the notation of the literal. It is of type {String+}, meaning that it contains at least one element. The type is assignment-compatible with {String*}, which represents a possibly empty sequence. Very interesting.

This goes on by supporting array literals as such:

String[] operators = [ "+", "-", "*", "/" ];
String? plus = operators[0];
String[] multiplicative = operators[2..3];

… or tuple literals:

[Float,Float,String] point = [0.0, 0.0, "origin"];

Notice also the range literal 2..3 which allows for extracting sub-arrays from the original array. So much sequence goodness in Ceylon!

Notice also the question mark in String?, which is Ceylon’s way of declaring …

3. Nullable types

While Scala knows the Option type and Haskell knows the Maybe type and Java 8 tries to compete by adding the new, unenforceable Optional type, Ceylon has a very simple notion of something that is nullable. If there’s a question mark behind a type, it’s nullable. Otherwise, it’s not null. Always.

In order to convert a nullable type into a not nullable type, you have to explicitly check:

void hello() {
    String? name = process.arguments.first;
    String greeting;
    if (exists name) {
        greeting = "Hello, ``name``!";
    }
    else {
        greeting = "Hello, World!";
    }
    print(greeting);
}

Notice the exists operator. It defines a new scope within which the name variable is known to be not null, i.e. it is promoted from String? to String. This locally scoped type promotion is commonly referred to as flow-sensitive typing, which has already been observed in the Whiley language, according to Lukas Rytz.

If you omit the exists check, you’d get a compilation error on that string interpolation there. There are also other useful constructs to perform ad-hoc type conversions:

String greeting = "Hello, " + (name else "World");

The else clause acts like a SQL COALESCE() function and can even be chained. Read more about Ceylon’s nullable goodness.

4. Defaulted parameters

OMG, how I wish we had that in Java. Every time we overload methods, we think, why not just support defaulted parameters like PL/SQL, for instance??

void hello(String name="World") {
    print("Hello, ``name``!");
}

I cannot think of a single good reason why languages wouldn’t have named and defaultable parameters like PL/SQL:

-- One of the parameters is optional
CREATE PROCEDURE MY_PROCEDURE (
  P1 IN NUMBER,
  P2 IN VARCHAR2 := 'ABC',
  P3 IN VARCHAR2
);

-- Calling the procedure
MY_PROCEDURE(
  P1 => 1,
  P3 => 'XYZ'
);

So this is one way to circumvent method overloading in most common cases. Method overloading is still tedious when we want to deal with alternative, incompatible types. But not in Ceylon, as Ceylon knows …

5. Union types

OK, this is a bit esoteric. The creators of Ceylon really really wanted to get rid of method overloading, partially because Ceylon also compiles to JavaScript, and JavaScript does not know function overloading. In fact, it is not possible to overload methods in Ceylon at all. To be able to interoperate with Java, however, union types needed to be introduced. A union type String|Integer can be either a String or an Integer. There’s method overloading right there!

void printType(String|Integer|Float val) { ... }
 
printType("hello");
printType(69);
printType(-1.0);

In order to “untangle” the union type, you can again take advantage of flow-sensitive typing for the val parameter by performing type-checks similar to Java’s instanceof

void printType(String|Integer|Float val) {
    switch (val)
    case (is String) { print("String: ``val``"); }
    case (is Integer) { print("Integer: ``val``"); }
    case (is Float) { print("Float: ``val``"); }
}

Within that scope, val is known to the compiler to be of type String, for example. This goes on to allowing crazy stuff like enumerated types where a type can be one or another thing, simultaneously:

abstract class Point()
        of Polar | Cartesian {
    // ...
}

Note that this is very different from multiple inheritance where such a Point would be both Polar and Cartesian. But that’s not all. Ceylon also has …

6. Intersection types

Now, as you may have guessed, that’s the exact inverse of a union type, and this is actually also supported by Java’s generics. In Java, you can write:

class X<E extends Serializable & Comparable<E>> {}

In the above example, X accepts only type parameters that are both Serializable and Comparable. This is much crazier in Ceylon where you can assign values to a locally declared intersection type. And that’s not it! In our chat, Gavin has pointed out this incredible language feature to me, where union / intersection types can interact with flow-sensitive typing to form the following (due for Ceylon 1.2):

value x = X();
//x has type X
if (something) {
    x = Y();
    //x has type Y
}
//x has type X|Y

Makes sense, right? So I asked him, if I will be able to intersect that type again with Z and Gavin said, yes! The following can be done:

value x = X();
//x has type X
if (something) {
    x = Y();
    //x has type Y
}
//x has type X|Y
if (is Z x) {
    //x has type <X|Y>&Z
}

And this goes on, because type intersections also interact with generics in a very interesting way. Under certain circumstances, X<A>&X<B> can be the same as X<A&B>. In other words, intersections (and unions) are distributive with generics, just like additions are with multiplications (in an informal understanding of “just like”). If you’re willing to delve into the language spec for this, see §3.7.2 Principal instantiation inheritance.

Now, union and intersection types can get quite nasty und hard to reuse. This is why Ceylon has …

7. Type aliases

Is there any other programming language that ever thought of this awesome feature?? This is so useful, even if you’re not supporting union and/or intersection types. Think about Java’s generics. With the advent of generics, people started writing stuff like:

Map<String, List<Map<Integer, String>>> map = // ...

Two things can be said:

  • Generics are extremely useful to the Java libraries
  • Generics become extremely verbose when doing the above

Here’s where type aliases come into play. Check out this example:

interface People => Set<Person>;

The point here is that even if some verbose types are reused very often, you don’t often want to create an explicit subtype for the above. In other words, you don’t want to abuse subtype polymorphism as a shortcut to “simplify” generic polymorphism.

Think of aliases as an expandable macro, which is mutually assignment-compatible. In other words, you can write:

People?      p1 = null;
Set<Person>? p2 = p1;
People?      p3 = p2;

So as the term “alias” suggests, you’re not creating a new type. You’re just giving a complex type a simpler name. But even better than type aliasing is …

8. Type inference

Many other languages have this and so does Java to a certain extent, at least as far as generics are involved. Java 8 goes one step further in allowing type inference with generics. But Java is far away from what languages like Scala or Ceylon can do with local variables:

interface Foo {}
interface Bar {}
object foobar satisfies Foo&Bar {}
//inferred type Basic&Foo&Bar
value fb = foobar; 
//inferred type {Basic&Foo&Bar+}
value fbs = { foobar, foobar };

So, this example shows a lot of features combined, including type constraints, sequence types, union types. With such a rich type system it is very important to support this level of type inference where a value keyword indicates that you don’t want to (or you cannot) explicitly declare a type. This, I’d really love to see in Java 9!

Read more about Ceylon’s awesome type inference capabilities.

9. Declaration-site variance

Now, this feature might be a bit harder to understand, as Java’s generics are already quite difficult to understand. I’ve recently read a very interesting paper by Ross Tate, Alan Leung and Sorin Lerner about the challenges brought to Java generics through wildcards: Taming Wildcards in Java’s Type System. Generics are still a very active research topic neither researchers nor language designers completely agree on whether use-site variance (as in Java) or declaration-site variance (as in C#, Scala, or Ceylon) is really better for mainstream programmers. Older languages talking about variance are Eiffel and OCaml.

Microsoft has introduced declaration-site variance in C#. I’ll cite the example from Wikipedia, which is very easy to understand. In C#, the IEnumerator interface has a covariant generic type parameter:

interface IEnumerator<out T>
{
    T Current { get; }
    bool MoveNext();
}

This simply means that the following will work:

IEnumerator<Cat> cats = ...
IEnumerator<Animal> animals = cats;

This is quite different from Java’s use-site variance, where the above wouldn’t compile, but the following would:

Iterator<Cat> cats = ...
Iterator<? extends Animal> animals = cats;

The main reason for declaration-site covariance is the simple fact that verbosity is greatly reduced at the use-site. Wildcards are a major pain to Java developers and they lead to numerous Stack Overflow questions as this one, which is about locally scoped wild-cards:

// Given this interface:
public interface X<E> {
    E get();
    E set(E e);
}

// This does not compile:
public void foo(X<?> x) {
    x.set(x.get());
}

As can be seen in the Ceylon language tour, Ceylon generics support declaration-site variance, just like C# and Scala. It will be interesting to see how these things evolve, as both types of variance support have their pros and cons, while at the same time, Ross Tate advocates mixed-site variance, which would really be a great addition for the Java language!

Now this was a bit complex, so let’s have a look at a simpler, yet awesome feature to round things up …

10. Functions and methods

One of the main things outlined by Stéphane Épardaud was the fact that the Ceylon language is a very regular language. This is particularly apparent when considering how Ceylon treats functions (and methods, which are type member functions). I can put a function everywhere. Consider this example:

Integer f1() => 1;
class C() {
    shared Integer f2() {
        Integer f3() => 2;
        return f3();
    }
}

print(f1());
print(C().f2());

In the above example,

  • f1() is a package-level function (much like a “global” static function in Java)
  • f2() is a regular method on the C class
  • f3() is a local function within the f2() method

With Java 8’s support for lambda expressions, these things get a bit better, but isn’t it awesome to be able to declare functions anywhere, in almost the same syntax?

Conclusion: Play around with Ceylon

That’s it for now. We might be publishing a follow-up article about the more esoteric language features in Ceylon, some time soon. In any case, you can download this interesting JVM language for free with first-class IDE support in Eclipse. You can also visit the Ceylon documentation website and have their website compile Ceylon code into JavaScript for execution in your browser.

Visit the Community and interact with the language designers from RedHat and Serli, and when you’re done, share this post on our jOOQ blog and help the JCP recognise that this wonderful language has a couple of very interesting features to put on the Java 9 or 10 roadmap!

10 More Common Mistakes Java Developers Make when Writing SQL

I was positively surprised to see how popular my recent listing about 10 Common Mistakes Java Developers Make when Writing SQL was, both on my own blog and on my syndication partner DZone. The popularity shows a couple of things:

jOOQ is the best way to write SQL in Java

Anyway, the common mistakes I listed previously are far from complete, so I will treat you to a sequel of 10 subtly less common, yet equally interesting mistakes Java developers make when writing SQL.

1. Not using PreparedStatements

Interestingly, this mistake or misbelief still surfaces blogs, forums and mailing lists many years after the appearance of JDBC, even if it is about a very simple thing to remember and to understand. It appears that some developers refrain from using PreparedStatements for any of these reasons:

  • They don’t know about PreparedStatements
  • They think that PreparedStatements are slower
  • They think that writing a PreparedStatement takes more effort

First off, let’s bust the above myths. In 96% of the cases, you’re better off writing a PreparedStatement rather than a static statement. Why? For simple reasons:

  • You can omit syntax errors originating from bad string concatenation when inlining bind values.
  • You can omit SQL injection vulnerabilities from bad string concatenation when inlining bind values.
  • You can avoid edge-cases when inlining more “sophisticated” data types, such as TIMESTAMP, binary data, and others.
  • You can keep open PreparedStatements around for a while, reusing them with new bind values instead of closing them immediately (useful in Postgres, for instance).
  • You can make use of adaptive cursor sharing (Oracle-speak) in more sophisticated databases. This helps prevent hard-parsing SQL statements for every new set of bind values.

Convinced? Yes. Note, there are some rare cases when you actually want to inline bind values in order to give your database’s cost-based optimiser some heads-up about what kind of data is really going to be affected by the query. Typically, this results in “constant” predicates such as:

  • DELETED = 1
  • STATUS = 42

But it shouldn’t result in “variable” predicates such as:

  • FIRST_NAME LIKE “Jon%”
  • AMOUNT > 19.95

Note that modern databases implement bind-variable peeking. Hence, by default, you might as well use bind values for all your query parameters. Note also that higher-level APIs such as JPA CriteriaQuery or jOOQ will help you generate PreparedStatements and bind values very easily and transparently when writing embedded JPQL or embedded SQL.

More background info:

The Cure:

By default, always use PreparedStatements instead of static statements. By default, never inline bind values into your SQL.

2. Returning too many columns

This mistake is quite frequent and can lead to very bad effects both in your database’s execution plan and in your Java application. Let’s look at the second effect first:

Bad effects on the Java application:

If you’re selecting * (star) or a “default” set of 50 columns, which you’re reusing among various DAOs, you’re transferring lots of data from the database into a JDBC ResultSet. Even if you’re not reading the data from the ResultSet, it has been transferred over the wire and loaded into your memory by the JDBC driver. That’s quite a waste of IO and memory if you know that you’re only going to need 2-3 of those columns.

This was obvious, but beware also of…

Bad effects on the database execution plan:

These effects may actually be much worse than the effects on the Java application. Sophisticated databases perform a lot of SQL transformation when calculating the best execution plan for your query. It may well be that some parts of your query can be “transformed away”, knowing that they won’t contribute to the projection (SELECT clause) or to the filtering predicates. I’ve recently blogged about this in the context of schema meta data:
How schema meta data impacts Oracle query transformations

Now, this is quite a beast. Think about a sophisticated SELECT that will join two views:

SELECT *
FROM   customer_view c
JOIN   order_view o
  ON   c.cust_id = o.cust_id

Each of the views that are joined to the above joined table reference might again join data from dozens of tables, such as CUSTOMER_ADDRESS, ORDER_HISTORY, ORDER_SETTLEMENT, etc. Given the SELECT * projection, your database has no choice but to fully perform the loading of all those joined tables, when in fact, the only thing that you were interested in was this:

SELECT c.first_name, c.last_name, o.amount
FROM   customer_view c
JOIN   order_view o
  ON   c.cust_id = o.cust_id

A good database will transform your SQL in a way that most of the “hidden” joins can be removed, which results in much less IO and memory consumption within the database.

The Cure:

Never execute SELECT *. Never reuse the same projection for various queries. Always try to reduce the projection to the data that you really need.

Note that this can be quite hard to achieve with ORMs.

3. Thinking that JOIN is a SELECT clause

This isn’t a mistake with a lot of impact on performance or SQL correctness, but nevertheless, SQL developers should be aware of the fact that the JOIN clause is not part of the SELECT statement per se. The SQL standard 1992 defines a table reference as such:

6.3 <table reference>

<table reference> ::=
    <table name> [ [ AS ] <correlation name>
      [ <left paren> <derived column list> <right paren> ] ]
  | <derived table> [ AS ] <correlation name>
      [ <left paren> <derived column list> <right paren> ]
  | <joined table>

The FROM clause and also the joined table can then make use of such table references:

7.4 <from clause>

<from clause> ::= 
    FROM <table reference> [ { <comma> <table reference> }... ]

7.5 <joined table>

<joined table> ::=
    <cross join>
  | <qualified join>
  | <left paren> <joined table> <right paren>

<cross join> ::=
    <table reference> CROSS JOIN <table reference>

<qualified join> ::=
    <table reference> [ NATURAL ] [ <join type> ] JOIN
      <table reference> [ <join specification> ]

Relational databases are very table-centric. Many operations are performed on physical, joined or derived tables in one way or another. To write SQL effectively, it is important to understand that the SELECT .. FROM clause expects a comma-separated list of table references in whatever form they may be provided.

Depending on the complexity of the table reference, some databases also accept sophisticated table references in other statements, such as INSERT, UPDATE, DELETE, MERGE. See Oracle’s manuals for instance, explaining how to create updatable views.

The Cure:

Always think of your FROM clause to expect a table reference as a whole. If you write a JOIN clause, think of this JOIN clause to be part of a complex table reference:

SELECT c.first_name, c.last_name, o.amount
FROM

    customer_view c
      JOIN order_view o
      ON c.cust_id = o.cust_id

4. Using pre-ANSI 92 JOIN syntax

Now that we’ve clarified how table references work (see the previous point), it should become a bit more obvious that the pre-ANSI 92 JOIN syntax should be avoided at all costs. To execution plans, it usually makes no difference if you specify join predicates in the JOIN .. ON clause or in the WHERE clause. But from a readability and maintenance perspective, using the WHERE clause for both filtering predicates and join predicates is a major quagmire. Consider this simple example:

SELECT c.first_name, c.last_name, o.amount
FROM   customer_view c,
       order_view o
WHERE  o.amount > 100
AND    c.cust_id = o.cust_id
AND    c.language = 'en'

Can you spot the join predicate? What if we joined dozens of tables? This gets much worse when applying proprietary syntaxes for outer join, such as Oracle’s (+) syntax.

The Cure:

Always use the ANSI 92 JOIN syntax. Never put JOIN predicates into the WHERE clause. There is absolutely no advantage to using the pre-ANSI 92 JOIN syntax.

5. Forgetting to escape input to the LIKE predicate

The SQL standard 1992 specifies the like predicate as such:

8.5 <like predicate>

<like predicate> ::=
    <match value> [ NOT ] LIKE <pattern>
      [ ESCAPE <escape character> ]

The ESCAPE keyword should be used almost always when allowing for user input to be used in your SQL queries. While it may be rare that the percent sign (%) is actually supposed to be part of the data, the underscore (_) might well be:

SELECT *
FROM   t
WHERE  t.x LIKE 'some!_prefix%' ESCAPE '!'

The Cure:

Always think of proper escaping when using the LIKE predicate.

6. Thinking that NOT (A IN (X, Y)) is the boolean inverse of A IN (X, Y)

This one is subtle but very important with respect to NULLs! Let’s review what A IN (X, Y) really means:

                  A IN (X, Y)
is the same as    A = ANY (X, Y)
is the same as    A = X OR A = Y

When at the same time, NOT (A IN (X, Y)) really means:

                  NOT (A IN (X, Y))
is the same as    A NOT IN (X, Y)
is the same as    A != ANY (X, Y)
is the same as    A != X AND A != Y

That looks like the boolean inverse of the previous predicate, but it isn’t! If any of X or Y is NULL, the NOT IN predicate will result in UNKNOWN whereas the IN predicate might still return a boolean value.

Or in other words, when A IN (X, Y) yields TRUE or FALSE, NOT(A IN (X, Y)) may still yield UNKNOWN instead of FALSE or TRUE. Note, that this is also true if the right-hand side of the IN predicate is a subquery.

Don’t believe it? See this SQL Fiddle for yourself. It shows that the following query yields no result:

SELECT 1
WHERE     1 IN (NULL)
UNION ALL
SELECT 2
WHERE NOT(1 IN (NULL))

More details can be seen in my previous blog post on that subject, which also shows some SQL dialect incompatibilities in that area.

The Cure:

Beware of the NOT IN predicate when nullable columns are involved!

7. Thinking that NOT (A IS NULL) is the same as A IS NOT NULL

Right, so we remembered that SQL implements three-valued logic when it comes to handling NULL. That’s why we can use the NULL predicate to check for NULL values. Right? Right.

But even the NULL predicate is subtle. Beware that the two following predicates are only equivalent for row value expressions of degree 1:

                   NOT (A IS NULL)
is not the same as A IS NOT NULL

If A is a row value expression with a degree of more than 1, then the truth table is transformed such that:

  • A IS NULL yields true only if all values in A are NULL
  • NOT(A IS NULL) yields false only if all values in A are NULL
  • A IS NOT NULL yields true only if all values in A are NOT NULL
  • NOT(A IS NOT NULL) yields false only if all values in A are NOT NULL

See more details in my previous blog post on that subject.

The Cure:

When using row value expressions, beware of the NULL predicate, which might not work as expected.

8. Not using row value expressions where they are supported

Row value expressions are an awesome SQL feature. When SQL is a very table-centric language, tables are also very row-centric. Row value expressions let you describe complex predicates much more easily, by creating local ad-hoc rows that can be compared with other rows of the same degree and row type. A simple example is to query customers for first names and last names at the same time.

SELECT c.address
FROM   customer c,
WHERE (c.first_name, c.last_name) = (?, ?)

As can be seen, this syntax is slightly more concise than the equivalent syntax where each column from the predicate’s left-hand side is compared with the corresponding column on the right-hand side. This is particularly true if many independent predicates are combined with AND. Using row value expressions allows you to combine correlated predicates into one. This is most useful for join expressions on composite foreign keys:

SELECT c.first_name, c.last_name, a.street
FROM   customer c
JOIN   address a
  ON  (c.id, c.tenant_id) = (a.id, a.tenant_id)

Unfortunately, not all databases support row value expressions in the same way. But the SQL standard had defined them already in 1992, and if you use them, sophisticated databases like Oracle or Postgres can use them for calculating better execution plans. This is explained on the popular Use The Index, Luke page.

The Cure:

Use row value expressions whenever you can. They will make your SQL more concise and possibly even faster.

9. Not defining enough constraints

So, I’m going to cite Tom Kyte and Use The Index, Luke again. You cannot have enough constraints in your meta data. First off, constraints help you keep your data from corrupting, which is already very useful. But to me, more importantly, constraints will help the database perform SQL transformations, as the database can decide that

  • Some values are equivalent
  • Some clauses are redundant
  • Some clauses are “void” (i.e. they will not return any values)

Some developers may think that constraints are slow. The opposite is the case, unless you insert lots and lots of data, in case of which you can either disable constraints for a large operation, or use a temporary “load table” without constraints, transferring data offline to the real table.

The Cure:

Define as many constraints as you can. They will help your database to perform better when querying.

10. Thinking that 50ms is fast query execution

The NoSQL hype is still ongoing, and many companies still think they’re Twitter or Facebook in dire need of faster, more scalable solutions, escaping ACID and relational models to scale horizontally. Some may succeed (e.g. Twitter or Facebook), others may run into this:

For the others who are forced (or chose) to stick with proven relational databases, don’t be tricked into thinking that modern databases are slow. They’re hyper fast. In fact, they’re so fast, they can parse your 20kb query text, calculate 2000-line execution plans, and actually execute that monster in less than a millisecond, if you and your DBA get along well and tune your database to the max.

They may be slow because of your application misusing a popular ORM, or because that ORM won’t be able to produce fast SQL for your complex querying logic. In that case, you may want to chose a more SQL-centric API like JDBC, jOOQ or MyBatis that will let you get back in control of your SQL.

So, don’t think that a query execution of 50ms is fast or even acceptable. It’s not. If you get these speeds at development time, make sure you investigate execution plans. Those monsters might explode in production, where you have more complex contexts and data.

Conclusion

SQL is a lot of fun, but also very subtle in various ways. It’s not easy to get it right as my previous blog post about 10 common mistakes has shown. But SQL can be mastered and it’s worth the trouble. Data is your most valuable asset. Treat data with respect and write better SQL.

jOOQ is the best way to write SQL in Java