## Implementing a generic REDUCE aggregate function with SQL

So, @rotnroll666 nerd sniped me again. Apparently, the Neo4j Cypher query language supports arbitrary reductions, just like any functional collection API, oh say, the JDK Stream API:

```Stream.of(2, 4, 3, 1, 6, 5)
.reduce((i, j) -> i * j)
.ifPresent(System.out::println); // Prints 720
```

SQL doesn’t have this, yet it would be very useful to be able to occasionally do that. An arbitrary reduction can be implemented “easily” in SQL. Let’s look at the above multiplication reduction. In PostgreSQL, you’d write it like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
--     ^^^^^^^^^ reduction
)
select i from r
order by o desc
limit 1
)
from t;
```

Woah. That’s a bit of a syntactic beast. Let’s decompose it.

### The aggregate function

First off, if we were summing the values, we’d use the built-in `SUM` function, like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select sum(i)
from t;
```

That would produce 21. If you’re willing to lose precision, you could emulate `PRODUCT()` using logarithms. But we wrote `REDUCE()`, a hypothetical one, like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select reduce(
t1.i * t2.i referencing accumulated as t1, accumulating as t2
)
from t;
```

This is SQL, so the lambda expression would obviously use a ton of keywords, completely novel and unique to this particular function, and you’d need jOOQ to make it composable 😁. Essentially, we’d have some sort of reduction expression based on two pseudo tables:

• The accumulated table containing the result
• The accumulating table (or rather row)

A reduction is a generic aggregate function that operates on groups. So, we will have to re-use some SQL aggregate function mechanism to achieve the desired behaviour.

### Using ARRAY_AGG() to get the aggregation effect

First off, let’s do some aggregation. PostgreSQL’s `ARRAY_AGG()` is perfect for this job, because it

• Aggregates
• Yet kinda leaves the data untouched, unlike e.g. `SUM()`

In a way, it’s a collection like `Stream.collect()`, not a reduction.

If we use `ARRAY_AGG()` in a correlated subquery, we’ll still get the aggregation effect, but we can unnest the array again to a table, in order to operate on it. You can see this in the following example:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
select string_agg(i::text, ', ')
from unnest(array_agg(t.i)) as u(i)
)
from t;
```

This yields:

`2, 4, 3, 1, 6, 5`

Not a very useful thing to do, aggregate, unnest, and aggregate again, but it shows the power of nesting an aggregate function in a correlated subquery’s `FROM` clause. If your RDBMS doesn’t have arrays, maybe you can do the same thing using `JSON_ARRAYAGG` and `JSON_TABLE`, or `XMLAGG` and `XMLTABLE`.

Disclaimer: PostgreSQL often Does The Right Thing™. I think you’d be more hard pressed to juggle with SQL syntax as elegantly in most other RDBMS, so this approach isn’t portable. But as Lætitia Avrot so elegantly put it:

### Next step, generate row numbers

There are mainly 2 ways how we can generate row numbers in our example:

Adapting our previous example for some visualisation:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
select string_agg(row(i, o)::text, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t;
```

(Awesome, that row constructor!)

This produces:

`(2,1), (4,2), (3,3), (1,4), (6,5), (5,6)`

Doesn’t look fancy, but imagine we group by even numbers:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
i % 2,
(
select string_agg(row(i, o)::text, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t
group by i % 2;
```

The result is now:

It’s a bit weird, right? We `GROUP BY` in the outer query, and the entire correlated subquery is the aggregate function based on the fact that its `FROM` clause contains `ARRAY_AGG()`. This isn’t so much different from this query:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select 1 + sum(i) + 2
from t;
```

We’re used to building scalar expressions from aggregate functions all the time. This is nothing fancy. We can easily also just wrap the function in another subquery:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select (select 1 + sum(i) + 2)
from t;
```

From here, it’s not far fetched to extend the aggregate-function-in-scalar-subquery approach to the `FROM` clause, and then unnesting the aggregation again. This may not “click” immediately. The `GROUP BY` clause in SQL is a bit weird, syntactically.

Remark: Regrettably, PostgreSQL doesn’t allow using aggregate functions in the `FROM` clause on the same query level like in a correlated subquery. I was going to show a fancy `LATERAL` version, but this doesn’t work (yet).

### Now, recurse

The final bit is the recursion with the `r` table:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
--     ^^^^^^^^^ reduction
)
select i from r
order by o desc
limit 1
)
from t;
```

We simply recurse on the ordinality. The first subquery of `UNION ALL` produces the first row of our data, namely `(1, 1)`. The next iterations just always multiply the result of `r.i` by the value of `u.i` from the next row by ordinality. This is probably best shown visually:

Finally, we don’t care about SQL’s set-based way of working. I.e. we don’t care about the whole set of multiplications that are shown in the table above. We only care about the last row, ordered by the ordinality, which contains our result in r.i

Done!

### Using group by

Just as shown before, we can easily add a `GROUP BY` clause to the outer query. E.g. let’s multiply odd and even numbers separately:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
i % 2,
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
)
select i from r
order by o desc
limit 1
),
string_agg(i::text, ' * ')
from t
group by i % 2
```

I’ve added another aggregate function `STRING_AGG()` for good measure to get:

Wonderful, isn’t it? Now, I wasn’t able to just add an `OVER()` clause right there. That produced

SQL Error [42P20]: ERROR: window functions are not allowed in functions in FROM

Maybe that will work as well, in the near future? Or, I might come up with another hack to make it work, in case of which I’ll update this post.

### jOOQ support

Obviously, this will be supported in jOOQ soon: https://github.com/jOOQ/jOOQ/issues/11385. The syntax will be again much more bearable:

```ctx.select(T.I.mod(inline(2)), reduce(T.I, (i1, i2) -> i1.times(i2)))
.from(T.I)
.groupBy(T.I.mod(inline(2)))
.fetch();
```

Other emulations using actual `CREATE AGGREGATE FUNCTION` will be investigated as well, in the near future.

## Lesser Known jOOλ Features: Useful Collectors

jOOλ is our second most popular library. It implements a set of useful extensions to the JDK’s Stream API, which are useful especially when streams are sequential only, which according to our assumptions is how most people use streams in Java. Such extensions include:
```
// (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, ...)
Seq.of(1, 2, 3).cycle();

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// (1, 0, 2, 0, 3, 0, 4)
Seq.of(1, 2, 3, 4).intersperse(0);

// (4, 3, 2, 1)
Seq.of(1, 2, 3, 4).reverse();

```
… and many more.

### Collectors

But that’s not the only thing jOOλ offers. It also ships with a set of useful Collectors, which can be used both with JDK streams, as well as with jOOλ’s Seq type. Most of them are available from the `org.jooq.lambda.Agg` type, where `Agg` stands for aggregations. Just like the rest of jOOλ, these collectors are inspired by SQL, and you will find quite a few SQL aggregate functions represented in this class. Here are some of these collectors:

### Counting

While the JDK has `Collectors.counting()`, jOOλ also has a way to count distinct values, just like SQL:
```
// A simple wrapper for two values:
class A {
final String s;
final long l;
A(String s, long l) {
this.s = s;
this.l = l;
}

static A A(String s, long l) {
return new A(s, l);
}
}

@Test
public void testCount() {
assertEquals(7L, (long)
Stream.of(1, 2, 3, 3, 4, 4, 5)
.collect(Agg.count()));
assertEquals(5L, (long)
Stream.of(1, 2, 3, 3, 4, 4, 5)
.collect(Agg.countDistinct()));
assertEquals(5L, (long)
Stream.of(A("a", 1),
A("b", 2),
A("c", 3),
A("d", 3),
A("e", 4),
A("f", 4),
A("g", 5))
.collect(Agg.countDistinctBy(a -> a.l)));
assertEquals(7L, (long)
Stream.of(A("a", 1),
A("b", 2),
A("c", 3),
A("d", 3),
A("e", 4),
A("f", 4),
A("g", 5))
.collect(Agg.countDistinctBy(a -> a.s)));
}

```
These are pretty self explanatory, I think.

### Percentiles

Just recently, I’ve blogged about the usefulness of SQL’s percentile functions, and how to emulate them if they’re unavailable. Percentiles can also be nicely calculated on streams. Why not? As soon as a Stream’s contents implements `Comparable`, or if you supply your custom `Comparator`, percentiles are easy to calculate:
```
// Assuming a static import of Agg.percentile:
assertEquals(
Optional.empty(),
Stream.<Integer> of().collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2, 3, 4).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21, 22).collect(percentile(0.25)));

```
Notice that jOOλ implements SQL’s percentile_disc semantics. Also, there are 3 “special” percentiles that deserve their own names:
A variety of overloads allows for calculating:
• The percentile of the values contained in the stream
• The percentile of the values contained in the stream, if sorted by another value mapped by a function
• The percentile of the values mapped to another value by a function

### Mode

Speaking of statistics. What about the mode? I.e. the value that appears the most often in a stream? Easy, with `Agg.mode()`
```
assertEquals(
Optional.of(1),
Stream.of(1, 1, 1, 2, 3, 4).collect(Agg.mode()));
assertEquals(
Optional.of(1),
Stream.of(1, 1, 2, 2, 3, 4).collect(Agg.mode()));
assertEquals(
Optional.of(2),
Stream.of(1, 1, 2, 2, 2, 4).collect(Agg.mode()));

```

### Other useful collectors

Other collectors that can be useful occasionally are:

### Combine the aggregations

And one last important feature when working with jOOλ is the capability of combining aggregations, just like in SQL. Following the examples above, I can easily calculate several percentiles in one go:
```
// Unfortunately, Java's type inference might need
// a little help here
var percentiles =
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(
Tuple.collectors(
Agg.<Integer>percentile(0.0),
Agg.<Integer>percentile(0.25),
Agg.<Integer>percentile(0.5),
Agg.<Integer>percentile(0.75),
Agg.<Integer>percentile(1.0)
)
);

System.out.println(percentiles);

```
The result being:
```(Optional[1], Optional[2], Optional[3], Optional[4], Optional[10])
```

## Imperative Loop or Functional Stream Pipeline? Beware of the Performance Impact!

I like weird, yet concise language constructs and API usages
Yes. I am guilty. Evil? Don’t know. But guilty. I heavily use and abuse the `java.lang.Boolean` type to implement three valued logic in Java:
• Boolean.TRUE means true (duh)
• Boolean.FALSE means false
• null can mean anything like “unknown” or “uninitialised”, etc.
I know – a lot of enterprise developers will bikeshed and cargo cult the old saying:
Code is read more often than it is written
But as with everything, there is a tradeoff. For instance, in algorithm-heavy, micro optimised library code, it is usually more important to have code that really performs well, rather than code that apparently doesn’t need comments because the author has written it in such a clear and beautiful way. I don’t think it matters much in the case of the boolean type (where I’m just too lazy to encode every three valued situation in an enum). But here’s a more interesting example from that same twitter thread. The code is simple:
```
woot:
if (something) {
for (Object o : list)
if (something(o))
break woot;

throw new E();
}

```
Yes. You can break out of “labeled ifs”. Because in Java, any statement can be labeled, and if the statement is a compound statement (observe the curly braces following the if), then it may make sense to break out of it. Even if you’ve never seen that idiom, I think it’s quite immediately clear what it does. Ghasp! If Java were a bit more classic, it might have supported this syntax:
```
if (something) {
for (Object o : list)
if (something(o))
goto woot;

throw new E();
}
woot:

```
Nicolai suggested that the main reason I hadn’t written the following, equivalent, and arguably more elegant logic, is because jOOQ still supports Java 6:
```
if (something && list.stream().noneMatch(this::something))
throw new E();

```
It’s more concise! So, it’s better, right? Everything new is always better. A third option would have been the less concise solution that essentially just replaces `break` by `return`:
```
if (something && noneMatchSomething(list)
throw new E();

// And then:
private boolean noneMatchSomething(List<?> list) {
for (Object o : list)
if (something(o))
return false;
return true;
}

```
There’s an otherwise useless method that has been extracted. The main benefit is that people are not used to breaking out of labeled statements (other than loops, and even then it’s rare), so this is again about some subjective “readability”. I personally find this particular example less readable, because the extracted method is no longer local. I have to jump around in the class and interrupt my train of thoughts. But of course, YMMV with respect to the two imperative alternatives.

### Back to objectivity: Performance

When I tweet about Java these days, I’m mostly tweeting about my experience writing jOOQ. A library. A library that has been tuned so much over the past years, that the big client side bottleneck (apart from the obvious database call) is the internal `StringBuilder` that is used to generate dynamic SQL. And compared to most database queries, you will not even notice that. But sometimes you do. E.g. if you’re using an in-memory H2 database and run some rather trivial queries, then jOOQ’s overhead can become measurable again. Yes. There are some use-cases, which I do want to take seriously as well, where the difference between an imperative loop and a stream pipeline is measurable. In the above examples, let’s remove the throw statement and replace it by something simpler (because exceptions have their own significant overhead). I’ve created this JMH benchmark, which compares the 3 approaches:
• Imperative with break
• Imperative with return
• Stream
Here’s the benchmark
```
package org.jooq.test.benchmark;

import java.util.ArrayList;
import java.util.List;

import org.openjdk.jmh.annotations.*;

@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5, time = 3)
@Measurement(iterations = 7, time = 3)
public class ImperativeVsStream {

@State(Scope.Benchmark)
public static class BenchmarkState {

boolean something = true;

@Param({ "2", "8" })
int listSize;

List<Integer> list = new ArrayList<>();

boolean something() {
return something;
}

boolean something(Integer o) {
return o > 2;
}

@Setup(Level.Trial)
public void setup() throws Exception {
for (int i = 0; i < listSize; i++)
}

@TearDown(Level.Trial)
public void teardown() throws Exception {
list = null;
}
}

@Benchmark
public Object testImperativeWithBreak(BenchmarkState state) {
woot:
if (state.something()) {
for (Integer o : state.list)
if (state.something(o))
break woot;

return 1;
}

return 0;
}

@Benchmark
public Object testImperativeWithReturn(BenchmarkState state) {
if (state.something() && woot(state))
return 1;

return 0;
}

private boolean woot(BenchmarkState state) {
for (Integer o : state.list)
if (state.something(o))
return false;

return true;
}

@Benchmark
public Object testStreamNoneMatch(BenchmarkState state) {
if (state.something() && state.list.stream().noneMatch(state::something))
return 1;

return 0;
}

@Benchmark
public Object testStreamAnyMatch(BenchmarkState state) {
if (state.something() && !state.list.stream().anyMatch(state::something))
return 1;

return 0;
}

@Benchmark
public Object testStreamAllMatch(BenchmarkState state) {
if (state.something() && state.list.stream().allMatch(s -> !state.something(s)))
return 1;

return 0;
}
}

```
The results are pretty clear:
```Benchmark                                    (listSize)   Mode  Cnt         Score          Error  Units
ImperativeVsStream.testImperativeWithBreak            2  thrpt   14  86513288.062 ± 11950020.875  ops/s
ImperativeVsStream.testImperativeWithBreak            8  thrpt   14  74147172.906 ± 10089521.354  ops/s
ImperativeVsStream.testImperativeWithReturn           2  thrpt   14  97740974.281 ± 14593214.683  ops/s
ImperativeVsStream.testImperativeWithReturn           8  thrpt   14  81457864.875 ±  7376337.062  ops/s
ImperativeVsStream.testStreamAllMatch                 2  thrpt   14  14924513.929 ±  5446744.593  ops/s
ImperativeVsStream.testStreamAllMatch                 8  thrpt   14  12325486.891 ±  1365682.871  ops/s
ImperativeVsStream.testStreamAnyMatch                 2  thrpt   14  15729363.399 ±  2295020.470  ops/s
ImperativeVsStream.testStreamAnyMatch                 8  thrpt   14  13696297.091 ±   829121.255  ops/s
ImperativeVsStream.testStreamNoneMatch                2  thrpt   14  18991796.562 ±   147748.129  ops/s
ImperativeVsStream.testStreamNoneMatch                8  thrpt   14  15131005.381 ±   389830.419  ops/s
```
With this simple example, break or return don’t matter. At some point, adding additional methods might start getting in the way of inlining (because of stacks getting too deep), but not creating additional methods might be getting in the way of inlining as well (because of method bodies getting too large). I don’t want to bet on either approach here at this level, nor is jOOQ tuned that much. Like most similar libraries, the traversal of the jOOQ expression tree generates stack that are too deep to completely inline anyway. But the very obvious loser here is the Stream approach, which is roughly 6.5x slower in this benchmark than the imperative approaches. This isn’t surprising. The stream pipeline has to be set up every single time to represent something as trivial as the above imperative loop. I’ve already blogged about this in the past, where I compared replacing simple for loops by `Stream.forEach()`

### Meh, does it matter?

In your business logic? Probably not. Your business logic is I/O bound, mostly because of the database. Wasting a few CPU cycles on a client side loop is not the main issue. Even if it is, the waste probably happens because your loop shouldn’t even be at the client side in the first place, but moved into the database as well. I’m currently touring conferences with a call about that topic:
In your infrastructure logic? Maybe! If you’re writing a library, or if you’re using a library like jOOQ, then yes. Chances are that a lot of your logic is CPU bound. You should occasionally profile your application and spot such bottlenecks, both in your code and in third party libraries. E.g. in most of jOOQ’s internals, using a stream pipeline might be a very bad choice, because ultimately, jOOQ is something that might be invoked from within your loops, thus adding significant overhead to your application, if your queries are not heavy (e.g. again when run against an H2 in-memory database). So, given that you’re clearly “micro-losing” on the performance side by using the Stream API, you may need to evaluate the readability tradeoff more carefully. When business logic is complex, readability is very important compared to micro optimisations. With infrastructure logic, it is much less likely so, in my opinion. And I’m not alone:
Note: there’s that other cargo cult of premature optimisation going around. Yes, you shouldn’t worry about these details too early in your application implementation. But you should still know when to worry about them, and be aware of the tradeoffs. And while you’re still debating what name to give to that extracted method, I’ve written 5 new labeled if statements! ;-)

## A Nice API Design Gem: Strategy Pattern With Lambdas

With Java 8 lambdas being available to us as a programming tool, there is a “new” and elegant way of constructing objects. I put “new” in quotes, because it’s not new. It used to be called the strategy pattern, but as I’ve written on this blog before, many GoF patterns will no longer be implemented in their classic OO way, now that we have lambdas.

### A simple example from jOOQ

jOOQ knows a simple type called `Converter`. It’s a simple SPI, which allows users to implement custom data types and inject data type conversion into jOOQ’s type system. The interface looks like this:
```
public interface Converter<T, U> {
U from(T databaseObject);
T to(U userObject);
Class<T> fromType();
Class<U> toType();
}

```
Users will have to implement 4 methods:
• Conversion from a database (JDBC) type `T` to the user type `U`
• Conversion from the user type `U` to the database (JDBC) type `T`
• Two methods providing a `Class` reference, to work around generic type erasure
Now, an implementation that converts hex strings (database) to integers (user type):
```
public class HexConverter implements Converter<String, Integer> {

@Override
public Integer from(String hexString) {
return hexString == null
? null
: Integer.parseInt(hexString, 16);
}

@Override
public String to(Integer number) {
return number == null
? null
: Integer.toHexString(number);
}

@Override
public Class<String> fromType() {
return String.class;
}

@Override
public Class<Integer> toType() {
return Integer.class;
}
}

```
That wasn’t difficult to write, but it’s quite boring to write this much boilerplate:
• Why do we need to give this class a name?
• Why do we need to override methods?
• Why do we need to handle nulls ourselves?
Now, we could write some object oriented libraries, e.g. abstract base classes that take care at least of the `fromType()` and `toType()` methods, but much better: The API designer can provide a “constructor API”, which allows users to provide “strategies”, which is just a fancy name for “function”. One function (i.e. lambda) for each of the four methods. For example:
```
public interface Converter<T, U> {
...

static <T, U> Converter<T, U> of(
Class<T> fromType,
Class<U> toType,
Function<? super T, ? extends U> from,
Function<? super U, ? extends T> to
) {
return new Converter<T, U>() { ... boring code here ... }
}

static <T, U> Converter<T, U> ofNullable(
Class<T> fromType,
Class<U> toType,
Function<? super T, ? extends U> from,
Function<? super U, ? extends T> to
) {
return of(
fromType,
toType,

// Boring null handling code here
t -> t == null ? null : from.apply(t),
u -> u == null ? null : to.apply(u)
);
}
}

```
From now on, we can easily write converters in a functional way. For example, our `HexConverter` would become:
```
Converter<String, Integer> converter =
Converter.ofNullable(
String.class,
Integer.class,
s -> Integer.parseInt(s, 16),
Integer::toHexString
);

```
Wow! This is really nice, isn’t it? This is the pure essence of what it means to write a `Converter`. No more overriding, null handling, type juggling, just the bidirectional conversion logic.

### Other examples

A more famous example is the JDK 8 `Collector.of()` constructor, without which it would be much more tedious to implement a collector. For example, if we want to find the second largest element in a stream… easy!
```
for (int i : Stream.of(1, 8, 3, 5, 6, 2, 4, 7)
.collect(Collector.of(
() -> new int[] { Integer.MIN_VALUE, Integer.MIN_VALUE },
(a, t) -> {
if (a[0] < t) {
a[1] = a[0];
a[0] = t;
}
else if (a[1] < t)
a[1] = t;
},
(a1, a2) -> {
throw new UnsupportedOperationException(
"Say no to parallel streams");
}
)))
System.out.println(i);

```
Run this, and you get:
```8
7
```
Bonus exercise: Make the collector parallel capable by implementing the combiner correctly. In a sequential-only scenario, we don’t need it (until we do, of course…).

### Conclusion

The concrete examples are nice examples of API usage, but the key message is this: If you have an interface of the form:
```
interface MyInterface {
void myMethod1();
String myMethod2();
void myMethod3(String value);
String myMethod4(String value);
}

```
Then, just add a convenience constructor to the interface, accepting Java 8 functional interfaces like this:
```
// You write this boring stuff
interface MyInterface {
static MyInterface of(
Runnable function1,
Supplier<String> function2,
Consumer<String> function3,
Function<String, String> function4
) {
return new MyInterface() {
@Override
public void myMethod1() {
function1.run();
}

@Override
public String myMethod2() {
return function2.get();
}

@Override
public void myMethod3(String value) {
function3.accept(value);
}

@Override
public String myMethod4(String value) {
return function4.apply(value);
}
}
}
}

```
As an API designer, you write this boilerplate only once. And your users can then easily write things like these:
```
// Your users write this awesome stuff
MyInterface.of(
() -> { ... },
() -> "hello",
v -> { ... },
v -> "world"
);

```
Easy! And your users will love you forever for this.

## jOOQ Tuesdays: Mario Fusco Talks About Functional and Declarative Programming

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Mario Fusco, author of LambdaJ, working on Red Hat’s drools, a Java Champion and frequent speaker at Java conferences on all topics functional programming.

### Mario, a long time ago, I have already stumbled upon your name when looking up the author of Lambdaj – a library that went to the extreme to bring lambdas to Java 5 or earlier. How does it work? And what’s the most peculiar hack you implemented to make it work?

When I started developing Lambdaj in 2007 I thought to it just as a proof-of-concept to check how far I could push Java 5. I never expected that it could become something that somebody else other than myself may actually want to use. In reality, given the limited, or I should say non-existing, capabilities of Java 5 as a functional language, Lambdaj was entirely a big hack. Despite this, people started using and somewhat loving it, and this made me (and possibly somebody else) realize that Java developers, or at least part of them, were tired of the pure imperative paradigm imposed by the language and ready to experiment with something more functional.

The main feature of Lambdaj, and what made its DSL quite nice to use, was the possibility to reference the method of a class in a static and type safe way and pass it to another method. In this way you could for example sort a list of persons by their age doing something like:

```sort(persons, on(Person.class).getAge());
```

As anticipated what happened under the hood was a big hack: the on() method created a proxy of the Person class so you could safely call the getAge() method on it. The proxy didn’t do anything useful other than registering the method call. However it had to return something of the same type of the value returned by the actual method to avoid a ClassCastException. To this purpose it had a mechanism to generate a reasonably unique instance of that type, an int in my example. Before returning that value it also associated it, using a WeakHashMap, to the invoked method. In this way the sort() method was actually invoked with a list and the value generated by my proxy. It then retrieved from the map the Java method associated with that value and invoked it on all the items of the list performing the operation, a sorting in this case, that it was supposed to execute.

### That’s crazy :) I’m sure you’re happy that a lot of Lambdaj features are now deprecated. You’re now touring the world with your functional programming talks. What makes you so excited about this topic?

The whole Lambdaj project is now deprecated and abandoned. The new functional features introduced with Java 8 just made it obsolete. Nevertheless it not only had the merit to make developers become curious and interested about functional programming, but also to experiment with new patterns and ideas that in the end also influenced the Java 8 syntax. Take for instance how you can sort a Stream of persons by age using a method reference

```persons.sort(Person::getAge)
```

It looks evident how the method references have been at least inspired by the Lambdaj‘s on() method.

There is a number of things that I love of functional programming:

1. The readability: a snippet of code written in functional style looks like a story while too often the equivalent code in imperative style resembles a puzzle.
2. The declarative nature: in functional programming is enough to declare the result that you want to achieve rather than specifying the steps to obtain it. You only care about the what without getting lost in the details of the how.
3. The possibility of treating data and behaviors uniformly: functional programming allows you to pass to a method both data (the list of persons to be sorted) and computation (the function to be applied to each person in the list). This idea is fundamental for many algorithms like for example the map/reduce: since data and computation are the same thing and the second is typically orders of magnitude smaller you are free to send them to the machine holding the data instead of the opposite.
4. The higher level of abstraction: the possibility of encapsulating computations in functions and pass them around to other functions allows both a dramatic reduction of code duplication and the design of more generic and expressive API.
5. Immutability and referential transparency: using immutable values and having side-effects programs makes far easier to reason on your code, test it and ensure its correctness.
6. The parallelism friendliness: all the features listed above also enable the parallelization of your software in a simpler and more reliable way. It is not coincidence that functional programming started becoming more popular around 10 years ago that is also when multicore CPUs began to be available on commodity hardware.

### Our readers love SQL (or at least, they use it frequently). How does functional programming compare to SQL?

The most evident thing that FP and SQL have in common is their declarative paradigm. To some extent SQL, or at least the data selection part, can be seen as a functional language specialized to manipulate data in tabular format.

The data modification part is a totally different story though. The biggest part of SQL users normally change data in a destructive way, overwriting or even deleting the existing data. This is clearly in contrast with the immutability mantra of functional programming. However this is only how SQL is most commonly used, but nothing dictates that it couldn’t be also employed in a non-destructive append-only way. I wish to see SQL used more often in this way in future.

### In your day job, you’re working for Red Hat, on drools. Business rules sound enterprisey. How does that get along with your fondness of functional programming?

Under an user point of view a rule engine in general and drools in particular are the extreme form of declarative programming, second only to Prolog. For this reason developers who are only familiar with the imperative paradigm struggle to use it, because they also try to enforce it to work in an imperative way. Conversely programmers more used to think in functional (and then declarative) terms are more often able to use it correctly when they approach it for the first time.

For what regards me, my work as developer of both the core engine and the compiler of drools allows me to experiment every day in both fields of language design and algorithmic invention and optimization. To cut it short it’s a challenging job and there’s lot’s of fun in it: don’t tell this to my employer but I cannot stop being surprised that they allow me to play with this everyday and they also pay me for that.

### You’re also on the board of VoxxedDays Ticino, Zurich, and CERN (wow, how geeky is that? A large hadron collider Java conference!). Why is Voxxed such a big success for you?

I must admit that, before being involved in this, I didn’t imagine the amount of work that organizing a conference requires. However this effort is totally rewarded. In particular the great advantage of VoxxedDays is the fact of being local 1-day events made by developers for developers that practically anybody can afford.

I remember that the most common feedback I received after the first VoxxedDays Ticino that we did 2 years ago was some like: “This has been the very first conference I attended in my life and I didn’t imagine it could have been a so amazing experience both under a technical and even more a social point of view. Thanks a lot for that, I eagerly wait to attend even next year”. Can you imagine something more rewarding for a conference organizer?

The other important thing for me is giving the possibility to speakers that aren’t rock stars (yet) to talk in public and share their experience with a competent audience. I know that for at least some of them this is only the first step to let themselves and others discover their capabilities as public speakers and launch them toward bigger conferences like the Devoxx.

### Thank you very much Mario

If you want to learn more about Mario’s insights on functional programming, please do visit his interesting talks at Devoxx from the recent past:

## jOOQ Tuesdays: Daniel Dietrich Explains the Benefits of Object-Functional Programming

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Daniel Dietrich whose popular library vavr is picking up a lot of momentum among functional programming afictionados working with Java.

## Daniel, you created vavr – Object-Functional Programming in Java, a library that is becoming more and more popular among functional programmers. Why is vavrso popular?

Thank you Lukas for giving me the opportunity to share my thoughts.

I think that many users were disappointed about Java 8 in the whole, especially those who are already familiar with more advanced languages. The Java language architects did an awesome job. Java 8 brought groundbreaking new features like Lambdas, the new Stream API and CompletableFuture. But the new abstractions were only poorly integrated into the language from an API perspective.

There is already an increasing amount of write-ups about the disadvantages of Java 8, starting with the drawbacks of the Optional type. We read that we have to take care when using parallel Streams. These are self-made problems that keep us busy, stealing our expensive time. vavr provides us with alternatives.

There is no reason to reinvent the wheel. My vision is to bring as much as possible of the Scala goodness to Java. In fact Scala emerged from Java in the form of the Pizza language. Back in 2001 it had features like generics, function pointers (aka lambdas), case classes (aka value types) and pattern matching. In 2004 Java got generics, in 2014 came lambdas, and hopefully Java 10 will include value types. Scala left Java far behind. It used the last 15 year to evolve.

Object-functional programming is nothing new. It is the best of both worlds, object-oriented programming and functional programming. Scala is one of the better choices to do it on the JVM. Java’s Lambdas are an enabling feature. They allowed us to create a vavr API that is similar to Scala.

Java developers who get their hands on vavr often react in a way that I call the nice-effect: “Wow that’s nice, it feels like Scala”.

## You have published a guest post on the jOOQ blog about vavr more than one year ago. Since then, vavr has moved forward quite a bit and you’ve recently published the roadmap for version 3.0. What have you done since then and where are you going?

Yes, that is true, it has changed a lot since then. We released vavr 1.2.2 two weeks before the first jOOQ guest post went online. Beside enriched functions that release offered popular Scala features like Option for null-safety, Try for performing computations headache-free in the presence of exceptions and a fluent pattern matching DSL. Also notably we shipped two common persistent collections, an eagerly evaluated linked List and the lazy form of it, also called Stream.

Roughly one year later we released vavr 2.0.0. We hardened the existing features and most notably included Future and Promise for concurrent programming and a full-fledged, Scala-like persistent collection library. Beside that, we replaced the pattern matching DSL with a more powerful pattern matching API that allows us to recursively match arbitrary object trees.

I spent a significant amount of time and energy abstracting on the type level over the mentioned features, as far as this is possible in Java. For Java developers it is not important to call things monads, sum-types or products. For example we do not need to know group theory in order to calculate 1 + 1. My duty as library developer is to make it as simple as possible for users of vavr to reach their goals. The need to learn new APIs and DSLs should be reduced to the minimum. This is the main reason for aligning vavr to Scala.

Our efforts for the next release concentrate on adding more syntactic sugar and missing persistent collections beyond those of Scala. It will be sufficient to add one import to reach 90% of vavr’s API. There will be new persistent collections BitSet, several MultiMaps and a PriorityQueue. We are improving the performance of our collections, most notably our persistent Vector. It will be faster than Java’s Stream for some operations and have a smaller memory footprint than Java’s ArrayList for primitive elements.

Beyond library features we pay special attention on three things: backward compatibility, controlled growth and integration aspects. Web is important. Our Jackson module ensures that all vavr types can be sent over the wire as serialized JSON. The next release will include a GWT module, first tests already run vavr in the browser. However, the vavr core will stay thin. It will not depend on any other libraries than the JDK.

Towards the next major release 3.0.0 I’m starting to adjust the roadmap I sketched in a previous blog post. I’ve learned that it is most important to our users that they can rely on backward compatibility. Major releases should not appear often, following the 2.x line is a better strategy. We will start to deprecate a few APIs that will be removed in a future major release. Also I keep an eye on some interesting developments that will influence the next major release. For example a new major Scala release is in the works and there are new interesting Java features that will appear in Java 10.

Looking at the current issues I don’t have to be an oracle to foresee that the next minor release 2.1.0 will take some more time. I understand that users want to start using the new vavr features but we need the time and the flexibility to get things right. Therefore we target a first beta release of 2.1.0 in Q4 2016.

## In the meantime, there is a variety of functional(-ish) libraries for Java 8, like our own jOOλ, StreamEx, Cyclops, or the much older FunctionalJλvλ. How do all these libraries compare and how is yours different?

This question goes a little bit in the philosophical direction, maybe it is also political. These are my subjective thoughts, please treat them as such.

Humans have the ability to abstract over things. They express themselves in various ways, e.g. with painting and music. These areas split into different fields. For example in literature things are expressed in manifold ways like rhythmic prose and poetry. Furthermore different styles can be applied within these fields, like the iambic trimeter in poetry. The styles across different areas are often embossed by outer circumstances, bound to time, like an epoch.

In the area of mathematics there are also several fields, like algebra and mathematical analysis. Both have a notion of functions. Which field should I take when I want to express myself in a functional style?

Personally, I’m not able to afford the time to write non-trivial applications in each of the mentioned libraries. But I took a look at the source code and followed discussions. I see that nearly all libraries are embossed by the outer circumstance that lambdas finally made it to all curly-braces languages, especially to Java in our case. Library designers are keen to modernize their APIs in order to keep pace. But library designers are also interested in staying independent of 3rd party libraries for reasons like stability and progression.

The field of jOOQ is SQL in Java, the field of Cyclops is asynchronous systems. Both libraries are similar in the way that they adapted the new Java Lambda feature. I already mentioned that the new Java features are only poorly integrated into the language. This is the reason why we see a variety of new libraries that try to close this gap.

jOOQ needs jOOλ in order to stay independent. On the technical level StreamEx is similar to jOOλ in the way that both sit on top of Java’s Stream. They augment it with additional functionality that can be accessed using a fluent API. The biggest difference between them is that StreamEx supports parallel computations while jOOλ concentrates on sequential computations only. Looking at the SQL-ish method names it shows that jOOλ is tailored to be used with jOOQ.

Cyclops states to be the answer to the cambrian explosion of functional(-ish) libraries. It offers a facade that is backed by one of several integration modules. From the developer perspective I see this with skepticism. The one-size-fits-all approach did not work well for me in the past because it does not cover all features of the backing libraries. An abstraction layer adds another source of errors, which is unnecessary.

Many names of Cyclops look unfamiliar to me, maybe because of the huge amount of types. Looking at the API, the library seems to be a black hole, a cambrian implosion of reactive and functional features. John McClean did a great job abstracting over all the different libraries and providing a common API but I prefer to use a library directly.

FunctionalJλvλ is different. It existed long before the other libraries and has the noble goal of purely functional programming: If it does compile, it is correct. FunctionalJλvλ was originally driven by people well known from the Scala community, more specifically from the Scalaz community. Scalaz is highly influenced by Haskell, a purely functional language.

Haskell and Scala are much more expressive than Java. Porting the algebraic abstractions from Scalaz to Java turned out to be awkward. Java’s type system isn’t powerful enough, it does not allow us to reach that goal in a practical way. The committers seem to be disillusioned to me. Some state that Java is not the right language for functional programming.

vavr is a fresh take on porting Scala functionality to Java. At its core it is not as highly influenced by Scalaz and Haskell as FunctionalJλvλ is. However, for purely functional abstractions it offers an algebra module that depends on the core. The relation algebra/core can be compared to Scalaz/Scala.

vavr is similar to StreamEx in the way that it is not bound to a specific domain, in contrast to jOOλ and Cyclops. It is different from StreamEx in the sense that it does not build on top of Java’s Stream. I understand vavr as language addition that integrates well with existing Java features.

## You have never spoken at conferences, you let other people do that for you. What’s your secret? :)

In fact I never attended a conference at all. My secret is to delegate the real work to others.

Joking aside, I feel more comfortable spending my time on the vavr source code than preparing conferences and travelling. Currently I am working on vavr beside my job but I’m still looking for opportunities to do it full-time.

It is awesome to see other people jumping on the vavr train. We receive help from all over the world. Beside IntelliJ and YourKit we recently got TouK as new sponsor and produced vavr stickers that are handed out at conferences.

Because of the increasing popularity of vavr there is also an increasing amount of questions and pull requests. Beside the conception and development I concentrate on code-reviews, discussions and managing the project.

## Where do you see Java’s future with projects like Valhalla?

Java stands for stability and safety. New language features are moderately added, like salt to a soup. This is what we can expect from a future Java.

In his recent mission statement Brian Goetz gives us a great overview about the goals of Project Valhalla. From the developer point of view I really love to see that the Java language architects attach great importance to improve the expressiveness of Java. Value types for example will reduce a lot of redundant code and ceremony we are currently confronted with. It is also nice to see that value types will be immutable.

Another feature I’m really looking forward to is the extension of generics. It will allow us to remove several specializations that exist only for primitive types and void. Popular functional interfaces like Predicate, Consumer, Supplier and Runnable will be equivalent to Function. In vavr we currently provide additional API for performing side-effects. Having extended generics that API can be reduced to the general case, like it should have been from the beginning.

There are two more features I’m really interested in: local variable type inference, that will come to Java, and reified generics, that might come. Reified generics are needed when we want to get the type of a generic parameter at runtime. We already have type inference for lambdas. Extending it to local variables will increase conciseness and readability of method and lambda bodies while preserving type-safety. I think it is a good idea that we will still have to specify the return type of methods. It is a clear documentation of the API of an application.

I’m deeply impressed how Java and the JVM evolve over time without breaking backward compatibility. It is a safe platform we can rely on. The gap between Java and other, more modern languages is getting smaller but Java is still behind. Some popular features might never come and most probably outdated API will not get a complete refresh or a replacement. This is a field where libraries such as vavr can help.

## How Functional Programming will (Finally) do Away With the GoF Patterns

A recent article about various ways to implement structural pattern matching in Java has triggered my interest: http://blog.higher-order.com/blog/2009/08/21/structural-pattern-matching-in-java The article mentions a Scala example where a tree data structure can be traversed very easily and neatly using Scala’s `match` keyword, along with using algebraic data types (more specifically, a sum type):
```
def depth(t: Tree): Int = t match {
case Empty => 0
case Leaf(n) => 1
case Node(l, r) => 1 + max(depth(l), depth(r))
}

```
Even if you’re not used to the syntax, it is relatively easy to understand what it does:
• There’s a function `depth` that calculates the (maximum) depth of a tree structure
• It does so by checking if the input argument is empty, a leaf node, or any other node
• If it is any other node, then it adds 1 to the maximum of the remaining tree, recursively
The elegant thing here is that the Scala type system helps the author of the above code get this right from a formal point of view, by offering formal type checking. The closest we can do in Java as illustrated by the article is this
```
public static int depth(Tree t) {
if (t instanceof Empty)
return 0;
if (t instanceof Leaf)
return 1;
if (t instanceof Node)
return 1 + max(depth(((Node) t).left), depth(((Node) t).right));
throw new RuntimeException("Inexhaustive pattern match on Tree.");
}

```
But these instanceof checks do smell kind of fishy… For more details, read the full article here, highly recommended: http://blog.higher-order.com/blog/2009/08/21/structural-pattern-matching-in-java

## How does this compare to the GoF design patterns?

In our object-orientation-brainwashed Java ecosystem (which inherited the OO brainwash from C++), the above `instanceof` logic would most likely be refactored into an implementation using the visitor pattern from the GoF design patterns book. This refactoring would be done by The Team Architect™ himself, as they are supervising the object oriented quality of your software. The 7 lines of code using `instanceof` would quickly bloat up to roughly 200 lines of weird interfaces, abstract classes, and cryptic `accept()` and `visit()` methods. When in fact, the functional programming approach was so much leaner, even in its imperfect Java `instanceof` form! A lot of the GoF design patterns stem from a time when EVERYTHING needed to be an object. Object orientation was the new holy grail, and people even wanted to push objects down into databases. Object databases were invented (luckily, they’re all dead) and the SQL standard was enhanced with ORDBMS features (only really implemented in Oracle, PostgreSQL, and Informix, and maybe some other minor DBs), most of which – also luckily – were never widely adopted. Since Java 8, finally, we’re starting to recover from the damage that was made in early days of object orientation in the 90s, and we can move back to a more data-centric, functional, immutable programming model where data processing languages like SQL are appreciated rather than avoided, and Java will see more and more of these patterns, hopefully. If you’re not convinced by the above visitor pattern vs pattern matching example, do read this very interesting series of articles by Mario Fusco: You will see that with functional programming, many patterns lose their meaning as you’re just starting to pass around functions, making code very simple and easy to understand. As a wrap up, as Mario presented the content at Voxxed Days Ticino:
Happy functional programming!

## 10 Easy Steps to a Complete Understanding of SQL

Too many programmers think SQL is a bit of a beast. It is one of the few declarative languages out there, and as such, behaves in an entirely different way from imperative, object-oriented, or even functional languages (although, some say that SQL is also somewhat functional).

As a SQL trainer (do visit our training, it’s great!) I’m writing SQL every day and embracing SQL with our company’s Open Source library jOOQ.

I thus feel compelled to bring the beauty of SQL a bit closer to those of you still struggling with it. The following tutorial is destined for

• readers who have already worked with SQL but never completely understood it
• readers who know SQL well but have never really thought about its syntax
• readers who want to teach SQL to others

This tutorial will focus on SELECT statements only. Other DML statements will be covered in another tutorial.

Note: This tutorial was previously published exclusively on Tech.Pro (see a historic version here). Unfortunately, Tech.Pro went offline. With the permission of Tech.Pro, we’re re-publishing this content again on the jOOQ blog.

Here are…

# 10 Easy Steps to a Complete Understanding of SQL.

## 1. SQL is declarative

Get this into your head first. Declarative. The only paradigm where you “just” declare the nature of the results that you would like to get. Not how your computer shall compute those results. Isn’t that wonderful?

``````SELECT first_name, last_name
FROM employees
WHERE salary > 100000
``````

Easy to understand. You don’t care where employee records physically come from. You just want those that have a decent salary.

### What do we learn from this?

So if this is so simple, what’s the problem? The problem is that most of us intuitively think in terms of imperative programming. As in: “machine, do this, and then do that, but before, run a check and fail if this-and-that”. This includes storing temporary results in variables, writing loops, iterating, calling functions, etc. etc.

Forget about all that. Think about how to declare things. Not about how to tell the machine to compute things.

## 2. SQL syntax is not “well-ordered”

A common source of confusion is the simple fact that SQL syntax elements are not ordered in the way they are executed. The lexical ordering is:

• SELECT [ DISTINCT ]
• FROM
• WHERE
• GROUP BY
• HAVING
• UNION
• ORDER BY

For simplicity, not all SQL clauses are listed. This lexical ordering differs fundamentally from the logical order (which may again differ from the order of execution, depending on the optimiser choices):

• FROM
• WHERE
• GROUP BY
• HAVING
• SELECT
• DISTINCT
• UNION
• ORDER BY

There are three things to note:

1. FROM is the first clause, not SELECT. The first thing that happens is loading data from the disk into memory, in order to operate on such data.
2. SELECT is executed after most other clauses. Most importantly, after FROM and GROUP BY. This is important to understand when you think you can reference stuff that you declare in the SELECT clause from the WHERE clause. The following is not possible:

``````SELECT A.x + A.y AS z
FROM A
WHERE z = 10 -- z is not available here!
``````

If you wanted to reuse `z`, you have two options. Either repeat the expression:

``````SELECT A.x + A.y AS z
FROM A
WHERE (A.x + A.y) = 10
``````

… or you resort to derived tables, common table expressions, or views to avoid code repetition. See examples further down.

3. UNION is placed before ORDER BY in both lexical and logical ordering. Many people think that each UNION subselect can be ordered, but according to the SQL standard and most SQL dialects, that is not true. While some dialects allow for ordering subqueries or derived tables, there is no guarantee that such ordering will be retained after a UNION operation

Note, not all databases implement things the same way. Rule number 2, for instance, does not apply exactly in the above way to MySQL, PostgreSQL, and SQLite.

### What do we learn from this?

Always remember both the lexical order and the logical order of SQL clauses to avoid very common mistakes. If you understand that distinction, it will become very obvious why some things work and others don’t.

Of course, it would have been nice if the language was designed in a way that the lexical order actually reflected the logical order, as it is implemented in Microsoft’s LINQ.

## 3. SQL is about table references

Because of the difference between lexical ordering and logical ordering, most beginners are probably tricked into thinking that column values are the first-class citizens in SQL. They are not. The most important things are table references.

The SQL standard defines the FROM clause as such:

``````<from clause> ::=
FROM <table reference>
[ { <comma> <table reference> }... ]
``````

The “output” of the FROM clause is a combined table reference of the combined degree of all table references. Let’s digest this, slowly.

``````FROM a, b
``````

The above produces a combined table reference of the degree of `a` + the degree of `b`. If `a` has 3 columns and `b` has 5 columns, then the “output table” will have 8 (`3 + 5`) columns.

The records contained in this combined table reference are those of the cross product / cartesian product of `a x b`. In other words, each record of `a` is paired with each record of `b`. If `a` has 3 records and `b` has 5 records, then the above combined table reference will produce 15 records (`3 x 5`).

This “output” is “fed” / “piped” into the GROUP BY clause (after filtering in the WHERE clause), where it is transformed into a new “output”. We’ll deal with that later on.

If we’re looking at these things from a relational algebra / set theory perspective, a SQL table is a relation or a set of tuples. And each SQL clause will transform one or several relations in order to produce new relations.

### What do we learn from this?

Always think in terms of table references to understand how data is “pipelined” through your SQL clauses.

## 4. SQL table references can be rather powerful

A table reference is something rather powerful. A simple example of their power is the JOIN keyword, which is actually not part of the SELECT statement, but part of a “special” table reference. The joined table, as defined in the SQL standard (simplified):

``````<table reference> ::=
<table name>
| <derived table>
| <joined table>
``````

If we take again the example from before:

``````FROM a, b
``````

`a` can be a joined table as such:

``````a1 JOIN a2 ON a1.id = a2.id
``````

Expanding this into the previous expression, we’d get:

``````FROM a1 JOIN a2 ON a1.id = a2.id, b
``````

While it is discouraged to combine the comma-separated list of table references syntax with the joined table syntax, you can most certainly do this. The resulting, combined table reference will now have a degree of `a1+a2+b`.

Derived tables are even more powerful than joined tables. We’ll get to that.

### What do we learn from this?

Always, always think in terms of table references. Not only is this important to understand how data is “pipelined” through your SQL clauses (see previous section), it will also help you understand how complex table references are constructed.

And, importantly, understand that JOIN is a keyword for constructing joined tables. Not a part of the SELECT statement. Some databases allow for using JOIN in INSERT, UPDATE, DELETE

## 5. SQL JOIN tables should be used rather than comma-separated tables

Before, we’ve seen this clause:

``````FROM a, b
``````

Advanced SQL developers will probably tell you that it is discouraged to use the comma-separated list at all, and always fully express your JOINs. This will help you improve readability of your SQL statement, and thus prevent mistakes.

One very common mistake is to forget a JOIN predicate somewhere. Think about the following:

``````FROM a, b, c, d, e, f, g, h
WHERE a.a1 = b.bx
AND a.a2 = c.c1
AND d.d1 = b.bc
-- etc...
``````

The join table syntax is both

• Safer, as you can place join predicates close to the joined tables, thus preventing mistakes.
• More expressive, as you can distinguish between OUTER JOIN, INNER JOIN, etc.

### What do we learn from this?

Always use JOIN. Never use comma-separated table references in your FROM clauses.

## 6. SQL’s different JOIN operations

JOIN operations essentially come with five flavours:

• EQUI JOIN
• SEMI JOIN
• ANTI JOIN
• CROSS JOIN
• DIVISION

These terms are commonly used in relational algebra. SQL uses different terms for the above concepts, if they exist at all. Let’s have a closer look:

### EQUI JOIN

This is the most common JOIN operation. It has two sub-flavours:

• INNER JOIN (or just JOIN)
• OUTER JOIN (further sub-flavoured as LEFT, RIGHT, FULL OUTER JOIN)

The difference is best explained by example:

``````-- This table reference contains authors and their books.
-- There is one record for each book and its author.
-- authors without books are NOT included
author JOIN book ON author.id = book.author_id

-- This table reference contains authors and their books
-- There is one record for each book and its author.
-- ... OR there is an "empty" record for authors without books
-- ("empty" meaning that all book columns are NULL)
author LEFT OUTER JOIN book ON author.id = book.author_id
``````

### SEMI JOIN

w

This relational concept can be expressed in two ways in SQL: Using an IN predicate, or using an EXISTS predicate. “Semi” means “half” in latin. This type of join is used to join only “half” of a table reference. What does that mean? Consider again the above joining of author and book. Let’s imagine that we don’t want author/book combinations, but just those authors who actually also have books. Then we can write:

``````-- Using IN
FROM author
WHERE author.id IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)
``````

While there is no general rule as to whether you should prefer IN or EXISTS, these things can be said:

• IN predicates tend to be more readable than EXISTS predicates
• EXISTS predicates tend to be more expressive than IN predicates (i.e. it is easier to express very complex SEMI JOIN)
• There is no formal difference in performance. There may, however, be a huge performance difference on some databases.

Because INNER JOIN also produces only those authors that actually have books, many beginners may think that they can then remove duplicates using DISTINCT. They think they can express a SEMI JOIN like this:

``````-- Find only those authors who also have books
SELECT DISTINCT first_name, last_name
FROM author
JOIN book ON author.id = book.author_id
``````

This is very bad practice for two reasons:

• It is very slow, as the database has to load a lot of data into memory, just to remove duplicates again.
• It is not entirely correct, even if it produces the correct result in this simple example. But as soon as you JOIN more table references, you will have a very hard time correctly removing duplicates from your results.

### ANTI JOIN

This relational concept is just the opposite of a SEMI JOIN. You can produce it simply by adding a NOT keyword to the IN or EXISTS predicates. An example, where we’ll select those authors who do not have any books:

``````-- Using IN
FROM author
WHERE author.id NOT IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE NOT EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)
``````

The same rules with respect to performance, readability, expressivity apply. However, there is a small caveat with respect to NULLs when using NOT IN, which is a bit out of scope for this tutorial.

### CROSS JOIN

This produces a cross product of the two joined table references, combining every record of the first table reference with every record of the second table reference. We have seen before, that this can be achieved with comma-separated table references in the FROM clause. In the rare cases where this is really desired, you can also write a CROSS JOIN explicitly, in most SQL dialects:

``````-- Combine every author with every book
author CROSS JOIN book
``````

### DIVISION

The relational division is really a beast of its own breed. In short, if JOIN is multiplication, division is the inverse of JOIN. Relational divisions are very tough to express in SQL. As this is a beginners’ tutorial, explaining it is out of scope. For the brave among you, read on about it here, here, and here.

### What do we learn from this?

A lot. Again, let’s hammer this into our heads. SQL is about table references. Joined tables are quite sophisticated table references. But there is a difference in relational-speak and SQL-speak. Not all relational join operations are also formal SQL join operations. With a bit of practice and knowledge about relational theory, you will always be able to choose the right type of relational JOIN and be able to translate it to the correct SQL.

## 7. SQL’s derived tables are like table variables

Before, we’ve learned that SQL is a declarative language, and as such, variables do not have a place (they do in some SQL dialects, though). But you can write something like variables. And those beasts are called derived tables.

A derived table is nothing but a subquery wrapped in parentheses.

``````-- A derived table
FROM (SELECT * FROM author)
``````

Note that some SQL dialects require derived tables to have a correlation name (also known as alias).

``````-- A derived table with an alias
FROM (SELECT * FROM author) a
``````

Derived tables are awesome when you want to circumvent the problems caused by the logical ordering of SQL clauses. For instance, if you want to reuse a column expression in both the SELECT and the WHERE clause, just write (Oracle dialect):

``````-- Get authors' first and last names, and their age in days
SELECT first_name, last_name, age
FROM (
SELECT first_name, last_name, current_date - date_of_birth age
FROM author
)
-- If the age is greater than 10000 days
WHERE age > 10000
``````

Note that some databases, and the SQL:1999 standard have taken derived tables to the next level, introducing common table expressions. This will allow you to reuse the same derived table several times within a single SQL SELECT statement. The above query would then translate to the (almost) equivalent:

``````WITH a AS (
SELECT first_name, last_name, current_date - date_of_birth age
FROM author
)
SELECT *
FROM a
WHERE age > 10000
``````

Obviously, you could also externalise “a” into a standalone view for even broader reuse of common SQL subselects. Read more about views here.

### What do we learn from this?

Again, again, again. SQL is mostly about table references, not columns. Make use of them. Don’t be afraid of writing derived tables or other complex table references.

## 8. SQL GROUP BY transforms previous table references

Let’s reconsider our previous FROM clause:

``````FROM a, b
``````

And now, let’s apply a GROUP BY clause to the above combined table reference

``````GROUP BY A.x, A.y, B.z
``````

The above produces a new table reference with only three remaining columns (!). Let’s digest this again. If you apply GROUP BY, then you reduce the number of available columns in all subsequent logical clauses – including SELECT. This is the syntactical reason why you can only reference columns from the GROUP BY clause in the SELECT clause.

• Note that other columns may still be available as arguments of aggregate functions:

``````SELECT A.x, A.y, SUM(A.z)
FROM A
GROUP BY A.x, A.y
``````
• Note that MySQL, unfortunately, doesn’t adhere to this standard, causing nothing but confusion. Don’t fall for MySQL’s tricks. GROUP BY transforms table references. You can thus only reference columns also referenced in the GROUP BY clause.

### What do we learn from this?

GROUP BY, again, operates on table references, transforming them into a new form.

## 9. SQL SELECT is called projection in relational algebra

I personally like the term “projection”, as it is used in relational algebra. Once you’ve generated your table reference, filtered it, transformed it, you can step to projecting it to another form. The SELECT clause is like a projector. A table function making use of a row value expression to transform each record from the previously constructed table reference into the final outcome.

Within the SELECT clause, you can finally operate on columns, creating complex column expressions as parts of the record / row.

There are a lot of special rules with respect to the nature of available expressions, functions, etc. Most importantly, you should remember these:

1. You can only use column references that can be produced from the “output” table reference
2. If you have a GROUP BY clause, you may only reference columns from that clause, or aggregate functions.
3. You can use window functions instead of aggregate functions, when you don’t have a GROUP BY clause.
4. If you don’t have a GROUP BY clause, you must not combine aggregate functions with non-aggregate functions.
5. There are some rules with respect to wrapping regular functions in aggregate functions and vice-versa.
6. There are …

Well, there are lots of complex rules. They could fill yet another tutorial. For instance, the reason why you cannot combine aggregate functions with non-aggregate functions in the projection of a SELECT statement without GROUP BY clause (rule number 4) is this:

1. It doesn’t make sense. Intuitively.
2. If intuition doesn’t help (it hardly does, with a SQL beginner), then syntax rules do. SQL:1999 introduced GROUPING SETS, and SQL:2003 introduced empty grouping sets: GROUP BY (). Whenever an aggregate function is present, and there is no explicit GROUP BY clause, an implicit, empty GROUPING SET is applied (rule number 2). Hence, the original rules about logical ordering aren’t exactly true anymore, and the projection (SELECT) influences the outcome of a logically preceding, yet lexically succeeding clause (GROUP BY).

Confused? Yes. Me too. Let’s get back to simpler things.

### What do we learn from this?

The SELECT clause may be one of the most complex clauses in SQL, even if it appears so simple. All other clauses just “pipe” table references from one to another. The SELECT clause messes up the beauty of these table references, by completely transforming them, applying some rules to them retroactively.

In order to understand SQL, it is important to understand everything else first, before trying to tackle SELECT. Even if SELECT is the first clause in lexical ordering, it should be the last.

## 10. SQL DISTINCT, UNION, ORDER BY, and OFFSET are simple again

After the complicated SELECT, we can get back to simple things again:

• Set operations (DISTINCT and UNION)
• Ordering operations (ORDER BY, OFFSET .. FETCH)

### Set operations

Set operations operate on “sets”, which are actually nothing other than… tables. Well, almost. Conceptually, they’re easy to understand.

• DISTINCT removes duplicates after the projection.
• UNION concatenates two subselects and removes duplicates
• UNION ALL concatenates two subselects retaining duplicates
• EXCEPT removes records from the first subselect that are also contained in the second subselect (and then removes duplicates)
• INTERSECT retains only records contained in both subselects (and then removes duplicates)

All of this removing duplicates is usually non-sense. Most often, you should just use UNION ALL, when you want to concatenate subselects.

### Ordering operations

Ordering is not a relational feature. It is a SQL-only feature. It is applied at the very end of both lexical ordering and logical ordering of your SQL statement. Using ORDER BY and OFFSET .. FETCH is the only way to guarantee that records can be accessed by index in a reliable way. All other ordering is always arbitrary and random, even if it may appear to be reproducible.

OFFSET .. FETCH is only one syntax variant. Other variants include MySQL’s and PostgreSQL’s LIMIT .. OFFSET, or SQL Server’s and Sybase’s TOP .. START AT. A good overview of various ways to implement OFFSET .. FETCH can be seen here.

# Let’s get to work

As with every language, SQL takes a lot of practice to master. The above 10 simple steps will help you make more sense of the every day SQL that you’re writing. On the other hand, it is also good to learn from common mistakes. The following two articles list lots of common mistakes Java (and other) developers make when writing SQL:

## Beware of Functional Programming in Java!

This isn’t going to be a rant about functional programming, which is awesome. This is a warning about some practices that you are very likely going to apply to your code, which are terribly wrong!. Higher order functions are essential to functional programming, and thus, talking about them will help you be the center of attention at parties. If you’re writing JavaScript, you’re doing it all the time. For instance:
```
setTimeout(function() {
}, 10000);

```
The above `setTimeout()` function is a higher-order function. It is a function that takes an anonymous function as an argument. After 10 seconds, it will call the function passed as an argument. We can write another easy higher-order function that provides the above function as a result:
```
var message = function(text) {
return function() {
}
};

setTimeout(message('10 Seconds passed'), 10000);

```
If you execute the above, `message()` will be executed, returning an anonymous function, which alerts the argument text you have passed to `message()` In functional programming, the above is common practice. A function returned from a higher-order function will capture the outer scope and is able to act on this scope when called.

## Why is this practice treacherous in Java?

For the same reasons. A “function” (lambda) returned from a higher-order “function” (method) will capture the outer scope and is able to act on this scope when called. The most trivial example is given here:
```
class Test {
public static void main(String[] args) {
Runnable runnable = runnable();
runnable.run(); // Breakpoint here
}

static Runnable runnable() {
return () -> {
System.out.println("Hello");
};
}
}

```
In the above logic, if you put a breakpoint right where the `runnable.run()` call is made, you can see the harmless lambda instance on the stack. A simple generated class, backing the functional interface implementation: Now let’s translate this example to your average Enterprise™ application (notice the annotations), which we’ve greatly simplified to fit this blog post:
```
class Test {
public static void main(String[] args) {
Runnable runnable = new EnterpriseBean()
.runnable();
runnable.run(); // Breakpoint here
}
}

@ImportantDeclaration
@NoMoreXML({
@CoolNewValidationStuff("Annotations"),
@CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
Object[] enterpriseStateObject =
new Object[100_000_000];

Runnable runnable() {
return () -> {
System.out.println("Hello");
};
}
}

```
The breakpoint is still at the same spot. What do we see on the stack? Still a harmless little lambda instance: Fine. Of course. Let’s add some additional logging, just for debugging
```
class Test {
public static void main(String[] args) {
Runnable runnable = new EnterpriseBean()
.runnable();
runnable.run(); // Breakpoint here
}
}

@ImportantDeclaration
@NoMoreXML({
@CoolNewValidationStuff("Annotations"),
@CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
Object[] enterpriseStateObject =
new Object[100_000_000];

Runnable runnable() {
return () -> {
// Some harmless debugging here
System.out.println("Hello from: " + this);
};
}
}

```
Ooops! Suddenly, the “harmless” little `this` reference forced the Java compiler to enclose the enclosing instance of the `EnterpriseBean™` in the returned `Runnable` class: And with it that heavy `enterpriseStateObject` came along, which can now no longer be garbage collected, until the call site releases the harmless little `Runnable`
Sure. Just be careful, know what you’re doing, and don’t reference “this” from a lambda
… you say? How about a more subtle version?
```
class EnterpriseBean {
Object[] enterpriseStateObject =
new Object[100_000_000];

Runnable runnable() {
return () -> log(); // implicit this.log()
}

void log() {
// Some harmless debugging here
System.out.println("Hello");
}
}

```

## OK, this is nothing new now, is it?

Indeed, it isn’t. Java 8 doesn’t have first-class functions, and that’s OK. The idea of backing lambda expressions by nominal SAM types is quite cunning, as it allowed to upgrade and lambda-y-fy all existing libraries in the Java ecosystem without changing them. Also, with an anonymous class, this whole story would not have been surprising. The following coding style has leaked internal state via anonymous classes since good old Swing 1.0 style `ActionListener` et al.
```
class Test {
public static void main(String[] args) {
Runnable runnable = new EnterpriseBean()
.runnable();
runnable.run();
}
}

@ImportantDeclaration
@NoMoreXML({
@CoolNewValidationStuff("Annotations"),
@CoolNewValidationStuff("Rock")
})
class EnterpriseBean {
Object[] enterpriseStateObject =
new Object[100_000_000];

Runnable runnable() {
return new Runnable() {
@Override
public void run() {
System.out.println("Hello from " + EnterpriseBean.this);
}
};
}
}

```
What’s new? The lambda style will encourage using higher-order functions in Java, all over the place. Which is generally good. But only when the higher-order function is a static method, whose resulting types will not enclose any state. With the above examples, however, we can see that we’re going to be debugging through a couple of memory leaks and problems in the near future, when we start embracing Java 8’s functional style programming. So, be careful, and follow this rule:
(“Pure”) Higher order functions MUST be static methods in Java!

Enclosing instances have caused issues before. Read about how the dreaded double curly braces anti pattern has caused pain and suffering among Java developers for the last two decades.

## The Danger of Subtype Polymorphism Applied to Tuples

Java 8 has lambdas and streams, but no tuples, which is a shame. This is why we have implemented tuples in jOOλ – Java 8’s missing parts. Tuples are really boring value type containers. Essentially, they’re just an enumeration of types like these:
```
public class Tuple2<T1, T2> {
public final T1 v1;
public final T2 v2;

public Tuple2(T1 v1, T2 v2) {
this.v1 = v1;
this.v2 = v2;
}

// [...]
}

public class Tuple3<T1, T2, T3> {
public final T1 v1;
public final T2 v2;
public final T3 v3;

public Tuple3(T1 v1, T2 v2, T3 v3) {
this.v1 = v1;
this.v2 = v2;
this.v3 = v3;
}

// [...]
}

```
Writing tuple classes is a very boring task, and it’s best done using a source code generator.

### Tuples in other languages and APIs

jOOλ‘s current version features tuples of degrees 0 – 16. C# and other .NET languages have tuple types between 1 – 8. There’s a special library just for tuples called Javatuples with tuples between degrees 1 and 10, and the authors went the extra mile and gave the tuples individual English names:
```
Unit<A> // (1 element)
Pair<A,B> // (2 elements)
Triplet<A,B,C> // (3 elements)
Quartet<A,B,C,D> // (4 elements)
Quintet<A,B,C,D,E> // (5 elements)
Sextet<A,B,C,D,E,F> // (6 elements)
Septet<A,B,C,D,E,F,G> // (7 elements)
Octet<A,B,C,D,E,F,G,H> // (8 elements)

```
Why?
because Ennead really rings that sweet bell when I see it
Last, but not least, jOOQ also has a built-in tuple-like type, the `org.jooq.Record`, which serves as a base type for nice subtypes like `Record7<T1, T2, T3, T4, T5, T6, T7>`. jOOQ follows Scala and defines records up to a degree of 22.

## Watch out when defining tuple type hierarchies

As we have seen in the previous example, `Tuple3` has much code in common with `Tuple2`. As we’re all massively brain-damaged by decades of object orientation and polymorphic design anti-patters, we might think that it would be a good idea to let `Tuple3<T1, T2, T3>` extend `Tuple2<T1, T2>`, as `Tuple3` just adds one more attribute to the right of `Tuple2`, right? So…
```
public class Tuple3<T1, T2, T3> extends Tuple2<T1, T2> {
public final T3 v3;

public Tuple3(T1 v1, T2 v2, T3 v3) {
super(v1, v2);
this.v3 = v3;
}

// [...]
}

```
The truth is: That’s about the worst thing you could do, for several reasons. First off, yes. Both `Tuple2` and `Tuple3` are tuples, so they do have some common features. It’s not a bad idea to group those features in a common super type, such as:
```
public class Tuple2<T1, T2> implements Tuple {
// [...]
}

```
But the degree is not one of those things. Here’s why:

### Permutations

Think about all the possible tuples that you can form. If you let tuples extend each other, then a `Tuple5` would also be assignment-compatible with a `Tuple2`, for instance. The following would compile perfectly:
```
Tuple2<String, Integer> t2 = tuple("A", 1, 2, 3, "B");

```
When letting `Tuple3` extend `Tuple2`, it may have seemed like a good default choice to just drop the right-most attribute from the tuple in the extension chain. But in the above example, why don’t I want to re-assign `(v2, v4)` such that the result is `(1, 3)`, or maybe `(v1, v3)`, such that the result is `("A", 2)`? There are a tremendous amount of permutations of possible attributes that could be of interest when “reducing” a higher degree tuple to a lower degree one. No way a default of dropping the right-most attribute will be sufficiently general for all use-cases

### Type systems

Much worse than the above, there would be drastic implications for the type system, if `Tuple3` extended `Tuple2`. Check out the jOOQ API, for instance. In jOOQ, you can safely assume the following:
```
// Compiles:
TABLE1.COL1.in(select(TABLE2.COL1).from(TABLE2))

// Must not compile:
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

```
The first `IN` predicate is correct. The left hand side of the predicate has a single column (as opposed to being a row value expression). This means that the right hand side of the predicate must also operate on single-column expressions, e.g. a `SELECT` subquery that selects a single column (of the same type). The second example selects too many columns, and the jOOQ API will tell the Java compiler that this is wrong. This is guaranteed by jOOQ via the `Field.in(Select)` method, whose signature reads:
```
public interface Field<T> {
...
Condition in(Select<? extends Record1<T>> select);
...
}

```
So, you can provide a `SELECT` statement that produces any subtype of the `Record1<T>` type. Luckily, `Record2` does not extend `Record1` If now `Record2` extended `Record1`, which might have seemed like a good idea at first, the second query would suddenly compile:
```
// This would now compile
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

```
… even if it forms an invalid SQL statement. It would compile because it would generate a `Select<Record2<Type1, Type2>>` type, which would be a subtype of the expected `Select<Record1<Type1>>` from the `Field.in(Select)` method.

## Conclusion

`Tuple2` and `Tuple5` types are fundamentally incompatible types. In strong type systems, you mustn’t be lured into thinking that similar types, or related types should also be compatible types. Type hierarchies are something very object-oriented, and by object-oriented, I mean the flawed and over-engineered notion of object orientation that we’re still suffering from since the 90s. Even in “the Enterprise”, most people have learned to favour Composition over Inheritance. Composition in the case of tuples means that you can well transform a `Tuple5` to a `Tuple2`. But you cannot assign it. In jOOλ, such a transformation can be done very easily as follows:
```
// Produces (1, 3)
Tuple2<String, Integer> t2_4 =
tuple("A", 1, 2, 3, "B")
.map((v1, v2, v3, v4, v5) -> tuple(v2, v4));

// Produces ("A", 2)
Tuple2<String, Integer> t1_3 =
tuple("A", 1, 2, 3, "B")
.map((v1, v2, v3, v4, v5) -> tuple(v1, v3));

```
The idea is that you operate on immutable values, and that you can easily extract parts of those values and map / recombine them to new values.