## Implementing a generic REDUCE aggregate function with SQL

So, @rotnroll666 nerd sniped me again. Apparently, the Neo4j Cypher query language supports arbitrary reductions, just like any functional collection API, oh say, the JDK Stream API:

```Stream.of(2, 4, 3, 1, 6, 5)
.reduce((i, j) -> i * j)
.ifPresent(System.out::println); // Prints 720
```

SQL doesn’t have this, yet it would be very useful to be able to occasionally do that. An arbitrary reduction can be implemented “easily” in SQL. Let’s look at the above multiplication reduction. In PostgreSQL, you’d write it like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
--     ^^^^^^^^^ reduction
)
select i from r
order by o desc
limit 1
)
from t;
```

Woah. That’s a bit of a syntactic beast. Let’s decompose it.

### The aggregate function

First off, if we were summing the values, we’d use the built-in `SUM` function, like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select sum(i)
from t;
```

That would produce 21. If you’re willing to lose precision, you could emulate `PRODUCT()` using logarithms. But we wrote `REDUCE()`, a hypothetical one, like this:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select reduce(
t1.i * t2.i referencing accumulated as t1, accumulating as t2
)
from t;
```

This is SQL, so the lambda expression would obviously use a ton of keywords, completely novel and unique to this particular function, and you’d need jOOQ to make it composable 😁. Essentially, we’d have some sort of reduction expression based on two pseudo tables:

• The accumulated table containing the result
• The accumulating table (or rather row)

A reduction is a generic aggregate function that operates on groups. So, we will have to re-use some SQL aggregate function mechanism to achieve the desired behaviour.

### Using ARRAY_AGG() to get the aggregation effect

First off, let’s do some aggregation. PostgreSQL’s `ARRAY_AGG()` is perfect for this job, because it

• Aggregates
• Yet kinda leaves the data untouched, unlike e.g. `SUM()`

In a way, it’s a collection like `Stream.collect()`, not a reduction.

If we use `ARRAY_AGG()` in a correlated subquery, we’ll still get the aggregation effect, but we can unnest the array again to a table, in order to operate on it. You can see this in the following example:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
select string_agg(i::text, ', ')
from unnest(array_agg(t.i)) as u(i)
)
from t;
```

This yields:

`2, 4, 3, 1, 6, 5`

Not a very useful thing to do, aggregate, unnest, and aggregate again, but it shows the power of nesting an aggregate function in a correlated subquery’s `FROM` clause. If your RDBMS doesn’t have arrays, maybe you can do the same thing using `JSON_ARRAYAGG` and `JSON_TABLE`, or `XMLAGG` and `XMLTABLE`.

Disclaimer: PostgreSQL often Does The Right Thing™. I think you’d be more hard pressed to juggle with SQL syntax as elegantly in most other RDBMS, so this approach isn’t portable. But as Lætitia Avrot so elegantly put it:

### Next step, generate row numbers

There are mainly 2 ways how we can generate row numbers in our example:

Adapting our previous example for some visualisation:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
select string_agg(row(i, o)::text, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t;
```

(Awesome, that row constructor!)

This produces:

`(2,1), (4,2), (3,3), (1,4), (6,5), (5,6)`

Doesn’t look fancy, but imagine we group by even numbers:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
i % 2,
(
select string_agg(row(i, o)::text, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t
group by i % 2;
```

The result is now:

It’s a bit weird, right? We `GROUP BY` in the outer query, and the entire correlated subquery is the aggregate function based on the fact that its `FROM` clause contains `ARRAY_AGG()`. This isn’t so much different from this query:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select 1 + sum(i) + 2
from t;
```

We’re used to building scalar expressions from aggregate functions all the time. This is nothing fancy. We can easily also just wrap the function in another subquery:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select (select 1 + sum(i) + 2)
from t;
```

From here, it’s not far fetched to extend the aggregate-function-in-scalar-subquery approach to the `FROM` clause, and then unnesting the aggregation again. This may not “click” immediately. The `GROUP BY` clause in SQL is a bit weird, syntactically.

Remark: Regrettably, PostgreSQL doesn’t allow using aggregate functions in the `FROM` clause on the same query level like in a correlated subquery. I was going to show a fancy `LATERAL` version, but this doesn’t work (yet).

### Now, recurse

The final bit is the recursion with the `r` table:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
--     ^^^^^^^^^ reduction
)
select i from r
order by o desc
limit 1
)
from t;
```

We simply recurse on the ordinality. The first subquery of `UNION ALL` produces the first row of our data, namely `(1, 1)`. The next iterations just always multiply the result of `r.i` by the value of `u.i` from the next row by ordinality. This is probably best shown visually:

Finally, we don’t care about SQL’s set-based way of working. I.e. we don’t care about the whole set of multiplications that are shown in the table above. We only care about the last row, ordered by the ordinality, which contains our result in r.i

Done!

### Using group by

Just as shown before, we can easily add a `GROUP BY` clause to the outer query. E.g. let’s multiply odd and even numbers separately:

```with t(i) as (values (2), (4), (3), (1), (6), (5))
select
i % 2,
(
with recursive
u(i, o) as (
select i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
select u.i, u.o from u where o = 1
union all
select r.i * u.i, u.o from u join r on u.o = r.o + 1
)
select i from r
order by o desc
limit 1
),
string_agg(i::text, ' * ')
from t
group by i % 2
```

I’ve added another aggregate function `STRING_AGG()` for good measure to get:

Wonderful, isn’t it? Now, I wasn’t able to just add an `OVER()` clause right there. That produced

SQL Error [42P20]: ERROR: window functions are not allowed in functions in FROM

Maybe that will work as well, in the near future? Or, I might come up with another hack to make it work, in case of which I’ll update this post.

### jOOQ support

Obviously, this will be supported in jOOQ soon: https://github.com/jOOQ/jOOQ/issues/11385. The syntax will be again much more bearable:

```ctx.select(T.I.mod(inline(2)), reduce(T.I, (i1, i2) -> i1.times(i2)))
.from(T.I)
.groupBy(T.I.mod(inline(2)))
.fetch();
```

Other emulations using actual `CREATE AGGREGATE FUNCTION` will be investigated as well, in the near future.

## Map Reducing a Set of Values Into a Dynamic SQL UNION Query

Sounds fancy, right? But it’s a really nice and reasonable approach to doing dynamic SQL with jOOQ. This blog post is inspired by a Stack Overflow question, where a user wanted to turn a set of values into a dynamic UNION query like this:
```
SELECT T.COL1
FROM T
WHERE T.COL2 = 'V1'
UNION
SELECT T.COL1
FROM T
WHERE T.COL2 = 'V2'
...
UNION
SELECT T.COL1
FROM T
WHERE T.COL2 = 'VN'

```
Note, both the Stack Overflow user and I are well aware of the possibility of using IN predicates :-), let’s just assume for the sake of argument, that the UNION query indeed outperforms the IN predicate in the user’s particular MySQL version and database. If this cannot be accepted, just imagine a more complex use case. The solution in Java is really very simple:
```
import static org.jooq.impl.DSL.*;
import java.util.*;
import org.jooq.*;

public class Unions {
public static void main(String[] args) {
List<String> list = Arrays.asList("V1", "V2", "V3", "V4");

System.out.println(
list.stream()
.map(Unions::query)
.reduce(Select::union));
}

// Dynamically construct a query from an input string
private static Select<Record1<String>> query(String s) {
return select(T.COL1).from(T).where(T.COL2.eq(s));
}
}

```
The output is:
```Optional[(
select T.COL1
from T
where T.COL2 = 'V1'
)
union (
select T.COL1
from T
where T.COL2 = 'V2'
)
union (
select T.COL1
from T
where T.COL2 = 'V3'
)
union (
select T.COL1
from T
where T.COL2 = 'V4'
)]
```
If you’re using JDK 9+ (which has `Optional.stream()`), you can further proceed to running the query fluently as follows:
```
List<String> list = Arrays.asList("V1", "V2", "V3", "V4");

try (Stream<Record1<String>> stream = list.stream()
.map(Unions::query)
.reduce(Select::union))
.stream() // Optional.stream()!
.flatMap(Select::fetchStream)) {
...
}

```
This way, if the list is empty, reduce will return an empty optional. Streaming that empty optional will result in not fetching any results from the database.

## “NoSQL” should be called “SQL with alternative storage models”

Time and again, you’ll find blog posts like this one here telling you the same “truths” about SQL vs. NoSQL:

http://onewebsql.com/blog/no-sql-do-i-really-need-it
(OneWebSQL being a competitor of jOOQ, see a previous article for a comparison)

Usually, those blogs aim for the same arguments being:

• Performance (“SQL” can “never” scale as much as “NoSQL”)
• ACID (you don’t always need it)
• Schemalessness (just store any data)

For some funny reason, all of these ideas have led to the misleading term “NoSQL”, which is interpreted by some as being “no SQL”, by others as being “not only SQL”. But SQL really just means “Structured Query Language”, and it is extremely powerful in terms of expressing relational context. It is well-designed for ad-hoc creation of tuples, records, tables, sets and for mapping them to other projections, reducing them to custom aggregations, etc. Note the terms “map/reduce”, which are often employed by NoSQL evangelists.

For good reasons, the Facebook Query Language (FQL), one of the leading NoSQL query languages, closely resembles SQL although it operates on a completely different data model. Oracle too, has jumped on the “NoSQL” train and sells its own product. It won’t be very long until the two types of data storage will merge and can be queried by an ISO/IEEE standardised SQL:2015 (or so). Because the true spirit of “NoSQL” does not consist in the way data is queried. It consists in the way data is stored. NoSQL is all about data storage. So, sooner or later, you will just create “traditional” tables along with “graph tables” and “hashmap tables” in the same database and join them in single SQL queries without thinking much about today’s hype.

“NoSQL” should be called “SQL with alternative storage models” and queried with pure SQL!