Occasionally, you want to write a SQL query and fetch a hierarchy of data, whose flat representation may look like this:
SELECT id, parent_id, label
FROM t_directory;
The result might be:
|id |parent_id|label | |---|---------|-------------------| |1 | |C: | |2 |1 |eclipse | |3 |2 |configuration | |4 |2 |dropins | |5 |2 |features | |7 |2 |plugins | |8 |2 |readme | |9 |8 |readme_eclipse.html| |10 |2 |src | |11 |2 |eclipse.exe |
Get the hierarchy with SQL
Now, you could run a recursive PostgreSQL query like the below monster to turn that into a JSON document:
WITH RECURSIVE
d1 (id, parent_id, name) as (
SELECT id, parent_id, label
FROM t_directory
),
d2 AS (
SELECT d1.*, 0 AS level
FROM d1
WHERE parent_id IS NULL
UNION ALL
SELECT d1.*, d2.level + 1
FROM d1
JOIN d2 ON d2.id = d1.parent_id
),
d3 AS (
SELECT d2.*, null::jsonb children
FROM d2
WHERE level = (SELECT max(level) FROM d2)
UNION (
SELECT
(branch_parent).*,
jsonb_strip_nulls(
jsonb_agg(branch_child - 'parent_id' - 'level'
ORDER BY branch_child->>'name'
) FILTER (
WHERE branch_child->>'parent_id' = (branch_parent).id::text
)
)
FROM (
SELECT
branch_parent,
to_jsonb(branch_child) AS branch_child
FROM d2 branch_parent
JOIN d3 branch_child
ON branch_child.level = branch_parent.level + 1
) branch
GROUP BY branch_parent
)
)
SELECT
jsonb_pretty(jsonb_agg(to_jsonb(d3) - 'parent_id' - 'level')) AS tree
FROM d3
WHERE level = 0;
I’ve given this query also as an answer to this Stack Overflow question. Some inspiration for the query in this blog post.
And behold, we have a JSON tree:
[
{
"id": 1,
"name": "C:",
"children": [
{
"id": 2,
"name": "eclipse",
"children": [
{
"id": 3,
"name": "configuration"
},
{
"id": 4,
"name": "dropins"
},
{
"id": 11,
"name": "eclipse.exe"
},
{
"id": 5,
"name": "features"
},
{
"id": 7,
"name": "plugins"
},
{
"id": 8,
"name": "readme",
"children": [
{
"id": 9,
"name": "readme_eclipse.html"
}
]
},
{
"id": 10,
"name": "src"
}
]
}
]
}
]
But that’s quite a beast of a SQL query, and perhaps, you don’t need to do this with SQL in the first place.
Doing this with jOOQ 3.19
In fact, starting from jOOQ 3.19 and #12341, you can do this entirely with jOOQ, using a Collector.
Assuming you have this client side representation for your data:
record File(int id, String name, List<File> children) {}
Now, you can write:
List<File> result =
ctx.select(T_DIRECTORY.ID, T_DIRECTORY.PARENT_ID, T_DIRECTORY.LABEL)
.from(T_DIRECTORY)
.orderBy(T_DIRECTORY.ID)
.collect(Records.intoHierarchy(
r -> r.value1(),
r -> r.value2(),
r -> new File(r.value1(), r.value3(), new ArrayList<>()),
(p, c) -> p.children().add(c)
));
That’s it! When you print the result, you’ll get:
[
File[id=1, name=C:, children=[
File[id=2, name=eclipse, children=[
File[id=3, name=configuration, children=[]],
File[id=4, name=dropins, children=[]],
File[id=5, name=features, children=[]],
File[id=7, name=plugins, children=[]],
File[id=8, name=readme, children=[
File[id=9, name=readme_eclipse.html, children=[]]
]],
File[id=10, name=src, children=[]],
File[id=11, name=eclipse.exe, children=[]]
]]
]]
]
Or, if you prefer JSON output, just use Jackson, or whatever, to serialise your data as follows:
new ObjectMapper()
.writerWithDefaultPrettyPrinter()
.writeValue(System.out, result);
And now, you’re getting:
[ {
"id" : 1,
"name" : "C:",
"children" : [ {
"id" : 2,
"name" : "eclipse",
"children" : [ {
"id" : 3,
"name" : "configuration"
}, {
"id" : 4,
"name" : "dropins"
}, {
"id" : 5,
"name" : "features"
}, {
"id" : 7,
"name" : "plugins"
}, {
"id" : 8,
"name" : "readme",
"children" : [ {
"id" : 9,
"name" : "readme_eclipse.html"
} ]
}, {
"id" : 10,
"name" : "src"
}, {
"id" : 11,
"name" : "eclipse.exe"
} ]
} ]
} ]
Very cool, huh?
Don’t use jOOQ? No problem, just copy this Collector:
The above isn’t really jOOQ specific magic. You can just copy the following Collector from jOOQ to achieve the same thing with your pure Java code:
public static final <K, E, R extends Record>
Collector<R, ?, List<E>> intoHierarchy(
Function<? super R, ? extends K> keyMapper,
Function<? super R, ? extends K> parentKeyMapper,
Function<? super R, ? extends E> nodeMapper,
BiConsumer<? super E, ? super E> parentChildAppender
) {
return Collectors.collectingAndThen(
Collectors.toMap(keyMapper, r -> new SimpleImmutableEntry<R, E>(
r, nodeMapper.apply(r)
)),
m -> {
List<E> r = new ArrayList<>();
m.forEach((k, v) -> {
Entry<R, E> parent = m.get(
parentKeyMapper.apply(v.getKey())
);
if (parent != null)
parentChildAppender.accept(
parent.getValue(), v.getValue()
);
else
r.add(v.getValue());
});
return r;
}
);
}
With this collector, and the following types / data:
record Flat(int id, int parentId, String name) {}
record Hierarchical(int id, String name, List<Hierarchical> children) {}
List<Flat> data = List.of(
new Flat(1, 0, "C:"),
new Flat(2, 1, "eclipse"),
new Flat(3, 2, "configuration"),
new Flat(4, 2, "dropins"),
new Flat(5, 2, "features"),
new Flat(7, 2, "plugins"),
new Flat(8, 2, "readme"),
new Flat(9, 8, "readme_eclipse.html"),
new Flat(10, 2, "src"),
new Flat(11, 2, "eclipse.exe")
);
You can now create the same hierarchy again, using the Collector directly on the list:
List<Hierarchical> result =
data.stream().collect(intoHierarchy(
e -> e.id(),
e -> e.parentId(),
e -> new Hierarchical(e.id(), e.name(), new ArrayList<>()),
(p, c) -> p.children().add(c)
));
An alternative API
A previous version of this blog post used an alternative API design for the Collector:
public static final <K, E, R> Collector<R, ?, List<E>> intoHierarchy(
Function<? super R, ? extends K> keyMapper,
Function<? super R, ? extends K> parentKeyMapper,
BiFunction<? super R, ? super List<E>, ? extends E> recordMapper
) {
record Tuple3<T1, T2, T3>(T1 t1, T2 t2, T3 t3) {}
return Collectors.collectingAndThen(
Collectors.toMap(keyMapper, r -> {
List<E> e = new ArrayList<>();
return new Tuple3<R, C, E>(r, e, recordMapper.apply(r, e));
}),
m -> {
List<E> r = new ArrayList<>();
m.forEach((k, v) -> {
K parent = parentKeyMapper.apply(v.t1());
E child = v.t3();
if (m.containsKey(parent))
m.get(parent).t2().add(child);
else
r.add(child);
});
return r;
}
);
}
This can lead to more compact usages in client code:
List<Hierarchical> result =
data.stream().collect(intoHierarchy(
e -> e.id(),
e -> e.parentId(),
(e, l) -> new Hierarchical(e.id(), e.name(), l)
));
However, it relies on type inference of the target type (see JEP 101). As soon as you don’t hint the target type anymore, inference falls appart, so this won’t compile:
List<?> result =
data.stream().collect(intoHierarchy(
e -> e.id(),
e -> e.parentId(),
(e, l) -> new Hierarchical(e.id(), e.name(), l)
));
This design would be quite impractical for users, especially when writing complex jOOQ queries, so it was rejected.
A more complex jOOQ example
In jOOQ, all results, including nested collections (e.g. those produced by MULTISET) can be collected, so if you have a nested hierarchy, such as comments on a blog post, just collect them with jOOQ.
Assuming this schema:
CREATE TABLE post (
id INT PRIMARY KEY,
title TEXT
);
CREATE TABLE comment (
id INT PRIMARY KEY,
parent_id INT REFERENCES comment,
post_id INT REFERENCES post,
text TEXT
);
INSERT INTO post
VALUES
(1, 'Helo'),
(2, 'World');
INSERT INTO comment
VALUES
(1, NULL, 1, 'You misspelled "Hello"'),
(2, 1, 1, 'Thanks, will fix soon'),
(3, 2, 1, 'Still not fixed'),
(4, NULL, 2, 'Impeccable blog post, thanks');
You could write a query like this:
record Post(int id, String title, List<Comment> comments) {}
record Comment(int id, String text, List<Comment> replies) {}
List<Post> result =
ctx.select(
POST.ID,
POST.TITLE,
multiset(
select(COMMENT.ID, COMMENT.PARENT_ID, COMMENT.TEXT)
.from(COMMENT)
.where(COMMENT.POST_ID.eq(POST.ID))
).convertFrom(r -> r.collect(intoHierarchy(
r -> r.value1(),
r -> r.value2(),
r -> new Comment(r.value1(), r.value3(), new ArrayList<>()),
(p, c) -> p.replies().add(c)
)))
)
.from(POST)
.orderBy(POST.ID)
.fetch(mapping(Post::new));
All of this is type-safe, as always with jOOQ!
Now, check out what this prints, when serialised with Jackson:
[ {
"id" : 1,
"title" : "Helo",
"comments" : [ {
"id" : 1,
"text" : "You misspelled \"Hello\"",
"replies" : [ {
"id" : 2,
"text" : "Thanks, will fix soon",
"replies" : [ {
"id" : 3,
"text" : "Still not fixed"
} ]
} ]
} ]
}, {
"id" : 2,
"title" : "World",
"comments" : [ {
"id" : 4,
"text" : "Impeccable blog post, thanks"
} ]
} ]
Note, if you only want to show a subtree, or a tree up until a certain depth, you can still run a hierarchical query in your
MULTISETsubquery usingWITH RECURSIVEorCONNECT BY.
Conclusion
Collector is a much underrated API in the JDK. Any JDK Collection can be turned into a Stream and its elements can be collected. In jOOQ, a ResultQuery is an Iterable, which also offers a convenient collect() method (it just executes the query, streams results, and collects records into your Collector).
Our functional library jOOλ has many additional collectors in its Agg class, e.g. for:
- Bitwise aggregation
- Statistical aggregation, like standard deviation, correlation, percentiles, etc.
Collecting things into a hierarchy isn’t really that special. It’s just another collector, which I’m sure, you’ll be using much more frequently from now on!
