How to Turn a List of Flat Elements into a Hierarchy in Java, SQL, or jOOQ

lukaseder

3 years ago

Occasionally, you want to write a SQL query and fetch a hierarchy of data, whose flat representation may look like this:

SELECT id, parent_id, label
FROM t_directory;

The result might be:

|id |parent_id|label              |
|---|---------|-------------------|
|1  |         |C:                 |
|2  |1        |eclipse            |
|3  |2        |configuration      |
|4  |2        |dropins            |
|5  |2        |features           |
|7  |2        |plugins            |
|8  |2        |readme             |
|9  |8        |readme_eclipse.html|
|10 |2        |src                |
|11 |2        |eclipse.exe        |

Get the hierarchy with SQL

Now, you could run a recursive PostgreSQL query like the below monster to turn that into a JSON document:

WITH RECURSIVE
  d1 (id, parent_id, name) as (
    SELECT id, parent_id, label
    FROM t_directory
  ),
  d2 AS (
    SELECT d1.*, 0 AS level
    FROM d1
    WHERE parent_id IS NULL
    UNION ALL
    SELECT d1.*, d2.level + 1
    FROM d1
    JOIN d2 ON d2.id = d1.parent_id
  ),
  d3 AS (
    SELECT d2.*, null::jsonb children
    FROM d2
    WHERE level = (SELECT max(level) FROM d2)
    UNION (
      SELECT 
        (branch_parent).*, 
        jsonb_strip_nulls(
          jsonb_agg(branch_child - 'parent_id' - 'level' 
            ORDER BY branch_child->>'name'
          ) FILTER (
            WHERE branch_child->>'parent_id' = (branch_parent).id::text
          )
        )
      FROM (
        SELECT
          branch_parent,
          to_jsonb(branch_child) AS branch_child
        FROM d2 branch_parent
        JOIN d3 branch_child 
          ON branch_child.level = branch_parent.level + 1
      ) branch
      GROUP BY branch_parent
    )
  )
SELECT 
  jsonb_pretty(jsonb_agg(to_jsonb(d3) - 'parent_id' - 'level')) AS tree
FROM d3
WHERE level = 0;

I’ve given this query also as an answer to this Stack Overflow question. Some inspiration for the query in this blog post.

And behold, we have a JSON tree:

[
    {
        "id": 1,
        "name": "C:",
        "children": [
            {
                "id": 2,
                "name": "eclipse",
                "children": [
                    {
                        "id": 3,
                        "name": "configuration"
                    },
                    {
                        "id": 4,
                        "name": "dropins"
                    },
                    {
                        "id": 11,
                        "name": "eclipse.exe"
                    },
                    {
                        "id": 5,
                        "name": "features"
                    },
                    {
                        "id": 7,
                        "name": "plugins"
                    },
                    {
                        "id": 8,
                        "name": "readme",
                        "children": [
                            {
                                "id": 9,
                                "name": "readme_eclipse.html"
                            }
                        ]
                    },
                    {
                        "id": 10,
                        "name": "src"
                    }
                ]
            }
        ]
    }
]

But that’s quite a beast of a SQL query, and perhaps, you don’t need to do this with SQL in the first place.

Doing this with jOOQ 3.19

In fact, starting from jOOQ 3.19 and #12341, you can do this entirely with jOOQ, using a Collector.

Assuming you have this client side representation for your data:

record File(int id, String name, List<File> children) {}

Now, you can write:

List<File> result =
ctx.select(T_DIRECTORY.ID, T_DIRECTORY.PARENT_ID, T_DIRECTORY.LABEL)
   .from(T_DIRECTORY)
   .orderBy(T_DIRECTORY.ID)
   .collect(Records.intoHierarchy(
       r -> r.value1(),
       r -> r.value2(),
       r -> new File(r.value1(), r.value3(), new ArrayList<>()),
       (p, c) -> p.children().add(c)
   ));

That’s it! When you print the result, you’ll get:

[
  File[id=1, name=C:, children=[
    File[id=2, name=eclipse, children=[
      File[id=3, name=configuration, children=[]], 
      File[id=4, name=dropins, children=[]], 
      File[id=5, name=features, children=[]], 
      File[id=7, name=plugins, children=[]], 
      File[id=8, name=readme, children=[
        File[id=9, name=readme_eclipse.html, children=[]]
      ]], 
      File[id=10, name=src, children=[]], 
      File[id=11, name=eclipse.exe, children=[]]
    ]]
  ]]
]

Or, if you prefer JSON output, just use Jackson, or whatever, to serialise your data as follows:

new ObjectMapper()
    .writerWithDefaultPrettyPrinter()
    .writeValue(System.out, result);

And now, you’re getting:

[ {
  "id" : 1,
  "name" : "C:",
  "children" : [ {
    "id" : 2,
    "name" : "eclipse",
    "children" : [ {
      "id" : 3,
      "name" : "configuration"
    }, {
      "id" : 4,
      "name" : "dropins"
    }, {
      "id" : 5,
      "name" : "features"
    }, {
      "id" : 7,
      "name" : "plugins"
    }, {
      "id" : 8,
      "name" : "readme",
      "children" : [ {
        "id" : 9,
        "name" : "readme_eclipse.html"
      } ]
    }, {
      "id" : 10,
      "name" : "src"
    }, {
      "id" : 11,
      "name" : "eclipse.exe"
    } ]
  } ]
} ]

Very cool, huh?

Don’t use jOOQ? No problem, just copy this Collector:

The above isn’t really jOOQ specific magic. You can just copy the following Collector from jOOQ to achieve the same thing with your pure Java code:

public static final <K, E, R extends Record>
Collector<R, ?, List<E>> intoHierarchy(
    Function<? super R, ? extends K> keyMapper,
    Function<? super R, ? extends K> parentKeyMapper,
    Function<? super R, ? extends E> nodeMapper,
    BiConsumer<? super E, ? super E> parentChildAppender
) {
    return Collectors.collectingAndThen(
        Collectors.toMap(keyMapper, r -> new SimpleImmutableEntry<R, E>(
            r, nodeMapper.apply(r)
        )),
        m -> {
            List<E> r = new ArrayList<>();

            m.forEach((k, v) -> {
                Entry<R, E> parent = m.get(
                    parentKeyMapper.apply(v.getKey())
                );

                if (parent != null)
                    parentChildAppender.accept(
                        parent.getValue(), v.getValue()
                    );
                else
                    r.add(v.getValue());
            });

            return r;
        }
    );
}

With this collector, and the following types / data:

record Flat(int id, int parentId, String name) {}
record Hierarchical(int id, String name, List<Hierarchical> children) {}

List<Flat> data = List.of(
    new Flat(1, 0, "C:"),
    new Flat(2, 1, "eclipse"),
    new Flat(3, 2, "configuration"),
    new Flat(4, 2, "dropins"),
    new Flat(5, 2, "features"),
    new Flat(7, 2, "plugins"),
    new Flat(8, 2, "readme"),
    new Flat(9, 8, "readme_eclipse.html"),
    new Flat(10, 2, "src"),
    new Flat(11, 2, "eclipse.exe")
);

You can now create the same hierarchy again, using the Collector directly on the list:

List<Hierarchical> result =
data.stream().collect(intoHierarchy(
    e -> e.id(),
    e -> e.parentId(),
    e -> new Hierarchical(e.id(), e.name(), new ArrayList<>()),
    (p, c) -> p.children().add(c)
));

An alternative API

A previous version of this blog post used an alternative API design for the Collector:

public static final <K, E, R> Collector<R, ?, List<E>> intoHierarchy(
    Function<? super R, ? extends K> keyMapper,
    Function<? super R, ? extends K> parentKeyMapper,
    BiFunction<? super R, ? super List<E>, ? extends E> recordMapper
) {
    record Tuple3<T1, T2, T3>(T1 t1, T2 t2, T3 t3) {}
    return Collectors.collectingAndThen(
        Collectors.toMap(keyMapper, r -> {
            List<E> e = new ArrayList<>();
            return new Tuple3<R, C, E>(r, e, recordMapper.apply(r, e));
        }),
        m -> {
            List<E> r = new ArrayList<>();

            m.forEach((k, v) -> {
                K parent = parentKeyMapper.apply(v.t1());
                E child = v.t3();

                if (m.containsKey(parent))
                    m.get(parent).t2().add(child);
                else
                    r.add(child);
            });

            return r;
        }
    );
}

This can lead to more compact usages in client code:

List<Hierarchical> result =
data.stream().collect(intoHierarchy(
    e -> e.id(),
    e -> e.parentId(),
    (e, l) -> new Hierarchical(e.id(), e.name(), l)
));

However, it relies on type inference of the target type (see JEP 101). As soon as you don’t hint the target type anymore, inference falls appart, so this won’t compile:

List<?> result =
data.stream().collect(intoHierarchy(
    e -> e.id(),
    e -> e.parentId(),
    (e, l) -> new Hierarchical(e.id(), e.name(), l)
));

This design would be quite impractical for users, especially when writing complex jOOQ queries, so it was rejected.

A more complex jOOQ example

In jOOQ, all results, including nested collections (e.g. those produced by MULTISET) can be collected, so if you have a nested hierarchy, such as comments on a blog post, just collect them with jOOQ.

Assuming this schema:

CREATE TABLE post (
  id INT PRIMARY KEY,
  title TEXT
);

CREATE TABLE comment (
  id INT PRIMARY KEY,
  parent_id INT REFERENCES comment,
  post_id INT REFERENCES post,
  text TEXT
);

INSERT INTO post 
VALUES
  (1, 'Helo'),
  (2, 'World');
  
INSERT INTO comment
VALUES 
  (1, NULL, 1, 'You misspelled "Hello"'),
  (2, 1, 1, 'Thanks, will fix soon'),
  (3, 2, 1, 'Still not fixed'),
  (4, NULL, 2, 'Impeccable blog post, thanks');

You could write a query like this:

record Post(int id, String title, List<Comment> comments) {}
record Comment(int id, String text, List<Comment> replies) {}

List<Post> result =
ctx.select(
       POST.ID, 
       POST.TITLE,
       multiset(
           select(COMMENT.ID, COMMENT.PARENT_ID, COMMENT.TEXT)
           .from(COMMENT)
           .where(COMMENT.POST_ID.eq(POST.ID))
       ).convertFrom(r -> r.collect(intoHierarchy(
           r -> r.value1(),
           r -> r.value2(),
           r -> new Comment(r.value1(), r.value3(), new ArrayList<>()),
           (p, c) -> p.replies().add(c)
       )))
   )
   .from(POST)
   .orderBy(POST.ID)
   .fetch(mapping(Post::new));

All of this is type-safe, as always with jOOQ!

Now, check out what this prints, when serialised with Jackson:

[ {
  "id" : 1,
  "title" : "Helo",
  "comments" : [ {
    "id" : 1,
    "text" : "You misspelled \"Hello\"",
    "replies" : [ {
      "id" : 2,
      "text" : "Thanks, will fix soon",
      "replies" : [ {
        "id" : 3,
        "text" : "Still not fixed"
      } ]
    } ]
  } ]
}, {
  "id" : 2,
  "title" : "World",
  "comments" : [ {
    "id" : 4,
    "text" : "Impeccable blog post, thanks"
  } ]
} ]

Note, if you only want to show a subtree, or a tree up until a certain depth, you can still run a hierarchical query in your MULTISET subquery using WITH RECURSIVE or CONNECT BY.

Conclusion

Collector is a much underrated API in the JDK. Any JDK Collection can be turned into a Stream and its elements can be collected. In jOOQ, a ResultQuery is an Iterable, which also offers a convenient collect() method (it just executes the query, streams results, and collects records into your Collector).

Our functional library jOOλ has many additional collectors in its Agg class, e.g. for:

Bitwise aggregation
Statistical aggregation, like standard deviation, correlation, percentiles, etc.

Collecting things into a hierarchy isn’t really that special. It’s just another collector, which I’m sure, you’ll be using much more frequently from now on!