Using jOOQ’s DiagnosticsConnection to detect N+1 Queries

N+1 queries are a popular problem in many applications that run SQL queries. The problem can be described easily as follows:

  • 1 query fetching a parent value is run
  • N queries fetching each individual child values are run

This problem isn’t limited to SQL, it can happen with any poorly designed API that does not allow for batch and/or bulk processing (even stored procedures). But with SQL, it’s particularly painful, because in many cases, running tons of logic in a single query would be totally possible, especially with jOOQ’s MULTISET and SQL/XML or SQL/JSON support.

In the worst case, the N+1 problem is caused by a third party ORM – or rather, its poor implementation / configuration, but some ORMs make it really easy to shoot oneself in the foot with N+1 problems…

An example

Let’s stick with JDBC for now to illustrate how N+1 queries happen.

try (Statement stmt = conn.createStatement()) {

    // The parent query, fetching actors
    try (ResultSet r1 = stmt.executeQuery(
        """
        SELECT actor_id, first_name, last_name
        FROM actor
        LIMIT 5
        """
    )) {
        while (r1.next()) {
            System.out.println();
            System.out.println(
                "Actor: " + r1.getString(2) + " " + r1.getString(2));

            // The child query, fetching films per actor
            try (PreparedStatement pstmt = conn.prepareStatement(
                """
                SELECT count(*) FROM film_actor WHERE actor_id = ?
                """
            )) {
                pstmt.setInt(1, r1.getInt(1));

                try (ResultSet r2 = pstmt.executeQuery()) {
                    while (r2.next()) {
                        System.out.println("Films: " + r2.getInt(1));
                    }
                }
            }
        }
    }
}

When run against the sakila database, the above prints:

Actor: PENELOPE PENELOPE
Films: 19

Actor: NICK NICK
Films: 25

Actor: ED ED
Films: 22

Actor: JENNIFER JENNIFER
Films: 22

Actor: JOHNNY JOHNNY
Films: 29

Obviously correct, but we could have easily run this in a single query:

SELECT
  a.first_name,
  a.last_name,
  count(fa.film_id)
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
GROUP BY a.actor_id

Given that we have 200 actors in total, which do you prefer? Running 1+200 queries or just 1 query? If you’re in control of your SQL, this mistake is much less likely to happen, but what if you’re not in (complete) control, because the SQL is generated based on eager/lazy loading configurations and complex entity graph annotations, then you’ll be happy you can just plug in jOOQ’s DiagnosticsConnection’s repeated statements diagnostic into your integration test environment (not necessarily in production, as there’s some overhead to parse and normalise all the SQL strings).

Applied to the above JDBC example:

DSLContext ctx = DSL.using(connection);
ctx.configuration().set(new DefaultDiagnosticsListener() {
    @Override
    public void repeatedStatements(DiagnosticsContext c) {

        // Custom callback, may also throw exceptions, etc.
        System.out.println(
            "Repeated statement: " + c.normalisedStatement());
    }
});

Connection conn = ctx.diagnosticsConnection();

You’re now getting the following output:

Actor: PENELOPE PENELOPE
Films: 19

Actor: NICK NICK
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 25

Actor: ED ED
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 22

Actor: JENNIFER JENNIFER
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 22

Actor: JOHNNY JOHNNY
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 29

As you can see, the diagnostics connection starts logging after the first repetition of the statement, the assumption being that within a transaction, it is generally unnecessary to ever repeat a statement more than once, because there’s almost always a better way.

Using this with JPA / Hibernate

You probably don’t write JDBC statements manually like this, but it doesn’t matter who calls JDBC (you, jOOQ, JdbcTemplate, Hibernate, etc.). If you proxy your connection (or DataSource) with jOOQ’s DiagnosticsConnection or DiagnosticsDataSource, then you can intercept such events easily, no matter the cause.

Future versions of jOOQ will add a lot more diagnostics via https://github.com/jOOQ/jOOQ/issues/7527.

To see what’s available in jOOQ already, refer to the manual.