Simulating Latency with SQL / JDBC

I’ve run across a fun little trick to simulate latency in your development environments when testing some SQL queries. Possible use-cases including to validate that backend latency won’t bring down your frontend, or that your UX is still bearable, etc.

The solution is PostgreSQL and Hibernate specific, though to doesn’t have to be. Besides, it uses a stored function to work around the limitations of a VOID function in PostgreSQL, but that can be worked around differently as well, without storing anything auxiliary to the catalog.

To remove the Hibernate dependency, you can just use the pg_sleep function directly using a NULL predicate, but don’t try it like this!

select 1
from t_book
-- Don't do this!
where pg_sleep(1) is not null;

This will sleep 1 second per row (!). As can be seen in the explain plan. Let’s limit to 3 rows to see:

explain analyze
select 1
from t_book
where pg_sleep(1) is not null
limit 3;

And the result is:

Limit  (cost=0.00..1.54 rows=3 width=4) (actual time=1002.142..3005.374 rows=3 loops=1)
   ->  Seq Scan on t_book  (cost=0.00..2.05 rows=4 width=4) (actual time=1002.140..3005.366 rows=3 loops=1)
         Filter: (pg_sleep('1'::double precision) IS NOT NULL)
 Planning Time: 2.036 ms
 Execution Time: 3005.401 ms

As you can see, the whole query took about 3 seconds for 3 rows. In fact, this is also what happens in Gunnar’s example from the tweet, except that he was filtering by ID, which “helps” hide this effect.

We can use what Oracle calls scalar subquery caching, the fact that a scalar subquery can be reasonably expected to be side-effect free (despite the obvious side-effect of pg_sleep), meaning that some RDBMS cache its result per query execution.

explain analyze
select 1
from t_book
where (select pg_sleep(1)) is not null
limit 3;

The result is now:

Limit  (cost=0.01..1.54 rows=3 width=4) (actual time=1001.177..1001.178 rows=3 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=1001.148..1001.148 rows=1 loops=1)
   ->  Result  (cost=0.00..2.04 rows=4 width=4) (actual time=1001.175..1001.176 rows=3 loops=1)
         One-Time Filter: ($0 IS NOT NULL)
         ->  Seq Scan on t_book  (cost=0.00..2.04 rows=4 width=0) (actual time=0.020..0.021 rows=3 loops=1)
 Planning Time: 0.094 ms
 Execution Time: 1001.223 ms

We’re now getting the desired one-time filter. However, I don’t really like this hack, because it depends on an optimisation, which is optional, not a formal guarantee. This may be good enough for a quick simulation of latency, but don’t depend on this kind of optimisation in production lightheartedly.

Another approach that seems to guarantee this behaviour would be to use a MATERIALIZED CTE:

explain
with s (x) as materialized (select pg_sleep(1))
select *
from t_book
where (select x from s) is not null;

I’m now again using a scalar subquery, because I somehow need to access the CTE, and I don’t want to place it in the FROM clause, where it would impact my projection.

The plan being:

Result  (cost=0.03..2.07 rows=4 width=943) (actual time=1001.289..1001.292 rows=4 loops=1)
   One-Time Filter: ($1 IS NOT NULL)
   CTE s
     ->  Result  (...) (actual time=1001.262..1001.263 rows=1 loops=1)
   InitPlan 2 (returns $1)
     ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=4) (actual time=1001.267..1001.268 rows=1 loops=1)
   ->  Seq Scan on t_book  (cost=0.03..2.07 rows=4 width=943) (actual time=0.015..0.016 rows=4 loops=1)
 Planning Time: 0.049 ms
 Execution Time: 1001.308 ms

Again, containing a one-time filter, which is what we want here.

Using a JDBC based approach

If your application is JDBC based, you don’t have to simulate the latency by tweaking the query. You can simply proxy JDBC in one way or another. Let’s look at this little program:

try (Connection c1 = db.getConnection()) {

    // A Connection proxy that intercepts preparedStatement() calls
    Connection c2 = new DefaultConnection(c1) {
        @Override
        public PreparedStatement prepareStatement(String sql) 
        throws SQLException {
            sleep(1000L);
            return super.prepareStatement(sql);
        }
    };

    long time = System.nanoTime();
    String sql = "SELECT id FROM book";

    // This call now has a 1 second "latency"
    try (PreparedStatement s = c2.prepareStatement(sql);
        ResultSet rs = s.executeQuery()) {
        while (rs.next())
            System.out.println(rs.getInt(1));
    }

    System.out.println("Time taken: " + 
       (System.nanoTime() - time) / 1_000_000L + "ms");
}

Where:

public static void sleep(long time) {
    try {
        Thread.sleep(time);
    }
    catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }
}

For simplicity reasons, this uses jOOQ’s DefaultConnection which acts as a proxy, conveniently delegating all the methods to some delegate connection, allowing for overriding only specific methods. The output of the program is:

1
2
3
4
Time taken: 1021ms

This simulates the latency on the prepareStatement() event. Obviously, you’d be extracting the proxying into some utility in order not to clutter your code. You could even proxy all your queries in development and enable the sleep call only based on a system property.

Alternatively, we could also simulate it on the executeQuery() event:

try (Connection c = db.getConnection()) {
    long time = System.nanoTime();

    // A PreparedStatement proxy intercepting executeQuery() calls
    try (PreparedStatement s = new DefaultPreparedStatement(
        c.prepareStatement("SELECT id FROM t_book")
    ) {
        @Override
        public ResultSet executeQuery() throws SQLException {
            sleep(1000L);
            return super.executeQuery();
        };
    };

        // This call now has a 1 second "latency"
        ResultSet rs = s.executeQuery()) {
        while (rs.next())
            System.out.println(rs.getInt(1));
    }

    System.out.println("Time taken: " +
        (System.nanoTime() - time) / 1_000_000L + "ms");
}

This is now using the jOOQ convenience class DefaultPreparedStatement. If you need these, just add the jOOQ Open Source Edition dependency (there’s nothing RDBMS specific in these classes), with any JDBC based application, including Hibernate:

<dependency>
  <groupId>org.jooq</groupId>
  <artifactId>jooq</artifactId>
</dependency>

Alternatively, just copy the sources of the classes DefaultConnection or DefaultPreparedStatement if you don’t need the entire dependency, or you just proxy the JDBC API yourself.

A jOOQ based solution

If you’re already using jOOQ (and you should be!), you can do this even more easily, by implementing an ExecuteListener. Our program would now look like this:

try (Connection c = db.getConnection()) {
    DSLContext ctx = DSL.using(new DefaultConfiguration()
        .set(c)
        .set(new CallbackExecuteListener()
            .onExecuteStart(x -> sleep(1000L))
        )
    );

    long time = System.nanoTime();
    System.out.println(ctx.fetch("SELECT id FROM t_book"));
    System.out.println("Time taken: " +
        (System.nanoTime() - time) / 1_000_000L + "ms");
}

Still the same result:

+----+
|id  |
+----+
|1   |
|2   |
|3   |
|4   |
+----+
Time taken: 1025ms

The difference is that with a single intercepting callback, we can now add this sleep to all types of statements, including prepared statements, static statements, statements returning result sets, or update counts, or both.

How to Detect Slow Queries with jOOQ

Just now, we implemented a nice little feature in jOOQ’s code generator:
https://github.com/jOOQ/jOOQ/issues/4974

It detects whenever the jOOQ code generator runs a slow query to reverse engineer schema meta information. Why?

In our development and integration test environment, we don’t have huge schemas with all the different performance edge cases put in place. For instance, we don’t have 5000 Oracle synonyms. Or 10000 procedures with 500 parameters each. We do cover some common edge cases, but not in all databases.

Users on the other hand tend to accept the status quo after a while. The code generator is slow-ish? Sure, because we have a huge schema. This lazy acceptance is an impediment to our product quality. We had rather our users report all sorts of issues they encounter, so we want to encourage them.

And we did

In the upcoming jOOQ version 3.8 (and patch releases for 3.5.5, 3.6.5, and 3.7.3), we’ve added a nice little ExecuteListener to jOOQ-meta, which looks roughly like this:

class PerformanceListener 
    extends DefaultExecuteListener {

    StopWatch watch;
    class SQLPerformanceWarning 
        extends Exception {}

    @Override
    public void executeStart(ExecuteContext ctx) {
        super.executeStart(ctx);
        watch = new StopWatch();
    }

    @Override
    public void executeEnd(ExecuteContext ctx) {
        super.executeEnd(ctx);
        if (watch.split() > 5_000_000_000L)
            log.warn(
                "Slow SQL",
                "jOOQ Meta executed a slow query"
              + "\n\n"
              + "Please report this bug here: "
              + "https://github.com/jOOQ/jOOQ/issues/new\n\n"
              + formatted(ctx.query()),
                new SQLPerformanceWarning());
    }
}

It’s very simple. Every time we start executing a query, a “stop watch” is started. Every time we end execution, we check if the watch elapsed more than 5 seconds. If so, we log a warning, a link to our issue tracker, a formatted version of the SQL query, and a stack trace to help find the exact spot where the slow statement was executed.

Let’s run this

The reason why we did this is because we’ve seen ourselves that the PostgreSQL code generator runs a slow query to fetch all the stored procedures (and to generate overload indexes). The produced error message is:

[WARNING] Slow SQL                 : jOOQ Meta executed a slow query (slower than 5 seconds)

Please report this bug here: https://github.com/jOOQ/jOOQ/issues/new

select
  "r1"."routine_schema",
  "r1"."routine_name",
  "r1"."specific_name",
  case when exists (
        select 1 as "one"
        from "information_schema"."parameters"
        where (
          "information_schema"."parameters"."specific_schema" = "r1"."specific_schema"
          and "information_schema"."parameters"."specific_name" = "r1"."specific_name"
          and upper("information_schema"."parameters"."parameter_mode")  'IN'
        )
      ) then 'void'
       else "r1"."data_type"
  end as "data_type",
  "r1"."character_maximum_length",
  "r1"."numeric_precision",
  "r1"."numeric_scale",
  "r1"."type_udt_schema",
  "r1"."type_udt_name",
  case when exists (
        select 1 as "one"
        from "information_schema"."routines" as "r2"
        where (
          "r2"."routine_schema" in (
            'public', 'multi_schema', 'pg_catalog'
          )
          and "r2"."routine_schema" = "r1"."routine_schema"
          and "r2"."routine_name" = "r1"."routine_name"
          and "r2"."specific_name"  "r1"."specific_name"
        )
      ) then (
        select count(*)
        from "information_schema"."routines" as "r2"
        where (
          "r2"."routine_schema" in (
            'public', 'multi_schema', 'pg_catalog'
          )
          and "r2"."routine_schema" = "r1"."routine_schema"
          and "r2"."routine_name" = "r1"."routine_name"
          and "r2"."specific_name" <= "r1"."specific_name"
        )
      ) end as "overload",
  "pg_catalog"."pg_proc"."proisagg"
from "information_schema"."routines" as "r1"
  join "pg_catalog"."pg_namespace"
  on "pg_catalog"."pg_namespace"."nspname" = "r1"."specific_schema"
  join "pg_catalog"."pg_proc"
  on (
    "pg_catalog"."pg_proc"."pronamespace" = "pg_catalog"."pg_namespace".oid
    and (("pg_catalog"."pg_proc"."proname" || '_') || cast("pg_catalog"."pg_proc".oid as varchar)) = "r1"."specific_name"
  )
where (
  "r1"."routine_schema" in (
    'public', 'multi_schema', 'pg_catalog'
  )
  and not("pg_catalog"."pg_proc"."proretset")
)
order by
  "r1"."routine_schema" asc,
  "r1"."routine_name" asc,
  "overload" asc
org.jooq.util.AbstractDatabase$1$SQLPerformanceWarning
        at org.jooq.util.AbstractDatabase$1.executeEnd(AbstractDatabase.java:230)
        at org.jooq.impl.ExecuteListeners.executeEnd(ExecuteListeners.java:163)
        at org.jooq.impl.AbstractResultQuery.execute(AbstractResultQuery.java:269)
        at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:346)
        at org.jooq.impl.AbstractResultQuery.fetch(AbstractResultQuery.java:308)
        at org.jooq.impl.SelectImpl.fetch(SelectImpl.java:2703)
        at org.jooq.util.postgres.PostgresDatabase.getRoutines0(PostgresDatabase.java:707)
        at org.jooq.util.AbstractDatabase.getRoutines(AbstractDatabase.java:1131)
        at org.jooq.util.JavaGenerator.generate(JavaGenerator.java:417)
        at org.jooq.util.JavaGenerator.generate(JavaGenerator.java:314)
        at org.jooq.util.JavaGenerator.generate(JavaGenerator.java:279)
        at org.jooq.util.GenerationTool.run(GenerationTool.java:490)
        at org.jooq.util.GenerationTool.generate(GenerationTool.java:193)
        at org.jooq.util.maven.Plugin.execute(Plugin.java:131)
        ...

We can now proceed with fixing the query, easily.

You can do the same!

The implementation of the ExecuteListener was straight forward. You can do the same, very easily. Just hook a simple execute listener into your jOOQ Configuration, measuring execution speeds and logging warnings after a threshold, done.

Happy debugging!

Further reading

Coincidentally, a very similar approach has been documented by the engineering team at Square – The Query Sniper:

https://corner.squareup.com/2016/01/query-sniper.html