Vendor Agnostic, Dynamic Procedural Logic with jOOQ

One of the strengths of modern RDBMS is the capability to mix the powerful SQL language with procedural code.

SQL is a 4th generation programming language (4GL), and as such, extremely well suited for querying and bulk data manipulation. Its functional-declarative nature allows for it to be optimised in highly efficient ways using cost-based optimisation, but also statically as we’ve blogged before.

Sometimes, however, an imperative 3GL is better suited for a given task. That’s where stored procedures shine, or more specifically, procedural languages of RDBMS.

Among the ones that jOOQ supports, at least these ones support procedures:

  • BigQuery
  • Db2
  • Exasol
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • MariaDB
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Vertica

Others may do, as well, but jOOQ isn’t supporting their dialects yet.

Many have implemented their own procedural languages, some according to the ISO/IEC 9075-4 Persistent stored modules (SQL/PSM) standard, others have their own.

jOOQ support for procedural logic

Since jOOQ 3.12, our commercial distributions have supported anonymous blocks and the procedural statements they contain, such as the IF statement, LOOP statements, etc. Starting with jOOQ 3.15, we also support 3 types of statements to manage storing procedural logic in the catalog:

Using these statements via jOOQ may not be your every day use-case. You may prefer managing that logic via the native syntax, which is still more powerful than what jOOQ 3.15 supports (especially when you’re using Oracle’s PL/SQL), in case of which you’ll u se jOOQ purely to call your procedure from Java in the usual type safe manner.

But maybe, you have one of these use-cases?

  • You’re a product vendor, and you profit from procedural logic being vendor agnostic in order to support multiple of your clients’ RDBMS
  • Your procedural logic is dynamic, just like your SQL logic (and what other than jOOQ to use for that?)
  • You don’t have the necessary privileges to create procedures, functions, or triggers in your schema

In all of those cases, jOOQ is here to help you.

How does it work?

The first building block is the anonymous block, which isn’t supported by all of the above dialects, regrettably. jOOQ can emulate it on MySQL as discussed here, but not currently in other dialects.

Here’s a simple, empty anonymous block:

-- Db2
BEGIN
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
END

-- MariaDB
BEGIN NOT ATOMIC
END;

-- Oracle
BEGIN
  NULL;
END;

-- PostgreSQL
DO $$
BEGIN
  NULL;
END;
$$

It doesn’t really do much, but you can try executing it as follows, with jOOQ:

ctx.begin().execute();

Now, let’s do something more interesting, such as:

// Assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

// Then write
Variable<Integer> i = variable(unquotedName("i"), INTEGER);
Table<?> t = table(unquotedName("t"));
Field<Integer> col = field(unquotedName("col"), INTEGER);

ctx.begin(
    declare(i).set(1),

    while_(i.le(10)).loop(
        insertInto(t).columns(c).values(i),
        i.set(i.plus(1))
    )
).execute();

The above block executes:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- FIREBIRD
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
DECLARE
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
DO $$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

Easy as pie. Perhaps you prefer a FOR loop, instead? Try this:

ctx.begin(
    for_(i).in(1, 10).loop(
        insertInto(t).columns(c).values(i)
    )
).execute();

It produces the necessary emulations, if required, because regrettably, not all dialects support FOR:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- Firebird
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  FOR i IN 1 .. 10 DO
    INSERT INTO t (c)
    VALUES (i);
  END FOR;
END;

-- Oracle
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;

-- PostgreSQL
DO $$
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  BEGIN
    SET @i = 1;
    WHILE @i <= 10 BEGIN
      INSERT INTO t (c)
      VALUES (@i);
      SET @i = (@i + 1);
    END;
  END;
END;

SQL vs procedures

Of course, this particular SQL statement would be better implemented using a single bulk insertion statement, purely with SQL, not with procedural logic

ctx.insertInto(t, c)
   .select(selectFrom(generateSeries(1, 10)))
   .execute();

Which translates to:

-- Db2
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH
    generate_series(generate_series) AS (
      SELECT 1
      FROM SYSIBM.DUAL
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- Firebird
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      FROM RDB$DATABASE
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- MariaDB
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) AS generate_series;

-- Oracle
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT (level + (1 - 1)) generate_series
  FROM DUAL
  CONNECT BY level <= ((10 + 1) - 1)
) generate_series;

-- PostgreSQL
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM generate_series(1, 10);

-- SQL Server
WITH
  generate_series(generate_series) AS (
    SELECT 1
    UNION ALL
    SELECT (generate_series + 1)
    FROM generate_series
    WHERE generate_series < 10
  )
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT generate_series
  FROM generate_series
) generate_series

… but you get the point.

Storing the procedural logic

If you have the necessary privileges, and your procedural logic isn’t super dynamic, you may choose to store your logic in a procedure or function directly in your database. In some databases, this means a compiler will be able to eagerly translate the logic to something very efficient (e.g. machine code), instead of interpreting the logic on the fly.

Take the above WHILE loop, for example. You may want to store that as a procedure P:

Name p = unquotedName("p");

ctx.createProcedure(p)
   .modifiesSQLData()
   .as(
        declare(i).set(1),

        while_(i.le(10)).loop(
            insertInto(t).columns(c).values(i),
            i.set(i.plus(1))
        )
   )
   .execute();

This produces the following statements:

-- Db2
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Firebird
CREATE PROCEDURE p()
AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
CREATE PROCEDURE p
AS
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
CREATE PROCEDURE p()
LANGUAGE plpgsql
AS
$$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
CREATE PROCEDURE p
AS
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

And now, what better way to call this procedure than, again, an anonymous block?

ctx.begin(call(unquotedName("p"))).execute();

Producing:

-- Db2
BEGIN
  CALL p();
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
  EXECUTE PROCEDURE p;
END

-- MariaDB
BEGIN NOT ATOMIC
  CALL p();
END;

-- Oracle
BEGIN
  p();
END;

-- PostgreSQL
DO $$
BEGIN
  CALL p();
END;
$$

-- SQL Server
BEGIN
  EXEC p ;
END;

If you’re using jOOQ in Flyway or Liquibase to generate procedures during your database migrations, you can obviously generate jOOQ procedure stubs to call in a more type safe manner, instead of the above dynamic procedure call.

Parsing procedural logic

This jOOQ feature is not really exceptional. You can play around with our parser / translator here: https://www.jooq.org/translate. It can definitely help you translate your (simpler) stored procedures between dialects, such as PL/SQL, T-SQL, PL/pgSQL, etc.

Conclusion

As a rule of thumb, if you can do it with SQL (the 4GL), do it with SQL alone. But sometimes, you can’t. A 3GL is a better choice for an algorithm. When using jOOQ, you’ll naturally think of using Java to implement that 3GL algorithm. But wait, you could move the logic to the server for (drastically) increased performance!

Thanks to jOOQ, you can generate procedural logic that is:

  • Dynamic
  • Vendor agnostic
  • Anonymous or stored

Just like you’re used to, from jOOQ, for SQL

SQL, Streams, For Comprehension… It’s All the Same

Recently, at Devoxx, I’ve seen this beautiful slide in a talk by Kevlin Henney
In his talk, he was displaying a variety of approaches to solve the FizzBuzz “problem”, including a couple of very elegant solutions in completely declarative approaches and languages. In this particular slide, Kevlin used a notation that is derived from maths. The set builder notation. Here’s an example from Wikipedia: even-numbers The example reads: For all n in (the set of all integer numbers), take those for which there exists () another integer k, for which the following equation is satisfied: n = 2k. Or in plain English: All even integers. (because for even integers, there exists another integer that is half the even integer) Beautiful, eh? In imperative programming, we’d probably do something like this instead:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i++)
    even.add(i * 2);

Or this:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i = i + 2)
    even.add(i);

But there are several problems with the imperative approach:
  • We have to realistically start somewhere
  • We have to realistically end somewhere
  • We have to store all values in an intermediate collection
Sure, those aren’t severe limitations in every day use-cases, because we’re probably solving a real world problem where we don’t actually need an infinite number of even integers, and storing them in an intermediate collection doesn’t consume all of our memory, but still, the declarative, mathematical approach is much leaner, because we can still answer those questions about where to start and where to end later, and we never need to materialise any intermediate collection before we make those final decisions. For instance, we can declare X to be that set, and then declare Y to be a set that is derived from X, and finally materialise Z, which is a very tiny set derived from Y. For this, we may have never needed to materialise all the (even) integers.

How this compares to SQL

Kevlin made a cunning comparison. Of course, all functional programming aficionados will immediately recognise that languages like Scala have something called a “for comprehension”, which models precisely the mathematical set-builder notation. Java 8 now has the Streams API, which allows us, to some extent, model something similar (although not as powerful). But Kevlin didn’t use those “modern” languages. He used SQL as a comparison. That “arcane” declarative programming language that has been around forever, and that we love so much. Yes, here’s how we can declare all the even numbers in SQL:

SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE n = 2 * k
)

If optimisers were perfect, this semi-self-join between the two references of the integers “table” could be optimised perfectly. In most databases, we’d probably manually transform the above notation to this equivalent one:

SELECT n
FROM integers
WHERE MOD(n, 2) = 0

Yes, indeed. The set-builder notation and the SQL language are very similar beasts. The former prefers using mathematical symbols for brevity and conciseness, the latter prefers using English words to connect the different operators, but it’s the same thing. And if you squint hard enough, you’ll see that Java 8 Streams, for instance, are also pretty much the same thing: everything-is-a-table I’ve blogged about this recently where all the Java 8 Streams operations are compared to their SQL clause counterparts: https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-equivalents-in-java-8-streams

How is this better?

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them. Remember the imperative approach? We tell the machine exactly what to do:
  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection
What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc. In the imperative approach, we have to refactor constantly, to avoid the overhead of
  • Producing too many integers
  • Storing too many integers (or storing them at all)
In truly declarative languages like SQL, we’re just describing “even integers” with an expression, possibly assigning the expression a name:

CREATE VIEW even_integers AS
SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE k = 2 * n
)

So, when we actually use and materialise the even integers, e.g. positive integers less than 100, the optimiser can optimise away the double access to the integer table and produce only the exact number of values that we’re requesting (without materialising them in intermediate collections):

SELECT n
FROM even_integers
WHERE n BETWEEN 0 AND 100

Conclusion

Thinking in terms of sets, in terms of declaring sets, has always been our dream as software engineers. The approach is extremely compelling and elegant. We can delegate a lot of boring algorithmic work to the implementation engine of the declarative programming language. In the case of SQL, it would be a SQL database optimiser, which figures out a great lot of optimisations that we might not have thought of. The above example is trivial. We can perfectly live in a world where we manually iterate over a local integer variable that goes from 0 to 100:

for (int i = 0; i <= 100; i++)
  doSomething(i);

But stuff gets hairy quite quickly. Compare Mario Fusco‘s famous tweet’s two versions of the same algorithm:
This also applies to SQL, and what’s even better in SQL than with Streams: The SQL statement is a declarative expression tree, not a formally ordered set of stream pipeline operations. The optimiser can freely reorder / transform the expression tree into something that it thinks is more optimal. This isn’t just a promise. This works in modern SQL databases every day, for very complex queries, which you can write in a matter of seconds, rather than hours. Stay tuned for a short series of blog posts on the jOOQ blog illustrating what modern cost-based optimisation can do for you, when you’re using the SQL language.

Warning: Don’t oversimplify

This article just illustrates the roots of the SQL mindset in mathematics and functional programming. Do note that modern SQL is vastly more sophisticated than its roots, and has moved away from this original paradigm to embrace other paradigms for practical reasons. Don’t limit your SQL usage to what for comprehensions offer. There’s much more to SQL!