Translating Stored Procedures Between Dialects

In the past years, we’ve invested a lot of effort into improving our procedural language capabilities in jOOQ. What started with a simple internal API to support the emulations of DDL clauses like these:

-- Some dialect that supports this
create table if not exists t (i varchar(10));

-- Db2
begin
  declare continue handler for sqlstate '42710' begin end;
  execute immediate 'create table T (I varchar(10))';
end

-- Oracle
begin
  execute immediate 'create table T (I varchar2(10))';
exception
  when others then
    if sqlerrm like 'ORA-00955%' then null;
    else raise;
    end if;
end;

-- SQL Server
begin try
  create table T (I varchar(10))
end try
begin catch
  if error_number() != 2714 throw;
end catch

… evolved into a full fledged API for all sorts of procedural logic executed in your database server.

Anonymous blocks

The above examples show what most RDBMS call “anonymous blocks”, similar to Java’s anonymous classes, i.e. elements of procedural logic that do not have a name.

Depending on the database, these blocks are interpreted on the fly, or compiled and cached like ordinary SQL statements. They can be very useful for a variety of reasons:

  • To create atomic ad-hoc units of code, executed in a single server round trip, similar to the above DDL scripts with integrated exception handling
  • To create dynamic procedural code. This may be esoteric for many, but exactly the right thing to do for others. All of jOOQ is about dynamic SQL, so why not also dynamic PL/SQL, T-SQL, pgplsql, etc?
  • To work around limitations imposed by Conway’s Law, when there is no way you can get the necessary GRANT or other bureaucratic token to deploy your procedure in production at your discretion. I mean, this is still a thing in a lot of companies.
  • A lesser case of Conway’s Law may be when you’re a product vendor, and you don’t know if you can create procedures on your clients’ production system. Just don’t. Run your procedural logic as an anonymous block if you can’t, or as a procedure if you can. Same jOOQ code.
  • If your procedural code changes very frequently (even dynamically), storing it might cause tricky issues. If you’ve ever worked with Oracle and ran into the dreaded latch free event, you know what I mean.

I’m by no means advocating you should use anonymous blocks over stored procedures in general. If you can, store your code in the database for better performance and re-use. But sometimes you can’t, and sometimes you shouldn’t.

So, jOOQ supports – as always – a mix of various procedural logic elements, including:

  • Blocks with variable declarations
  • IF statement
  • Loops including LOOP, WHILE, REPEAT, FOR loops
  • EXIT (or LEAVE) and CONTINUE (or ITERATE) for loop control flow
  • RETURN to return from procedures or functions
  • GOTO (ghasp!)
  • SIGNAL or RAISE
  • Labels
  • CALL statement to call other stored procedures
  • EXECUTE statement (for running dynamic SQL from within procedural logic. Which level of inception is that?)

And we’re adding more support all the time. The Java code might look something like this:

Variable<Integer> i = var(name("i"), INTEGER);
ctx.begin(
  for_(i).in(1, 10).loop(
    insertInto(T).columns(T.COL).values(i)
  )
).execute();

Assuming you cannot run a bulk insert statement for some reason, this might be the way to go. It translates to various dialects as follows.

Db2 and MySQL (which doesn’t support anonymous blocks, but statement batches)

begin
  declare I bigint;
  set I = 1;
  while I <= 10 do
    insert into T (COL) values (I);
    set I = (I + 1);
  end while;
end;

PostgreSQL

do $$
begin
  for I in 1 .. 10 loop
    insert into T (COL) values (I);
  end loop;
end;
$$

Oracle

begin
  for I in 1 .. 10 loop
    insert into T (COL) values (I);
  end loop;
end;

SQL Server

begin
  declare @I bigint = 1;
  while @I <= 10 begin
    insert into T (COL) values (I);
    set @I = (@I + 1);
  end;
end;

As always with jOOQ, you don’t have to start out with writing jOOQ API based code. While this is the recommended approach when your procedural (or SQL) logic is dynamic, jOOQ can also parse and translate static SQL in string form. The babelfish of SQL. Play around with it here to learn more: https://www.jooq.org/translate/

Storing the code as a procedure

If you don’t have any of the above use-cases, you will want to store this code as a procedure (or function):

  • For greater re-use
  • For better performance

In that case, starting from jOOQ 3.15, you can use our CREATE PROCEDURE, CREATE FUNCTION, or even CREATE TRIGGER support.

Note: CREATE PACKAGE is high on our wish list, but might not make it into 3.15 anymore. If packages are used for namespacing only, they might be emulated using schemas in other dialects. Other package level features, such as package state may be more difficult to translate.

The previous anonymous block can be easily wrapped in a DSLContext.createProcedure() call

Variable<Integer> i = var("i", INTEGER);
Parameter<Integer> i1 = in("i1", INTEGER);
Parameter<Integer> i2 = in("i2", INTEGER);

ctx.createProcedure("insert_into_t")
   .parameters(i1, i2)
   // You may or may not wrap your block in BEGIN .. END.
   // jOOQ will figure it out per dialect...
   .as(for_(i).in(i1, i2).loop(
     insertInto(T).columns(T.COL).values(i)
   ))
   .execute();

Which would produce the following procedures:

Db2 and MySQL

create procedure INSERT_INTO_T(
  I1 integer,
  I2 integer
)
begin
  declare I bigint;
  set I = I1;
  while I <= I2 do
    insert into T (COL) values (I);
    set I = (I + 1);
  end while;
end;

MariaDB

create procedure INSERT_INTO_T(
  I1 int,
  I2 int
)
begin
  for I in I1 .. I2 do
    insert into T (COL) values (I);
  end for;
end;

Oracle

create procedure INSERT_INTO_T(
  I1 number,
  I2 number
)
as
begin
  for I in I1 .. I2 loop
    insert into T (COL) values (I);
  end loop;
end;

PostgreSQL

create procedure INSERT_INTO_T(
  I1 int,
  I2 int
)
language plpgsql
as
$$
begin
  for I in I1 .. I2 loop
    insert into T (COL) values (I);
  end loop;
end;
$$

SQL Server

create procedure INSERT_INTO_T
  @I1 int,
  @I2 int
as
begin
  declare @I bigint = @I1;
  while @I <= @I2 begin
    insert into T (COL) values (@I);
    set @I = (@I + 1);
  end;
end;

Play around with it here to learn more: https://www.jooq.org/translate/. We’re also looking forward to your bug reports and/or feature requests here: https://github.com/jOOQ/jOOQ/issues/new/choose.

Tricky transformations

Procedural languages are standardised via the ISO/IEC 9075-4 standard, and some RBDMS surprisingly agree to a large extent on the standard, including:

  • Db2
  • HSQLDB
  • MariaDB
  • MySQL

Others do less so, but all procedural languages agree on the fact that they are very simple languages, without any such “fancy” things like subtype or parametric polymorphism (OK, PL/SQL has some subtype polymorphism, but not a very sophisticated one. We won’t support it for now), lambda expressions, dynamic dispatch, algebraic data types, etc. etc.

What they do have in common is a tight integration with the SQL language, which is where they shine.

But there are subtle differences, nonetheless. For example, they differ in where you can declare variables. Some have block scope, others don’t. And some adhere to the standard, where LEAVE requires a label, others don’t.

Imagine you write this “fantasy” jOOQ code

Name t = unquotedName("t");
Name a = unquotedName("a");
Variable<Integer> i = var(unquotedName("i"), INTEGER);

ctx.begin(
     insertInto(t).columns(a).values(1),
     declare(i).set(2),
     loop(
       insertInto(t).columns(a).values(i),
       i.set(i.plus(1)),
       if_(i.gt(10)).then(loop(exit()), exit())
     )
   )
   .execute();

This is just a more complicated version of the original loop, which inserts values 1-10 into a table. There’s no reason other than to show off the transformation capabilities for the nesting of loop(exit()), as well as the infinite LOOP with EXIT usage, rather than the indexed FOR loop.

There are a few things that don’t always work exactly like this in some dialects!

Let’s look at what Db2 does with this.

begin
  -- Variable declarations need to be "pulled up" to the beginning
  -- of the block, i.e. before the INSERT statement
  declare i integer;
  insert into t (a) values (1);

  -- While irrelevant to this example, the init value for the
  -- variable declaration must remain at the original location
  set i = 2;

  -- We need a label to be able to leave this loop
  alias_1:
  loop
    insert into t (a) values (i);
    set i = (i + 1);
    if i > 10 then

      -- Same here, a label is required
      alias_2:
      loop
        leave alias_2;
      end loop;
      leave alias_1;
    end if;
  end loop;
end

If we don’t use EXIT on a loop, then there won’t be a label. Or, you can obviously label your loops explicitly, which is always recommended. But sometimes, you don’t have that in your original source code.

What does Oracle do with this?

Oracle has a slightly different syntax here:

declare
  i number(10);
begin
  insert into t (a) values (1);
  i := 2;
  loop
    insert into t (a) values (i);
    i := (i + 1);
    if i > 10 then
      loop
        exit;
      end loop;
      exit;
    end if;
  end loop;
end;

The main difference being that the declaration is also pulled up, but a separate DECLARE block is required to declare variables outside of BEGIN .. END. Label-less EXIT is supported natively, so nothing needs to be transformed here.

If you’re interested in how this transformations work, read this post.

Conclusion

Whether you’re migrating off one dialect onto another, or whether you’re supporting several dialects at once, or you’re writing dynamic SQL and dynamic procedural logic, or you just like writing things in Java rather than native SQL, or you suffer from Conway’s Law and cannot store your procedural code easily, jOOQ can help you with those endeavours.

For a while now, jOOQ has supported procedural statements as anonymous blocks for the most popular dialects. Starting from jOOQ 3.15, we’ll also support storing this logic in the database in a dialect agnostic way, as well as parsing / translating procedural code on our website, or as a library / CLI or JDBC proxy to replace your SQL / procedural code ad-hoc in a legacy JDBC application.

Stay tuned for more in this exciting area of jOOQ development!

jOOQ Internals: Pushing up SQL fragments

Over the past 13 years, jOOQ has accrued quite some internal features, which you, the user, are not exposed to. One very interesting feature is the capability for any arbitrary jOOQ expression tree element to push a SQL fragment up to a higher level.

How does it work?

The jOOQ expression tree model

When you write a jOOQ query, you’re really creating an expression tree of your SQL statement that just happens to look like actual SQL syntax. For example:

ctx.select(BOOK.ID, BOOK.TITLE)
   .from(BOOK)
   .where(BOOK.AUTHOR_ID.eq(1))
   .fetch();

In jOOQ, this is nothing else than an expression tree of this form

                     Select (Statement)
                             |
          +------------------+-----------------+
          |                  |                 |
        Select             From              Where
          |                  |                 |
    +-----+------+           |                 |
    |            |           |                 |
TableField  TableField     Table        CompareCondition
  |     |     |     |        |           |     |       |
Table Field Table Field    BOOK    TableField  EQ     Val
  |     |     |     |                |     |           |
BOOK    ID   BOOK TITLE            Table Field         1
                                     |     |
                                    BOOK AUTHOR_ID

When generating the SQL, jOOQ mostly just traverses the tree brepth first (just kidding. It’s mostly depth first, but some child elements on the same level are often considered, prior to going down one level), collecting each element into a StringBuilder to the expected form:

SELECT book.id, book.title
FROM book
WHERE book.author_id = 1

Each of these so-called QueryPart expression tree elements can decide for itself how it’s going to render its SQL. For example, the CompareCondition will roughly generate this sequence:

<lhs> <operator> <rhs>

… further delegating SQL generation to its children, whatever they may be. A TableField will decide whether to fully / partially / or not qualify its Field reference at all, etc., e.g. based on the schema mapping (multi tenancy) feature.

If you’re using functions, such as Substring, that function can decide on its own how it should generate its SQL. As can be seen in the manual, these are all the same:

-- ACCESS
mid('hello world', 7)

-- ASE, SQLDATAWAREHOUSE, SQLSERVER
substring('hello world', 7, 2147483647)

-- AURORA_MYSQL, AURORA_POSTGRES, COCKROACHDB, CUBRID, H2, 
-- HANA, HSQLDB, IGNITE, MARIADB, MEMSQL, MYSQL, POSTGRES, REDSHIFT, 
-- SNOWFLAKE, SYBASE, VERTICA
substring('hello world', 7)

-- DB2, DERBY, INFORMIX, ORACLE, SQLITE
substr('hello world', 7)

-- FIREBIRD, TERADATA
substring('hello world' FROM 7)

This way, a jOOQ expression tree can emulate any syntax in any dialect.

But what if the emulation is not local?

Non local emulations

Sometimes, a QueryPart needs to assume the presence of a non-local syntactic element to work. A recent case was https://github.com/jOOQ/jOOQ/issues/11366.

When writing this procedural logic in jOOQ:

for_(i).in(1, 10).loop(
    insertInto(t).columns(a).values(i)
)

Of course, you wouldn’t do this. You’d write a bulk insert statement instead and solve this with SQL only! But you have your reasons, right?

Then, the indexed FOR loop may have to be emulated in some dialects, using an equivalent WHILE statement. So, instead of this straight forward procedural SQL generation, which we might get in Oracle for example:

FOR i IN 1 .. 10 LOOP
  INSERT INTO t (a) VALUES (i);
END LOOP;

… we generate this in MySQL, more or less:

DECLARE i INT DEFAULT 1;
WHILE i <= 10 DO
  INSERT INTO t (a) VALUES (i);
  SET i = i + 1;
END WHILE;

This can still be done completely locally, as shown before. There’s a FOR expression tree element that can locally generate the DECLARE and WHILE query parts, instead. But what if local variables aren’t possible? What if there’s no block scope, as in Firebird?

In Firebird, all your local variables must be declared in a top level declaration section. If we run the above in an anonymous block, the correct generated procedural SQL would be this:

EXECUTE BLOCK AS
  DECLARE i INT;
BEGIN
  :i = 1; -- The loop variable must still be initialised locally
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (a) VALUES (i);
    :i = :i + 1;
  END
END

This is still the case when we further nest the loop in procedural control flow elements, such as:

for_(i).in(1, 10).loop(
    for_(j).in(1, 10).loop(
        insertInto(t).columns(a, b).values(i, j)
    )
)

Of course, you still wouldn’t do this. You’d write a bulk insert statement from a cartesian product instead and solve this with SQL only! But alas, let’s keep the example simple

We now have nested variable declarations, which still work well in MySQL

DECLARE i INT DEFAULT 1;
WHILE i <= 10 DO 
  BEGIN
    DECLARE j INT DEFAULT 1;
    WHILE j <= 10 DO
      INSERT INTO t (a, b) VALUES (i, j);
      SET j = j + 1;
    END WHILE;
  END
  SET i = i + 1;
END WHILE;

But in Firebird, the declarations will both have to be pushed up to the top:

EXECUTE BLOCK AS
  DECLARE i INT;
  DECLARE j INT;
BEGIN
  :i = 1; -- The loop variable must still be initialised locally
  WHILE (:i <= 10) DO BEGIN
    :j = 1; -- The loop variable must still be initialised locally
    WHILE (:j <= 10) DO BEGIN
      INSERT INTO t (a, b) VALUES (i, j);
      :j = :j + 1;
    END
    :i = :i + 1;
  END
END

This doesn’t handle all the edge cases yet (e.g. some dialects allow for “hiding” local variables, such as PL/SQL) but it goes a long way already for simple procedures, functions, and triggers, all of which will be supported starting in jOOQ 3.15.

How does it work?

Alternative 1: Expression tree transformation

There are numerous ways to make such an expression tree transformation work. In the long run, we will re-design our internal expression tree model, and make it available also as a public API to those who wish to use the jOOQ parser and expression tree transformation as a standalone product. To some extent, this is already possible using the VisitListener SPI as shown in this post about row level security, but the current implementation is complex.

Also, it is relatively rare (so far) for an expression tree to require non-local transformation. This means that eagerly trying to look for possible candidates every time is probably overkill.

Alternative 2: Lazy expression tree transformation

We could transform the expression tree “lazily”, i.e. still assuming it is unnecessary, and when it is, throw an exception and start again. We actually do this, the “pattern” in jOOQ is called a ControlFlowSignal, and it is mainly used to work around the maximum number of bind parameters per statement limit of different dialects. I.e. we just count bind values, and if there are more than 2000 in SQL Server (2100 are supported by SQL Server, but only 2000 by jtds), then we just re-generate the SQL from scratch using inline values in a static statement.

As always with jOOQ, you can re-configure these limits to your own value.

Another case is when you forgot to turn on the ROWNUM to LIMIT transformation when migrating off Oracle’s old ROWNUM filtering. It would be silly to eagerly search for ROWNUM instances every time. Instead, we just re-generate the entire SQL query when we encounter one, and when you’re not using Oracle.

The assumption here is that these things happens very rarely, and if they do, you haven’t thought of it, and you don’t want the query to fail in production. The fact that a probably already slow query takes just a bit more time for jOOQ to generate is a price worth paying for the query to still Just Work™

Alternative 3: Patching generated SQL strings

Now, this is what we are actually doing.

Better assume almost all SQL transformations are local (as in the Substring example), and patch the SQL in case they’re not. In the end, we’re just generating SQL strings! And as such, why not introduce an infrastructure where we can put special markers to special text areas, and then replace that area with alternative SQL content.

Without the fix #11366, the generated code may have looked like this:

EXECUTE BLOCK AS
  -- Special marker here
BEGIN 
  -- Entering scope
  DECLARE i INT DEFAULT 1;
  WHILE (:i <= 10) DO BEGIN
    DECLARE j INT DEFAULT 1;
    WHILE (:j <= 10) DO BEGIN
      INSERT INTO t (a, b) VALUES (i, j);
      :j = :j + 1;
    END
    :i = :i + 1;
  END
  -- Exiting scope
END

This doesn’t work in Firebird, so we apply the fix. Note there’s a cheap, special marker that is placed by the SQL generation of anonymous blocks, but also for procedures, functions, and triggers, e.g.

CREATE FUNCTION func1()
RETURNS INTEGER
AS 
  -- Special marker here
BEGIN
  -- Entering scope
  RETURN 1;
  -- Exiting scope
END

Now, whenever the org.jooq.impl.DeclarationImpl query part generates its SQL locally, then, instead of generating something like:

DECLARE i INT DEFAULT 1;
DECLARE j INT;

We generate (locally)

:i = 1;
-- j isn't initialised here, so nothing needs to be done locally

At the same time, we push the org.jooq.impl.DeclarationImpl into some context variable that is visible to the entire scope (see “entering scope” and “exiting scope” comments).

As soon as we exit the scope, we must render all collected declarations, this time without the defaults, e.g.

DECLARE i INT;
DECLARE j INT;

… and then insert that rendered SQL right where the marker was located. All subsequent markers, if any, are then shifted by the difference of text lengths, of course.

Applications in jOOQ

This is currently used a few times within jOOQ:

  • To emulate calling Oracle PL/SQL functions with BOOLEAN parameters / return values. We patch the generated SQL producing a synthetic WITH clause with some BOOLEAN to NUMBER translation logic. Since Oracle 12c, Oracle supports PL/SQL embedded in WITH, which is quite a nifty feature!
  • The entire implicit JOIN feature is implemented this way! Markers delimit each table in the FROM clause (e.g. FROM BOOK), and if any path originating from any such marked table is encountered in the query (e.g. SELECT BOOK.AUTHOR.FIRST_NAME), then 1) instead of generating the path, a synthetic alias for the path is used to qualify the column, 2) instead of generating the FROM BOOK table, a synthetic LEFT JOIN or INNER JOIN is generated, joining all the necessary to-one relationships. An example will be shown below.
  • The above Firebird (and possibly also T-SQL, let’s see) procedural local variable scoping fixes are implemented this way.
  • A few emulations that require prepending SQL to the complete statement, such as CREATE OR REPLACE PROCEDURE x emulations that prepend DROP PROCEDURE x to CREATE PROCEDURE x work in a similar fashion. These types of emissions are “special” in that they add another statement to the statement batch. This means we’ll also have to be careful to skip any possible result sets or update counts this produces when calling the batch from JDBC.

Future applications may include:

  • More top level CTE, which are quite handy for a variety of emulations

An example for implicit joins:

SELECT
  book.author.first_name,
  book.author.last_name
FROM book -- Special marker around book

The above SQL doesn’t work in any dialect, it’s just jOOQ specific. We generate an alias for each unique path, based on the path’s hash code, so the query might look like this, instead:

SELECT
  -- The path isn't generated, but an alias for it is
  alias_xyz.first_name,
  alias_xyz.last_name
FROM (
  -- The marked "book" table is replaced by a join tree
  book 
    LEFT JOIN author AS alias_xyz 
    ON book.author_id = author.id
)

We simply replace book by (book LEFT JOIN ...) in the resulting SQL string. Thanks to our infrastructure of being able to define scopes and register context and variables for each scope, this works for arbitrary levels of nesting. We can always identify the appropriate book identifier for each path expression, even for things like this:

SELECT
  book.author.first_name,
  book.author.last_name,

  -- Different book tables here, because the nested scope 
  -- hides the "book" identifier from the outer scope
  (SELECT count(*) FROM book),
  (SELECT count(DISTINCT book.author.first_name) FROM book),

  -- Outer book, again
  (SELECT book.author.first_name)
FROM 
  book

The above being emulated like this, by patching two marked occurrences of book by the same join graph:

SELECT
  alias_xyz.first_name,
  alias_xyz.last_name,

  -- No patching done to this book
  (SELECT count(*) FROM book),

  -- The alias_xyz alias is used again, the path is the same
  (SELECT count(DISTINCT alias_xyz.first_name) 

  -- And the book table is patched again with the same left join
   FROM (
     book 
       LEFT JOIN author AS alias_xyz 
       ON book.author_id = author.id
  )),

  -- Outer book, again
  (SELECT alias_xyz.first_name)
FROM (
  book 
    LEFT JOIN author AS alias_xyz 
    ON book.author_id = author.id
)

It sounds as sophisticated as it sounds complex, but it works really really well.

Perhaps, in the future, expression tree transformation will be preferred over patching the result string. So far, with the current applications, this was the path of least resistance, and highest performance.