How to Run a Bulk INSERT .. RETURNING Statement With Oracle and JDBC

When inserting records into SQL databases, we often want to fetch back generated IDs and possibly other trigger, sequence, or default generated values. Let’s assume we have the following table:

-- DB2
  j VARCHAR(50), 

-- PostgreSQL
  j VARCHAR(50), 

-- Oracle
  j VARCHAR2(50), 


DB2 is the only database currently supported by jOOQ, which implements the SQL standard according to which we can SELECT from any INSERT statement, including:

  VALUES ('a'), ('b'), ('c')

The above query returns:

I |J |K          |
1 |a |2018-05-02 |
2 |b |2018-05-02 |
3 |c |2018-05-02 |

Pretty neat! This query can simply be run like any other query in JDBC, and you don’t have to go through any hassles.

PostgreSQL and Firebird

These databases have a vendor specific extension that does the same thing, almost as powerful:

-- Simple INSERT .. RETURNING query
VALUES ('a'), ('b'), ('c')

-- If you want to do more fancy stuff
  VALUES ('a'), ('b'), ('c')

Both syntaxes work equally well, the latter is just as powerful as DB2’s, where the result of an insertion (or update, delete, merge) can be joined to other tables. Again, no problem with JDBC


In Oracle, this is a bit more tricky. The Oracle SQL language doesn’t have an equivalent of DB2’s FINAL TABLE (DML statement). The Oracle PL/SQL language, however, does support the same syntax as PostgreSQL and Firebird. This is perfectly valid PL/SQL

-- Create a few auxiliary types first

  -- These are the input values
  in_j t_j := t_j('a', 'b', 'c');
  out_i t_i;
  out_j t_j;
  out_k t_k;

  -- Use PL/SQL's FORALL command to bulk insert the
  -- input array type and bulk return the results
  FORALL i IN 1 .. in_j.COUNT
    INSERT INTO x (j)
    VALUES (in_j(i))
    RETURNING i, j, k
    BULK COLLECT INTO out_i, out_j, out_k;
  -- Fetch the results and display them to the console

A bit verbose, but it has the same effect. Now, from JDBC:

try (Connection con = DriverManager.getConnection(url, props);
    Statement s = con.createStatement();

    // The statement itself is much more simple as we can
    // use OUT parameters to collect results into, so no
    // auxiliary local variables and cursors are needed
    CallableStatement c = con.prepareCall(
        "DECLARE "
      + "  v_j t_j := ?; "
      + "BEGIN "
      + "  FORALL j IN 1 .. v_j.COUNT "
      + "    INSERT INTO x (j) VALUES (v_j(j)) "
      + "    RETURNING i, j, k "
      + "    BULK COLLECT INTO ?, ?, ?; "
      + "END;")) {

    try {

        // Create the table and the auxiliary types
            "CREATE TABLE x ("
          + "  j VARCHAR2(50),"
          + "  k DATE DEFAULT SYSDATE"
          + ")");
        s.execute("CREATE TYPE t_i AS TABLE OF NUMBER(38)");
        s.execute("CREATE TYPE t_j AS TABLE OF VARCHAR2(50)");
        s.execute("CREATE TYPE t_k AS TABLE OF DATE");

        // Bind input and output arrays
        c.setArray(1, ((OracleConnection) con).createARRAY(
            "T_J", new String[] { "a", "b", "c" })
        c.registerOutParameter(2, Types.ARRAY, "T_I");
        c.registerOutParameter(3, Types.ARRAY, "T_J");
        c.registerOutParameter(4, Types.ARRAY, "T_K");

        // Execute, fetch, and display output arrays
        Object[] i = (Object[]) c.getArray(2).getArray();
        Object[] j = (Object[]) c.getArray(3).getArray();
        Object[] k = (Object[]) c.getArray(4).getArray();

    finally {
        try {
            s.execute("DROP TYPE t_i");
            s.execute("DROP TYPE t_j");
            s.execute("DROP TYPE t_k");
            s.execute("DROP TABLE x");
        catch (SQLException ignore) {}

The above code will display:

[1, 2, 3]
[a, b, c]
[2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0]

Exactly what we wanted.

jOOQ support

A future version of will emulate the above PL/SQL block from the jOOQ INSERT .. RETURNING statement:

   .returning(X.I, X.J, X.K)

This will correctly emulate the query for all of the databases that natively support the syntax. In the case of Oracle, since jOOQ cannot create nor assume any SQL TABLE types, PL/SQL types from the DBMS_SQL package will be used

The relevant issue is here:

The Cost of JDBC Server Roundtrips

Or: Move That Loop into the Server Already!

This article will illustrate the significance of something that I always thought to be common sense, but I keep seeing people getting this (very) wrong in their productive systems. Chances are, in fact, that most applications out there suffer from this performance problem – and the fix is really easy.

Transferring data in bulk vs. transferring it row by row

This blog post is inspired by a recent Stack Overflow question that was asking about how to call Oracle’s DBMS_OUTPUT.GET_LINES from JDBC. The answer to that specific question can also be seen in a previous blog post here.

This time, I don’t want to discuss that particular Oracle-specific technical problem, but the performance aspect of whether to call:

  • DBMS_OUTPUT.GET_LINES: Which allows for fetching a bulk of server output into an array
  • DBMS_OUTPUT.GET_LINE: Which fetches a single line of server output into a string

While I will continue to discuss this Oracle-specific functionality, this blog post is in no way strictly related to Oracle. It is generally applicable (and again, it should be common sense) for any database, and in fact, any client-server architecture, or even any distributed architecture. The solution to such performance problems will almost always be that transferring a bulk of data is better than transferring it row by row.

Let’s run a benchmark!

The full benchmark logic can be seen in this gist. It includes the boring parts of actually calling the GET_LINE[S] procedures. The beef of the benchmark is this:

int max = 50;
long[] getLines = new long[max];
long[] getLine = new long[max];

try (Connection c = DriverManager.getConnection(url, properties);
    Statement s = c.createStatement()) {

    for (int warmup = 0; warmup < 2; warmup++) {
        for (int i = 0; i < max; i++) {
            s.executeUpdate("begin dbms_output.enable(); end;");
            String sql =
                "begin "
              + "for i in 1 .. 100 loop "
              + "dbms_output.put_line('Message ' || i); "
              + "end loop; "
              + "end;";
            long t1 = System.nanoTime();
            logGetLines(c, 100, () -> s.executeUpdate(sql));
            long t2 = System.nanoTime();
            logGetLine(c, 100, () -> s.executeUpdate(sql));
            long t3 = System.nanoTime();
            s.executeUpdate("begin dbms_output.disable(); end;");

            if (warmup > 0) {
                getLines[i] = t2 - t1;
                getLine[i] = t3 - t2;


What does it do in prose?

  • It contains a warmup loop whose first iteration doesn’t contribute to our measurements (this is always a good idea)
  • It runs the benchmarked logic 50 times
  • It generates 100 DBMS_OUTPUT.PUT_LINE messages for each run in an anonymous PL/SQL loop …
  • … and then fetches those 100 messages immediately with either 1 call to GET_LINES or 100 calls to GET_LINE
  • Finally, all the stored execution times are aggregated and printed out conveniently with Java 8 Stream’s summary statistics feature

So, in both cases, we’re generating and fetching 5000 messages.

Both methods using GET_LINES and GET_LINE respectively are functionally equivalent, i.e. the only difference is performance. Again the full benchmark can be seen here (which also benchmarks the effect of JDBC fetchSize, which is none, in this case).

The results are devastating:

{count=50, sum=  69120455, min= 1067521, average= 1382409.100000, max= 2454614}
{count=50, sum=2088201423, min=33737827, average=41764028.460000, max=64498375}

We’re gaining a factor of 30x in this benchmark run on my machine. The actual results may vary depending on your hardware and software (e.g. I’m running Oracle 12cR2 in Docker), but regardless of the setup, the results are always very significant.

Note that caching the prepared statement in the client yields better results (see the gist), but nowhere near as good as moving the loop to the server

Does this mean that GET_LINE is slow?

When looking at benchmark results, we must always be very very careful not to draw the wrong conclusions. There can be many reasons why benchmarks show differences in performance. One simple (albeit improbable) reason could be that the GET_LINE implementation simply sucks.

So, I’ve tried to re-implement this benchmark in pure PL/SQL. The full benchmark can be seen here. The beef of it is this:

FOR r IN 1..5 LOOP
  FOR i IN 1..v_repeat LOOP
    v_i := v_max;
    dbms_output.get_lines(v_array, v_i);
  INSERT INTO results VALUES (1, (SYSTIMESTAMP - v_ts));
  FOR i IN 1..v_repeat LOOP
    FOR j IN 1 .. v_max LOOP
      dbms_output.get_line(v_string, v_i);
  INSERT INTO results VALUES (2, (SYSTIMESTAMP - v_ts));

Where m() is:

  FOR i IN 1 .. v_max LOOP 
    dbms_output.put_line('Message ' || i);
END m;

The results are now rather different:

stmt    sum     avg      min     max
1       0.0609  0.01218  0.0073  0.0303
2       0.0333  0.00666  0.0063  0.007

This time, calling GET_LINE individually seems to have been 2x faster than the GET_LINES version. Again, it is important not to draw the wrong conclusions! This could be due to:

  • GET_LINES allocating an additional array copy of the original lines, which resides in the PGA, might be costly
  • GET_LINE might have profited from some additional optimisation because we’re never actually consuming the result in the benchmark

But the one thing we can conclude with certainty is: There’s no problem in GET_LINE, so calling it is not inherently worse than calling GET_LINES.

Which brings us back to the JDBC calls

While guess work and hypotheses are usually dangerous, in this case I’m certain of the reason why the JDBC based approach shows such drastic differences. Just watch this excellent talk by Toon Koppelaars from Oracle “NoPLSql and Thick Database Approaches with Toon Koppelaars”, where he explains this with some impressive flame graphs:

The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:

  • JVM overhead
  • JDBC logic
  • Network overhead
  • Various “outer” layers inside the Oracle database
  • Oracle’s API layers to get into the SQL and PL/SQL execution engines
  • The actual code running in the PL/SQL engine

In Toon’s talk (which again, you should definitely watch), the examples are running SQL code, not PL/SQL code, but the results are the same. The actual logic is relatively cheap inside of the database (as we’ve seen in the PL/SQL only benchmark), but the overhead is significant when calling database logic from outside the database.

Thus: It is very important to minimise that overhead

There are two ways to minimise that overhead:

  • The super hard way: Change and / or tweak the API technology, e.g. in Oracle’s case, using the C/OCI bindings can be much faster than JDBC
  • The easy way: Just move some data collection logic into the database and fetch data in bulk

Let me repeat this one more time:

Fetch (or send) data in bulk

… it’s almost always faster than processing things row-by-row (or as the Oracle folks call it: “slow-by-slow”).

And it does not matter at all, if that database logic is written in SQL or in a procedural language. The point is that accessing objects over the network (any network) is expensive, so you should minimise the access calls if ever possible.

As I’ve tweeted recently:

When calling logic over the network (any network), we should move logic to the data, not data to the logic. When working with RDBMS, we’re doing this through SQL (preferrably) or if SQL doesn’t suffice, we resort to using stored procedures.

When working with HTTP, we’re doing this with – well, it doesn’t have a name, but we should prefer making few physical HTTP calls that aggregate several logical API calls in order to transfer a lot of data in bulk.

When working with “map reduce” or “serverless” etc technology, we’re calling this “functions” or “lambdas”, which are just fancy, more modern names for stored procedures.


I’m not an expert in how to design complex distributed systems. This stuff is really hard. But I’ve spent many years working with RDBMS, which are also, in a very simple way, distributed systems (data on one server, client logic on another one).

A very significant amount of performance problems with RDBMS is related to the simple fact of clients making way too many calls to the database for what could be implemented in a single SQL query. Within the database, once your logic has reached the kernel, stuff gets executed really really fast. Adding more logic to a query is going to cause far less trouble than adding more queries.

Does your application take into account these things? Especially, if you’re using an ORM that generates the SQL for you, does it generate the right amount of queries, or are you suffering from “N+1 problems”? The main reason why ORM-based systems tend to be slow is because developers are not aware of the SQL their ORMs generate, e.g. due to excessive lazy loading, when a single SQL (or JPQL) query would have been a much better choice. It’s not the ORM’s fault. Most ORMs can get this right.

It’s the developer’s responsibility to think about where the logic should be executed. The rule of thumb is:

If you’re looping in the client to fetch individual things from the same server, you’re doing it wrong. Move the loop into the server.

And by doing so, you’ve written your first (ghasp) “stored procedure”. You’ll write many more, and you’ll love it, once you realise how much speed you’re gaining.

Or, in other words:

Update: Some criticism from the reddit discussion of this article

/u/cogman10 made good points in his comment warning about batching “too big” workloads, which is perfectly correct when batching write heavy tasks. Large batches may increase the contention inside of your database. If you’re working in an MVCC environment (such as Oracle), having transactions that span millions of updates will put a lot of pressure on the UNDO/REDO log, and all other sessions reading from the same data will feel that pain.

Also, when working with HTTP, beware of the fact that batches are harder to cache than individual requests. This article made the assumption that HTTP requests are:

  • Authorised – i.e. caching doesn’t even make sense as the authorisation might be revoked or the session terminated
  • Operating on similar resources, e.g. fetching a bulk of IDs in one go might be more sensible than fetching each ID individuall

… of course, as always, don’t follow advice you find on the internet blindly :) This article illustrated a common mistake. The fix isn’t always as simple as illustrated here, but often it really is.

Impress Your Coworkers With the Incredible NATURAL FULL OUTER JOIN!

There are already only very few real-world use-cases for FULL [ OUTER ] JOIN, but maybe, you have run into this beast in the past. But when was the last time you’ve seen a NATURAL JOIN? Right.

A quick reminder from our article about JOINs:


A FULL JOIN is a type of OUTER JOIN that retains data from both sides of the JOIN operation, regardless if the JOIN predicate matches or not. For example, querying the Sakila database:

SELECT first_name, last_name, title
FROM actor
FULL JOIN film_actor USING (actor_id)
FULL JOIN film USING (film_id)

The result will look something like this:

first_name   last_name   title
--[ LEFT JOIN ] -------------------------
Jon          Doe                          <-- Actors that didn't play in any film
Claire       Miller                      
--[ INNER JOIN ] ------------------------
Samuel       Jackson     Django Unchained <-- Actors played in many films
Samuel       Jackson     Pulp Fiction
John         Travolta    Pulp Fiction     <-- Films featured several actors
--[ RIGHT JOIN ]-------------------------
                         Meh              <-- Films that don't have actors

Note the special “markers” that delimit the parts that would have also been contributed by LEFT JOIN, INNER JOIN, RIGHT JOIN. In a way, a FULL JOIN combines them all.

Handy, occasionally


This one is less obviously useful. It looks useful at first, because if we carefully design our schemas as in the Sakila database used above, then we can simplify our joins as such:

SELECT first_name, last_name, title
FROM actor
NATURAL JOIN film_actor

Which is just syntax sugar for this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (<all common columns>)
JOIN film USING (<all common columns>)

So, a NATURAL JOIN will join two tables using all the columns they have in common. You’d think that this would be the useful outcome:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

Unfortunately, the outcome looks more like this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id, last_update)
JOIN film USING (film_id, last_update)

In other words: It makes no sense at all. In real world situations (and the Sakila database is already close enough to the real world), we’ll always have accidentally conflicting column names that will be considered for NATURAL JOIN, when they hsould not be.


… you’re asking. One is very rarely needed, and the other is pretty useless in the real world. Not so fast!

I recently helped a customer who had two very similar tables that both contained payments. Let’s call them: PAYMENT and STANDING_ORDER, the latter being a PAYMENT template which generates actual PAYMENTs periodically.

Now, there was some PL/SQL logic that operated on payments, which after some change request, needed to operate both on payments and/or on standing orders. If this was Java, we’d just (ab)use subtype polymorphism for the quick win, or factor out common logic into common component types for the thorough solution. For instance, these kinds of columns certainly appear in both cases:

“System” columns

  • ID – The surrogate key
  • CREATED_AT – Audit info
  • MODIFIED_AT – Audit info
  • DELETED – Soft deletion

“Business” columns


There are many more common columns. There are other columns that are specific to one or the other type:

Payment only

  • STANDING_ORDER_ID – Only payments reference the standing order that may have created them.

Standing order only

  • PERIODICITY – This is a standing order’s core feature.

Now, in order to refactor that PL/SQL logic, I needed a row type that combines all these columns easily. I didn’t want to manually track the exact columns of both tables, I wanted this to always work, even if the maintainers of the tables add/remove/rename columns.

NATURAL FULL JOIN to the rescue

I created a view like this:

CREATE VIEW like_a_payment
FROM payment
NATURAL FULL JOIN standing_order
WHERE 1 = 0

This view itself doesn’t do anything. It just sits there and produces the row type I need. Why? Because the NATURAL JOIN will join by ALL the common columns, namely ID, CREATED_AT, MODIFIED_AT, DELETED, AMOUNT, CURRENCY and then adds the columns from the left table, namely STANDING_ORDER_ID, and then from the right table, namely PERIODICITY.

I can now use this type in PL/SQL:

  paym like_a_payment%ROWTYPE;
  -- Do things...

But why FULL?

True, I didn’t need FULL yet, because I’m not producing any rows anyway. Correct. But observe how I will populate the PAYM local variable in the PL/SQL block:

  l_paym    like_a_payment%ROWTYPE;
  l_paym_id := 1;

  INTO l_paym
  FROM (
    SELECT * FROM payment WHERE id = l_paym_id
    SELECT * FROM standing_order WHERE id = l_paym_id

And I’m done! This could not have been done any more easily!

  • I’m using NATURAL because now I have to use it all the time to populate the record (or select individual columns, but meh)
  • I’m using FULL because only one of the two queries will produce a result, and the remaining columns will be NULL


Some observations

Of course, this is kind of a hack. Please don’t abuse this too often. Here’s some hints to follow:

  • The refactoring would have been substantial. I needed a quick win.
  • The logic actually needs almost all the columns from the payments. Otherwise, you might want to review that SELECT * usage.
  • The derived tables are necessary in Oracle for performance reasons.
  • The actual code contained two distinct queries, each one joining “one side” with an empty substitute query, as we know that the ID is either a payment or a standing order.

Having said so, now you know one of the very few real world use-cases of NATURAL FULL JOIN. Put this in your code base and impress your coworkers!

Use jOOQ to Read / Write Oracle PL/SQL RECORD Types

Some of the biggest limitations when working with Oracle PL/SQL from Java is the lack of support for a variety of PL/SQL features through the JDBC interface. This lack of support is actually not limited to JDBC, but also extends to Oracle SQL. For instance, if you’re using the useful PL/SQL BOOLEAN type as such:

  RETURN true;
END yes;

It would now be terrific if you could do this:

SELECT yes FROM dual;

But it’s not possible. You’ll be getting an error along the lines of the following:

ORA-06552: PL/SQL: Statement ignored
ORA-06553: PLS-382: expression is of wrong type
06552. 00000 -  "PL/SQL: %s"

It’s crazy to think that the Oracle SQL language still doesn’t support the SQL standard boolean type, which is so useful as I’ve shown in previous blog posts. Here’s where you can upvote the feature request:

BOOLEAN isn’t the only “inaccessible” SQL feature

Now, there are a couple of other data types, which cannot be “bridged” to the SQL engine, and thus (for some reason only the OJDBC driver gods can fathom) cannot be “bridged” to a JDBC client. Among them: The very useful PL/SQL RECORD type.

Very often, you want to do this:

  TYPE person IS RECORD (
    first_name VARCHAR2(50),
    last_name VARCHAR2(50)
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person;
END customers;

  FUNCTION get_customer(p_customer_id NUMBER) RETURN person IS
    v_person customers.person;
    SELECT c.first_name, c.last_name
    INTO v_person
    FROM customer c
    WHERE c.customer_id = p_customer_id;
    RETURN v_person;
  END get_customer;
END customers;

(we’re running this on the SAKILA database).

As in any language with the least bit of sophistication, we can define “structs” or records in PL/SQL, which we can now frequently reuse. Everyone knows what a PERSON is and we can pass them around between procedures and functions.

For instance, in PL/SQL client code:

  v_person customers.person;
  v_person := customers.get_customer(1);

… which yields:


What about JDBC client code?

After having added support for PL/SQL BOOLEAN types in jOOQ 3.9, with jOOQ 3.9, we now finally support PL/SQL record types in stored procedures as well, at least in standalone calls, which are not embedded in SQL statements. The jOOQ code generator will pick up all of these package-level PL/SQL record types and their structures and generate the boring boiler plate code for you. E.g. (simplified):

package org.jooq.demo.sakila.packages.customers.records;

public class PersonRecord extends UDTRecordImpl<PersonRecord> {
    public void   setFirstName(String value) { ... }
    public String getFirstName()             { ... }
    public void   setLastName(String value)  { ... }
    public String getLastName()              { ... }

    public PersonRecord() { ... }
    public PersonRecord(String firstName, String lastName) { ... }

Notice how jOOQ doesn’t really make any difference in its API between the generated code for an Oracle SQL OBJECT type or an Oracle PL/SQL RECORD type. They’re essentially the same thing (from a jOOQ API perspective).

More interesting, what happened to the generated package and the function? This code is generated (simplified):

public class Customers extends PackageImpl {
    public static PersonRecord getCustomer(
        Configuration configuration, Long pCustomerId
    ) { ... }

That’s all! So all we now need to do is pass the ubiquitous jOOQ Configuration (which contains information such as SQLDialect or JDBC Connection) and the actual stored function parameter, the P_CUSTOMER_ID value, and we’re done!

This is how jOOQ client code might look:

PersonRecord person = Customers.getCustomer(configuration(), 1L);

As you can see, this is just the same thing as the corresponding PL/SQL code. And the output of this println call is this:


A fully qualified RECORD declaration with schema, package, and type name.

How does it work?

Let’s turn on jOOQ’s built-in TRACE logging to see what jOOQ did behind the scenes:

Calling routine          : 
    ? := r1."FIRST_NAME";
    ? := r1."LAST_NAME";
Binding variable 1       : 1 (class java.lang.Long)
Registering variable 2   : class java.lang.String
Registering variable 3   : class java.lang.String
Fetched OUT parameters   : +-----------------+
                         : |RETURN_VALUE     |
                         : +-----------------+
                         : |('MARY', 'SMITH')|
                         : +-----------------+

So, jOOQ usually doesn’t use JDBC’s very limited escape syntax to call stored procedures, it just produces an anonymous PL/SQL block with a local variable of type CUSTOMER.PERSON, i.e. of our RECORD type. The function call is then assigned to this local variable, and the local variable is descructured into its individual parts.

In the TRACE log, you can see the individual bind variables, i.e. there’s an IN variable at index 1 of type Long, and two OUT variables of type String at indexes 2 and 3.

How does jOOQ know the record types?

At runtime, all the information is hard-wired to the generated code. So, the magic is inside of the code generator. Warning: Some serious SQL ahead!

This beauty here queries the dictionary views for PL/SQL record types:

AllArguments a ="a");
AllArguments x ="x");
Field<BigDecimal> nextSibling = field(name("next_sibling"), x.SEQUENCE.getDataType());

       a.ARGUMENT_NAME                                               .as(ALL_TYPE_ATTRS.ATTR_NAME),
       a.SEQUENCE                                                    .as(ALL_TYPE_ATTRS.ATTR_NO),
       a.TYPE_OWNER                                                  .as(ALL_TYPE_ATTRS.ATTR_TYPE_OWNER),
       nvl2(a.TYPE_SUBNAME, a.TYPE_NAME, inline(null, a.TYPE_NAME))  .as("package_name"),
       coalesce(a.TYPE_SUBNAME, a.TYPE_NAME, a.DATA_TYPE)            .as(ALL_TYPE_ATTRS.ATTR_TYPE_NAME),
       a.DATA_LENGTH                                                 .as(ALL_TYPE_ATTRS.LENGTH),
       a.DATA_PRECISION                                              .as(ALL_TYPE_ATTRS.PRECISION),
       a.DATA_SCALE                                                  .as(ALL_TYPE_ATTRS.SCALE))
           min(a.OWNER        ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.OWNER),
           min(a.PACKAGE_NAME ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.PACKAGE_NAME),
           min(a.SUBPROGRAM_ID).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SUBPROGRAM_ID),
           min(a.SEQUENCE     ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SEQUENCE),
           min(nextSibling    ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(nextSibling),
           min(a.DATA_LEVEL   ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.DATA_LEVEL))
              lead(a.SEQUENCE, 1, inline(new BigDecimal(99999999))).over(
                  partitionBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.DATA_LEVEL)
      .and(a.DATA_TYPE.eq("PL/SQL RECORD"))

This is a nice little jOOQ query, which corresponds to the following equially impressive SQL query, which you can run directly in your SQL developer, or some other SQL client for Oracle:

  nvl2("a"."TYPE_SUBNAME", "a"."TYPE_NAME", NULL) "package_name",
    MIN("next_sibling") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "next_sibling",
  FROM (
	  lead("a"."SEQUENCE", 1, 99999999) OVER (
	  ) "next_sibling",
    WHERE "a"."OWNER" IN ('SAKILA', 'SYS')     -- Possibly replace schema here
    ) "a"
  WHERE ("a"."TYPE_OWNER" IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."OWNER"         IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."DATA_TYPE"      = 'PL/SQL RECORD')
  ) "x"
 = (("x"."OWNER", "x"."PACKAGE_NAME", "x"."SUBPROGRAM_ID"))
AND "a"."SEQUENCE" BETWEEN "x"."SEQUENCE" AND "next_sibling"
AND "a"."DATA_LEVEL" = ("x"."DATA_LEVEL" + 1))
  "x"."TYPE_NAME" ASC,


The output shows that we got all the required information for our RECORD type:


All of this also works for:

  • Nested types
  • Multiple IN and OUT parameters

I’ll blog about a more advanced use-case in the near future, so stay tuned.

A Hidden jOOQ Gem: Foreach Loop Over ResultQuery

A recent question on Stack Overflow about jOOQ caught my attention. The question essentially asked:

Why do both of these loops work?

// With fetch()
for (MyTableRecord rec : DSL
    .fetch()) { // fetch() here


// Without fetch()
for (MyTableRecord rec : DSL
    .orderBy(MY_TABLE.COLUMN)) { // No fetch() here


And indeed, just like in PL/SQL, you can use any jOOQ ResultQuery as a Java 5 Iterable, because that’s what it is. An Iterable<R> where R extends Record.

The semantics is simple. When Iterable.iterator() is invoked, the query is executed and the Result.iterator() is returned. So, the result is materialised in the client memory just like if I called fetch(). Unsurprisingly, this is the implementation of AbstractResultQuery.iterator():

public final Iterator<R> iterator() {
    return fetch().iterator();

No magic. But it’s great that this works like PL/SQL:

FOR rec IN (SELECT * FROM my_table ORDER BY my_table.column)

Note, unfortunately, there’s no easy way to manage resources through Iterable, i.e. there’s no AutoCloseableIterable returning an AutoCloseableIterator, which could be used in an auto-closing try-with-resources style loop. This is why the entire result set needs to be fetched at the beginning of the loop. For lazy fetching, you can still use ResultQuery.fetchLazy()

try (Cursor<MyTableRecord> cursor = DSL
    .fetchLazy()) {

    for (MyTableRecord rec : cursor)

Happy coding!

jOOQ Tuesdays: Chris Saxon Explains the 3 Things Every Developer Should Know About SQL

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month (today, exceptionally on a Wednesday because of technical issues) where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.


I’m very excited to feature today Chris Saxon who has worked with Oracle forever, and who is one of the brains behind the famous Ask Tom website.

Chris, you’re part of the famous Ask Tom team. Everyone working with Oracle has wound up on Ask Tom’s website at least once. You’ve answered an incredible amount of questions already. What’s it like to work for such a big community as Oracle’s?

It’s an amazing experience! My first real job was as a PL/SQL developer. My only knowledge of SQL was a couple of vaguely remembered lectures at university. Ask Tom was the main place I learned about SQL and Oracle Database. So it’s a huge honor to be on the other side, helping others get the best out of the technology.

The best part has to be the positive comments when you help someone solve a problem that’s been troubling them for days. That’s why we’re here. To help developers learn more about Oracle and improve their SQL skills. When you use the database to its full extent, you can write better, faster applications with less code!

What were your three most interesting questions, so far?

Any question that has a clear definition and a complete test case is interesting! ;) Personally I enjoy using SQL to solve complex problems the best. So the first two do just that:

1. Finding the previous row in a different group

The poster had a series of transactions. These were grouped into two types. For each row, they wanted to show the id of the previous transaction from the other group.

At first this sounds like a problem you can solve using LAG or LEAD. But these only search for values within the same group. So you need a different method.

I provided a solution using the model clause. Using this, you can generate columns based on complex, spreadsheet-like formulas. Rows in your table are effectively cells in the sheet. You identify them by defining dimensions which can be other columns or expressions. By setting the transaction type as a dimension, you can then easily reference – and assign – values from one type to the other.

This worked well. But commenters were quick to provide solutions using window functions and 12c’s match_recognize clause. Both of which were faster than my approach!

I like this because it shows the flexibility of SQL. And it shows the value of an engaged community. No one knows everything. By sharing our knowledge and workin together we can all become better developers.

2. Improving SQL that deliberately generates cartesian product

The poster had a set of abbreviations for words. For example, Saint could also be “St.” or “St”. They wanted to take text containing these words. Then generate all combinations of strings using these abbreviations.

The “obvious” solution the user had is to split the text into words. Then for each word, join the abbreviation table, replacing the string as needed. So for a five word string, you have five joins.

There are a couple of problems with this method. The number of joins limits the number of words. So if you have a string with seven words, but only six table joins you won’t abbreviate the final word.

The other issue is performance. Joining the same table N times increases the work you do. If you have long sentences and/or a large number of abbreviations, the query could take a long time to run.

To overcome these you need to ask: “how can I join to the abbreviation table just once?”

The solution to do this starts the same as the original. Split the sentence into a table of words. Then join this to the abbreviations to give a row for each replacement needed.

You can then recursively walk down these rows using CTEs. These build up the sentence again, replacing words with their abbreviations as needed. A scalable solution that only needs a single pass of each table!

The final question relates to performance. Tom Kyte’s mantra was always “if you can do it in SQL, do it in SQL”. The reason is because a pure SQL solution is normally faster than one which combines SQL and other code. Yet a question came in that cast doubt on this:

3. Difference in performance SQL vs PL/SQL

The poster was updating a table. The new values came from another table. He was surprised that PL/SQL using bulk processing came out faster than the pure SQL approach.

The query in question was in the form:

update table1
set col1 = (select col2 from table2 where t1.code = t2.code);

It turned out the reason was due to “missing” indexes. Oracle executes the subquery once for every row in table1. Unless there’s an index on table2 (code), this will full scan table2 once for every row in table1!

The PL/SQL only approach avoided this problem by reading the whole of table2 into an array. So there was only one full scan of table2.

The problem here is there was no index on the join condition (t1.code = t2.code). With this in place Oracle does an index lookup of table2 for each row in table1. A massive performance improvement!

The moral being if your SQL is “slow”, particularly in compared to a combined SQL + other language method, it’s likely you have a missing index (or two).

This question again showed the strength and value of the Oracle community. Shortly after I posted the explanation, a reviewer was quick to point out the following SQL solution:

merge into table1
using  table2
on   (t1.code = t2.code)
when matched
  then update set t1.col = t2.col;

This came out significantly faster than both the original update and PL/SQL – without needing any extra indexes!

You’re running a YouTube channel called “The Magic of SQL”. Are SQL developers magicians?

Of course they are! In fact, I’d say that all developers are magicians. As Arthur C Clarke said:

“Any sufficiently advanced technology is indistinguishable from magic”

The amount of computing power you carry around in your phone today is mind blowing. Just ask your grandparents!

I think SQL developers have a special kind of magic though :). The ability to answer hard questions with a few lines of SQL is amazing. And for it to adapt to changes in the underlying data to give great performance without you changing it is astounding.

Your Twitter account has a pinned tweet about window functions. I frequently talk to Java developers at conferences, and few of them know about window functions, even if they’ve been in databases like Oracle for a very long time. Why do you think they’re still so “obscure”?

Oracle Database has had window functions has had them since the nineties. But many other RDBMSes have only fully supported them recently. So a combination of writing “database independent” code and people using other databases is certainly a factor.

Use of tools which hide SQL from developers is also a problem. If you’re not actively using SQL, it’s easy to overlook many of its features.

Fortunately I think this is changing. As more and more developers are realizing, SQL is a powerful language. Learning how to use it effectively is a key skill for all developers. Window functions and other SQL features mean you can get write better performing applications with less code. Who doesn’t want that? ;)

What are three things that every developer should know about SQL?

1. Understand set based processing

If you find yourself writing a cursor loop (select … from … loop), and inside that loop you run more SQL, you’re doing it wrong.

Think about it. Do you eat your cornflakes by placing one flake in your bowl, adding the milk, and eating that one piece? Then doing the same for the next. And the next. And so on? Or do you pour yourself a big bowl and eat all the flakes at once?

If you have a cursor loop with more SQL within the loop, you’re effectively doing this. There’s a lot of overhead in executing each SQL statement. This will slow you down if you have a large number of statements that each process one row. Instead you want few statements that process lots of rows where possible.

It’s also possible to do this by accident. As developers we’re taught that code reuse is A Good Thing. So if there’s an API available we’ll often use it. For example, say you’re building a batch process. This finds the unshipped orders, places them on shipments and marks them as sent.

If a ship_order function exists, you could write something like:

select order_id from unshipped_orders loop
  ship_order ( order_id );
end loop;

The problem here is ship_order almost certainly contains SQL. SQL you’ll be executing once for every order awaiting postage. If it’s only a few this may be fine. But if there’s hundreds or thousands this process could take a long time to run.

The way to make this faster is to process all the orders in one go. You can do this with SQL like:

insert into shipments
  select … from unshipped_orders;

update unshipped_orders
set  shipment_date = sysdate;

You may counter there’s other, non-SQL, processing you need to do such as sending emails. So you still need a query to find the order ids.

But you can overcome this! With update’s returning clause, you can get values from all the changed rows:

update unshipped_orders
set  shipment_date = sysdate
returning order_id bulk collect into order_array;

This gives you all the order ids to use as you need.

2. Learn what an execution plan is and how to create and read one

“How can I make my SQL faster” is one of the most common classes of questions posted on Ask Tom. The trouble is there’s scant one-size-fits-all advice when it comes to SQL performance. To help we need to know what your query is, what the tables and indexes are and details about the data. But most importantly we need to know what the query is actually doing!

For example, say you want me to help you figure out a faster route to work. To do this I need to know which route you currently use and how long each part of it takes. We can then compare this against other routes, checking how far they are, expected traffic and predicted speeds. But we need the data to do this!

So when answering performance questions, the first thing we look for is an execution plan. Many confuse this with an explain plan. An explain plan is just a prediction. Often it’s wrong. And even when it’s right, we still don’t know how much work each step does.

An execution plan shows exactly what the database did. It also gives stats about how much work, how often and how long it took to process each step. Combine this with a basic understanding of indexes and join methods and you can often solve your own performance problems.

3. Use bind variables

Sadly data breaches are all too common. There hardly seems to be a week that goes by without news of a major company leaking sensitive data. And the root cause of these attacks is often SQL injection.

This is a simple, well known attack vector. If you write vulnerable SQL on a web enabled application, eventually you’ll be attacked.

And this isn’t something you can avoid by using NoSQL databases. SQL injection like attacks are possible there too!

Fortunately the solution is easy: use bind variables. Not only do these secure your application, they can improve performance too.

Make sure your company is not tomorrow’s data leak headline. Use bind variables!

Last but not least: When will Oracle have a BOOLEAN type? :)

We have a BOOLEAN type! It’s just only in PL/SQL ;P

There’s currently a push in the community to for us to add a SQL BOOLEAN type. If this is a feature you’d like to see, you can vote for it on the Database Ideas forum. The more support there is, the more likely we are to implement it! But no promises ;)

Why I Completely Forgot That Programming Languages Have While Loops

I’ve recently made an embarassing discovery:

Yes. In all of my professional work with PL/SQL (and that has been quite a bit, in the banking industry), I have never really used a WHILE loop – at least not that I recall. The idea of a WHILE loop is simple, and it is available in most languages, like PL/SQL:

WHILE condition

Or Java:

while (condition)

So, why have I simply never used it?

Most loops iterate on collections

In hindsight, it’s crazy to think that it took Java until version 5 to introduce the foreach loop:

String[] strings = { "a", "b", "c" };
for (String s : strings)

It is some of Java’s most useful syntax sugar for the equivalent loop that uses a local loop variable that we simply don’t care about:

String[] strings = { "a", "b", "c" };
for (int i = 0; i < strings.length; i++)

Let’s be honest. When do we really want this loop variable? Hardly ever. Sometimes, we need to access the next string along with the current one, and we want to stick to the imperative paradigm for some reason (when we could do it more easily with functional programming APIs). But that’s it. Most loops simply iterate the entire collection in a very dumb and straightforward way.

SQL is all about collections

In SQL, everything is a table (see SQL trick #1 in this article), just like in relational algebra, everything is a set.

Now, PL/SQL is a useful procedural language that “builds around” the SQL language in the Oracle database. Some of the main reasons to do things in PL/SQL (rather than e.g. in Java) are:

  • Performance (the most important reason), e.g. when doing ETL or reporting
  • Logic needs to be “hidden” in the database (e.g. for security reasons)
  • Logic needs to be reused among different systems that all access the database

Much like Java’s foreach loop, PL/SQL has the ability to define implicit cursors (as opposed to explicit ones)

As a PL/SQL developer, when I want to loop over a cursor, I have at least these options:

Explicit cursors

  -- The loop variable
  row all_objects%ROWTYPE;

  -- The cursor specification
  CURSOR c IS SELECT * FROM all_objects;
  OPEN  c;
    FETCH c INTO row;
  CLOSE c;

The above would correspond to the following boring Java code that we wrote time and again prior to Java 5 (in fact, without the generics):

Iterator<Row> c = ... // SELECT * FROM all_objects
while (c.hasNext()) {
    Row row =;

The while loop is absolutely boring to write. Just like with the loop variable, we really don’t care about the current state of the iterator. We want to iterate over the whole collection, and at each iteration, we don’t care where we’re currently at.

Note that in PL/SQL, it is common practice to use an infinite loop syntax and break out of the loop when the cursor is exhausted (see above). In Java, this would be the corresponding logic, which is even worse to write:

Iterator<Row> c = ... // SELECT * FROM all_objects
for (;;) {
    if (!c.hasNext())
    Row row =;

Implicit cursors

Here’s how many PL/SQL developers do things most of the time:

  FOR row IN (SELECT * FROM all_objects)

The cursor is really an Iterable in terms of Java collections. An Iterable is a specification of what collection (Iterator) will be produced when the control flow reaches the loop. I.e. a lazy collection.

It’s very natural to implement external iteration in the above way.

If you’re using jOOQ to write SQL in Java (and you should), you can apply the same pattern in Java as well, as jOOQ’s ResultQuery type extends Iterable, which means it can be used as an Iterator source in Java’s foreach loop:

for (AllObjectsRecord row : ctx.selectFrom(ALL_OBJECTS))

Yes, that’s it! Focus on the business logic only, which is the collection specification (the query) and what you do with each row (the println statement). None of that cursor noise!

OK, but why no WHILE?

If you love SQL as much as me, you probably do that because you like the idea of having a declarative programming language to declare sets of data, just like SQL does. If you write client code in PL/SQL or Java, you will thus like to continue working on the entire data set and continue thinking in terms of sets. The imperative programming paradigm that operates on the intermediate object, the cursor, is not your paradigm. You don’t care about the cursor. You don’t want to manage it, you don’t want to open / close it. You don’t want to keep track of its state.

Thus, you will choose the implicit cursor loop, or the foreach loop in Java (or some Java 8 Stream functionality).

As you do that more often, you will run into less and less situations where the WHILE loop is useful. Until you forget about its mere existence.

WHILE LOOP, you won’t be missed.