Impress Your Coworkers With the Incredible NATURAL FULL OUTER JOIN!

There are already only very few real-world use-cases for FULL [ OUTER ] JOIN, but maybe, you have run into this beast in the past. But when was the last time you’ve seen a NATURAL JOIN? Right.

A quick reminder from our article about JOINs:

FULL JOIN

A FULL JOIN is a type of OUTER JOIN that retains data from both sides of the JOIN operation, regardless if the JOIN predicate matches or not. For example, querying the Sakila database:

SELECT first_name, last_name, title
FROM actor
FULL JOIN film_actor USING (actor_id)
FULL JOIN film USING (film_id)

The result will look something like this:

first_name   last_name   title
--[ LEFT JOIN ] -------------------------
Jon          Doe                          <-- Actors that didn't play in any film
Claire       Miller                      
--[ INNER JOIN ] ------------------------
Samuel       Jackson     Django Unchained <-- Actors played in many films
Samuel       Jackson     Pulp Fiction
John         Travolta    Pulp Fiction     <-- Films featured several actors
--[ RIGHT JOIN ]-------------------------
                         Meh              <-- Films that don't have actors

Note the special “markers” that delimit the parts that would have also been contributed by LEFT JOIN, INNER JOIN, RIGHT JOIN. In a way, a FULL JOIN combines them all.

Handy, occasionally

NATURAL JOIN

This one is less obviously useful. It looks useful at first, because if we carefully design our schemas as in the Sakila database used above, then we can simplify our joins as such:

SELECT first_name, last_name, title
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film

Which is just syntax sugar for this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (<all common columns>)
JOIN film USING (<all common columns>)

So, a NATURAL JOIN will join two tables using all the columns they have in common. You’d think that this would be the useful outcome:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

Unfortunately, the outcome looks more like this:

SELECT first_name, last_name, title
FROM actor
JOIN film_actor USING (actor_id, last_update)
JOIN film USING (film_id, last_update)

In other words: It makes no sense at all. In real world situations (and the Sakila database is already close enough to the real world), we’ll always have accidentally conflicting column names that will be considered for NATURAL JOIN, when they hsould not be.

OK, so why NATURAL FULL JOIN?

… you’re asking. One is very rarely needed, and the other is pretty useless in the real world. Not so fast!

I recently helped a customer who had two very similar tables that both contained payments. Let’s call them: PAYMENT and STANDING_ORDER, the latter being a PAYMENT template which generates actual PAYMENTs periodically.

Now, there was some PL/SQL logic that operated on payments, which after some change request, needed to operate both on payments and/or on standing orders. If this was Java, we’d just (ab)use subtype polymorphism for the quick win, or factor out common logic into common component types for the thorough solution. For instance, these kinds of columns certainly appear in both cases:

“System” columns

  • ID – The surrogate key
  • CREATED_AT – Audit info
  • MODIFIED_AT – Audit info
  • DELETED – Soft deletion

“Business” columns

  • AMOUNT
  • CURRENCY

There are many more common columns. There are other columns that are specific to one or the other type:

Payment only

  • STANDING_ORDER_ID – Only payments reference the standing order that may have created them.

Standing order only

  • PERIODICITY – This is a standing order’s core feature.

Now, in order to refactor that PL/SQL logic, I needed a row type that combines all these columns easily. I didn’t want to manually track the exact columns of both tables, I wanted this to always work, even if the maintainers of the tables add/remove/rename columns.

NATURAL FULL JOIN to the rescue

I created a view like this:

CREATE VIEW like_a_payment
SELECT *
FROM payment
NATURAL FULL JOIN standing_order
WHERE 1 = 0

This view itself doesn’t do anything. It just sits there and produces the row type I need. Why? Because the NATURAL JOIN will join by ALL the common columns, namely ID, CREATED_AT, MODIFIED_AT, DELETED, AMOUNT, CURRENCY and then adds the columns from the left table, namely STANDING_ORDER_ID, and then from the right table, namely PERIODICITY.

I can now use this type in PL/SQL:

DECLARE
  paym like_a_payment%ROWTYPE;
BEGIN
  -- Do things...
END;

But why FULL?

True, I didn’t need FULL yet, because I’m not producing any rows anyway. Correct. But observe how I will populate the PAYM local variable in the PL/SQL block:

DECLARE
  l_paym_id like_a_payment.id%TYPE;
  l_paym    like_a_payment%ROWTYPE;
BEGIN
  l_paym_id := 1;

  SELECT *
  INTO l_paym
  FROM (
    SELECT * FROM payment WHERE id = l_paym_id
  )
  NATURAL FULL JOIN (
    SELECT * FROM standing_order WHERE id = l_paym_id
  );
END;

And I’m done! This could not have been done any more easily!

  • I’m using NATURAL because now I have to use it all the time to populate the record (or select individual columns, but meh)
  • I’m using FULL because only one of the two queries will produce a result, and the remaining columns will be NULL

Excellent!

Some observations

Of course, this is kind of a hack. Please don’t abuse this too often. Here’s some hints to follow:

  • The refactoring would have been substantial. I needed a quick win.
  • The logic actually needs almost all the columns from the payments. Otherwise, you might want to review that SELECT * usage.
  • The derived tables are necessary in Oracle for performance reasons.
  • The actual code contained two distinct queries, each one joining “one side” with an empty substitute query, as we know that the ID is either a payment or a standing order.

Having said so, now you know one of the very few real world use-cases of NATURAL FULL JOIN. Put this in your code base and impress your coworkers!

Use jOOQ to Read / Write Oracle PL/SQL RECORD Types

Some of the biggest limitations when working with Oracle PL/SQL from Java is the lack of support for a variety of PL/SQL features through the JDBC interface. This lack of support is actually not limited to JDBC, but also extends to Oracle SQL. For instance, if you’re using the useful PL/SQL BOOLEAN type as such:

CREATE OR REPLACE FUNCTION yes RETURN boolean AS
BEGIN
  RETURN true;
END yes;
/

It would now be terrific if you could do this:

SELECT yes FROM dual;

But it’s not possible. You’ll be getting an error along the lines of the following:

ORA-06552: PL/SQL: Statement ignored
ORA-06553: PLS-382: expression is of wrong type
06552. 00000 -  "PL/SQL: %s"
*Cause:    
*Action:

It’s crazy to think that the Oracle SQL language still doesn’t support the SQL standard boolean type, which is so useful as I’ve shown in previous blog posts. Here’s where you can upvote the feature request:

https://community.oracle.com/ideas/2633

BOOLEAN isn’t the only “inaccessible” SQL feature

Now, there are a couple of other data types, which cannot be “bridged” to the SQL engine, and thus (for some reason only the OJDBC driver gods can fathom) cannot be “bridged” to a JDBC client. Among them: The very useful PL/SQL RECORD type.

Very often, you want to do this:

CREATE PACKAGE customers AS
  TYPE person IS RECORD (
    first_name VARCHAR2(50),
    last_name VARCHAR2(50)
  );
  
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person;
END customers;
/

CREATE PACKAGE BODY customers AS
  FUNCTION get_customer(p_customer_id NUMBER) RETURN person IS
    v_person customers.person;
  BEGIN
    SELECT c.first_name, c.last_name
    INTO v_person
    FROM customer c
    WHERE c.customer_id = p_customer_id;
    
    RETURN v_person;
  END get_customer;
END customers;
/

(we’re running this on the SAKILA database).

As in any language with the least bit of sophistication, we can define “structs” or records in PL/SQL, which we can now frequently reuse. Everyone knows what a PERSON is and we can pass them around between procedures and functions.

For instance, in PL/SQL client code:

SET SERVEROUTPUT ON
DECLARE
  v_person customers.person;
BEGIN
  v_person := customers.get_customer(1);
  
  dbms_output.put_line(v_person.first_name);
  dbms_output.put_line(v_person.last_name);
END;
/

… which yields:

MARY
SMITH

What about JDBC client code?

After having added support for PL/SQL BOOLEAN types in jOOQ 3.9, with jOOQ 3.9, we now finally support PL/SQL record types in stored procedures as well, at least in standalone calls, which are not embedded in SQL statements. The jOOQ code generator will pick up all of these package-level PL/SQL record types and their structures and generate the boring boiler plate code for you. E.g. (simplified):

package org.jooq.demo.sakila.packages.customers.records;

public class PersonRecord extends UDTRecordImpl<PersonRecord> {
    public void   setFirstName(String value) { ... }
    public String getFirstName()             { ... }
    public void   setLastName(String value)  { ... }
    public String getLastName()              { ... }

    public PersonRecord() { ... }
    public PersonRecord(String firstName, String lastName) { ... }
}

Notice how jOOQ doesn’t really make any difference in its API between the generated code for an Oracle SQL OBJECT type or an Oracle PL/SQL RECORD type. They’re essentially the same thing (from a jOOQ API perspective).

More interesting, what happened to the generated package and the function? This code is generated (simplified):

public class Customers extends PackageImpl {
    public static PersonRecord getCustomer(
        Configuration configuration, Long pCustomerId
    ) { ... }
}

That’s all! So all we now need to do is pass the ubiquitous jOOQ Configuration (which contains information such as SQLDialect or JDBC Connection) and the actual stored function parameter, the P_CUSTOMER_ID value, and we’re done!

This is how jOOQ client code might look:

PersonRecord person = Customers.getCustomer(configuration(), 1L);
System.out.println(person);

As you can see, this is just the same thing as the corresponding PL/SQL code. And the output of this println call is this:

SAKILA.CUSTOMERS.PERSON('MARY', 'SMITH')

A fully qualified RECORD declaration with schema, package, and type name.

How does it work?

Let’s turn on jOOQ’s built-in TRACE logging to see what jOOQ did behind the scenes:

Calling routine          : 
  DECLARE
    r1 "CUSTOMERS"."PERSON";
  BEGIN
    r1 := "CUSTOMERS"."GET_CUSTOMER"("P_CUSTOMER_ID" => ?);
    ? := r1."FIRST_NAME";
    ? := r1."LAST_NAME";
  END;
Binding variable 1       : 1 (class java.lang.Long)
Registering variable 2   : class java.lang.String
Registering variable 3   : class java.lang.String
Fetched OUT parameters   : +-----------------+
                         : |RETURN_VALUE     |
                         : +-----------------+
                         : |('MARY', 'SMITH')|
                         : +-----------------+

So, jOOQ usually doesn’t use JDBC’s very limited escape syntax to call stored procedures, it just produces an anonymous PL/SQL block with a local variable of type CUSTOMER.PERSON, i.e. of our RECORD type. The function call is then assigned to this local variable, and the local variable is descructured into its individual parts.

In the TRACE log, you can see the individual bind variables, i.e. there’s an IN variable at index 1 of type Long, and two OUT variables of type String at indexes 2 and 3.

How does jOOQ know the record types?

At runtime, all the information is hard-wired to the generated code. So, the magic is inside of the code generator. Warning: Some serious SQL ahead!

This beauty here queries the dictionary views for PL/SQL record types:

AllArguments a = ALL_ARGUMENTS.as("a");
AllArguments x = ALL_ARGUMENTS.as("x");
Field<BigDecimal> nextSibling = field(name("next_sibling"), x.SEQUENCE.getDataType());

DSL.using(configuration)
   .select(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME)
   .select(
       a.ARGUMENT_NAME                                               .as(ALL_TYPE_ATTRS.ATTR_NAME),
       a.SEQUENCE                                                    .as(ALL_TYPE_ATTRS.ATTR_NO),
       a.TYPE_OWNER                                                  .as(ALL_TYPE_ATTRS.ATTR_TYPE_OWNER),
       nvl2(a.TYPE_SUBNAME, a.TYPE_NAME, inline(null, a.TYPE_NAME))  .as("package_name"),
       coalesce(a.TYPE_SUBNAME, a.TYPE_NAME, a.DATA_TYPE)            .as(ALL_TYPE_ATTRS.ATTR_TYPE_NAME),
       a.DATA_LENGTH                                                 .as(ALL_TYPE_ATTRS.LENGTH),
       a.DATA_PRECISION                                              .as(ALL_TYPE_ATTRS.PRECISION),
       a.DATA_SCALE                                                  .as(ALL_TYPE_ATTRS.SCALE))
   .from(a)
   .join(table(
       select(
           a.TYPE_OWNER,
           a.TYPE_NAME,
           a.TYPE_SUBNAME,
           min(a.OWNER        ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.OWNER),
           min(a.PACKAGE_NAME ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.PACKAGE_NAME),
           min(a.SUBPROGRAM_ID).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SUBPROGRAM_ID),
           min(a.SEQUENCE     ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.SEQUENCE),
           min(nextSibling    ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(nextSibling),
           min(a.DATA_LEVEL   ).keepDenseRankFirstOrderBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.SEQUENCE).as(a.DATA_LEVEL))
      .from(table(
          select(
              lead(a.SEQUENCE, 1, a.SEQUENCE).over(
                  partitionBy(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID, a.DATA_LEVEL)
                 .orderBy(a.SEQUENCE)
              ).as("next_sibling"),
              a.TYPE_OWNER,
              a.TYPE_NAME,
              a.TYPE_SUBNAME,
              a.OWNER,
              a.PACKAGE_NAME,
              a.SUBPROGRAM_ID,
              a.SEQUENCE,
              a.DATA_LEVEL,
              a.DATA_TYPE)
         .from(a)
         .where(a.OWNER.in(getInputSchemata()))
      ).as("a"))
      .where(a.TYPE_OWNER.in(getInputSchemata()))
      .and(a.OWNER.in(getInputSchemata()))
      .and(a.DATA_TYPE.eq("PL/SQL RECORD"))
      .groupBy(a.TYPE_OWNER, a.TYPE_NAME, a.TYPE_SUBNAME)
   ).as("x"))
   .on(row(a.OWNER, a.PACKAGE_NAME, a.SUBPROGRAM_ID).eq(x.OWNER, x.PACKAGE_NAME, x.SUBPROGRAM_ID))
   .and(a.SEQUENCE.between(x.SEQUENCE).and(nextSibling))
   .and(a.DATA_LEVEL.eq(x.DATA_LEVEL.plus(one())))
   .orderBy(x.TYPE_OWNER, x.TYPE_NAME, x.TYPE_SUBNAME, a.SEQUENCE)
   .fetch();

This is a nice little jOOQ query, which corresponds to the following equially impressive SQL query, which you can run directly in your SQL developer, or some other SQL client for Oracle:

SELECT 
  "x"."TYPE_OWNER",
  "x"."TYPE_NAME",
  "x"."TYPE_SUBNAME",
  "a"."ARGUMENT_NAME" "ATTR_NAME",
  "a"."SEQUENCE" "ATTR_NO",
  "a"."TYPE_OWNER" "ATTR_TYPE_OWNER",
  nvl2("a"."TYPE_SUBNAME", "a"."TYPE_NAME", NULL) "package_name",
  COALESCE("a"."TYPE_SUBNAME", "a"."TYPE_NAME", "a"."DATA_TYPE") "ATTR_TYPE_NAME",
  "a"."DATA_LENGTH" "LENGTH",
  "a"."DATA_PRECISION" "PRECISION",
  "a"."DATA_SCALE" "SCALE"
FROM "SYS"."ALL_ARGUMENTS" "a"
JOIN (
  SELECT 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME",
    MIN("a"."OWNER") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "OWNER",
    MIN("a"."PACKAGE_NAME") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "PACKAGE_NAME",
    MIN("a"."SUBPROGRAM_ID") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SUBPROGRAM_ID",
    MIN("a"."SEQUENCE") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "SEQUENCE",
    MIN("next_sibling") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "next_sibling",
    MIN("a"."DATA_LEVEL") KEEP (DENSE_RANK FIRST
      ORDER BY "a"."OWNER" ASC, "a"."PACKAGE_NAME" ASC, "a"."SUBPROGRAM_ID" ASC, "a"."SEQUENCE" ASC) "DATA_LEVEL"
  FROM (
    SELECT 
	  lead("a"."SEQUENCE", 1, "a"."SEQUENCE") OVER (
	    PARTITION BY "a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID", "a"."DATA_LEVEL" 
		ORDER BY "a"."SEQUENCE" ASC
	  ) "next_sibling",
      "a"."TYPE_OWNER",
      "a"."TYPE_NAME",
      "a"."TYPE_SUBNAME",
      "a"."OWNER",
      "a"."PACKAGE_NAME",
      "a"."SUBPROGRAM_ID",
      "a"."SEQUENCE",
      "a"."DATA_LEVEL",
      "a"."DATA_TYPE"
    FROM "SYS"."ALL_ARGUMENTS" "a"
    WHERE "a"."OWNER" IN ('SAKILA', 'SYS')     -- Possibly replace schema here
    ) "a"
  WHERE ("a"."TYPE_OWNER" IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."OWNER"         IN ('SAKILA', 'SYS') -- Possibly replace schema here
  AND "a"."DATA_TYPE"      = 'PL/SQL RECORD')
  GROUP BY 
    "a"."TYPE_OWNER",
    "a"."TYPE_NAME",
    "a"."TYPE_SUBNAME"
  ) "x"
ON (("a"."OWNER", "a"."PACKAGE_NAME", "a"."SUBPROGRAM_ID")
 = (("x"."OWNER", "x"."PACKAGE_NAME", "x"."SUBPROGRAM_ID"))
AND "a"."SEQUENCE" BETWEEN "x"."SEQUENCE" AND "next_sibling"
AND "a"."DATA_LEVEL" = ("x"."DATA_LEVEL" + 1))
ORDER BY 
  "x"."TYPE_OWNER" ASC,
  "x"."TYPE_NAME" ASC,
  "x"."TYPE_SUBNAME" ASC,
  "a"."SEQUENCE" ASC

Whew.

The output shows that we got all the required information for our RECORD type:

[  TYPE INFO COLUMNS  ]   ATTR_NAME   ATTR_TYPE_NAME  LENGTH
SAKILA CUSTOMERS PERSON	  FIRST_NAME  VARCHAR2        50
SAKILA CUSTOMERS PERSON	  LAST_NAME   VARCHAR2        50

All of this also works for:

  • Nested types
  • Multiple IN and OUT parameters

I’ll blog about a more advanced use-case in the near future, so stay tuned.

A Hidden jOOQ Gem: Foreach Loop Over ResultQuery

A recent question on Stack Overflow about jOOQ caught my attention. The question essentially asked:

Why do both of these loops work?

// With fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetch()) { // fetch() here

    doThingsWithRecord(rec);
}

// Without fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)) { // No fetch() here

    doThingsWithRecord(rec);
}

And indeed, just like in PL/SQL, you can use any jOOQ ResultQuery as a Java 5 Iterable, because that’s what it is. An Iterable<R> where R extends Record.

The semantics is simple. When Iterable.iterator() is invoked, the query is executed and the Result.iterator() is returned. So, the result is materialised in the client memory just like if I called fetch(). Unsurprisingly, this is the implementation of AbstractResultQuery.iterator():

@Override
public final Iterator<R> iterator() {
    return fetch().iterator();
}

No magic. But it’s great that this works like PL/SQL:

FOR rec IN (SELECT * FROM my_table ORDER BY my_table.column)
LOOP
  doThingsWithRecord(rec);
END LOOP;

Note, unfortunately, there’s no easy way to manage resources through Iterable, i.e. there’s no AutoCloseableIterable returning an AutoCloseableIterator, which could be used in an auto-closing try-with-resources style loop. This is why the entire result set needs to be fetched at the beginning of the loop. For lazy fetching, you can still use ResultQuery.fetchLazy()

try (Cursor<MyTableRecord> cursor = DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetchLazy()) {

    for (MyTableRecord rec : cursor)
        doThingsWithRecord(rec);
}

Happy coding!

jOOQ Tuesdays: Chris Saxon Explains the 3 Things Every Developer Should Know About SQL

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month (today, exceptionally on a Wednesday because of technical issues) where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

chris-saxon-headshot[1]

I’m very excited to feature today Chris Saxon who has worked with Oracle forever, and who is one of the brains behind the famous Ask Tom website.

Chris, you’re part of the famous Ask Tom team. Everyone working with Oracle has wound up on Ask Tom’s website at least once. You’ve answered an incredible amount of questions already. What’s it like to work for such a big community as Oracle’s?

It’s an amazing experience! My first real job was as a PL/SQL developer. My only knowledge of SQL was a couple of vaguely remembered lectures at university. Ask Tom was the main place I learned about SQL and Oracle Database. So it’s a huge honor to be on the other side, helping others get the best out of the technology.

The best part has to be the positive comments when you help someone solve a problem that’s been troubling them for days. That’s why we’re here. To help developers learn more about Oracle and improve their SQL skills. When you use the database to its full extent, you can write better, faster applications with less code!

What were your three most interesting questions, so far?

Any question that has a clear definition and a complete test case is interesting! 😉 Personally I enjoy using SQL to solve complex problems the best. So the first two do just that:

1. Finding the previous row in a different group

The poster had a series of transactions. These were grouped into two types. For each row, they wanted to show the id of the previous transaction from the other group.

At first this sounds like a problem you can solve using LAG or LEAD. But these only search for values within the same group. So you need a different method.

I provided a solution using the model clause. Using this, you can generate columns based on complex, spreadsheet-like formulas. Rows in your table are effectively cells in the sheet. You identify them by defining dimensions which can be other columns or expressions. By setting the transaction type as a dimension, you can then easily reference – and assign – values from one type to the other.

This worked well. But commenters were quick to provide solutions using window functions and 12c’s match_recognize clause. Both of which were faster than my approach!

I like this because it shows the flexibility of SQL. And it shows the value of an engaged community. No one knows everything. By sharing our knowledge and workin together we can all become better developers.

2. Improving SQL that deliberately generates cartesian product

The poster had a set of abbreviations for words. For example, Saint could also be “St.” or “St”. They wanted to take text containing these words. Then generate all combinations of strings using these abbreviations.

The “obvious” solution the user had is to split the text into words. Then for each word, join the abbreviation table, replacing the string as needed. So for a five word string, you have five joins.

There are a couple of problems with this method. The number of joins limits the number of words. So if you have a string with seven words, but only six table joins you won’t abbreviate the final word.

The other issue is performance. Joining the same table N times increases the work you do. If you have long sentences and/or a large number of abbreviations, the query could take a long time to run.

To overcome these you need to ask: “how can I join to the abbreviation table just once?”

The solution to do this starts the same as the original. Split the sentence into a table of words. Then join this to the abbreviations to give a row for each replacement needed.

You can then recursively walk down these rows using CTEs. These build up the sentence again, replacing words with their abbreviations as needed. A scalable solution that only needs a single pass of each table!

The final question relates to performance. Tom Kyte’s mantra was always “if you can do it in SQL, do it in SQL”. The reason is because a pure SQL solution is normally faster than one which combines SQL and other code. Yet a question came in that cast doubt on this:

3. Difference in performance SQL vs PL/SQL

The poster was updating a table. The new values came from another table. He was surprised that PL/SQL using bulk processing came out faster than the pure SQL approach.

The query in question was in the form:

update table1
set col1 = (select col2 from table2 where t1.code = t2.code);

It turned out the reason was due to “missing” indexes. Oracle executes the subquery once for every row in table1. Unless there’s an index on table2 (code), this will full scan table2 once for every row in table1!

The PL/SQL only approach avoided this problem by reading the whole of table2 into an array. So there was only one full scan of table2.
 

The problem here is there was no index on the join condition (t1.code = t2.code). With this in place Oracle does an index lookup of table2 for each row in table1. A massive performance improvement!
 

The moral being if your SQL is “slow”, particularly in compared to a combined SQL + other language method, it’s likely you have a missing index (or two).

This question again showed the strength and value of the Oracle community. Shortly after I posted the explanation, a reviewer was quick to point out the following SQL solution:

merge into table1
using  table2
on   (t1.code = t2.code)
when matched
  then update set t1.col = t2.col;

This came out significantly faster than both the original update and PL/SQL – without needing any extra indexes!

You’re running a YouTube channel called “The Magic of SQL”. Are SQL developers magicians?

Of course they are! In fact, I’d say that all developers are magicians. As Arthur C Clarke said:

“Any sufficiently advanced technology is indistinguishable from magic”

The amount of computing power you carry around in your phone today is mind blowing. Just ask your grandparents!

I think SQL developers have a special kind of magic though :). The ability to answer hard questions with a few lines of SQL is amazing. And for it to adapt to changes in the underlying data to give great performance without you changing it is astounding.

Your Twitter account has a pinned tweet about window functions. I frequently talk to Java developers at conferences, and few of them know about window functions, even if they’ve been in databases like Oracle for a very long time. Why do you think they’re still so “obscure”?

Oracle Database has had window functions has had them since the nineties. But many other RDBMSes have only fully supported them recently. So a combination of writing “database independent” code and people using other databases is certainly a factor.

Use of tools which hide SQL from developers is also a problem. If you’re not actively using SQL, it’s easy to overlook many of its features.

Fortunately I think this is changing. As more and more developers are realizing, SQL is a powerful language. Learning how to use it effectively is a key skill for all developers. Window functions and other SQL features mean you can get write better performing applications with less code. Who doesn’t want that? 😉

What are three things that every developer should know about SQL?

1. Understand set based processing

If you find yourself writing a cursor loop (select … from … loop), and inside that loop you run more SQL, you’re doing it wrong.

Think about it. Do you eat your cornflakes by placing one flake in your bowl, adding the milk, and eating that one piece? Then doing the same for the next. And the next. And so on? Or do you pour yourself a big bowl and eat all the flakes at once?

If you have a cursor loop with more SQL within the loop, you’re effectively doing this. There’s a lot of overhead in executing each SQL statement. This will slow you down if you have a large number of statements that each process one row. Instead you want few statements that process lots of rows where possible.

It’s also possible to do this by accident. As developers we’re taught that code reuse is A Good Thing. So if there’s an API available we’ll often use it. For example, say you’re building a batch process. This finds the unshipped orders, places them on shipments and marks them as sent.

If a ship_order function exists, you could write something like:

select order_id from unshipped_orders loop
  ship_order ( order_id );
end loop;

The problem here is ship_order almost certainly contains SQL. SQL you’ll be executing once for every order awaiting postage. If it’s only a few this may be fine. But if there’s hundreds or thousands this process could take a long time to run.

The way to make this faster is to process all the orders in one go. You can do this with SQL like:

insert into shipments
  select … from unshipped_orders;

update unshipped_orders
set  shipment_date = sysdate;

You may counter there’s other, non-SQL, processing you need to do such as sending emails. So you still need a query to find the order ids.

But you can overcome this! With update’s returning clause, you can get values from all the changed rows:

update unshipped_orders
set  shipment_date = sysdate
returning order_id bulk collect into order_array;

This gives you all the order ids to use as you need.

2. Learn what an execution plan is and how to create and read one

“How can I make my SQL faster” is one of the most common classes of questions posted on Ask Tom. The trouble is there’s scant one-size-fits-all advice when it comes to SQL performance. To help we need to know what your query is, what the tables and indexes are and details about the data. But most importantly we need to know what the query is actually doing!

For example, say you want me to help you figure out a faster route to work. To do this I need to know which route you currently use and how long each part of it takes. We can then compare this against other routes, checking how far they are, expected traffic and predicted speeds. But we need the data to do this!

So when answering performance questions, the first thing we look for is an execution plan. Many confuse this with an explain plan. An explain plan is just a prediction. Often it’s wrong. And even when it’s right, we still don’t know how much work each step does.

An execution plan shows exactly what the database did. It also gives stats about how much work, how often and how long it took to process each step. Combine this with a basic understanding of indexes and join methods and you can often solve your own performance problems.

3. Use bind variables

Sadly data breaches are all too common. There hardly seems to be a week that goes by without news of a major company leaking sensitive data. And the root cause of these attacks is often SQL injection.

This is a simple, well known attack vector. If you write vulnerable SQL on a web enabled application, eventually you’ll be attacked.

And this isn’t something you can avoid by using NoSQL databases. SQL injection like attacks are possible there too!

Fortunately the solution is easy: use bind variables. Not only do these secure your application, they can improve performance too.

Make sure your company is not tomorrow’s data leak headline. Use bind variables!

Last but not least: When will Oracle have a BOOLEAN type? 🙂

We have a BOOLEAN type! It’s just only in PL/SQL ;P

There’s currently a push in the community to for us to add a SQL BOOLEAN type. If this is a feature you’d like to see, you can vote for it on the Database Ideas forum. The more support there is, the more likely we are to implement it! But no promises 😉

Why I Completely Forgot That Programming Languages Have While Loops

I’ve recently made an embarassing discovery:

Yes. In all of my professional work with PL/SQL (and that has been quite a bit, in the banking industry), I have never really used a WHILE loop – at least not that I recall. The idea of a WHILE loop is simple, and it is available in most languages, like PL/SQL:

WHILE condition
LOOP
   {...statements...}
END LOOP;

Or Java:

while (condition)
    statement

So, why have I simply never used it?

Most loops iterate on collections

In hindsight, it’s crazy to think that it took Java until version 5 to introduce the foreach loop:

String[] strings = { "a", "b", "c" };
for (String s : strings)
    System.out.println(s);

It is some of Java’s most useful syntax sugar for the equivalent loop that uses a local loop variable that we simply don’t care about:

String[] strings = { "a", "b", "c" };
for (int i = 0; i < strings.length; i++)
    System.out.println(strings[i]);

Let’s be honest. When do we really want this loop variable? Hardly ever. Sometimes, we need to access the next string along with the current one, and we want to stick to the imperative paradigm for some reason (when we could do it more easily with functional programming APIs). But that’s it. Most loops simply iterate the entire collection in a very dumb and straightforward way.

SQL is all about collections

In SQL, everything is a table (see SQL trick #1 in this article), just like in relational algebra, everything is a set.

Now, PL/SQL is a useful procedural language that “builds around” the SQL language in the Oracle database. Some of the main reasons to do things in PL/SQL (rather than e.g. in Java) are:

  • Performance (the most important reason), e.g. when doing ETL or reporting
  • Logic needs to be “hidden” in the database (e.g. for security reasons)
  • Logic needs to be reused among different systems that all access the database

Much like Java’s foreach loop, PL/SQL has the ability to define implicit cursors (as opposed to explicit ones)

As a PL/SQL developer, when I want to loop over a cursor, I have at least these options:

Explicit cursors

DECLARE
  -- The loop variable
  row all_objects%ROWTYPE;

  -- The cursor specification
  CURSOR c IS SELECT * FROM all_objects;
BEGIN
  OPEN  c;
  LOOP
    FETCH c INTO row;
    EXIT WHEN c%NOTFOUND;
    dbms_output.put_line(row.object_name);
  END LOOP;
  CLOSE c;
END;

The above would correspond to the following boring Java code that we wrote time and again prior to Java 5 (in fact, without the generics):

Iterator<Row> c = ... // SELECT * FROM all_objects
while (c.hasNext()) {
    Row row = c.next();
    System.out.println(row.objectName);
}

The while loop is absolutely boring to write. Just like with the loop variable, we really don’t care about the current state of the iterator. We want to iterate over the whole collection, and at each iteration, we don’t care where we’re currently at.

Note that in PL/SQL, it is common practice to use an infinite loop syntax and break out of the loop when the cursor is exhausted (see above). In Java, this would be the corresponding logic, which is even worse to write:

Iterator<Row> c = ... // SELECT * FROM all_objects
for (;;) {
    if (!c.hasNext())
        break;
    Row row = c.next();
    System.out.println(row.objectName);
}

Implicit cursors

Here’s how many PL/SQL developers do things most of the time:

BEGIN
  FOR row IN (SELECT * FROM all_objects)
  LOOP
    dbms_output.put_line(row.object_name);
  END LOOP;
END;

The cursor is really an Iterable in terms of Java collections. An Iterable is a specification of what collection (Iterator) will be produced when the control flow reaches the loop. I.e. a lazy collection.

It’s very natural to implement external iteration in the above way.

If you’re using jOOQ to write SQL in Java (and you should), you can apply the same pattern in Java as well, as jOOQ’s ResultQuery type extends Iterable, which means it can be used as an Iterator source in Java’s foreach loop:

for (AllObjectsRecord row : ctx.selectFrom(ALL_OBJECTS))
    System.out.println(row.getObjectName());

Yes, that’s it! Focus on the business logic only, which is the collection specification (the query) and what you do with each row (the println statement). None of that cursor noise!

OK, but why no WHILE?

If you love SQL as much as me, you probably do that because you like the idea of having a declarative programming language to declare sets of data, just like SQL does. If you write client code in PL/SQL or Java, you will thus like to continue working on the entire data set and continue thinking in terms of sets. The imperative programming paradigm that operates on the intermediate object, the cursor, is not your paradigm. You don’t care about the cursor. You don’t want to manage it, you don’t want to open / close it. You don’t want to keep track of its state.

Thus, you will choose the implicit cursor loop, or the foreach loop in Java (or some Java 8 Stream functionality).

As you do that more often, you will run into less and less situations where the WHILE loop is useful. Until you forget about its mere existence.

WHILE LOOP, you won’t be missed.

Using Stored Procedures With JPA, JDBC… Meh, Just Use jOOQ

The current edition of the Java magazine has an article about Big Data Best Practices for JDBC and JPA by Josh Juneau:
http://www.javamagazine.mozaicreader.com/MayJune2016

The article shows how to use a stored procedure with JDBC (notice how resources aren’t closed, unfortunately. This is commonly forgotten, even in Java Magazine articles)

// Using JDBC to call upon a database stored
// procedure
CallableStatement cs = null;
try {
    cs = conn.prepareCall("{call DUMMY_PROC(?,?)}");
    cs.setString(1, "This is a test");
    cs.registerOutParameter(2, Types.VARCHAR);
    cs.executeQuery();

    // Do something with result
    String returnStr = cs.getString(2);
} catch (SQLException ex){
    ex.printStackTrace();
}

And with JPA:

// Utilize JPA to call a database stored procedure
// Add @NamedStoredProcedureQuery to entity class
@NamedStoredProcedureQuery(
    name="createEmp", procedureName="CREATE_EMP",
    parameters = {
        @StoredProcedureParameter(
            mode= ParameterMode.IN,
            type=String.class,
            name="first"),
        @StoredProcedureParamter(
            mode = ParameterMode.IN,
            type=String.class,
            name="last")
    })

// Calling upon stored procedure
StoredProcedureQuery qry =
    em.createStoredProcedureQuery("createEmp");
qry.setParameter("first", "JOSH");
qry.setParameter("last","JUNEAU");
qry.execute();

Specifically the latter was also recently discussed in blog posts by Vlad Mihalcea and Thorben Janssen.

Do you like verbosity and complexity?

No? We neither. This is why we give you a third option instead: Just use jOOQ. Here’s the equivalent jOOQ code:

// JDBC example:
String returnStr = Routines.dummyProc(
    config, "This is a test");

// JPA example
Routines.createEmp(config, "JOSH", "JUNEAU");

Yes! That’s it. Don’t waste time manually configuring your bind variables with JDBC API calls, or JPA annotations. No one likes writing annotations for stored procedures. With jOOQ and jOOQ’s code generator, procedure calls are:

  • A one-liner
  • A no-brainer
  • A way to bring back the fun to stored procedures

Learn more about using Oracle stored procedures with nested collections and object types here:
https://blog.jooq.org/2014/11/04/painless-access-from-java-to-plsql-procedures-with-jooq

It’s the Little Things: The PL/SQL NULL Statement, and why Every Language Should have One

Syntax is one of those topics. One of those emotional topics that lead to very very very important discussions. I personally like PL/SQL. It is extremely verbose, and precise. It forces you to adhere to a very strong and rigid type system, slowing you down, which is likely to help you avoid mistakes.

There is one thing in PL/SQL that I like in particular. There is no such thing as an empty block.

While in Java, you could write:

// Just an empty block:
{}

// An empty block with a label:
label1: {}

// Or, in fact, the empty statement:
;
label2: ;

The problem with the above from a mere syntactic perspective is that an empty block may have been left unintentionally empty. An empty statement may not even be visible at all. Consider this in the context of an if statement:

if (something) {

}

if (somethingElse) ;

In PL/SQL, this isn’t possible. There is no such thing as an empty block. The following doesn’t compile:

BEGIN
END;

Nope:

ORA-06550: line 2, column 1:
PLS-00103: Encountered the symbol "END" when expecting ...

Or, take the IF statement. No emptiness allowed here, either:

IF 1 = 1 THEN
END IF;

Doesn’t work.

If you DO want to create empty blocks (e.g. as placeholders for code you’ll write later on), you’ll have to explicitly put a dummy statement there. PL/SQL has such a statement. The NULL statement:

BEGIN
  NULL;
END;

IF 1 = 1 THEN
  NULL;
END IF;

That makes sense. You immediately see: OK, nothing going on here.

Conclusion

Verbosity helps decrease the number of bugs in your code. The less people can be concerned with syntax (see again, the discussion about very very very important topics), the more they can focus on what they really intend to write. Let’s swallow our pride. When we get used to a language, we’ll accept ANY language. They’re all flawed and quirky. It doesn’t matter. But at least, the language should keep us from arguing about different ways to style it.

Now, let me go reformat all that moron’s lower-case PL/SQL code. Who the hell would write lower-case begin and end!??