Should I Put That Table Alias or Not?

Infrequent SQL developers often get confused about when to put parentheses and/or aliases on derived tables. There has been this recent Reddit discussion about the subject, where user Elmhurstlol was wondering why they needed to provide an alias to the derived table (the subselect with the UNION) in the following query:

SELECT AVG(price) AS AVG_PRICE
FROM (
  SELECT price from product a JOIN pc b
  ON a.model=b.model AND maker='A'
  UNION ALL
  SELECT price from product a JOIN laptop c
  ON a.model=c.model and maker='A'
) hello

The question really was about why the "hello" table alias was necessary, because often it seems not to be required.

Here’s what the SQL standard states

If in doubt, it is often useful to consider the SQL standard about the rationale behind some syntax elements. In this case, let’s consider the freely available SQL 1992 standard text (for simplicity), and see how it specifies table references:

<table reference> ::=
    <table name> [ [ AS ] <correlation name>
        [ <left paren> <derived column list> 
          <right paren> ] ]
  | <derived table> [ AS ] <correlation name>
        [ <left paren> <derived column list>
          <right paren> ]
  | <joined table>

<derived table> ::= <table subquery>

<table subquery> ::= <subquery>

<subquery> ::= <left paren> <query expression>
               <right paren>

(for more information about the awesome and completely underused derived column list feature, read this article here)

The essence of the above syntax specification is this:

  • A derived table MUST always be aliased
  • The AS keyword is optional, for improved readability
  • The parentheses MUST always be put around subqueries

Following these rules, you’ll be pretty safe in most SQL dialects. Here are some deviations to the above, though:

  • Some dialects allow for unaliased derived tables. However, this is still a bad idea, because you will not be able to unambiguously qualify a column from such a derived table

Takeaway

Always provide a meaningful alias to your derived tables. As simple as that.

Oracle scalar subquery caching

The importance of being able to fully control executed SQL (using jOOQ, or plain JDBC) on large-scale systems becomes obvious every time you need to fine-tune your SQL queries for a specific RDBMS. In this case, we’re looking at Oracle and its miraculous scalar subquery caching mechanisms:

Usually, the context switch from SQL to PL/SQL and back is quite expensive for an average Oracle query, so we should normally omit putting stored functions as filtering, grouping, sorting criteria, even if they’re DETERMINISTIC. When we have large result sets, however, even functions in the query projection may turn out to be an execution plan nightmare. I recently raised the question on Stack Overflow, as I’m curious if there is any caching mechanism, that I could take advantage of for a concrete use-case:

http://stackoverflow.com/questions/7270467/is-there-a-pl-sql-pragma-similar-to-deterministic-but-for-the-scope-of-one-singl

And most interestingly, there are several. The accepted answer points to this very relevant article on Ask Tom’s Q/A:

http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51asktom-453438.html

Apparently, in scalar subqueries, function results may be cached! While this can be very useful for performance, it can also be very dangerous for consistency – as with any caching mechanism, but this one is highly implicit! So this will be cached:

-- my_function is NOT deterministic but it is cached!
select t.x, t.y, (select my_function(t.x) from dual)
from t

-- logically equivalent to this, uncached
select t.x, t.y, my_function(t.x) from t

Knowing this, I think I have to review quite a bit of SQL…