Let’s revise the SQL FROM clause

Intuitive SQL

SQL is extremely simple and yet at times, tricky. Most SQL developers have an intuitive (as opposed to formal) understanding of how the language works, for two reasons:
  1. It is designed “intuitively”, like a natural language. Maybe that keeps us from studying it more formally
  2. The formal language specification is not really freely available (except SQL-92), nor easy to read, nor necessary to get started
Of course, there are also those to whom SQL is a mystery. Let’s ignore those and have a look at how a simple SELECT statement works:

FROM my_table
JOIN my_other_table ON id1 = id2
JOIN my_third_table ON id2 = id3

This looks like we’re using 4 of the SELECT statement’s many clauses:
  • SELECT: specifying the projection
  • FROM: specifying the “main” table
  • JOIN: specifying several “joined” tables
  • WHERE: specifying the selection / filtering

Misinterpretation of the “JOIN clause”

But in fact, our intuition is slightly wrong. “Our intuition…” Of course, you might already know this but I feel it’s something we don’t think about every day… How are we wrong? By the fact that we tend to think there is a formal “JOIN” clause in a SELECT statement. There isn’t. Here’s a simplified T-SQL definition:

<query_specification> ::= 
SELECT [ ALL | DISTINCT ] < select_list > 
    [ FROM { <table_source> } [ ,...n ] ] 
    [ WHERE <search_condition> ] 
    [ <GROUP BY> ] 
    [ HAVING < search_condition > ] 

As you can see, there is no explicit JOIN clause in a SELECT statement, even if we tend to phrase our SQL in a way that would indicate so. Let’s have a closer look at the (simplified) definition of <table_source>

<table_source> ::= 
    table_or_view_name [ [ AS ] table_alias ]
    | user_defined_function [ [ AS ] table_alias ]
    | derived_table [ AS ] table_alias [ ( column_alias [ ,...n ] ) ] 
    | <joined_table> 
<joined_table> ::= 
    <table_source> <join_type> <table_source> ON <search_condition> 
    | <table_source> CROSS JOIN <table_source> 
    | [ ( ] <joined_table> [ ) ] 
<join_type> ::= 
    [ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]

So in fact, the true reading of our above SQL statement goes like this:

FROM ((my_table JOIN my_other_table ON id1 = id2)
                JOIN my_third_table ON id2 = id3)

We’re actually creating a tree of table sources, first by joining “my_other_table” to “my_table” and then by joining “my_third_table” to the previously created table source. When this is understood, the following statement will no longer be surprising:

FROM t1 JOIN t2 ON id1 = id2,
     t3 JOIN t4 ON id3 = id4,
     t5 JOIN t6 ON id5 = id6

Here, we select from an overall cross product of 5 table sources, some of which are actual tables, some of which are created from joined tables…

Implications for jOOQ

In jOOQ, this is about to be supported for release 2.0.3. The org.jooq.Table API will be enhanced by several join methods. You will be able to write things like this:

// Create a specific type of table source
Table<Record> tableSource =

// Use the above table source in a select statement

// Use several "complex" table sources

The existing syntax will be maintained as a fully compatible convenience API. These additions are an important step forward for creating more complex table sources, also involving Oracle (and SQL Server) PIVOT and UNPIVOT tables, about which I will blog in one of the next posts. Stay tuned!

Leave a Reply