Selecting all Columns Except One in PostgreSQL

Google’s BigQuery has a very interesting SQL language feature, which I’ve missed many times in other databases:

select:
    SELECT  [{ ALL | DISTINCT }]
        { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
            [ REPLACE ( expression [ AS ] column_name [, ...] ) ]
        | expression [ [ AS ] alias ] } [, ...]
    [ FROM from_item  [, ...] ]
    [ WHERE bool_expression ]
    ...

Notice the two keywords EXCEPT and REPLACE that can be used along with an asterisked expression.

An Example

For example, when running a query like this one (which fetches the longest film(s) every actor in the Sakila database played in):

SELECT *
FROM (
  SELECT 
    a.*, 
    f.*, 
    RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk
  FROM film f
  JOIN film_actor fa USING (film_id)
  JOIN actor a USING (actor_id)
) t
WHERE rk = 1
ORDER BY first_name, last_name

This is one way to apply TOP-N per category filtering in SQL, which works with most modern databases, including MySQL 8.0. Essentially, we’re calculating the rank of a film per actor ordered by the film’s length.

The result looks like this:

actor_id |first_name  |last_name    |..|title                  |length|..|rk |
---------|------------|-------------|..|-----------------------|------|--|---|
71       |ADAM        |GRANT        |..|GLADIATOR WESTWARD     |   173|..|1  |
71       |ADAM        |GRANT        |..|BALLROOM MOCKINGBIRD   |   173|..|1  |
132      |ADAM        |HOPPER       |..|TORQUE BOUND           |   179|..|1  |
165      |AL          |GARLAND      |..|JACKET FRISCO          |   181|..|1  |

Let’s assume for a moment that we really need the entire projection of the ACTOR and FILM tables (so, SELECT * is fine), but we certainly don’t need the useless RK column, which is always 1.

Sometimes, having some excess columns is not going to be a problem, but sometimes it is. How to remove it? We can’t reference the ACTOR and FILM tables anymore in the outer query:

SELECT a.*, f.* -- Would be great, but wrong syntax
FROM (
  SELECT 
    a.*, 
    f.*, 
    RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk
  FROM film f
  JOIN film_actor fa USING (film_id)
  JOIN actor a USING (actor_id)
) t
WHERE rk = 1
ORDER BY first_name, last_name

The outer query only has one table, and that’s the (derived) table T.

How to Solve This

In BigQuery syntax, we could now simply write

SELECT * EXCEPT rk
FROM (...) t
WHERE rk = 1
ORDER BY first_name, last_name

Which is really quite convenient! We want to project everything, except this one column. But none of the more popular SQL databases support this syntax.

Luckily, in PostgreSQL, we can use a workaround: Nested records:

SELECT (a).*, (f).* -- Unnesting the records again
FROM (
  SELECT 
    a, -- Nesting the actor table
    f, -- Nesting the film table
    RANK() OVER (PARTITION BY actor_id ORDER BY length DESC) rk
  FROM film f
  JOIN film_actor fa USING (film_id)
  JOIN actor a USING (actor_id)
) t
WHERE rk = 1
ORDER BY (a).first_name, (a).last_name;

Notice how we’re no longer projecting A.* and F.* inside of the derived table T, but instead, the entire table (record). In the outer query, we have to use some slightly different syntax to unnest the record again (e.g. (A).FIRST_NAME), and we’re done.

How Does This Work?

Informix, Oracle, PostgreSQL, and maybe a few lesser known ones, have implemented the SQL standard’s ORDBMS features to various degrees. ORDBMS attempted to combine relational and object oriented features in the SQL language (and in the storage model). For all practical purposes, this essentially just means that we can nest records and collections.

For instance, in PostgreSQL, we can write:

-- Explicit ROW constructor
SELECT 1, ROW(2, ROW(3, 4))

-- Implicit ROW constructor
SELECT 1, (2, (3, 4))

And we’ll get:

x        |row       |
---------|----------|
1        |(2,(3,4)) |

Along with ordinary “scalar” values, we can have nested rows (or nested tuples) constructed very easily. Conveniently, we can also reference a table without its column names in the projection, such as:

SELECT a, f
FROM film f
JOIN film_actor fa USING (film_id)
JOIN actor a USING (actor_id)

Which produces the aforementioned result:

a    |f    |
-----|-----|
(...)|(...)|
(...)|(...)|
(...)|(...)|
...

Similar things are possible in Oracle as well, except that Oracle doesn’t support structural row/tuple types, only nominal ones. We’d have to create some types first, prior to being able to use them:

CREATE TABLE film_t AS OBJECT (...);

Bonus

Of course, if you’re using SQL Server or Oracle, you wouldn’t have this problem, because then you could use the totally underrated WITH TIES clause along with CROSS APPLY:

SQL Server

SELECT *
FROM actor a
CROSS APPLY (
  SELECT TOP 1 WITH TIES f.*
  FROM film f
  JOIN film_actor fa 
    ON f.film_id = fa.film_id
	AND fa.actor_id = a.actor_id
  ORDER BY length DESC
) f
ORDER BY first_name, last_name;

Oracle

(Do check performance on this!)

SELECT *
FROM actor a
CROSS APPLY (
  SELECT f.*
  FROM film f
  JOIN film_actor fa 
    ON f.film_id = fa.film_id
	AND fa.actor_id = a.actor_id
  ORDER BY length DESC
  FETCH FIRST ROW WITH TIES
) f
ORDER BY first_name, last_name;

PostgreSQL and DB2 support the LATERAL keyword, which could be used with FETCH FIRST ROW ONLY semantics (so, no ties are selected).

For more details about TOP N per category queries, see this blog post

A Typesafety Comparison of SQL Access APIs

SQL is a very expressive and distinct language. It is one of the few declarative languages which are used by a broad audience in everyday work. As a declarative language, SQL allows to specify what we’re expecting as output, not how this output should be produced. As a side-effect of this, ad-hoc record data types are created by every statement. An example:

-- A (id: integer, title: varchar) type is created
SELECT id, title
FROM book;

The above statement generates a cursor whose records have a well-defined record type with these properties:

  • The degree of the record is 2
  • The column names are id and title
  • The column types are integer and varchar
  • The column id can be accessed at index 1. The column title can be accessed at index 2

In other words, SQL records combine features from records (access by name) and tuples (access by index). They can be seen like typesafe associative “map-arrays”, where map keys are formally bound to array indexes and their associated key/index type.

Another, more complex example shows how these ad-hoc record types can be reused within a SQL statement!

-- A (c: bigint) type is created
SELECT count(*) c
FROM book

-- A (..: integer, ..: integer) type is created and compared with...
WHERE (author_id, language_id) IN (

  -- ... another, compatible (..: integer, ..: integer) type
  SELECT a.id, a.language_id
  FROM author a
)

This query counts books written by authors in their native language.

In the above example, the projected record type is a bit simpler. It contains only one column. The interesting part is the row value expression IN comparison predicate, which compares two compatible (integer, integer) types. In SQL, you can typesafely create ad-hoc record types and immediately compare them with other ad-hoc record types. In these comparisons, column names are not important, but column indexes (and associated column types) are.

Comparing various SQL access APIs

The previous examples show how SQL allows for the formal declaration of record types including record degree, column names, column indexes, column types. While SQL is very expressive in that matter, many client languages accessing SQL are less expressive. When comparing expressiveness and typesafety, two features should be taken into consideration:

  1. Are the records produced into the client language typesafe?
  2. Are the SQL statements produced from the client language typesafe and syntax-safe?

Let’s have a look at various accessing techniques, and how expressive they are in terms of the above typesafety requirements:

JDBC: Least typesafety

JDBC offers the least expressiveness and typesafety. This isn’t surprising, as JDBC is a very low-level API. It offers:

  1. No typesafety whatsoever when accessing result records.
  2. No typesafety or syntax-safety whatsoever when producing SQL statements.

Here is an example:

PreparedStatement stmt = null;
ResultSet rs = null;

try {

  // SQL statements are just strings. Constructing them is not
  // typesafe or syntax-safe
  stmt = connection.prepareStatement(
    "SELECT id, title FROM book WHERE id = ?");

  // Bind values are set by index. There is no typesafety or
  // "index safety"
  stmt.setInt(1, 15);

  rs = stmt.executeQuery();
  while (rs.next()) {

    // There is no typesafety or "index safety" when accessing
    // result record values
    System.out.println(
      "ID: " + rs.getInt(1) + ", TITLE: " + rs.getString(2));
  }
}
finally {
  closeSafely(stmt, rs);
}

Now, this wasn’t surprising. JDBC makes up for the lack of typesafety by being absolutely general. It is possible to implement a JDBC driver for any type of relational database, no matter what kinds of SQL and JDBC features they really support.

JPA: Some typesafety

JPA has implemented quite a bit of typesafety mostly on top of JPQL, but also slightly on top of SQL. With JPA, you can have:

  1. Some typesafety when accessing records.
  2. Some typesafety and syntax-safety when producing JPQL statements through the CriteriaQuery API (not SQL statements).

Record access typesafety can be guaranteed when you project the outcome of your statements onto your JPA-annotated entities. While the mapping itself isn’t really typesafe, the outcome is, as a Java class is the closest match to a SQL record. A Java class, much like a SQL record, has:

  • A degree, expressed in the number of properties
  • Column names, expressed as property names
  • Column types, expressed as property types
  • But: No column indexes. Properties have no explicit order

JPA record mapping has additional features that exceed the expressiveness of SQL, as “flat”, tabular result sets can be mapped onto object hierarchies. In any case, you will have to create one record / entity type per query to profit from this typesafety. If you’re not projecting all columns from every table, but ad-hoc records (including values derived from functions), you will lose this typesafety again.

When it comes to statement typesafety, JPA offers the CriteriaQuery API to produce typesafe JPQL statements. The CriteriaQuery API is often criticised for its verboseness and for the fact that resulting client code is hard to read. Here is an example taken from the CriteriaQuery API docs:

CriteriaQuery<String> q = cb.createQuery(String.class);
Root<Order> order = q.from(Order.class);
q.select(order.get("shippingAddress").<String>get("state"));
 
CriteriaQuery<Product> q2 = cb.createQuery(Product.class);
q2.select(q2.from(Order.class)
            .join("items")
            .<Item,Product>join("product"));

It can be seen that there is only a limited amount of typesafety in the above query construction:

  • Columns are accessed by string literals, such as "shippingAddress".
  • Generic entity types are not really checked. The <Item,Product> generic parameters might as well be wrong.

Of course, there are more typesafe API parts in JPA’s CriteriaQuery API. Using those API parts quickly lead to the aforementioned verbosity, though, as can be seen in this Stack Overflow question, or in the Java EE 6 Tutorials.

LINQ: Much typesafety (in .NET)

LINQ goes very far in offering typesafety in both dimensions:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing LINQ-to-SQL statements (not SQL statements).

As LINQ is formally integrated into various .NET languages, it has the advantage of being able to produce formally defined record types, directly into the target language (e.g. C#). Not only can typesafe records be produced, the LINQ-to-SQL statement is formally verified by the compiler as well. An example

// Typesafe renaming (aliasing with "AS" in SQL)
From p In db.Products
// Typesafe (named!) variable binding
Where p.UnitsInStock <= ReorderLevel AndAlso Not p.Discontinued
// The typesafe projection will produce a Products record
Select p

Another example from Stack Overflow can be seen here:

// Producing a C# tuple
var r = from u in db.Users
        join s in db.Staffs on u.Id equals s.UserId
        select new Tuple<User, Staff>(u, s);

// Producing an anonymous record type
var r = from u in db.Users
    select new { u.Name, 
                 u.Address,
                 ...,
                 (from s in db.Staffs 
                  select s.Password where u.Id == s.UserId) 
               };

LINQ has many obvious advantages when it comes to typesafety. In the case of LINQ, this comes at the price of losing actual SQL expressivity and syntax, as LINQ-to-SQL is not really SQL (just as JPQL is not really SQL either). The SQL querying API is partially shared with other, heterogeneous querying targets, such as LINQ-to-Entities, LINQ-to-Collections, LINQ-to-XML. This will reduce LINQ’s feature scope (see also a previous blog post, and I will soon blog about this again).

But C# offers all typesafety aspects that a SQL record offers as well: degree, column name (anonymous types), column index (tuples), column types (both types and tuples).

SLICK: Much typesafety (in Scala)

SLICK has been inspired by LINQ, and can thus offer a lot of typesafety as well. It offers:

  1. Much typesafety when accessing tuples (not records).
  2. Much typesafety when producing SLICK statements (not SQL statements).

SLICK takes advantage of Scala’s integrated tuple expressions. This is best shown by example:

// "for" is the "entry-point" to the DSL
val q = for {

    // FROM clause   WHERE clause
    c <- Coffees     if c.supID === 101

// SELECT clause and projection to a tuple
} yield (c.name, c.price)

The above example shows that the projection onto a (String, Int) tuple is done typesafely by the yield method. At the same time, the whole query expression is formally validated by the compiler, as SLICK makes heavy use of Scala’s language features in order to introduce an internal DSL for querying. Much more than LINQ, SLICK has a unique syntax that doesn’t remind of SQL any more. It is not obvious how subqueries, complex joins, grouping and aggregation can be expressed.

jOOQ: Much typesafety

jOOQ is mainly inspired by SQL itself and embraces all the features that SQL offers. It has thus:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing SQL statements.

jOOQ offers similar capabilities as JPA when it comes to mapping SQL result sets onto records, although JPA’s mapping type hierarchies are not supported by jOOQ. But jOOQ also allows for typesafe tuple access, the way SLICK has implemented it. Ad-hoc records produced by arbitrary query projections will maintain their various column types through generic Record1<T1>, Record2<T1, T2>, Record3<T1, T2, T3>, … record types. Unlike in Java, this can be leveraged extensively in Scala, where these typesafe Record[N] types can be used just like Scala’s tuples.

On the other hand, just like LINQ-to-SQL, which has formally integrated querying as a first-class citizen into .NET languages, jOOQ allows for heavy type-checking and syntax-checking, when writing SQL statements in Java.

In SQL, you can typesafely write things like:

SELECT * FROM t WHERE (t.a, t.b) = (1, 2)
SELECT * FROM t WHERE (t.a, t.b) OVERLAPS (date1, date2)
SELECT * FROM t WHERE (t.a, t.b) IN (SELECT x, y FROM t2)
UPDATE t SET (a, b) = (SELECT x, y FROM t2 WHERE ...)
INSERT INTO t (a, b) VALUES (1, 2)

In jOOQ 3.0, you can (also typesafely!) write

select().from(t).where(row(t.a, t.b).eq(1, 2));
// Type-check here: ----------------->  ^^^^
 
select().from(t).where(row(t.a, t.b).overlaps(date1, date2));
// Type-check here: ------------------------> ^^^^^^^^^^^^
 
select().from(t).where(row(t.a, t.b).in(select(t2.x, t2.y).from(t2)));
// Type-check here: -------------------------> ^^^^^^^^^^
 
update(t).set(row(t.a, t.b), select(t2.x, t2.y).where(...));
// Type-check here: --------------> ^^^^^^^^^^

insertInto(t, t.a, t.b).values(1, 2);
// Type-check here: ---------> ^^^^

This also applies for existing API, which doesn’t involve row value expressions:

select().from(t).where(t.a.eq(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^
 
select().from(t).where(t.a.eq(any(select(t2.x).from(t2)));
// Type-check here: -------------------> ^^^^
 
select().from(t).where(t.a.in(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^

select(t1.a, t1.b).from(t1).union(select(t2.a, t2.b).from(t2));
// Type-check here: -------------------> ^^^^^^^^^^

jOOQ is not SQL, but unlike other attempts of introducing SQL as an internal domain-specific language into host languages like Java, Scala, C#, jOOQ looks very much like SQL thanks to its unique fluent API technique, which informally follows an underlying BNF notation.

Even if Java offers less expressiveness than other languages like C# or Scala, jOOQ probably comes closest to both result record typesafety and SQL syntax safety in the Java world.

SQL Trouble with dummy tables

As I’m mostly using Oracle for work projects, the concept of the DUAL dummy table has become quite intuitive. I hardly ever think about the days when I was playing around with that table to find out its purpose (e.g. writing into it when DUAL was still a physical object, and thus killing the whole database…)

In many other RDBMS, there is no need for dummy tables, as you can issue statements like these:

SELECT 1;
SELECT 1 + 1;
SELECT SQRT(2);

These are the RDBMS, where the above is generally possible:

  • H2
  • MySQL
  • Ingres
  • Postgres
  • SQLite
  • SQL Server
  • Sybase ASE

In other RDBMS, dummy tables are required, like in Oracle. Hence, you’ll need to write things like these:

SELECT 1       FROM DUAL;
SELECT 1 + 1   FROM DUAL;
SELECT SQRT(2) FROM DUAL;

These are the RDBMS and their respective dummy tables:

  • DB2: SYSIBM.DUAL
  • Derby: SYSIBM.SYSDUMMY1
  • H2: Optionally supports DUAL
  • HSQLDB: INFORMATION_SCHEMA.SYSTEM_USERS
  • MySQL: Optionally supports DUAL
  • Oracle: DUAL
  • Sybase SQL Anywhere: SYS.DUMMY

Trouble when avoiding dummy tables

While some might find avoiding dummy tables in H2 or MySQL better as SQL becomes more readable, it is worth mentioning that you can run into trouble when doing so:

MySQL’s avoiding DUAL

Clauses such as the following one seem to cause trouble in MySQL in some contexts:

-- this can cause trouble
exists (select 1 where 1 = 0)

-- this will work
exists (select 1 from dual where 1 = 0)

Other similar clauses exist

Ingres has no DUAL, but would actually need it

In Ingres, you cannot have a WHERE, GROUP BY or HAVING clause without a FROM clause. Without dummy tables, you will have to create your own dummy subquery:

SELECT 1 WHERE 1 = 1

-- this is a workaround with a nested select
SELECT 1 FROM (SELECT 1) AS DUAL WHERE 1 = 1

In general, jOOQ will hide these facts from client code, allowing to always use the simple form without dummy table. You don’t have to worry about overly restrictive syntactic rules in some SQL dialects