The SQL standard is a nice thing. But it’s a standard. And as such, while it may provide good guidelines on how to implement some SQL features, most dialects deviate from it in one way or another (sometimes drastically, cheers, MySQL).
But that doesn’t have to be a bad thing. Innovation is not driven by standards, but by individual vendors’ attempts to tackle a problem from a different perspective. And then, sometimes, the innovation becomes the standard. One example for this is Oracle’s very very interesting MATCH_RECOGNIZE feature, on which Markus Winand did an excellent presentation.
Other functionality is not standardised, such as Oracle/SQL Server’s PIVOT and UNPIVOT.
In many cases, vendor-specific functionality can be translated to equivalent standard SQL, or to other vendor-specific SQL. That’s one of jOOQ’s core features: The “standardisation” of currently 21 SQL dialects into a single Java API. Since jOOQ 3.9, the Java API can also be hidden behind a parser, which makes visualising the differences much simpler. If you want to play around with the following examples, do visit https://www.jooq.org/translate to see our online SQL translator in action!
Here are Top 10 SQL Dialect Emulations Implemented in jOOQ:
1. Empty FROM clause
The SQL standard doesn’t allow this, but many databases do. You have to specify a FROM
clause in at least these databases
- Access
- CUBRID
- DB2
- Derby
- Firebird
- HANA
- HSQLDB
- Informix
- Ingres
- MariaDB
- MySQL (not always)
- Oracle
- Sybase SQL Anywhere
These ones don’t really need a FROM
clause:
- H2
- PostgreSQL
- Redshift
- SQL Server
- SQLite
- Sybase ASE
- Vertica
An example of such a query could be the following query that retrieves the server timestamp:
SELECT current_timestamp
In Oracle, you’d have to add the DUAL
table:
SELECT current_timestamp FROM dual
There are other possibilities to emulate this in other databases. If you want to see how jOOQ does it, again, either visit our online SQL translator, or run this little code snippet yourself, locally (be sure to report any issues you may find, greatly appreciated!):
import org.jooq.Query;
import org.jooq.SQLDialect;
import org.jooq.impl.DSL;
public class Translate {
public static void main(String[] args) {
Query query = DSL.using(SQLDialect.DEFAULT)
.parser()
.parseQuery("SELECT current_timestamp");
for (SQLDialect family : SQLDialect.families()) {
System.out.println(String.format(
"%1$-13s: %2$s", family,
DSL.using(family).render(query)
));
}
}
So, running the above snippet will yield (and observe, free current_timestamp
translation:
ACCESS : select now() from (select count(*) dual from MSysResources) as dual ASE : select current_bigdatetime() CUBRID : select current_timestamp() from "db_root" DB2 : select current_timestamp from "SYSIBM"."DUAL" DERBY : select current_timestamp from "SYSIBM"."SYSDUMMY1" FIREBIRD : select current_timestamp from "RDB$DATABASE" H2 : select current_timestamp() HANA : select current_timestamp from "SYS"."DUMMY" HSQLDB : select current_timestamp from ( select 1 as dual from information_schema.system_users limit 1 ) as dual INFORMIX : select current from ( select 1 as dual from systables where tabid = 1 ) as dual INGRES : select current_timestamp from (select 1 as dual) as dual MARIADB : select current_timestamp() from dual MYSQL : select current_timestamp() from dual ORACLE : select current_timestamp from dual POSTGRES : select current_timestamp REDSHIFT : select current_timestamp SQLITE : select current_timestamp SQLSERVER : select current_timestamp SYBASE : select current timestamp from [SYS].[DUMMY] VERTICA : select current_timestamp
See also the jOOQ manual’s section about the dual table.
2. LIMIT .. OFFSET
First off, don’t use OFFSET
.
Since you didn’t listen and you’re still using OFFSET
, let’s discuss how to emulate it (and the much more useful LIMIT
in many database dialects.
The SQL:2016 standard syntax is:
<result offset clause> ::= OFFSET <offset row count> { ROW | ROWS } <fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES } <fetch first quantity> ::= <fetch first row count> | <fetch first percentage> <offset row count> ::= <simple value specification> <fetch first row count> ::= <simple value specification> <fetch first percentage> ::= <simple value specification> PERCENT
So, there are a few interesting features:
- The
OFFSET
(which is the least interesting) - The number of rows to
FETCH
- Whether tied rows should be fetched, too (
TIES
). This will be covered in the next section - Whether the number of rows is really a
PERCENT
age
Oracle currently is the only database (I’m aware of) that does it all and with standard syntax.
FETCH without OFFSET
For instance, when querying the Sakila database, we can get the TOP 3 longest films:
SELECT film_id, title, length
FROM film
ORDER BY length DESC
FETCH NEXT 3 ROWS ONLY
Yielding:
FILM_ID TITLE LENGTH ------------------------------- 212 DARN FORRESTER 185 182 CONTROL ANTHEM 185 141 CHICAGO NORTH 185
(In the next section we’ll look at the WITH TIES
clause to find the other films of length 185)
But what do these queries look like in other databases? Here’s the translation of the ROWS ONLY query, according to jOOQ:
ACCESS : select top 3 film_id, title, length from film order by length desc ASE : select top 3 film_id, title, length from film order by length desc CUBRID : select film_id, title, length from film order by length desc limit 0, 3 DB2 : select film_id, title, length from film order by length desc fetch first 3 rows only DERBY : select film_id, title, length from film order by length desc offset 0 rows fetch next 3 rows only FIREBIRD : select film_id, title, length from film order by length desc rows (0 + 1) to (0 + 3) H2 : select film_id, title, length from film order by length desc limit 3 HANA : select film_id, title, length from film order by length desc limit 3 HSQLDB : select film_id, title, length from film order by length desc limit 3 INFORMIX : select first 3 film_id, title, length from film order by length desc INGRES : select film_id, title, length from film order by length desc offset 0 fetch first 3 rows only MARIADB : select film_id, title, length from film order by length desc limit 3 MYSQL : select film_id, title, length from film order by length desc limit 3 ORACLE : select film_id, title, length from film order by length desc fetch next 3 rows only POSTGRES : select film_id, title, length from film order by length desc limit 3 REDSHIFT : select film_id, title, length from film order by length desc limit 3 SQLITE : select film_id, title, length from film order by length desc limit 3 SQLSERVER : select top 3 film_id, title, length from film order by length desc SYBASE : select top 3 film_id, title, length from film order by length desc VERTICA : select film_id, title, length from film order by length desc limit 3
So, there are essentially 3 families:
- The standard family using
FETCH
, including DB2 (doesn’t supportOFFSET
), Derby, Ingres (although missing a keyword), Oracle - The MySQL family using
LIMIT
, including CUBRID, H2, HANA, HSQLDB, MariaDB, MySQL, PostgreSQL, Redshift, SQLite, Vertica - The T-SQL family using
TOP
, inculding Access, ASE, SQL Server, Sybase
There are also exotic syntaxes implemented by Firebird and Informix.
FETCH with OFFSET
You’ll find tons of blog posts on the web on how to emulate OFFSET .. LIMIT
, including jOOQ’s manual. Things do get a bit more tricky when adding an offset, as can be seen here:
CUBRID : select film_id, title, length from film order by length desc limit 3, 3 DB2 : select "v0" film_id, "v1" title, "v2" length from ( select film_id "v0", title "v1", length "v2", row_number() over (order by length desc) "rn" from film order by "v2" desc ) "x" where "rn" > 3 and "rn" <= (3 + 3) order by "rn" DERBY : select film_id, title, length from film order by length desc offset 3 rows fetch next 3 rows only FIREBIRD : select film_id, title, length from film order by length desc rows (3 + 1) to (3 + 3) H2 : select film_id, title, length from film order by length desc limit 3 offset 3 HANA : select film_id, title, length from film order by length desc limit 3 offset 3 HSQLDB : select film_id, title, length from film order by length desc limit 3 offset 3 INFORMIX : select skip 3 first 3 film_id, title, length from film order by length desc INGRES : select film_id, title, length from film order by length desc offset 3 fetch first 3 rows only MARIADB : select film_id, title, length from film order by length desc limit 3 offset 3 MYSQL : select film_id, title, length from film order by length desc limit 3 offset 3 ORACLE : select film_id, title, length from film order by length desc offset 3 rows fetch next 3 rows only POSTGRES : select film_id, title, length from film order by length desc limit 3 offset 3 REDSHIFT : select film_id, title, length from film order by length desc limit 3 offset 3 SQLITE : select film_id, title, length from film order by length desc limit 3 offset 3 SQLSERVER : select film_id, title, length from film order by length desc offset 3 rows fetch next 3 rows only SYBASE : select top 3 start at 4 film_id, title, length from film order by length desc VERTICA : select film_id, title, length from film order by length desc limit 3 offset 3
Interesting to note:
- MS Access, and Sybase ASE do not support offsets at all (maybe a good thing).
- The more recent versions of SQL Server support the SQL standard
OFFSET .. FETCH
clause (althoughOFFSET
, unfortunately, is mandatory), which is great. Older versions can emulateOFFSET
just like DB2 below - Sybase SQL Anywhere enhanced the T-SQL
TOP
syntax to something intuitive:TOP .. START AT
. Why not? - DB2 doesn’t support the syntax, and we have to emulate it using
ROW_NUMBER()
window functions:
select "v0" film_id, "v1" title, "v2" length from (
select
film_id "v0", title "v1", length "v2",
row_number() over (order by length desc) "rn"
from film order by "v2" desc
) "x"
where "rn" > 3 and "rn" <= (3 + 3)
order by "rn"
Notice how, over the years, we’ve learned to do it right and prevent all sorts of side-effects from wrong emulations:
- In the nested query, all columns have to be renamed to some enumerated column names to prevent problems from possibly duplicate column names in the user SQL query. It is perfectly OK for top-level
SELECT
statements to have duplicate / ambiguous column names, but not for subqueries - The top level
SELECT
statement should not project the auxiliaryROW_NUMBER()
value. While this might not be causing trouble in ordinary queries, it is certainly causing trouble in subqueries. Imagine emulating something like:
WHERE x IN (
SELECT id
FROM table
OFFSET 1 ROW
FETCH NEXT ROW ONLY
)
In this case, we must be very careful that the subquery continues to project only exactly one column.
3. WITH TIES
The previous approach to getting TOP 3 films is dangerous, because the ranking is rather random. There are other films of length 185, and they didn’t make it into the TOP 3. We could add another ordering criteria to make the ordering deterministic (e.g. FILM_ID), or we can use WITH TIES
to get 3 films and all the tied films, too.
The query is now:
SELECT film_id, title, length
FROM film
ORDER BY length DESC
FETCH NEXT 3 ROWS WITH TIES
And we’re getting:
FILM_ID TITLE LENGTH ----------------------------------- 212 DARN FORRESTER 185 872 SWEET BROTHERHOOD 185 817 SOLDIERS EVOLUTION 185 991 WORST BANGER 185 690 POND SEATTLE 185 609 MUSCLE BRIGHT 185 349 GANGS PRIDE 185 426 HOME PITY 185 182 CONTROL ANTHEM 185 141 CHICAGO NORTH 185
There are no more films of length 185 than the above. For more information about doing TOP N SQL queries, see this blog post.
For the sake of simplicity, let’s remove again the OFFSET
clause (because it behaves inconsistently when combined with WITH TIES
, at least in Oracle). Let’s look at WITH TIES
only. jOOQ currently doesn’t emulate this clause for all dialects as it is hard to get right without window functions.
DB2 : select "v0" film_id, "v1" title, "v2" length from ( select film_id "v0", title "v1", length "v2", rank() over (order by length desc) "rn" from film ) "x" where "rn" > 0 and "rn" <= (0 + 3) order by "rn" HANA : ... see DB2 MYSQL : ... see DB2 ORACLE : select film_id, title, length from film order by length desc fetch next 3 rows with ties POSTGRES : ... see ORACLE REDSHIFT : ... see DB2 SQLSERVER : select top 3 with ties film_id, title, length from film order by length desc SYBASE : ... see DB2
There are 3 ways to implement WITH TIES
:
- Oracle implements the SQL standard
- SQL Server has a vendor-specific
TOP N WITH TIES
clause - All the others can emulate this feature using window functions
4. Nested set operations
Granted, this isn’t something you might run into every day, but when you need it, it’s really a PITA if your database doesn’t support it. Nested set operations. There are three set operations in SQL and relational algebra:
UNION
INTERSECT
EXCEPT
(orMINUS
, in Oracle)
All of the above come in two flavours:
OP
orOP DISTINCT
OP ALL
(most databases support this only forUNION
)
Where ALL
turns the set operation into a multiset operation, allowing duplicate results. ALL
is fully supported (including on INTERSECT
and EXCEPT
) in:
- CUBRID
- DB2
- Derby
- HSQLDB
- PostgreSQL
Now, the query. What if you want to find all actor names and all customer names, but you don’t want e.g. ADAM GRANT: In PostgreSQL, you could write:
SELECT first_name, last_name
FROM actor
UNION
SELECT first_name, last_name
FROM customer
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2
In this case, we can simply hope that all these operators are left-associative, which means we’ll add customers to actors, and then remove ADAM GRANT. In fact, according to the standard, this is the case. But perhaps, not all databases implement things this way, and as soon as you mix in INTERSECT
, things change, as INTERSECT
has higher operator precedence.
Want to be sure? Put parentheses around the expressions, e.g.
(
SELECT first_name, last_name
FROM actor
UNION
SELECT first_name, last_name
FROM customer
)
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2
Still valid in PostgreSQL (and if you add FROM dual, and replace EXCEPT by MINUS, then also in Oracle), but won’t work e.g. in MySQL. How can we get this to work in all the databases?
Here’s how:
ASE : ... like MySQL CUBRID : ... like PostgreSQL (plus, add the dual table) DB2 : ... like PostgreSQL (plus, add the dual table) DERBY : select first_name, last_name from ( select first_name, last_name from ( select first_name, last_name from actor ) x union select first_name, last_name from ( select first_name, last_name from customer ) x ) x except select "ADAM", "GRANT" from ( select 'ADAM', 'GRANT' from "SYSIBM"."SYSDUMMY1" ) x order by 1, 2 H2 : ... like PostgreSQL HANA : ... like PostgreSQL (plus, add the dual table) HSQLDB : ... like PostgreSQL (plus, add the dual table) INFORMIX : ... like PostgreSQL (plus, add the dual table) INGRES : ... like PostgreSQL (plus, add the dual table) MARIADB : ... like MySQL MYSQL : select * from ( select * from ( select first_name, last_name from actor ) x union select * from ( select first_name, last_name from customer ) x ) x except select * from ( select 'ADAM', 'GRANT' from dual ) x order by 1, 2 ORACLE : ... like PostgreSQL (add dual and replace EXCEPT by MINUS) POSTGRES : ( (select first_name, last_name from actor) union (select first_name, last_name from customer) ) except (select 'ADAM', 'GRANT') order by 1, 2 REDSHIFT : SQLITE : ... like MySQL SQLSERVER : ... like PostgreSQL SYBASE : ... like PostgreSQL (plus, add the dual table) VERTICA : ... like PostgreSQL
Some observations:
- Access doesn’t support EXCEPT
- Firebird has a bit of trouble with these operators – I simply haven’t figured out how to work around them yet. Probably due to some parser bugs
- PostgreSQL (and many others) get it right
- MySQL (and a few others) get it wrong, and we have to wrap the various set operation subqueries in derived tables, when suddenly things work well again. This really seems to be just a parser problem, not actually missing functionality. But it’s really a pain if you have to rewrite your SQL manually to the MySQL version
- Derby is like MySQL, but in addition to lacking parser support for standard SQL nested set operations, it also suffers from these nasty bugs: https://issues.apache.org/jira/browse/DERBY-6983 and https://issues.apache.org/jira/browse/DERBY-6984. Luckily, you have jOOQ to work around all these hassles for you!
5. Derived column lists
A really cool standard feature is called “derived column lists”. It allows for renaming a table AND its columns in one go, for instance in PostgreSQL:
SELECT a, b
FROM (
SELECT first_name, last_name
FROM actor
) t(a, b) -- Interesting feature here
WHERE a LIKE 'Z%'
Yielding
A B ---------- ZERO CAGE
The utility of this functionality is most important when:
- You generate SQL dynamically, and perhaps you’re not entirely sure what your derived table’s column names are – just rename them and be sure again
- You call a table-valued function, i.e. a function that returns a table, and again, you’re not really sure what it’s columns are
- You simply don’t like the column names of a table. This might not be the most important use-case, as with the above syntax, you have to rename ALL (except in PostgreSQL) the columns, in the right order, and we don’t like to depend on such ordering
Again, not all databases support this feature. So, what to do if they don’t? Use this one weird trick with a UNION ALL
subquery to emulate it!
ACCESS : ... like PostgreSQL ASE : ... like PostgreSQL CUBRID : ... like PostgreSQL DB2 : ... like PostgreSQL DERBY : ... like PostgreSQL FIREBIRD : ... like PostgreSQL H2 : select a, b from ( (select null a, null b where 1 = 0) union all (select first_name, last_name from actor) ) t where a like 'Z%' HANA : ... like H2 (plus, add dual table) HSQLDB : ... like PostgreSQL INFORMIX : ... like PostgreSQL INGRES : ... like H2 (plus, add dual table) MARIADB : ... like H2 (plus, add dual table) MYSQL : ... like H2 (plus, add dual table) ORACLE : ... like H2 (plus, add dual table) POSTGRES : select a, b from ( select first_name, last_name from actor ) as t(a, b) where a like 'Z%' REDSHIFT : ... like PostgreSQL SQLITE : ... like H2 SQLSERVER : ... like PostgreSQL SYBASE : ... like PostgreSQL VERTICA : ... like PostgreSQL
Not a lot of magic here. Either the database supports the feature, or it doesn’t. If it’s not supported, then the derived table whose columns should be aliased must be prefixed by a zero-row-returning UNION ALL
subquery, which defines the column names. Because if you use set operations, then the first subquery defines the column names. Cool, eh?
select a, b from (
-- Dummy subquery defining the column names
(select null a, null b where 1 = 0)
union all
-- Actually interesting subqeury
(select first_name, last_name from actor)
) t
where a like 'Z%'
Please, forgive me… Actually, it was all Bill Karwin’s idea.
6. VALUES clause
Did you know that VALUES()
is a clause that can be used outside of INSERT
statements? Yes. In PostgreSQL, you can just write:
VALUES ('Hello', 'World'), ('Cool', 'eh?')
And you’re getting the following result:
column1 column2 ---------------- Hello World Cool eh?
Of course, we should never rely on such generated column names, thus again, derived column lists. In PostgreSQL, this is only possible when actually using a derived table in this context:
SELECT *
FROM (
VALUES ('Hello', 'World'), ('Cool', 'eh?')
) AS t(a, b)
Do all the databases support this clause? Of course not! But at least, it can be emulated in all databases:
ACCESS : ... like Oracle ASE : ... like PostgreSQL CUBRID : ... like PostgreSQL DB2 : ... like PostgreSQL DERBY : ... like PostgreSQL FIREBIRD : ... like Sybase SQL Anywhere H2 : select "v"."c1", "v"."c2" from ( (select null "c1", null "c2" where 1 = 0) union all (select * from ( values ('Hello', 'World'), ('Cool', 'eh?') ) "v") ) "v" HANA : ... like Oracle HSQLDB : ... like PostgreSQL INFORMIX : ... like Sybase SQL Anywhere INGRES : ... like Oracle MARIADB : ... like Oracle MYSQL : ... like Oracle ORACLE : select "v"."c1", "v"."c2" from ( (select null "c1", null "c2" from dual where 1 = 0) union all (select * from ( (select 'Hello', 'World' from dual) union all (select 'Cool', 'eh?' from dual) ) "v") ) "v" POSTGRES : select "v"."c1", "v"."c2" from ( values ('Hello', 'World'), ('Cool', 'eh?') ) as "v"("c1", "c2") REDSHIFT : ... like PostgreSQL SQLITE : ... like H2 SQLSERVER : ... like PostgreSQL SYBASE : select [v].[c1], [v].[c2] from ( (select 'Hello', 'World' from [SYS].[DUMMY]) union all (select 'Cool', 'eh?' from [SYS].[DUMMY]) ) [v]([c1], [c2]) VERTICA : ... like PostgreSQL
There are 4 flavours of how this is supported:
- PostgreSQL and others: Support both VALUES and derived column lists
- H2 and others: Support only VALUES, not derived column lists
- Sybase SQL Anywhere and others: Do not support VALUES, but derived column lists
- Oracle and others: Support neither feature
Clearly, this is only syntactic sugar for other, more verbose SQL, but it’s really cool when you don’t actually need any real table. In fact the whole optional FROM
clause discussion from the beginning of this article is unnecessary, once you have VALUES()
, which would be the standard way to “avoid” the FROM
clause.
7. Predicates using Row Value Expressions
Once you’ve started using these, you will not want to miss them. Row value expressions. They’re essentially just tuple expressions, like:
SELECT *
FROM customer
WHERE (first_name, last_name)
= ('MARY', 'SMITH')
Or, according to the standard and to PostgreSQL, also:
SELECT *
FROM customer
WHERE ROW (first_name, last_name)
= ROW ('MARY', 'SMITH')
The functionality doesn’t seem very useful when using equality predicates, but it is much more interesting when using IN
predicates:
-- Any customer named the same way as an actor?
SELECT *
FROM customer
WHERE (first_name, last_name) IN (
SELECT first_name, last_name
FROM actor
)
Or, when doing keyset pagination, through non-equality predicates:
SELECT *
FROM customer
WHERE (first_name, last_name)
> ('JENNIFER', 'DAVIS')
Again, not all databases support these. And those that do, have various levels of support. PostgreSQL is again the only database that goes “all in” on all the predicates, including funky things like the DISTINCT
predicate:
SELECT *
FROM customer
WHERE (first_name, last_name)
IS DISTINCT FROM ('JENNIFER', 'DAVIS')
But luckily, again, these things can be emulated. Let’s look at all 3 of the above examples, and save ourselves the DISTINCT
predicate for the next list item:
Equality on row value expressions
This is trivial. Either it’s supported, or it isn’t. Or you’re Oracle, and require a special syntax, to prevent ORA-00920 invalid relational operator (I would really love to hear that story. Must be funky):
ACCESS : ... like SQL Server ASE : ... like SQL Server CUBRID : ... like PostgreSQL DB2 : ... like PostgreSQL DERBY : ... like SQL Server FIREBIRD : ... like SQL Server H2 : ... like PostgreSQL HANA : ... like SQL Server HSQLDB : ... like PostgreSQL INFORMIX : select * from customer where row (first_name, last_name) = row ('MARY', 'SMITH') INGRES : ... like SQL Server MARIADB : ... like PostgreSQL MYSQL : ... like PostgreSQL ORACLE : select * from customer where (first_name, last_name) = (('MARY', 'SMITH')) POSTGRES : select * from customer where (first_name, last_name) = ('MARY', 'SMITH') REDSHIFT : ... like PostgreSQL SQLITE : ... like SQL Server SQLSERVER : select * from customer where (first_name = 'MARY' and last_name = 'SMITH') SYBASE : ... like SQL Server VERTICA : ... like PostgreSQL
Note that Informix requires the ROW()
constructor, which should be optional. And again, Oracle is… Oracle :-)
IN predicate
Emulating this is much more tricky if it is not supported. Remember that IN
and EXISTS
predicates can be equivalent, so there’s always a way to transform them into each other.
ACCESS : ... like SQLite ASE : ... like SQL Server CUBRID : ... like SQL Server DB2 : ... like SQL Server DERBY : ... like SQL Server FIREBIRD : ... like SQL Server H2 : select * from customer where (first_name, last_name) in ( select (first_name, last_name) from actor ) HANA : ... like SQLite HSQLDB : ... like PostgreSQL INFORMIX : ... like SQL Server INGRES : ... like SQLite MARIADB : ... like PostgreSQL MYSQL : ... like PostgreSQL ORACLE : select * from customer where (first_name, last_name) in (( select first_name, last_name from actor )) POSTGRES : select * from customer where (first_name, last_name) in ( select first_name, last_name from actor ) REDSHIFT : ... like PostgreSQL SQLITE : select * from customer where exists ( select x.c1, x.c2 from ( (select null c1, null c2 where 1 = 0) union all (select first_name, last_name from actor) ) x where (first_name = x.c1 and last_name = x.c2) ) SQLSERVER : select * from customer where exists ( select x.c1, x.c2 from (select first_name, last_name from actor) x(c1, c2) where (first_name = x.c1 and last_name = x.c2) ) SYBASE : ... like SQL Server VERTICA : ... like SQL Server
Observations:
- At this point, it’s worth mentioning that these things work “by accident” in H2. H2 unfortunately decided to use the (a, b, …, n) syntax for arrays, which are similar to tuples, but not the same thing. You can see in the H2 syntax that we have to wrap the two columns of the subquery in parentheses as well for the IN predicate to work as expected.
- The transformation to an
EXISTS()
predicate requires the derived column list feature again. This is why some emulations are even more verbose than others.
Non-equality predicate
This predicate can be expanded to its standard definition again, easily, if it is not natively supported:
ACCESS : ... like Oracle ASE : ... like PostgreSQL CUBRID : ... like Oracle DB2 : ... like PostgreSQL DERBY : ... like Oracle FIREBIRD : ... like Oracle H2 : ... like PostgreSQL HANA : ... like Oracle HSQLDB : ... like PostgreSQL INFORMIX : ... like Oracle INGRES : ... like Oracle MARIADB : ... like PostgreSQL MYSQL : ... like PostgreSQL ORACLE : select * from customer where ( first_name >= 'JENNIFER' and ( first_name > 'JENNIFER' or ( first_name = 'JENNIFER' and last_name > 'DAVIS' ) ) ) POSTGRES : select * from customer where (first_name, last_name) > ('JENNIFER', 'DAVIS') REDSHIFT : ... like Oracle SQLITE : ... like Oracle SQLSERVER : ... like Oracle SYBASE : ... like Oracle VERTICA : ... like PostgreSQL
Observation:
- Strictly speaking, it is not necessary to have one of the two predicates redundant in the emulation, but unfortunately, many databases have trouble when the top boolean operator of a boolean expression is
OR
rather thanAND
8. The DISTINCT predicate
In the previous section, we’ve briefly mentioned the DISTINCT
predicate, a useful predicate that helps handling NULL
values as we’s mostly expect.
A quick summary in PostgreSQL:
WITH t(v) AS (
VALUES (1),(2),(null)
)
SELECT v1, v2, v1 IS DISTINCT FROM v2
FROM t t1(v1), t t2(v2)
This yields:
v1 v2 d ----------------- 1 1 false 1 2 true 1 true 2 1 true 2 2 false 2 true 1 true 2 true false
Conveniently, this never returns NULL
when comparing anything with NULL
, so simply spoken NULL IS NOT DISTINCT FROM NULL
is TRUE
. Quite some syntax, but hey, it’s SQL.
Regrettably, only few databases support the standard syntax, and MySQL and SQLite have a much more concise, non-standard operator. Let’s emulate the query from the previous section in our databases:
SELECT *
FROM customer
WHERE (first_name, last_name)
IS DISTINCT FROM ('JENNIFER', 'DAVIS')
Observe, there’s a really cool way to emulate this operation using INTERSECT
, because interestingly, set operations also treat two NULL
values as “the same”, i.e. non-DISTINCT
. We have:
ACCESS : ... like SQL Server (plus, add the dual table) ASE : ... like SQL Server (plus, add the dual table) CUBRID : ... like SQL Server (plus, add the dual table) DB2 : ... like SQL Server (plus, add the dual table) DERBY : ... like SQL Server (plus, add the dual table) FIREBIRD : ... like PostgreSQL H2 : ... like PostgreSQL HANA : ... like SQL Server (plus, add the dual table) HSQLDB : ... like PostgreSQL INFORMIX : ... like SQL Server (plus, add the dual table) INGRES : ... like SQL Server (plus, add the dual table) MARIADB : ... like MySQL MYSQL : select * from customer where (not((first_name, last_name) <=> ('JENNIFER', 'DAVIS'))) ORACLE : ... like SQL Server (plus, add the dual table) POSTGRES : select * from customer where (first_name, last_name) is distinct from ('JENNIFER', 'DAVIS') REDSHIFT : ... like PostgreSQL SQLITE : select * from customer where ((first_name, last_name) is not ('JENNIFER', 'DAVIS')) SQLSERVER : select * from customer where not exists ( (select first_name, last_name) intersect (select 'JENNIFER', 'DAVIS') ) SYBASE : ... like SQL Server (plus, add the dual table) VERTICA : ... like SQL Server
Want to try it yourself? The original PostgreSQL truth-table producing query can be transformed to this one:
WITH t(v) AS (
VALUES (1),(2),(null)
)
SELECT v1, v2, NOT EXISTS (
SELECT v1 INTERSECT SELECT v2
)
FROM t t1(v1), t t2(v2)
It produces the same truth table. Cool, eh?
9. DDL statements
This is one of the main reasons why we’re doing all of this. We want to allow for SQL text based migration scripts (e.g. run with Flyway) to be translatable to any kind of SQL dialect. Because DDL is really the most boring part of SQL to keep vendor-agnostic.
Just two short examples:
Copying a table structure into a new table
A quick and dirty way to copy a table structure is this:
CREATE TABLE x AS
SELECT 1 AS one
WITH NO DATA
Looks cool, hm? Unfortunately, there is some trouble with the syntax as you will see in the emulations:
DB2 : create table x as (select 1 one from "SYSIBM"."DUAL") with no data H2 : ... like Oracle MARIADB : ... like Oracle MYSQL : ... like Oracle ORACLE : create table x as select 1 one from dual where 1 = 0 POSTGRES : create table x as select 1 as one with no data SQLSERVER : select 1 one into x where 1 = 0
I’ve left out a couple of dialects, as this hasn’t been integration tested everywhere yet, being work in progress. There are 4 flavours:
- PostgreSQL: Actual support for the
WITH [ NO ] DATA
clause - DB2: Actual support for the
WITH [ NO ] DATA
clause (but funky requirement to wrap the source query in parentheses - Oracle: No support for the clause (easy to emulate with dummy predicate), but at least support for CTAS (CREATE TABLE AS SELECT)
- SQL Server: Vendor specific alternative to CTAS
The inverse is equally fun to emulate, let’s actually add the data:
CREATE TABLE x AS
SELECT 1 AS one
WITH DATA
And we’re getting:
DB2 : begin execute immediate ' create table x as (select 1 one from "SYSIBM"."DUAL") with no data '; execute immediate ' insert into x select 1 one from "SYSIBM"."DUAL" '; end ORACLE : create table x as select 1 one from dual POSTGRES : create table x as select 1 as one with data SQLSERVER : select 1 one into x
Let’s focus on the interesting bits only.
- Oracle, PostgreSQL, SQL Server as before
- DB2 actually cannot copy the data with the table structure. Huh!
As can be seen above, in cases like this, we might need to split a single DDL statement in a statement batch or anonymous block containing several statements. This is work in progress as not all databases support anonymous blocks or statement batches.
There are many other interesting DDL emulations, and a lot of it is not yet implemented. Just play around with them on https://www.jooq.org/translate and send us your feature requests / ideas to https://github.com/jOOQ/jOOQ/issues/new
10. Built-in Functions
Last but not least, there are a ton of built-in functions, such as the boring LPAD()
function. (Left pad is known for various things). Migrating these between databases is really really tedious. We’re here to help! Let’s emulate:
SELECT lpad('abc', ' ', 5)
Translation:
ACCESS : replace(space(' ' - len('abc')), ' ', 5) & 'abc' ASE : (replicate(5, (' ' - char_length('abc'))) || 'abc') CUBRID : lpad('abc', ' ', 5) DB2 : lpad('abc', ' ', 5) DERBY : lpad('abc', ' ', 5) FIREBIRD : lpad('abc', ' ', 5) H2 : lpad('abc', ' ', 5) HANA : lpad('abc', ' ', 5) HSQLDB : lpad('abc', ' ', 5) INFORMIX : lpad('abc', ' ', 5) INGRES : lpad('abc', ' ', 5) MARIADB : lpad('abc', ' ', 5) MYSQL : lpad('abc', ' ', 5) ORACLE : lpad('abc', ' ', 5) POSTGRES : lpad('abc', ' ', 5) REDSHIFT : lpad('abc', ' ', 5) SQLITE : substr(replace(replace(substr(quote(zeroblob(((' ' - length('abc') - 1 + length("5")) / length("5") + 1) / 2)), 3), '''', ''), '0', "5"), 1, (' ' - length('abc'))) || 'abc' SQLSERVER : (replicate(5, (' ' - len('abc'))) + 'abc') SYBASE : (repeat(5, (' ' - length('abc'))) || 'abc') VERTICA : lpad('abc', ' ', 5)
Forgive me again for the SQLite version. It was a suggestion made by an unknown user on Stack Overflow, the place where I tend to nerd-snipe people into solving such problems for me for free.
Conclusion
jOOQ standardises SQL into a type safe, embedded internal DSL in Java. With jOOQ 3.9+, we’ve added a parser (which is also publicly available on https://www.jooq.org/translate), which removes the need to express everything in the jOOQ API. Just parse your random SQL string and translate it to some other SQL dialect. This list could easily be extended to 50 items and more, but it is much more fun to play around with our website and try this on your own.
Please, if you do, do report any issue, feature request that you’d like to see at: https://github.com/jOOQ/jOOQ/issues/new to help us make this new tool even more valuable for you. In the near future, we’re going to more closely integrate this parser with other tools, such as Flyway, as we think there’s a lot of value in vendor-agnostic, standardised SQL.