Tag Archive | jooq

How to Emulate the MEDIAN() Aggregate Function Using Inverse Distribution Functions

Some databases are awesome enough to implement the MEDIAN() aggregate function. Remember that the MEDIAN() is sligthly different from (and often more useful than) the MEAN() or AVG() (average).

While the average is calculated as the SUM(exp) / COUNT(exp), the MEDIAN() tells you that 50% of all values in the sample are higher than the MEDIAN() whereas the other 50% of the set are lower than the MEDIAN().

So, in other words, if you take the following query:

WITH t(value) AS (

… then both average and median are the same:

avg   median
2     2

But if you heavily skew your data like this:

WITH t(value) AS (

Then your average will also be skewed, whereas your median will still indicate where most of the values are in your sample

avg      median
34.333   2

The above sample is of course statistically insignificant, but you can easily see that the effect can be dramatic and relevant, if you have more data:

The skewing effect is very important in statistics and in order to make any interesting claim about anything, using percentiles is most often more useful than using averages. Take the average income vs. the median income in a country, for instance. While the average income in the U.S. (and in many other countries) has been steadily increasing, the median income has seen a decline over the past decade. This is due to wealth being heavily skewed towards the super-rich more and more.

This blog is not about politics but about Java and SQL, so let’s get back into calculating the actual facts.

Using precentiles in SQL

As we’ve seen before, the MEDIAN() divides a sample into two equally-sized groups and takes the value “between” those two groups. This particular value is also called the 50th percentile because 50% of all values in the sample are lower than the MEDIAN(). We can thus establish:

  • MIN(exp): The 0-percentile
  • MEDIAN(exp): The 50th-percentile
  • MAX(exp): The 100th-percentile

All of the above are special cases of percentiles, and while MIN() and MAX() are supported in all SQL databases (and the SQL standard), MEDIAN() is not in the SQL standard and only supported by the following jOOQ databases:

  • Oracle
  • Sybase SQL Anywhere

There is another way of calculating the MEDIAN() in particular and any sort of percentile in general in the SQL standard, and since PostgreSQL 9.4 also in PostgreSQL using …

Ordered-set aggregate functions

Interestingly, apart from window functions, you can also specify ORDER BY clauses to certain aggregate functions that aggregate data based on ordered sets.

One such function is the SQL standard percentile_cont function, which takes the percentile as an argument, and then accepts an additional WITHIN GROUP clause that takes an ORDER BY clause as an argument. These particular ordered-set functions are also called inverse distribution functions, because we want to find where a particular percentile is located in the distribution of all values in the sample (if you’re not scared by the math, check out the wikipedia article)

So, in PostgreSQL 9.4+, the MEDIAN() function can be emulated like this:

WITH t(value) AS (
  SELECT 100
  percentile_cont(0.5) WITHIN GROUP (ORDER BY value)

This interesting syntax is standardised and may be known to some of you from Oracle’s LISTAGG(), which allows to aggregate values into concatenated strings:

WITH t(value) AS (
  listagg(value, ', ') WITHIN GROUP (ORDER BY value)

This query yields simply:

1, 2, 100

On a side-note: LISTAGG() is, of course, completely useless, because it returns VARCHAR2, which again has a max length of 4000 in Oracle. Useless…

Emulation out-of-the-box with jOOQ

As always, jOOQ will emulate these kinds of things out of the box. You can either use the DSL.median() function, or with the upcoming jOOQ 3.6, the new DSL.percentileCont() function to produce the same value:


jOOQ is the best way to write SQL in Java

The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions

Sometimes when aggregating data with SQL, we’d love to add some additional filters. For instance, consider the following world bank data:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

And the table structure:

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL

Now, let’s assume we’d like to find the number of countries with a GDP higher than 40,000 for each year.

With standard SQL:2003, and now also with the newly released PostgreSQL 9.4, we can now take advantage of the new FILTER clause, which allows us to write the following query:

  count(*) FILTER (WHERE gdp_per_capita >= 40000)

The above query will now yield:

year   count
2012   4
2011   5
2010   4
2009   4

And that’s not it! As always, you can use any aggregate function also as a window function simply by adding an OVER() clause to the end:

    FILTER (WHERE gdp_per_capita >= 40000) 
    OVER   (PARTITION BY year)

The result would then look something like this:

year   code   gdp_per_capita   count
2009   CA           40764.00       4
2009   DE           40270.00       4
2009   FR           40488.00       4
2009   GB           35455.00       4

jOOQ 3.6 will also support the new FILTER clause for aggregate functions

Good news for jOOQ users. You can write the same query with jOOQ intuitively as such:


… and


And the best thing is that jOOQ (as usual) emulates the above clause for you if you’re not using PostgreSQL. The equivalent query would be:

  count(CASE WHEN gdp_per_capita >= 40000 THEN 1 END)

jOOQ: The best way to write SQL in Java

Read more about what’s new in PostgreSQL 9.4 here

Really Too Bad that Java 8 Doesn’t Have Iterable.stream()

This is one of the more interesting recent Stack Overflow questions:

Why does Iterable not provide stream() and parallelStream() methods?

At first, it might seem intuitive to make it straight-forward to convert an Iterable into a Stream, because the two are really more or less the same thing for 90% of all use-cases.

Granted, the expert group had a strong focus on making the Stream API parallel capable, but anyone who works with Java every day will notice immediately, that Stream is most useful in its sequential form. And an Iterable is just that. A sequential stream with no guarantees with respect to parallelisation. So, it would only be intuitive if we could simply write:


In fact, subtypes of Iterable do have such methods, e.g.


Brian Goetz himself gave an answer to the above Stack Overflow question. The reasons for this omittance are rooted in the fact that some Iterables might prefer to return an IntStream instead of a Stream. This really seems to be a very remote reason for a design decision, but as always, omittance today doesn’t mean omittance forever. On the other hand, if they had introduced Iterable.stream() today, and it turned out to be a mistake, they couldn’t have removed it again.

Well, primitive types in Java are a pain and they did all sorts of bad things to generics in the first place, and now to Stream as well, as we have to write the following, in order to turn an Iterable into a Stream:

Stream s = StreamSupport.stream(iterable.spliterator(), false);

Brian Goetz argues that this is “easy”, but I would disagree. As an API consumer, I experience a lot of friction in productivity because of:

  • Having to remember this otherwise useless StreamSupport type. This method could very well have been put into the Stream interface, because we already have Stream construction methods in there, such as Stream.of().
  • Having to remember the subtle difference between Iterator and Spliterator in the context of what I believe has nothing to do with parallelisation. It may well be that Spliterators will become popular eventually, though, so this doubt is for the magic 8 ball to address.
  • In fact, I have to repeat the information that there is nothing to be parallelised via the boolean argument false

Parallelisation really has such a big weight in this new API, even if it will cover only around 5%-10% of all functional collection manipulation operations. While sequential processing was not the main design goal of the JDK 8 APIs, it is really the main benefit for all of us, and the friction around APIs related to sequential processing should be as low as possible.

The method above should have just been called

Stream s = Stream.stream(iterable);

It could be implemented like this:

public static<T> Stream<T> stream(Iterable<T> i) {
    return StreamSupport.stream(i.spliterator(), false);

Obviously with convenience overloads that allow for the additional specialisations, like parallelisation, or passing a Spliterator

But again, if Iterable had its own stream() default method, an incredible number of APIs would be so much better integrated with Java 8 out of the box, without even supporting Java 8 explicitly!

Take jOOQ for instance. jOOQ still supports Java 6, so a direct dependency is not possible. However, jOOQ’s ResultQuery type is an Iterable. This allows you to use such queries directly inline in foreach loops, as if you were writing PL/SQL:


FOR book IN (
  SELECT * FROM books ORDER BY books.title
  -- Do things with book


for (BookRecord book : 
) {
  // Do things with book

Now imagine the same thing in Java 8:

   .map / reduce / findAny, etc...

Unfortunately, the above is currently not possible. You could, of course, eagerly fetch all the results into a jOOQ Result, which extends List:

   .map / reduce / findAny, etc...

But it’s one more method to call (every time), and the actual stream semantics is broken, because the fetch is done eagerly.

Complaining on a high level

This is, of course, complaining on a high level, but it would really be great if a future version of Java, e.g. Java 9, would add this missing method to the Iterable API. Again, 99% of all use-cases will want the Stream type to be returned, not the IntStream type. And if they do want that for whatever obscure reason (much more obscure than many evil things from old legacy Java APIs, looking at you Calendar), then why shouldn’t they just declare an intStream() method. After all, if someone is crazy enough to write Iterable<Integer> when they’re really operating on int primitive types, they’ll probably accept a little workaround.

How to Integrate Commercial Third-Party Artefacts into Your Maven Build

According to a recent survey by ZeroTurnaround’s RebelLabs, Maven is still the leading Java build platform. The current market share distribution, according to RebelLabs is:

  • Maven with 64%
  • Ant + Ivy with 16.5%
  • Gradle with 11%

Yet, at the same time, Maven is often criticised for being a bit obscure and intrusive. Compared to runner-ups Ant and Gradle, Maven allows for only little flexibility with respect to interpretation and thus custom adaptation of the build model. Or as Tim Berglund from Data Stax would put it:

But let’s cut the jokes and have a look at a real-world issue:

Integrating Third-Party Commercial Artefacts

Not all third party artefacts that you would like to depend upon are available for free from Maven Central. Examples for this are commercial JDBC drivers, or the commercial jOOQ editions. There are essentially three ways to integrate such artefacts into your build:


Often, you only need the commercial dependency for a small test project or demo. You want to be sure that it works when you run it without depending on your local repository setup, or on network connectivity. This is a good use-case for <scope>system</scope>:

For instance: jOOQ


For instance: Microsoft SQL JDBC


  <!-- Notice that we can still put "optional"
       on commercial JDBC driver dependencies -->

Advantages of this approach

This is really a very easy solution when you want to have a local, self-contained module that is guaranteed to run immediately after checkout from source control, without additional configuration and setup. Don’t forget to check in the libraries into source control first, of course.

Disadvantages of this appraoch

The system dependencies are never transitively inherited. If your module depends on jOOQ this way, your module’s dependencies won’t see the jOOQ API.

Details about system dependencies can be seen in the Maven documentation. Citing from the documentation:

Dependencies with the scope system are always available and are not looked up in repository. They are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus, system dependencies are especially useful for resolving dependencies on artifacts which are now provided by the JDK, but where available as separate downloads earlier. Typical example are the JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).

A bit more robust

An approach that might appear to be a bit more robust is to check out the dependencies from your version control system and then “manually” import them to your local repository. This will make them available to your own local build. The following shell scripts show how you can import, for instance, the jOOQ artefacts into your local repository

Windows Batch

@echo off
set VERSION=3.4.4

if exist jOOQ-javadoc\jooq-%VERSION%-javadoc.jar (
  set JAVADOC_JOOQ=-Djavadoc=jOOQ-javadoc\jooq-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_META=-Djavadoc=jOOQ-javadoc\jooq-meta-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN_MAVEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-maven-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_SCALA=-Djavadoc=jOOQ-javadoc\jooq-scala-%VERSION%-javadoc.jar

if exist jOOQ-src\jooq-%VERSION%-sources.jar (
  set SOURCES_JOOQ=-Dsources=jOOQ-src\jooq-%VERSION%-sources.jar
  set SOURCES_JOOQ_META=-Dsources=jOOQ-src\jooq-meta-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN=-Dsources=jOOQ-src\jooq-codegen-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN_MAVEN=-Dsources=jOOQ-src\jooq-codegen-maven-%VERSION%-sources.jar
  set SOURCES_JOOQ_SCALA=-Dsources=jOOQ-src\jooq-scala-%VERSION%-sources.jar

call mvn install:install-file -Dfile=jOOQ-pom\pom.xml                          -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=%VERSION% -Dpackaging=pom
call mvn install:install-file -Dfile=jOOQ-lib\jooq-%VERSION%.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ%               %SOURCES_JOOQ%              -DpomFile=jOOQ-pom\jooq\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-meta-%VERSION%.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_META%          %SOURCES_JOOQ_META%         -DpomFile=jOOQ-pom\jooq-meta\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-%VERSION%.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN%       %SOURCES_JOOQ_CODEGEN%      -DpomFile=jOOQ-pom\jooq-codegen\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-maven-%VERSION%.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN_MAVEN% %SOURCES_JOOQ_CODEGEN_META% -DpomFile=jOOQ-pom\jooq-codegen-maven\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-scala-%VERSION%.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_SCALA%         %SOURCES_JOOQ_SCALA%        -DpomFile=jOOQ-pom\jooq-scala\pom.xml

Linux Shell


if [ -f jOOQ-javadoc/jooq-$VERSION-javadoc.jar ]; then

if [ -f jOOQ-src/jooq-$VERSION-sources.jar ]; then

mvn install:install-file -Dfile=jOOQ-pom/pom.xml                         -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=$VERSION -Dpackaging=pom
mvn install:install-file -Dfile=jOOQ-lib/jooq-$VERSION.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ               $SOURCES_JOOQ              -DpomFile=jOOQ-pom/jooq/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-meta-$VERSION.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_META          $SOURCES_JOOQ_META         -DpomFile=jOOQ-pom/jooq-meta/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-$VERSION.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN       $SOURCES_JOOQ_CODEGEN      -DpomFile=jOOQ-pom/jooq-codegen/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-maven-$VERSION.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN_MAVEN $SOURCES_JOOQ_CODEGEN_META -DpomFile=jOOQ-pom/jooq-codegen-maven/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-scala-$VERSION.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_SCALA         $SOURCES_JOOQ_SCALA        -DpomFile=jOOQ-pom/jooq-scala/pom.xml

The above scripts essentially check if any of Javadoc, Sources, and/or binaries are available in the distribution, and then install:

  • The parent pom.xml
  • The various artefact binaries, sources, javadocs, and pom.xml files

Advantages of this approach

Dependencies can now be referenced like any other type of dependency, as the artefacts are registered in your local repository. Moreover, they’re also available to your module’s own dependencies, transitively – which is probably what you want when you’re using jOOQ. Here’s how you’d then specify the dependencies:



Disadvantages of this approach

There is a manual step involved in the installation of the dependencies. If you don’t have the above scripts readily available, it can be quite tedious to figure out exactly how to import all those dependencies step by step into your repository. Specifically if you’re running a demo or prototype, this may lead to unexpected compilation failure in the worst moments.

The way to go

In an actual project setup, obviously, neither of the above approaches will be sufficient, and you’ll probably import the libraries into your local Nexus or Bintray or whatever repository you’re using. Just beware of potential restrictions on distribution that commercial deliverables may have.

A small tutorial about how to install artefacts into Nexus can be found here.

jOOQ Newsletter: November 28, 2014 – Black Friday jOOQ Sale – Only Today!

Subscribe to this newsletter here

Tweet of the Day and Webinar with Arun Gupta from Red Hat

Today, we have a very special Tweet of the Day by Oliver Hubaut who expresses what we believe is a general feeling in the industry. He says:

There is a lot of truth in his statement, albeit perhaps not the one he intended. Many people have misinterpreted JPA in the past, believing that it will be a full replacement for SQL. This couldn’t be farther from the truth.

Gavin King, the creator of Hibernate has told us the following:

… and this is also the point we’re trying to make. Join us next week on Wednesday, December 3 when we meet with Arun Gupta from Red Hat for his Webinar about JPA and jOOQ. If you have any questions that you’d like us to talk about, ask them here:


Black Friday Sale: Get 20% off any jOOQ purchase, today!

We’re participating in the Black Friday sale tradition and give you an incredible 20% off your purchase of any jOOQ license that you order today, Black Friday, November 28, 2014.

Ask your manager today to treat you to a wonderful pre-christmas gift! Don’t waste time, act quickly and order jOOQ licenses right now:


jOOQ 3.5 released

If you’ve been following the jOOQ User Group, you’ve heard it already. Last Friday, we’ve shipped the awesome jOOQ 3.5 with loads of new features!

The new Binding SPI

The main improvement of this exciting release is the new org.jooq.Binding SPI which can be used to fully control all aspects of a user-type’s JDBC interaction. This goes much further than the existing org.jooq.Converter SPI that can be used to map standard JDBC types to user-types. With the new Binding SPI, virtually *ALL* vendor-specific types can be supported now. Examples include PostgreSQL’s JSON or HSTORE types, or Oracle’s DATE type – which is really incorrectly represented via java.sql.Timestamp, which is why we have retrofitted the existing <dateAsTimestamp/> feature to use such bindings, now.

Stored procedures are everywhere

Stored procedure support was generally improved in this release. This includes lots of new little features and conveniences for use with PL/SQL or Transact-SQL. For instance, jOOQ 3.5.0 now supports cross-schema references of PL/SQL OBJECT and TABLE types, which allows for binding directly to Oracle Spatial. We’ve blogged about this exciting improvement here:

And while we were at it, we’ve also added basic support for Oracle AQ, which integrates very nicely with our OBJECT type support!

In Transact-SQL and MySQL, we now support fetching arbitrary numbers of Results from stored procedures, and we’ve also implemented support for Firebird PSQL, including Firebird’s very interesting syntax for table-valued functions.

By the way, we support user-defined aggregate functions for a variety of databases, including Oracle, PostgreSQL, and HSQLDB. Definitely something you should look into!

SQL improvements

In this release, we’ve finally got support for UNION, INTERSECT, and EXCEPT right with respect to nesting such operations, as well as combining them with ORDER BY and LIMIT .. OFFSET.

Let’s talk some more DDL

We’ve continued to add support for DDL statements, including


We’ll continue to add support for more DDL statements also in the future.

Code generation improvements

We’ve added support for the new XMLDatabase, a code generation configuration that allows to read meta information from XML formats, e.g. from a standard INFORMATION_SCHEMA.xml, or from Vertabelo’s XML export format:

Future versions of jOOQ will include built-in support for a variety of XML formats.

We’ve had an awesome contribution by Etienne Studer from Gradleware to help our users integrate the jOOQ code generation with their Gradle builds.

Last but not least: Informix!

Oh, and by the way, we now also support IBM’s second most popular database: Informix. Support for this database will be included in the jOOQ Enterprise Edition.

More information can be found here:

jOOQ 3.2 End of Life

While 3.5 is out, 3.2 is now more than one year old, which means that it has reached its end of life. We’ll be shipping a last patch update 3.2.7 in early December. After that, we advise all our customers and users to upgrade to a newer minor release.

Do you want to stay on the 3.2 release? No problem, contact our sales team and we’ll find a solution for you.

Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()

If you’re using a commercial database or PostgreSQL / Firebird / CUBRID, you will be able to take advantage of the full power of window functions. We’ve blogged about window functions’ awesomeness a couple of times, in particular about ROW_NUMBER(), RANK(), DENSE_RANK().

Today, we’re going to look into some awesome window functions that produce values of other rows that are positioned before or after the current row.

Setting up the test data

We’re going to do some interesting statistics today using publicly available data from the World Bank. To keep things simple, we’ll only do analyses for the G8 countries:

  • Canada (CA)
  • France (FR)
  • Germany (DE)
  • Italy (IT)
  • Japan (JP)
  • Russian Federation (RU)
  • United Kingdom (GB)
  • United States (US)

And for those countries, let’s consider the following data points for the years 2009-2012:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

Central government debt, total (% of GDP)

          2009    2010    2011    2012
CA        51.3    51.4    52.5    53.5	
DE        47.6    55.5    55.1    56.9	
FR        85.0    89.2    93.2   103.8	
GB        71.7    85.2    99.6   103.2	
IT       121.3   119.9   113.0   131.1	
JP       166.8   174.8   189.5   196.5	
RU         8.7     9.1     9.3     9.4	
US        76.3    85.6    90.1    93.8	

Let’s put all that data into a fact table like so (PostgreSQL syntax):

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL,
  govt_debt DECIMAL(10, 2) NOT NULL

INSERT INTO countries
VALUES ('CA', 2009, 40764, 51.3),
       ('CA', 2010, 47465, 51.4),
       ('CA', 2011, 51791, 52.5),
       ('CA', 2012, 52409, 53.5),
       ('DE', 2009, 40270, 47.6),
       ('DE', 2010, 40408, 55.5),
       ('DE', 2011, 44355, 55.1),
       ('DE', 2012, 42598, 56.9),
       ('FR', 2009, 40488, 85.0),
       ('FR', 2010, 39448, 89.2),
       ('FR', 2011, 42578, 93.2),
       ('FR', 2012, 39759,103.8),
       ('GB', 2009, 35455,121.3),
       ('GB', 2010, 36573, 85.2),
       ('GB', 2011, 38927, 99.6),
       ('GB', 2012, 38649,103.2),
       ('IT', 2009, 35724,121.3),
       ('IT', 2010, 34673,119.9),
       ('IT', 2011, 36988,113.0),
       ('IT', 2012, 33814,131.1),
       ('JP', 2009, 39473,166.8),
       ('JP', 2010, 43118,174.8),
       ('JP', 2011, 46204,189.5),
       ('JP', 2012, 46548,196.5),
       ('RU', 2009,  8616,  8.7),
       ('RU', 2010, 10710,  9.1),
       ('RU', 2011, 13324,  9.3),
       ('RU', 2012, 14091,  9.4),
       ('US', 2009, 46999, 76.3),
       ('US', 2010, 48358, 85.6),
       ('US', 2011, 49855, 90.1),
       ('US', 2012, 51755, 93.8);

Start the querying fun

People who are used to SQL-92 syntax will be able to quickly find the highest GDP per capita or the highest debt from the table. It’s an easy query like this one:

SELECT MAX(gdp_per_capita), MAX(govt_debt)
FROM countries;

Which will return:

52409.00    196.50

But that’s not interesting. We don’t even know what countries and what years these values are associated with.

A standard SQL-92 (and also a standard relational) query to return all of these values would look something like this:

  'highest gdp per capita' AS what,
FROM countries c1
  FROM countries c2
  WHERE c1.gdp_per_capita < c2.gdp_per_capita
  'highest government debt' AS what,
FROM countries c1
  FROM countries c2
  WHERE c1.govt_debt < c2.govt_debt

In essence, we select those rows for which there doesn’t exist any other row with a higher value for either gdp_per_capita (first subselect) or govt_debt (second subselect).

Trick! Use quantified comparison predicates!

If your database supports quantified comparison predicates, then you can write this a bit more concisely like this:

  'highest gdp per capita' AS what,
FROM countries
WHERE gdp_per_capita >= ALL (
  SELECT gdp_per_capita FROM countries
  'highest government debt' AS what,
FROM countries
WHERE govt_debt >= ALL (
  SELECT govt_debt FROM countries

Which is essentially the same as…

  'highest gdp per capita' AS what,
FROM countries
WHERE gdp_per_capita = (
  SELECT MAX(gdp_per_capita) FROM countries
  'highest government debt' AS what,
FROM countries
WHERE govt_debt = (
  SELECT MAX(govt_debt) FROM countries

The output would be:

what                     code year       gdp    debt
highest gdp per capita   CA   2012  52409.00   53.50
highest government debt  JP   2012  46548.00  196.50

That’s a lot of SQL for only little analysis capability, and somehow, it just doesn’t feel entirely right to query the same table four times with all these subselects!


This is where window functions come into play, and in this particular case, FIRST_VALUE() or LAST_VALUE(). For now, let’s focus on calculating the maximum GDP per capita from the data set:

  FIRST_VALUE (code)           OVER (w_gdp) AS max_gdp_code,
  FIRST_VALUE (year)           OVER (w_gdp) AS max_gdp_year,
  FIRST_VALUE (gdp_per_capita) OVER (w_gdp) AS max_gdp_gdp,
  FIRST_VALUE (govt_debt)      OVER (w_gdp) AS max_gdp_debt
  w_gdp  AS (ORDER BY gdp_per_capita DESC)
  code, year

Notice how we make use of the SQL standard WINDOW clause, which is only currently supported by PostgreSQL and Sybase SQL Anywhere.

If you’re using Oracle or any other commercial database, you can simply substitute the window reference w_gdp into the various OVER() clauses to achieve equivalent behaviour – or you can use jOOQ’s WINDOW clause support and let jOOQ do the same for you.

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

The above query will not produce any aggregates, but it will add the values for the country / year with the highest GDP per capita to every row in the table:

each country             highest per year
CA 2009 40764.00 51.30   CA 2012 52409.00 53.50
CA 2010 47465.00 51.40   CA 2012 52409.00 53.50
CA 2011 51791.00 52.50   CA 2012 52409.00 53.50
CA 2012 52409.00 53.50   CA 2012 52409.00 53.50

This is extremely interesting because the data is not yet aggregated – the original data set remains unchanged, enriched with new computed columns.

You can then further process things, e.g. compare each country / year with the highest GDP per capita and with the highest debt per GDP of that country / year:

  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp) , '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt      / FIRST_VALUE (govt_debt)      OVER (w_debt), '999.99 %') debt_rank
  w_gdp  AS (PARTITION BY year ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY year ORDER BY govt_debt DESC)
  code, year

Notice how I’ve added PARTITION BY to the window definitions of the WINDOW clause. I’ve done this because I want to partition the data set by year, in order to find the highest GDP / debt values for each year, not for the whole data set.

The outcome of the above query can then be seen here:

country                   percentages
CA   2009  40764   51.3    86.73%   30.76%
CA   2010  47465   51.4    98.15%   29.41%
CA   2011  51791   52.5   100.00%   27.70%
CA   2012  52409   53.5   100.00%   27.23%
DE   2009  40270   47.6    85.68%   28.54%
DE   2010  40408   55.5    83.56%   31.75%
DE   2011  44355   55.1    85.64%   29.08%
DE   2012  42598   56.9    81.28%   28.96%
FR   2009  40488   85.0    86.15%   50.96%
FR   2010  39448   89.2    81.57%   51.03%
FR   2011  42578   93.2    82.21%   49.18%
FR   2012  39759  103.8    75.86%   52.82%
GB   2009  35455  121.3    75.44%   72.72%
GB   2010  36573   85.2    75.63%   48.74%
GB   2011  38927   99.6    75.16%   52.56%
GB   2012  38649  103.2    73.74%   52.52%
IT   2009  35724  121.3    76.01%   72.72%
IT   2010  34673  119.9    71.70%   68.59%
IT   2011  36988  113.0    71.42%   59.63%
IT   2012  33814  131.1    64.52%   66.72%
JP   2009  39473  166.8    83.99%  100.00%
JP   2010  43118  174.8    89.16%  100.00%
JP   2011  46204  189.5    89.21%  100.00%
JP   2012  46548  196.5    88.82%  100.00%
RU   2009  8616     8.7    18.33%    5.22%
RU   2010  10710    9.1    22.15%    5.21%
RU   2011  13324    9.3    25.73%    4.91%
RU   2012  14091    9.4    26.89%    4.78%
US   2009  46999   76.3   100.00%   45.74%
US   2010  48358   85.6   100.00%   48.97%
US   2011  49855   90.1    96.26%   47.55%
US   2012  51755   93.8    98.75%   47.74%

We could say that among the G8 countries, Canada has really improved the most in the last years, decreasing their debt compared to the GDP on a global comparison, while at the same time increasing their GDP per capita on a global comparison.

Instead of partitioning the data set by year, we could also partition it by country, and find the best / worst year for each country over the years:

  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp), '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt / FIRST_VALUE (govt_debt) OVER (w_debt), '999.99 %') debt_rank
  w_gdp  AS (PARTITION BY code ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY code ORDER BY govt_debt DESC)
  code, year

The result would now look quite different:

country                    percentages
CA   2009  40764   51.3    77.78%   95.89%
CA   2010  47465   51.4    90.57%   96.07%
CA   2011  51791   52.5    98.82%   98.13%
CA   2012  52409   53.5   100.00%  100.00%
DE   2009  40270   47.6    90.79%   83.66%
DE   2010  40408   55.5    91.10%   97.54%
DE   2011  44355   55.1   100.00%   96.84%
DE   2012  42598   56.9    96.04%  100.00%
FR   2009  40488   85.0    95.09%   81.89%
FR   2010  39448   89.2    92.65%   85.93%
FR   2011  42578   93.2   100.00%   89.79%
FR   2012  39759  103.8    93.38%  100.00%
GB   2009  35455  121.3    91.08%  100.00%
GB   2010  36573   85.2    93.95%   70.24%
GB   2011  38927   99.6   100.00%   82.11%
GB   2012  38649  103.2    99.29%   85.08%
IT   2009  35724  121.3    96.58%   92.52%
IT   2010  34673  119.9    93.74%   91.46%
IT   2011  36988  113.0   100.00%   86.19%
IT   2012  33814  131.1    91.42%  100.00%
JP   2009  39473  166.8    84.80%   84.89%
JP   2010  43118  174.8    92.63%   88.96%
JP   2011  46204  189.5    99.26%   96.44%
JP   2012  46548  196.5   100.00%  100.00%
RU   2009   8616    8.7    61.15%   92.55%
RU   2010  10710    9.1    76.01%   96.81%
RU   2011  13324    9.3    94.56%   98.94%
RU   2012  14091    9.4   100.00%  100.00%
US   2009  46999   76.3    90.81%   81.34%
US   2010  48358   85.6    93.44%   91.26%
US   2011  49855   90.1    96.33%   96.06%
US   2012  51755   93.8   100.00%  100.00%

As you can see, most countries have now generally performed better in terms of GDP per capita over the years, and also most countries have almost strictly increased their own debt per GDP (except for Germany, France and Italy), except for the (United Kingdom). Russia and Canada have seen the most growth.

In the above examples, we’ve been mainly using FIRST_VALUE(). LAST_VALUE() is almost the opposite function with respect to ordering, much like MAX() is the opposite function of MIN(). I’m saying almost because there is a caveat when using LAST_VALUE() with ORDER BY, because a window definition that uses ORDER BY is implicitly equivalent to a window definition that uses ORDER BY with a so-called “frame clause”:

-- Find the "last" year over the complete data set
-- This may not behave as expected, so always provide
-- an explicit ORDER BY clause

-- These two are implicitly equivalent. We're not
-- looking for the "last" year in the complete data
-- set, but only in the frame that is "before" the
-- current row. In other words, the current row is
-- always the "last value"!
  ORDER BY year 
           AND CURRENT ROW

-- Find the "last" year in the complete data set with
-- explicit ordering
  ORDER BY year 

LEAD() and LAG()

The previous functions were about comparing values with the maximum / minimum (FIRST_VALUE() and LAST_VALUE()) within a data set. But using window functions, you can also compare things with the next / previous value. Or with the second next / second previous, etc. The functions used for this are called LEAD() (for the next value) and LAG() (for the previous value).

This is best explained by example:

-- Use this view as a data source containing
-- all the distinct years: 2009-2012
WITH years AS (
  FROM countries
  FIRST_VALUE (year)    OVER w_year AS first,
  LEAD        (year, 2) OVER w_year AS lead2,
  LEAD        (year)    OVER w_year AS lead1,
  LAG         (year)    OVER w_year AS lag1,
  LAG         (year, 2) OVER w_year AS lag2,
  LAST_VALUE  (year)    OVER w_year AS last
  w_year AS (
    ORDER BY year DESC

The result is now simply:

first  lead2  lead1  year   lag1   lag2   last
2012                 2009   2010   2011   2009
2012          2009   2010   2011   2012   2009
2012   2009   2010   2011   2012          2009
2012   2010   2011   2012                 2009

LEAD() and LAG() are really the best window functions to help understand the whole concept of window functions. For each year, you can see immediately how the previous and next year in the same window and frame can be generated using very simple function calls.

This could be used, for instance, to find the “neighboring” countries in terms of GDP per capita for every country / year:

  LEAD (code)           OVER w_gdp AS runner_up_code,
  LEAD (gdp_per_capita) OVER w_gdp AS runner_up_gdp,
  LAG  (code)           OVER w_gdp AS leader_code,
  LAG  (gdp_per_capita) OVER w_gdp AS leader_gdp
  w_gdp AS (PARTITION BY year ORDER BY gdp_per_capita DESC)
ORDER BY year DESC, gdp_per_capita DESC

Which returns:

year   country      runner-up    leader
2012   CA  52409    US  51755
2012   US  51755    JP  46548    CA  52409
2012   JP  46548    DE  42598    US  51755
2012   DE  42598    FR  39759    JP  46548
2012   FR  39759    GB  38649    DE  42598
2012   GB  38649    IT  33814    FR  39759
2012   IT  33814    RU  14091    GB  38649
2012   RU  14091                 IT  33814

2011   CA  51791    US  49855
2011   US  49855    JP  46204    CA  51791
2011   JP  46204    DE  44355    US  49855
2011   DE  44355    FR  42578    JP  46204
2011   FR  42578    GB  38927    DE  44355
2011   GB  38927    IT  36988    FR  42578
2011   IT  36988    RU  13324    GB  38927
2011   RU  13324                 IT  36988

2010   US  48358    CA  47465
2010   CA  47465    JP  43118    US  48358
2010   JP  43118    DE  40408    CA  47465
2010   DE  40408    FR  39448    JP  43118
2010   FR  39448    GB  36573    DE  40408
2010   GB  36573    IT  34673    FR  39448
2010   IT  34673    RU  10710    GB  36573
2010   RU  10710                 IT  34673

2009   US  46999    CA  40764
2009   CA  40764    FR  40488    US  46999
2009   FR  40488    DE  40270    CA  40764
2009   DE  40270    JP  39473    FR  40488
2009   JP  39473    IT  35724    DE  40270
2009   IT  35724    GB  35455    JP  39473
2009   GB  35455    RU   8616    IT  35724
2009   RU   8616                 GB  35455

If you want to do more fancy analyses, you could now compare percentages between leaders and runner-ups, etc. Another great use-case for LEAD() and LAG() can be seen in this article.


Window functions are an incredibly powerful feature that is available from all major commercial databases, and also from a couple of Open Source databases like PostgreSQL, Firebird, and CUBRID. There has essentially been SQL before window functions, and SQL after window functions.

With jOOQ, you can leverage window functions on a type safe level like anything else related to SQL. The last query we’ve seen can be written simply like this:

// Static import the generated tables and all
// of jOOQ's functions from DSL
import static org.jooq.example.db.postgres.Tables.*;
import static org.jooq.impl.DSL.*;

// Shorten the table reference by aliasing
Countries c = COUNTRIES;

// Specifiy a window definition
WindowDefinition w_gdp = 

// Write the query as if it were native SQL
           lead(c.CODE)          .over(w_gdp).as("runner_up_code"),
           lag (c.CODE)          .over(w_gdp).as("leader_code"),
           lag (c.GDP_PER_CAPITA).over(w_gdp).as("leader_gdp")
       .orderBy(c.YEAR.desc(), c.GDP_PER_CAPITA.desc())

The above program will output

|2012|CA  |      52409.00|US            |     51755.00|{null}     |    {null}|
|2012|US  |      51755.00|JP            |     46548.00|CA         |  52409.00|
|2012|JP  |      46548.00|DE            |     42598.00|US         |  51755.00|
|2012|DE  |      42598.00|FR            |     39759.00|JP         |  46548.00|
|2012|FR  |      39759.00|GB            |     38649.00|DE         |  42598.00|
|2012|GB  |      38649.00|IT            |     33814.00|FR         |  39759.00|
|2012|IT  |      33814.00|RU            |     14091.00|GB         |  38649.00|
|2012|RU  |      14091.00|{null}        |       {null}|IT         |  33814.00|
|2011|CA  |      51791.00|US            |     49855.00|{null}     |    {null}|
|2011|US  |      49855.00|JP            |     46204.00|CA         |  51791.00|
|2011|JP  |      46204.00|DE            |     44355.00|US         |  49855.00|
|2011|DE  |      44355.00|FR            |     42578.00|JP         |  46204.00|
|2011|FR  |      42578.00|GB            |     38927.00|DE         |  44355.00|
|2011|GB  |      38927.00|IT            |     36988.00|FR         |  42578.00|
|2011|IT  |      36988.00|RU            |     13324.00|GB         |  38927.00|
|2011|RU  |      13324.00|{null}        |       {null}|IT         |  36988.00|
|2010|US  |      48358.00|CA            |     47465.00|{null}     |    {null}|
|2010|CA  |      47465.00|JP            |     43118.00|US         |  48358.00|
|2010|JP  |      43118.00|DE            |     40408.00|CA         |  47465.00|
|2010|DE  |      40408.00|FR            |     39448.00|JP         |  43118.00|
|2010|FR  |      39448.00|GB            |     36573.00|DE         |  40408.00|
|2010|GB  |      36573.00|IT            |     34673.00|FR         |  39448.00|
|2010|IT  |      34673.00|RU            |     10710.00|GB         |  36573.00|
|2010|RU  |      10710.00|{null}        |       {null}|IT         |  34673.00|
|2009|US  |      46999.00|CA            |     40764.00|{null}     |    {null}|
|2009|CA  |      40764.00|FR            |     40488.00|US         |  46999.00|
|2009|FR  |      40488.00|DE            |     40270.00|CA         |  40764.00|
|2009|DE  |      40270.00|JP            |     39473.00|FR         |  40488.00|
|2009|JP  |      39473.00|IT            |     35724.00|DE         |  40270.00|
|2009|IT  |      35724.00|GB            |     35455.00|JP         |  39473.00|
|2009|GB  |      35455.00|RU            |      8616.00|IT         |  35724.00|
|2009|RU  |       8616.00|{null}        |       {null}|GB         |  35455.00|

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

No matter whether you’re using jOOQ for your database integration, or just plain SQL – start using window functions today.

Liked this article?

Read more about how ROW_NUMBER(), RANK(), and DENSE_RANK() work.

A RESTful JDBC HTTP Server built on top of jOOQ

The jOOQ ecosystem and community is continually growing. We’re personally always thrilled to see other Open Source projects built on top of jOOQ. Today, we’re very happy to introduce you to a very interesting approach at combining REST and RDBMS by Björn Harrtell.

bjorn-harrtellBjörn Harrtell is a swedish programmer since childhood. He is usually busy writing GIS systems and integrations at Sweco Position AB but sometimes he spends time getting involved in Open Source projects and contributing to a few pieces of work related to Open Source projects like GeoTools and OpenLayers. Björn has also initiated a few minor Open Source projects himself and one of the latest projects he’s been working on is jdbc-http-server.

We’re excited to publish Björn’s guest post introducing his interesting work:


Ever found yourself writing a lot of REST resources that do simple CRUD against a relational database and felt the code was repeating itself? In that case, jdbc-http-server might be a project worth checking out.

jdbc-http-server exposes a relational database instance as a discoverable REST API making it possible to perform simple CRUD from a browser application without requiring any backend code to be written.

A discoverable REST API means you can access the root resource at / and follow links to subresources from there. For example, let’s say you have a database named testdb with a table named testtable in the public schema you can then do the following operations:

Retrieve (GET), update (PUT) or delete (DELETE) a single row at:


Retrieve (GET), update (PUT) rows or create a new row (POST) at:


The above resources accepts parameters select, where, limit, offset
and orderby where applicable. Examples:

GET a maximum of 10 rows where cost>100 at:


jdbc-http-server is database engine agnostic since it utilizes jOOQ to generate SQL in a dialect suited to the target database engine. At the moment H2, PostgreSQL and HSQLDB are covered by automated tests. Currently the only available representation data format is JSON but adding more is an interesting possibility.

Feedback and, of course, contributions are welcome :)

Let’s Stream a Map in Java 8 with jOOλ

I wanted to find an easy way to stream a Map in Java 8. Guess what? There isn’t!

What I would’ve expected for convenience is the following method:

public interface Map<K, V> {

    default Stream<Entry<K, V>> stream() {
        return entrySet().stream();

But there’s no such method. There are probably a variety of reasons why such a method shouldn’t exist, e.g.:

  • There’s no “clear” preference for entrySet() being chosen over keySet() or values(), as a stream source
  • Map isn’t really a collection. It’s not even an Iterable
  • That wasn’t the design goal
  • The EG didn’t have enough time

Well, there is a very compelling reason for Map to have been retrofitted to provide both an entrySet().stream() and to finally implement Iterable<Entry<K, V>>. And that reason is the fact that we now have Map.forEach():

default void forEach(
        BiConsumer<? super K, ? super V> action) {
    for (Map.Entry<K, V> entry : entrySet()) {
        K k;
        V v;
        try {
            k = entry.getKey();
            v = entry.getValue();
        } catch(IllegalStateException ise) {
            // this usually means the entry is no longer in the map.
            throw new ConcurrentModificationException(ise);
        action.accept(k, v);

forEach() in this case accepts a BiConsumer that really consumes entries in the map. If you search through JDK source code, there are really very few references to the BiConsumer type outside of Map.forEach() and perhaps a couple of CompletableFuture methods and a couple of streams collection methods.

So, one could almost assume that BiConsumer was strongly driven by the needs of this forEach() method, which would be a strong case for making Map.Entry a more important type throughout the collections API (we would have preferred the type Tuple2, of course).

Let’s continue this line of thought. There is also Iterable.forEach():

public interface Iterable<T> {
    default void forEach(Consumer<? super T> action) {
        for (T t : this) {

Both Map.forEach() and Iterable.forEach() intuitively iterate the “entries” of their respective collection model, although there is a subtle difference:

  • Iterable.forEach() expects a Consumer taking a single value
  • Map.forEach() expects a BiConsumer taking two values: the key and the value (NOT a Map.Entry!)

Think about it this way:

This makes the two methods incompatible in a “duck typing sense”, which makes the two types even more different


Improving Map with jOOλ

We find that quirky and counter-intuitive. forEach() is really not the only use-case of Map traversal and transformation. We’d love to have a Stream<Entry<K, V>>, or even better, a Stream<Tuple2<T1, T2>>. So we implemented that in jOOλ, a library which we’ve developed for our integration tests at jOOQ. With jOOλ, you can now wrap a Map in a Seq type (“Seq” for sequential stream, a stream with many more functional features):

Map<Integer, String> map = new LinkedHashMap<>();
map.put(1, "a");
map.put(2, "b");
map.put(3, "c");

    tuple(1, "a"), 
    tuple(2, "b"), 
    tuple(3, "c")


What you can do with it? How about creating a new Map, swapping keys and values in one go:

     .toMap(Tuple2::v1, Tuple2::v2)

     .toMap(Tuple2::v2, Tuple2::v1)

Both of the above will yield:

{a=1, b=2, c=3}

Just for the record, here’s how to swap keys and values with standard JDK API:


It can be done, but the every day verbosity of standard Java API makes things a bit hard to read / write

Stop Claiming that you’re Using a Schemaless Database

One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does


We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

jOOQ: The best way to write SQL in Java


Get every new post delivered to your Inbox.

Join 2,711 other followers

%d bloggers like this: