Tag Archive | jooq

How to Integrate Commercial Third-Party Artefacts into Your Maven Build


According to a recent survey by ZeroTurnaround’s RebelLabs, Maven is still the leading Java build platform. The current market share distribution, according to RebelLabs is:

  • Maven with 64%
  • Ant + Ivy with 16.5%
  • Gradle with 11%

Yet, at the same time, Maven is often criticised for being a bit obscure and intrusive. Compared to runner-ups Ant and Gradle, Maven allows for only little flexibility with respect to interpretation and thus custom adaptation of the build model. Or as Tim Berglund from Data Stax would put it:

But let’s cut the jokes and have a look at a real-world issue:

Integrating Third-Party Commercial Artefacts

Not all third party artefacts that you would like to depend upon are available for free from Maven Central. Examples for this are commercial JDBC drivers, or the commercial jOOQ editions. There are essentially three ways to integrate such artefacts into your build:

Quick-and-dirty

Often, you only need the commercial dependency for a small test project or demo. You want to be sure that it works when you run it without depending on your local repository setup, or on network connectivity. This is a good use-case for <scope>system</scope>:

For instance: jOOQ

<dependency>
  <groupId>org.jooq</groupId>
  <artifactId>jooq</artifactId>
  <version>${jooq.version}</version>
  <scope>system</scope>
  <systemPath>${basedir}/lib/jooq-${jooq.version}.jar</systemPath>
</dependency>

For instance: Microsoft SQL JDBC

<dependency>
  <groupId>com.microsoft.sqlserver</groupId>
  <artifactId>sqljdbc4</artifactId>
  <version>3.0</version>
  <scope>system</scope>
  <systemPath>${basedir}/lib/sqljdbc4.jar</systemPath>

  <!-- Notice that we can still put "optional"
       on commercial JDBC driver dependencies -->
  <optional>true</optional>
</dependency>

Advantages of this approach

This is really a very easy solution when you want to have a local, self-contained module that is guaranteed to run immediately after checkout from source control, without additional configuration and setup. Don’t forget to check in the libraries into source control first, of course.

Disadvantages of this appraoch

The system dependencies are never transitively inherited. If your module depends on jOOQ this way, your module’s dependencies won’t see the jOOQ API.

Details about system dependencies can be seen in the Maven documentation. Citing from the documentation:

Dependencies with the scope system are always available and are not looked up in repository. They are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus, system dependencies are especially useful for resolving dependencies on artifacts which are now provided by the JDK, but where available as separate downloads earlier. Typical example are the JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).

A bit more robust

An approach that might appear to be a bit more robust is to check out the dependencies from your version control system and then “manually” import them to your local repository. This will make them available to your own local build. The following shell scripts show how you can import, for instance, the jOOQ artefacts into your local repository

Windows Batch

@echo off
set VERSION=3.4.4

if exist jOOQ-javadoc\jooq-%VERSION%-javadoc.jar (
  set JAVADOC_JOOQ=-Djavadoc=jOOQ-javadoc\jooq-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_META=-Djavadoc=jOOQ-javadoc\jooq-meta-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN_MAVEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-maven-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_SCALA=-Djavadoc=jOOQ-javadoc\jooq-scala-%VERSION%-javadoc.jar
)

if exist jOOQ-src\jooq-%VERSION%-sources.jar (
  set SOURCES_JOOQ=-Dsources=jOOQ-src\jooq-%VERSION%-sources.jar
  set SOURCES_JOOQ_META=-Dsources=jOOQ-src\jooq-meta-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN=-Dsources=jOOQ-src\jooq-codegen-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN_MAVEN=-Dsources=jOOQ-src\jooq-codegen-maven-%VERSION%-sources.jar
  set SOURCES_JOOQ_SCALA=-Dsources=jOOQ-src\jooq-scala-%VERSION%-sources.jar
)

call mvn install:install-file -Dfile=jOOQ-pom\pom.xml                          -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=%VERSION% -Dpackaging=pom
call mvn install:install-file -Dfile=jOOQ-lib\jooq-%VERSION%.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ%               %SOURCES_JOOQ%              -DpomFile=jOOQ-pom\jooq\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-meta-%VERSION%.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_META%          %SOURCES_JOOQ_META%         -DpomFile=jOOQ-pom\jooq-meta\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-%VERSION%.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN%       %SOURCES_JOOQ_CODEGEN%      -DpomFile=jOOQ-pom\jooq-codegen\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-maven-%VERSION%.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN_MAVEN% %SOURCES_JOOQ_CODEGEN_META% -DpomFile=jOOQ-pom\jooq-codegen-maven\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-scala-%VERSION%.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_SCALA%         %SOURCES_JOOQ_SCALA%        -DpomFile=jOOQ-pom\jooq-scala\pom.xml

Linux Shell

#!/bin/sh
VERSION=3.4.4

if [ -f jOOQ-javadoc/jooq-$VERSION-javadoc.jar ]; then
  JAVADOC_JOOQ=-Djavadoc=jOOQ-javadoc/jooq-$VERSION-javadoc.jar
  JAVADOC_JOOQ_META=-Djavadoc=jOOQ-javadoc/jooq-meta-$VERSION-javadoc.jar
  JAVADOC_JOOQ_CODEGEN=-Djavadoc=jOOQ-javadoc/jooq-codegen-$VERSION-javadoc.jar
  JAVADOC_JOOQ_CODEGEN_MAVEN=-Djavadoc=jOOQ-javadoc/jooq-codegen-maven-$VERSION-javadoc.jar
  JAVADOC_JOOQ_SCALA=-Djavadoc=jOOQ-javadoc/jooq-scala-$VERSION-javadoc.jar
fi

if [ -f jOOQ-src/jooq-$VERSION-sources.jar ]; then
  SOURCES_JOOQ=-Dsources=jOOQ-src/jooq-$VERSION-sources.jar
  SOURCES_JOOQ_META=-Dsources=jOOQ-src/jooq-meta-$VERSION-sources.jar
  SOURCES_JOOQ_CODEGEN=-Dsources=jOOQ-src/jooq-codegen-$VERSION-sources.jar
  SOURCES_JOOQ_CODEGEN_MAVEN=-Dsources=jOOQ-src/jooq-codegen-maven-$VERSION-sources.jar
  SOURCES_JOOQ_SCALA=-Dsources=jOOQ-src/jooq-scala-$VERSION-sources.jar
fi

mvn install:install-file -Dfile=jOOQ-pom/pom.xml                         -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=$VERSION -Dpackaging=pom
mvn install:install-file -Dfile=jOOQ-lib/jooq-$VERSION.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ               $SOURCES_JOOQ              -DpomFile=jOOQ-pom/jooq/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-meta-$VERSION.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_META          $SOURCES_JOOQ_META         -DpomFile=jOOQ-pom/jooq-meta/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-$VERSION.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN       $SOURCES_JOOQ_CODEGEN      -DpomFile=jOOQ-pom/jooq-codegen/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-maven-$VERSION.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN_MAVEN $SOURCES_JOOQ_CODEGEN_META -DpomFile=jOOQ-pom/jooq-codegen-maven/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-scala-$VERSION.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_SCALA         $SOURCES_JOOQ_SCALA        -DpomFile=jOOQ-pom/jooq-scala/pom.xml

The above scripts essentially check if any of Javadoc, Sources, and/or binaries are available in the distribution, and then install:

  • The parent pom.xml
  • The various artefact binaries, sources, javadocs, and pom.xml files

Advantages of this approach

Dependencies can now be referenced like any other type of dependency, as the artefacts are registered in your local repository. Moreover, they’re also available to your module’s own dependencies, transitively – which is probably what you want when you’re using jOOQ. Here’s how you’d then specify the dependencies:

<dependency>
  <groupId>org.jooq</groupId>
  <artifactId>jooq</artifactId>
  <version>${jooq.version}</version>
</dependency>

<dependency>
  <groupId>com.microsoft.sqlserver</groupId>
  <artifactId>sqljdbc4</artifactId>
  <version>3.0</version>
  <scope>provided</scope>
</dependency>

Disadvantages of this approach

There is a manual step involved in the installation of the dependencies. If you don’t have the above scripts readily available, it can be quite tedious to figure out exactly how to import all those dependencies step by step into your repository. Specifically if you’re running a demo or prototype, this may lead to unexpected compilation failure in the worst moments.

The way to go

In an actual project setup, obviously, neither of the above approaches will be sufficient, and you’ll probably import the libraries into your local Nexus or Bintray or whatever repository you’re using. Just beware of potential restrictions on distribution that commercial deliverables may have.

A small tutorial about how to install artefacts into Nexus can be found here.

jOOQ Newsletter: November 28, 2014 – Black Friday jOOQ Sale – Only Today!


Subscribe to this newsletter here

Tweet of the Day and Webinar with Arun Gupta from Red Hat

Today, we have a very special Tweet of the Day by Oliver Hubaut who expresses what we believe is a general feeling in the industry. He says:

There is a lot of truth in his statement, albeit perhaps not the one he intended. Many people have misinterpreted JPA in the past, believing that it will be a full replacement for SQL. This couldn’t be farther from the truth.

Gavin King, the creator of Hibernate has told us the following:

… and this is also the point we’re trying to make. Join us next week on Wednesday, December 3 when we meet with Arun Gupta from Red Hat for his Webinar about JPA and jOOQ. If you have any questions that you’d like us to talk about, ask them here:

https://github.com/javaee-samples/webinars/issues/4

Black Friday Sale: Get 20% off any jOOQ purchase, today!

We’re participating in the Black Friday sale tradition and give you an incredible 20% off your purchase of any jOOQ license that you order today, Black Friday, November 28, 2014.

Ask your manager today to treat you to a wonderful pre-christmas gift! Don’t waste time, act quickly and order jOOQ licenses right now:

http://www.jooq.org/black-friday

jOOQ 3.5 released

If you’ve been following the jOOQ User Group, you’ve heard it already. Last Friday, we’ve shipped the awesome jOOQ 3.5 with loads of new features!

The new Binding SPI

The main improvement of this exciting release is the new org.jooq.Binding SPI which can be used to fully control all aspects of a user-type’s JDBC interaction. This goes much further than the existing org.jooq.Converter SPI that can be used to map standard JDBC types to user-types. With the new Binding SPI, virtually *ALL* vendor-specific types can be supported now. Examples include PostgreSQL’s JSON or HSTORE types, or Oracle’s DATE type – which is really incorrectly represented via java.sql.Timestamp, which is why we have retrofitted the existing <dateAsTimestamp/> feature to use such bindings, now.

Stored procedures are everywhere

Stored procedure support was generally improved in this release. This includes lots of new little features and conveniences for use with PL/SQL or Transact-SQL. For instance, jOOQ 3.5.0 now supports cross-schema references of PL/SQL OBJECT and TABLE types, which allows for binding directly to Oracle Spatial. We’ve blogged about this exciting improvement here:
http://blog.jooq.org/2014/11/04/painless-access-from-java-to-plsql-procedures-with-jooq/

And while we were at it, we’ve also added basic support for Oracle AQ, which integrates very nicely with our OBJECT type support!

In Transact-SQL and MySQL, we now support fetching arbitrary numbers of Results from stored procedures, and we’ve also implemented support for Firebird PSQL, including Firebird’s very interesting syntax for table-valued functions.

By the way, we support user-defined aggregate functions for a variety of databases, including Oracle, PostgreSQL, and HSQLDB. Definitely something you should look into!

SQL improvements

In this release, we’ve finally got support for UNION, INTERSECT, and EXCEPT right with respect to nesting such operations, as well as combining them with ORDER BY and LIMIT .. OFFSET.

Let’s talk some more DDL

We’ve continued to add support for DDL statements, including

  • CREATE TABLE
  • CREATE TABLE AS SELECT
  • CREATE VIEW and DROP VIEW
  • CREATE INDEX and DROP INDEX
  • CREATE SEQUENCE and DROP SEQUENCE
  • DROP .. IF EXISTS

We’ll continue to add support for more DDL statements also in the future.

Code generation improvements

We’ve added support for the new XMLDatabase, a code generation configuration that allows to read meta information from XML formats, e.g. from a standard INFORMATION_SCHEMA.xml, or from Vertabelo’s XML export format:
http://blog.jooq.org/2014/09/05/importing-your-erd-export-into-jooq/

Future versions of jOOQ will include built-in support for a variety of XML formats.

We’ve had an awesome contribution by Etienne Studer from Gradleware to help our users integrate the jOOQ code generation with their Gradle builds.

Last but not least: Informix!

Oh, and by the way, we now also support IBM’s second most popular database: Informix. Support for this database will be included in the jOOQ Enterprise Edition.

More information can be found here:
http://www.jooq.org/notes

jOOQ 3.2 End of Life

While 3.5 is out, 3.2 is now more than one year old, which means that it has reached its end of life. We’ll be shipping a last patch update 3.2.7 in early December. After that, we advise all our customers and users to upgrade to a newer minor release.

Do you want to stay on the 3.2 release? No problem, contact our sales team and we’ll find a solution for you.

Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()


If you’re using a commercial database or PostgreSQL / Firebird / CUBRID, you will be able to take advantage of the full power of window functions. We’ve blogged about window functions’ awesomeness a couple of times, in particular about ROW_NUMBER(), RANK(), DENSE_RANK().

Today, we’re going to look into some awesome window functions that produce values of other rows that are positioned before or after the current row.

Setting up the test data

We’re going to do some interesting statistics today using publicly available data from the World Bank. To keep things simple, we’ll only do analyses for the G8 countries:

  • Canada (CA)
  • France (FR)
  • Germany (DE)
  • Italy (IT)
  • Japan (JP)
  • Russian Federation (RU)
  • United Kingdom (GB)
  • United States (US)

And for those countries, let’s consider the following data points for the years 2009-2012:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

Central government debt, total (% of GDP)

          2009    2010    2011    2012
CA        51.3    51.4    52.5    53.5	
DE        47.6    55.5    55.1    56.9	
FR        85.0    89.2    93.2   103.8	
GB        71.7    85.2    99.6   103.2	
IT       121.3   119.9   113.0   131.1	
JP       166.8   174.8   189.5   196.5	
RU         8.7     9.1     9.3     9.4	
US        76.3    85.6    90.1    93.8	

Let’s put all that data into a fact table like so (PostgreSQL syntax):

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL,
  govt_debt DECIMAL(10, 2) NOT NULL
);

INSERT INTO countries
VALUES ('CA', 2009, 40764, 51.3),
       ('CA', 2010, 47465, 51.4),
       ('CA', 2011, 51791, 52.5),
       ('CA', 2012, 52409, 53.5),
       ('DE', 2009, 40270, 47.6),
       ('DE', 2010, 40408, 55.5),
       ('DE', 2011, 44355, 55.1),
       ('DE', 2012, 42598, 56.9),
       ('FR', 2009, 40488, 85.0),
       ('FR', 2010, 39448, 89.2),
       ('FR', 2011, 42578, 93.2),
       ('FR', 2012, 39759,103.8),
       ('GB', 2009, 35455,121.3),
       ('GB', 2010, 36573, 85.2),
       ('GB', 2011, 38927, 99.6),
       ('GB', 2012, 38649,103.2),
       ('IT', 2009, 35724,121.3),
       ('IT', 2010, 34673,119.9),
       ('IT', 2011, 36988,113.0),
       ('IT', 2012, 33814,131.1),
       ('JP', 2009, 39473,166.8),
       ('JP', 2010, 43118,174.8),
       ('JP', 2011, 46204,189.5),
       ('JP', 2012, 46548,196.5),
       ('RU', 2009,  8616,  8.7),
       ('RU', 2010, 10710,  9.1),
       ('RU', 2011, 13324,  9.3),
       ('RU', 2012, 14091,  9.4),
       ('US', 2009, 46999, 76.3),
       ('US', 2010, 48358, 85.6),
       ('US', 2011, 49855, 90.1),
       ('US', 2012, 51755, 93.8);

Start the querying fun

People who are used to SQL-92 syntax will be able to quickly find the highest GDP per capita or the highest debt from the table. It’s an easy query like this one:

SELECT MAX(gdp_per_capita), MAX(govt_debt)
FROM countries;

Which will return:

52409.00    196.50

But that’s not interesting. We don’t even know what countries and what years these values are associated with.

A standard SQL-92 (and also a standard relational) query to return all of these values would look something like this:

SELECT 
  'highest gdp per capita' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.gdp_per_capita < c2.gdp_per_capita
)
UNION ALL
SELECT
  'highest government debt' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.govt_debt < c2.govt_debt
)

In essence, we select those rows for which there doesn’t exist any other row with a higher value for either gdp_per_capita (first subselect) or govt_debt (second subselect).

Trick! Use quantified comparison predicates!

If your database supports quantified comparison predicates, then you can write this a bit more concisely like this:

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita >= ALL (
  SELECT gdp_per_capita FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt >= ALL (
  SELECT govt_debt FROM countries
)

Which is essentially the same as…

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita = (
  SELECT MAX(gdp_per_capita) FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt = (
  SELECT MAX(govt_debt) FROM countries
)

The output would be:

what                     code year       gdp    debt
----------------------------------------------------
highest gdp per capita   CA   2012  52409.00   53.50
highest government debt  JP   2012  46548.00  196.50

That’s a lot of SQL for only little analysis capability, and somehow, it just doesn’t feel entirely right to query the same table four times with all these subselects!

FIRST_VALUE() and LAST_VALUE()

This is where window functions come into play, and in this particular case, FIRST_VALUE() or LAST_VALUE(). For now, let’s focus on calculating the maximum GDP per capita from the data set:

SELECT
  countries.*,
  FIRST_VALUE (code)           OVER (w_gdp) AS max_gdp_code,
  FIRST_VALUE (year)           OVER (w_gdp) AS max_gdp_year,
  FIRST_VALUE (gdp_per_capita) OVER (w_gdp) AS max_gdp_gdp,
  FIRST_VALUE (govt_debt)      OVER (w_gdp) AS max_gdp_debt
FROM
  countries
WINDOW
  w_gdp  AS (ORDER BY gdp_per_capita DESC)
ORDER BY
  code, year

Notice how we make use of the SQL standard WINDOW clause, which is only currently supported by PostgreSQL and Sybase SQL Anywhere.

If you’re using Oracle or any other commercial database, you can simply substitute the window reference w_gdp into the various OVER() clauses to achieve equivalent behaviour – or you can use jOOQ’s WINDOW clause support and let jOOQ do the same for you.

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

The above query will not produce any aggregates, but it will add the values for the country / year with the highest GDP per capita to every row in the table:

each country             highest per year
-----------------------------------------------
CA 2009 40764.00 51.30   CA 2012 52409.00 53.50
CA 2010 47465.00 51.40   CA 2012 52409.00 53.50
CA 2011 51791.00 52.50   CA 2012 52409.00 53.50
CA 2012 52409.00 53.50   CA 2012 52409.00 53.50

This is extremely interesting because the data is not yet aggregated – the original data set remains unchanged, enriched with new computed columns.

You can then further process things, e.g. compare each country / year with the highest GDP per capita and with the highest debt per GDP of that country / year:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp) , '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt      / FIRST_VALUE (govt_debt)      OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY year ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY year ORDER BY govt_debt DESC)
ORDER BY
  code, year

Notice how I’ve added PARTITION BY to the window definitions of the WINDOW clause. I’ve done this because I want to partition the data set by year, in order to find the highest GDP / debt values for each year, not for the whole data set.

The outcome of the above query can then be seen here:

country                   percentages
------------------------------------------
CA   2009  40764   51.3    86.73%   30.76%
CA   2010  47465   51.4    98.15%   29.41%
CA   2011  51791   52.5   100.00%   27.70%
CA   2012  52409   53.5   100.00%   27.23%
DE   2009  40270   47.6    85.68%   28.54%
DE   2010  40408   55.5    83.56%   31.75%
DE   2011  44355   55.1    85.64%   29.08%
DE   2012  42598   56.9    81.28%   28.96%
FR   2009  40488   85.0    86.15%   50.96%
FR   2010  39448   89.2    81.57%   51.03%
FR   2011  42578   93.2    82.21%   49.18%
FR   2012  39759  103.8    75.86%   52.82%
GB   2009  35455  121.3    75.44%   72.72%
GB   2010  36573   85.2    75.63%   48.74%
GB   2011  38927   99.6    75.16%   52.56%
GB   2012  38649  103.2    73.74%   52.52%
IT   2009  35724  121.3    76.01%   72.72%
IT   2010  34673  119.9    71.70%   68.59%
IT   2011  36988  113.0    71.42%   59.63%
IT   2012  33814  131.1    64.52%   66.72%
JP   2009  39473  166.8    83.99%  100.00%
JP   2010  43118  174.8    89.16%  100.00%
JP   2011  46204  189.5    89.21%  100.00%
JP   2012  46548  196.5    88.82%  100.00%
RU   2009  8616     8.7    18.33%    5.22%
RU   2010  10710    9.1    22.15%    5.21%
RU   2011  13324    9.3    25.73%    4.91%
RU   2012  14091    9.4    26.89%    4.78%
US   2009  46999   76.3   100.00%   45.74%
US   2010  48358   85.6   100.00%   48.97%
US   2011  49855   90.1    96.26%   47.55%
US   2012  51755   93.8    98.75%   47.74%

We could say that among the G8 countries, Canada has really improved the most in the last years, decreasing their debt compared to the GDP on a global comparison, while at the same time increasing their GDP per capita on a global comparison.

Instead of partitioning the data set by year, we could also partition it by country, and find the best / worst year for each country over the years:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp), '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt / FIRST_VALUE (govt_debt) OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY code ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY code ORDER BY govt_debt DESC)
ORDER BY
  code, year

The result would now look quite different:

country                    percentages
------------------------------------------
CA   2009  40764   51.3    77.78%   95.89%
CA   2010  47465   51.4    90.57%   96.07%
CA   2011  51791   52.5    98.82%   98.13%
CA   2012  52409   53.5   100.00%  100.00%
DE   2009  40270   47.6    90.79%   83.66%
DE   2010  40408   55.5    91.10%   97.54%
DE   2011  44355   55.1   100.00%   96.84%
DE   2012  42598   56.9    96.04%  100.00%
FR   2009  40488   85.0    95.09%   81.89%
FR   2010  39448   89.2    92.65%   85.93%
FR   2011  42578   93.2   100.00%   89.79%
FR   2012  39759  103.8    93.38%  100.00%
GB   2009  35455  121.3    91.08%  100.00%
GB   2010  36573   85.2    93.95%   70.24%
GB   2011  38927   99.6   100.00%   82.11%
GB   2012  38649  103.2    99.29%   85.08%
IT   2009  35724  121.3    96.58%   92.52%
IT   2010  34673  119.9    93.74%   91.46%
IT   2011  36988  113.0   100.00%   86.19%
IT   2012  33814  131.1    91.42%  100.00%
JP   2009  39473  166.8    84.80%   84.89%
JP   2010  43118  174.8    92.63%   88.96%
JP   2011  46204  189.5    99.26%   96.44%
JP   2012  46548  196.5   100.00%  100.00%
RU   2009   8616    8.7    61.15%   92.55%
RU   2010  10710    9.1    76.01%   96.81%
RU   2011  13324    9.3    94.56%   98.94%
RU   2012  14091    9.4   100.00%  100.00%
US   2009  46999   76.3    90.81%   81.34%
US   2010  48358   85.6    93.44%   91.26%
US   2011  49855   90.1    96.33%   96.06%
US   2012  51755   93.8   100.00%  100.00%

As you can see, most countries have now generally performed better in terms of GDP per capita over the years, and also most countries have almost strictly increased their own debt per GDP (except for Germany, France and Italy), except for the (United Kingdom). Russia and Canada have seen the most growth.

In the above examples, we’ve been mainly using FIRST_VALUE(). LAST_VALUE() is almost the opposite function with respect to ordering, much like MAX() is the opposite function of MIN(). I’m saying almost because there is a caveat when using LAST_VALUE() with ORDER BY, because a window definition that uses ORDER BY is implicitly equivalent to a window definition that uses ORDER BY with a so-called “frame clause”:

-- Find the "last" year over the complete data set
-- This may not behave as expected, so always provide
-- an explicit ORDER BY clause
LAST_VALUE (year) OVER()

-- These two are implicitly equivalent. We're not
-- looking for the "last" year in the complete data
-- set, but only in the frame that is "before" the
-- current row. In other words, the current row is
-- always the "last value"!
LAST_VALUE (year) OVER(ORDER BY year)
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND CURRENT ROW
)

-- Find the "last" year in the complete data set with
-- explicit ordering
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND UNBOUNDED FOLLOWING
)

LEAD() and LAG()

The previous functions were about comparing values with the maximum / minimum (FIRST_VALUE() and LAST_VALUE()) within a data set. But using window functions, you can also compare things with the next / previous value. Or with the second next / second previous, etc. The functions used for this are called LEAD() (for the next value) and LAG() (for the previous value).

This is best explained by example:

-- Use this view as a data source containing
-- all the distinct years: 2009-2012
WITH years AS (
  SELECT DISTINCT year
  FROM countries
)
SELECT
  FIRST_VALUE (year)    OVER w_year AS first,
  LEAD        (year, 2) OVER w_year AS lead2,
  LEAD        (year)    OVER w_year AS lead1,
  year,
  LAG         (year)    OVER w_year AS lag1,
  LAG         (year, 2) OVER w_year AS lag2,
  LAST_VALUE  (year)    OVER w_year AS last
FROM
  years
WINDOW
  w_year AS (
    ORDER BY year DESC
    ROWS BETWEEN UNBOUNDED PRECEDING
             AND UNBOUNDED FOLLOWING
  )
ORDER BY year

The result is now simply:

first  lead2  lead1  year   lag1   lag2   last
----------------------------------------------
2012                 2009   2010   2011   2009
2012          2009   2010   2011   2012   2009
2012   2009   2010   2011   2012          2009
2012   2010   2011   2012                 2009

LEAD() and LAG() are really the best window functions to help understand the whole concept of window functions. For each year, you can see immediately how the previous and next year in the same window and frame can be generated using very simple function calls.

This could be used, for instance, to find the “neighboring” countries in terms of GDP per capita for every country / year:

SELECT
  year,
  code,
  gdp_per_capita,
  LEAD (code)           OVER w_gdp AS runner_up_code,
  LEAD (gdp_per_capita) OVER w_gdp AS runner_up_gdp,
  LAG  (code)           OVER w_gdp AS leader_code,
  LAG  (gdp_per_capita) OVER w_gdp AS leader_gdp
FROM
  countries
WINDOW
  w_gdp AS (PARTITION BY year ORDER BY gdp_per_capita DESC)
ORDER BY year DESC, gdp_per_capita DESC

Which returns:

year   country      runner-up    leader
------------------------------------------
2012   CA  52409    US  51755
2012   US  51755    JP  46548    CA  52409
2012   JP  46548    DE  42598    US  51755
2012   DE  42598    FR  39759    JP  46548
2012   FR  39759    GB  38649    DE  42598
2012   GB  38649    IT  33814    FR  39759
2012   IT  33814    RU  14091    GB  38649
2012   RU  14091                 IT  33814

2011   CA  51791    US  49855
2011   US  49855    JP  46204    CA  51791
2011   JP  46204    DE  44355    US  49855
2011   DE  44355    FR  42578    JP  46204
2011   FR  42578    GB  38927    DE  44355
2011   GB  38927    IT  36988    FR  42578
2011   IT  36988    RU  13324    GB  38927
2011   RU  13324                 IT  36988

2010   US  48358    CA  47465
2010   CA  47465    JP  43118    US  48358
2010   JP  43118    DE  40408    CA  47465
2010   DE  40408    FR  39448    JP  43118
2010   FR  39448    GB  36573    DE  40408
2010   GB  36573    IT  34673    FR  39448
2010   IT  34673    RU  10710    GB  36573
2010   RU  10710                 IT  34673

2009   US  46999    CA  40764
2009   CA  40764    FR  40488    US  46999
2009   FR  40488    DE  40270    CA  40764
2009   DE  40270    JP  39473    FR  40488
2009   JP  39473    IT  35724    DE  40270
2009   IT  35724    GB  35455    JP  39473
2009   GB  35455    RU   8616    IT  35724
2009   RU   8616                 GB  35455

If you want to do more fancy analyses, you could now compare percentages between leaders and runner-ups, etc. Another great use-case for LEAD() and LAG() can be seen in this article.

Conclusion

Window functions are an incredibly powerful feature that is available from all major commercial databases, and also from a couple of Open Source databases like PostgreSQL, Firebird, and CUBRID. There has essentially been SQL before window functions, and SQL after window functions.

With jOOQ, you can leverage window functions on a type safe level like anything else related to SQL. The last query we’ve seen can be written simply like this:

// Static import the generated tables and all
// of jOOQ's functions from DSL
import static org.jooq.example.db.postgres.Tables.*;
import static org.jooq.impl.DSL.*;

// Shorten the table reference by aliasing
Countries c = COUNTRIES;

// Specifiy a window definition
WindowDefinition w_gdp = 
  name("w_gdp").as(
    partitionBy(c.YEAR)
   .orderBy(c.GDP_PER_CAPITA.desc()
  )
);

// Write the query as if it were native SQL
System.out.println(
    DSL.using(conn)
       .select(
           c.YEAR,
           c.CODE,
           c.GDP_PER_CAPITA,
           lead(c.CODE)          .over(w_gdp).as("runner_up_code"),
           lead(c.GDP_PER_CAPITA).over(w_gdp).as("runner_up_gdp"),
           lag (c.CODE)          .over(w_gdp).as("leader_code"),
           lag (c.GDP_PER_CAPITA).over(w_gdp).as("leader_gdp")
       )
       .from(c)
       .window(w_gdp)
       .orderBy(c.YEAR.desc(), c.GDP_PER_CAPITA.desc())
       .fetch()
);

The above program will output

+----+----+--------------+--------------+-------------+-----------+----------+
|year|code|gdp_per_capita|runner_up_code|runner_up_gdp|leader_code|leader_gdp|
+----+----+--------------+--------------+-------------+-----------+----------+
|2012|CA  |      52409.00|US            |     51755.00|{null}     |    {null}|
|2012|US  |      51755.00|JP            |     46548.00|CA         |  52409.00|
|2012|JP  |      46548.00|DE            |     42598.00|US         |  51755.00|
|2012|DE  |      42598.00|FR            |     39759.00|JP         |  46548.00|
|2012|FR  |      39759.00|GB            |     38649.00|DE         |  42598.00|
|2012|GB  |      38649.00|IT            |     33814.00|FR         |  39759.00|
|2012|IT  |      33814.00|RU            |     14091.00|GB         |  38649.00|
|2012|RU  |      14091.00|{null}        |       {null}|IT         |  33814.00|
|2011|CA  |      51791.00|US            |     49855.00|{null}     |    {null}|
|2011|US  |      49855.00|JP            |     46204.00|CA         |  51791.00|
|2011|JP  |      46204.00|DE            |     44355.00|US         |  49855.00|
|2011|DE  |      44355.00|FR            |     42578.00|JP         |  46204.00|
|2011|FR  |      42578.00|GB            |     38927.00|DE         |  44355.00|
|2011|GB  |      38927.00|IT            |     36988.00|FR         |  42578.00|
|2011|IT  |      36988.00|RU            |     13324.00|GB         |  38927.00|
|2011|RU  |      13324.00|{null}        |       {null}|IT         |  36988.00|
|2010|US  |      48358.00|CA            |     47465.00|{null}     |    {null}|
|2010|CA  |      47465.00|JP            |     43118.00|US         |  48358.00|
|2010|JP  |      43118.00|DE            |     40408.00|CA         |  47465.00|
|2010|DE  |      40408.00|FR            |     39448.00|JP         |  43118.00|
|2010|FR  |      39448.00|GB            |     36573.00|DE         |  40408.00|
|2010|GB  |      36573.00|IT            |     34673.00|FR         |  39448.00|
|2010|IT  |      34673.00|RU            |     10710.00|GB         |  36573.00|
|2010|RU  |      10710.00|{null}        |       {null}|IT         |  34673.00|
|2009|US  |      46999.00|CA            |     40764.00|{null}     |    {null}|
|2009|CA  |      40764.00|FR            |     40488.00|US         |  46999.00|
|2009|FR  |      40488.00|DE            |     40270.00|CA         |  40764.00|
|2009|DE  |      40270.00|JP            |     39473.00|FR         |  40488.00|
|2009|JP  |      39473.00|IT            |     35724.00|DE         |  40270.00|
|2009|IT  |      35724.00|GB            |     35455.00|JP         |  39473.00|
|2009|GB  |      35455.00|RU            |      8616.00|IT         |  35724.00|
|2009|RU  |       8616.00|{null}        |       {null}|GB         |  35455.00|
+----+----+--------------+--------------+-------------+-----------+----------+

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

No matter whether you’re using jOOQ for your database integration, or just plain SQL – start using window functions today.

Liked this article?

Read more about how ROW_NUMBER(), RANK(), and DENSE_RANK() work.

A RESTful JDBC HTTP Server built on top of jOOQ


The jOOQ ecosystem and community is continually growing. We’re personally always thrilled to see other Open Source projects built on top of jOOQ. Today, we’re very happy to introduce you to a very interesting approach at combining REST and RDBMS by Björn Harrtell.

bjorn-harrtellBjörn Harrtell is a swedish programmer since childhood. He is usually busy writing GIS systems and integrations at Sweco Position AB but sometimes he spends time getting involved in Open Source projects and contributing to a few pieces of work related to Open Source projects like GeoTools and OpenLayers. Björn has also initiated a few minor Open Source projects himself and one of the latest projects he’s been working on is jdbc-http-server.

We’re excited to publish Björn’s guest post introducing his interesting work:

JDBC HTTP Server

Ever found yourself writing a lot of REST resources that do simple CRUD against a relational database and felt the code was repeating itself? In that case, jdbc-http-server might be a project worth checking out.

jdbc-http-server exposes a relational database instance as a discoverable REST API making it possible to perform simple CRUD from a browser application without requiring any backend code to be written.

A discoverable REST API means you can access the root resource at / and follow links to subresources from there. For example, let’s say you have a database named testdb with a table named testtable in the public schema you can then do the following operations:

Retrieve (GET), update (PUT) or delete (DELETE) a single row at:

/db/testdb/schemas/public/tables/testtable/rows/1

Retrieve (GET), update (PUT) rows or create a new row (POST) at:

/db/testdb/schemas/public/tables/testtable/rows

The above resources accepts parameters select, where, limit, offset
and orderby where applicable. Examples:

GET a maximum of 10 rows where cost>100 at:

/db/testdb/schemas/public/tables/testtable/rows?where=cost>100&limit=10

jdbc-http-server is database engine agnostic since it utilizes jOOQ to generate SQL in a dialect suited to the target database engine. At the moment H2, PostgreSQL and HSQLDB are covered by automated tests. Currently the only available representation data format is JSON but adding more is an interesting possibility.

Feedback and, of course, contributions are welcome :)

Let’s Stream a Map in Java 8 with jOOλ


I wanted to find an easy way to stream a Map in Java 8. Guess what? There isn’t!

What I would’ve expected for convenience is the following method:

public interface Map<K, V> {

    default Stream<Entry<K, V>> stream() {
        return entrySet().stream();
    }    
}

But there’s no such method. There are probably a variety of reasons why such a method shouldn’t exist, e.g.:

  • There’s no “clear” preference for entrySet() being chosen over keySet() or values(), as a stream source
  • Map isn’t really a collection. It’s not even an Iterable
  • That wasn’t the design goal
  • The EG didn’t have enough time

Well, there is a very compelling reason for Map to have been retrofitted to provide both an entrySet().stream() and to finally implement Iterable<Entry<K, V>>. And that reason is the fact that we now have Map.forEach():

default void forEach(
        BiConsumer<? super K, ? super V> action) {
    Objects.requireNonNull(action);
    for (Map.Entry<K, V> entry : entrySet()) {
        K k;
        V v;
        try {
            k = entry.getKey();
            v = entry.getValue();
        } catch(IllegalStateException ise) {
            // this usually means the entry is no longer in the map.
            throw new ConcurrentModificationException(ise);
        }
        action.accept(k, v);
    }
}

forEach() in this case accepts a BiConsumer that really consumes entries in the map. If you search through JDK source code, there are really very few references to the BiConsumer type outside of Map.forEach() and perhaps a couple of CompletableFuture methods and a couple of streams collection methods.

So, one could almost assume that BiConsumer was strongly driven by the needs of this forEach() method, which would be a strong case for making Map.Entry a more important type throughout the collections API (we would have preferred the type Tuple2, of course).

Let’s continue this line of thought. There is also Iterable.forEach():

public interface Iterable<T> {
    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }
}

Both Map.forEach() and Iterable.forEach() intuitively iterate the “entries” of their respective collection model, although there is a subtle difference:

  • Iterable.forEach() expects a Consumer taking a single value
  • Map.forEach() expects a BiConsumer taking two values: the key and the value (NOT a Map.Entry!)

Think about it this way:

This makes the two methods incompatible in a “duck typing sense”, which makes the two types even more different

Bummer!

Improving Map with jOOλ

We find that quirky and counter-intuitive. forEach() is really not the only use-case of Map traversal and transformation. We’d love to have a Stream<Entry<K, V>>, or even better, a Stream<Tuple2<T1, T2>>. So we implemented that in jOOλ, a library which we’ve developed for our integration tests at jOOQ. With jOOλ, you can now wrap a Map in a Seq type (“Seq” for sequential stream, a stream with many more functional features):

Map<Integer, String> map = new LinkedHashMap<>();
map.put(1, "a");
map.put(2, "b");
map.put(3, "c");

assertEquals(
  Arrays.asList(
    tuple(1, "a"), 
    tuple(2, "b"), 
    tuple(3, "c")
  ),

  Seq.seq(map).toList()
);

What you can do with it? How about creating a new Map, swapping keys and values in one go:

System.out.println(
  Seq.seq(map)
     .map(Tuple2::swap)
     .toMap(Tuple2::v1, Tuple2::v2)
);

System.out.println(
  Seq.seq(map)
     .toMap(Tuple2::v2, Tuple2::v1)
);

Both of the above will yield:

{a=1, b=2, c=3}

Just for the record, here’s how to swap keys and values with standard JDK API:

System.out.println(
  map.entrySet()
     .stream()
     .collect(Collectors.toMap(
         Map.Entry::getValue, 
         Map.Entry::getKey
     ))
);

It can be done, but the every day verbosity of standard Java API makes things a bit hard to read / write

Stop Claiming that you’re Using a Schemaless Database


One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
AS
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does

Murphy

We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

jOOQ: The best way to write SQL in Java

Don’t Miss out on Writing Java 8 SQL One-Liners with jOOλ or jOOQ


More and more people are catching up with the latest update to our platform by adopting functional programming also for their businesses.

At Data Geekery, we’re using Java 8 for our jOOQ integration tests, as using the new Streams API with lambda expressions makes generating ad-hoc test data so much easier.

However, we don’t feel that the JDK offers as much as it could, which is why we have also implemented and open-sourced jOOλ, a small utility library that patches those short-comings.

Note, we don’t aim to replace more sophisticated libraries like functionaljava. jOOλ is really just patching short-comings.

Putting lambdas to work with jOOλ or jOOQ

I’ve recently encountered this Stack Overflow question, which asked for streaming a result set with all columns into a single list. For example:

Input

+----+------------+------------+
| ID | FIRST_NAME | LAST_NAME  |
+----+------------+------------+
|  1 | Joslyn     | Vanderford |
|  2 | Rudolf     | Hux        |
+----+------------+------------+

Output

1
Joslyn
Vanderford
2
Rudolf
Hux

This is a typical school-book example for using functional programming rather than an iterative solution:

Iterative solution

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list = new ArrayList<>();

while (rs.next()) {
    for (int i = 0; i < meta.getColumnCount(); i++) {
        list.add(rs.getObject(i + 1));
    }
}

Truth is, the iterative solution isn’t all that bad, but let’s learn how this could be done with functional programming.

Using jOOλ

We’re using jOOλ for this example for a couple of reasons:

  • JDBC didn’t really adopt the new features. There is no simple ResultSet to Stream conversion, even if there should be.
  • Unfortunately, the new functional interfaces do not allow for throwing checked exceptions. The try .. catch blocks inside lambdas don’t exactly look nice
  • Interestingly, there is no way of generating a finite stream without also implementing an Iterator or Spliterator

So, here’s the plain code:

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list =
Seq.generate()
   .limitWhile(Unchecked.predicate(v -> rs.next()))
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))
   .toList()

So far, this looks about as verbose (or a bit more) than the iterative solution. As you can see, a couple of jOOλ extensions were needed here:

// This generate is a shortcut to generate an
// infinite stream with unspecified content
Seq.generate()

// This predicate-based stream termination
// unfortunately doesn't exist in the JDK
// Besides, the checked exception is wrapped in a
// RuntimeException by calling Unchecked.wrapper(...)
   .limitWhile(Unchecked.predicate(v -> rs.next()))

// Standard JDK flatmapping, producing a "nested"
// stream of column values for the "outer" stream
// of database rows
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))

// This is another convenience method that is more
// verbose to write with standard JDK code
   .toList()

Using jOOQ

jOOQ has even more convenience API to operate on result records of your SQL statement. Consider the following piece of logic:

ResultSet rs = ...;

List<Object> list =
DSL.using(connection)
   .fetch(rs)
   .stream()
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .collect(Collectors.toList());

Note that the above example is using standard JDK API, without resorting to jOOλ for convenience. If you want to use jOOλ with jOOQ, you could even write:

ResultSet rs = ...;

List<Object> list = 
Seq.seq(DSL.using(connection).fetch(rs))
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .toList();

Easy? I would say so! Let’s remember that this example:

  • Fetches a JDBC ResultSet into a Java Collection
  • Transforms each record in the result set into an array of column values
  • Transforms each array into a stream
  • Flattens that stream into a stream of streams
  • Collects all values into a single list

Whew!

Conclusion

We’re heading towards exciting times! It will take a while until all Java 8 idioms and functional thinking will feel “natural” to Java developers, also in the enterprise.

The idea of having a sort of data source that can be configured with pipelined data transformations expressed as lambda expressions to be evaluated lazily is very compelling, though. jOOQ is an API that encapsulates SQL data sources in a very fluent and intuitive way, but it doesn’t stop there. jOOQ produces regular JDK collections of records, which can be transformed out-of-the-box via the new streams API.

We believe that this will drastically change the way the Java ecosystem will think about data transformation. Stay tuned for more examples on this blog!

The Caveats of Dual-Licensing


We’ve been in business for more than one year now with our dual-licensing strategy for jOOQ. While this strategy has worked very well for us, it has also been a bit of a challenge for some of our customers. Today, we’re going to show you what caveats of dual-licensing we’ve run into.

Our dual-licensing strategy

For those of you not acquainted with our license model, just a brief reminder to get you into the subject. We mainly consider ourselves as a vendor of Open Source software. However, contrary to a variety of other companies like Gradleware or Red Hat, we don’t want to build our business model on support, which is a much tougher business than licensing. Why?

  • Support contracts need a lot more long-term trust by customers, and more outbound sales. There’s only little inbound interest for such contracts, as people don’t acquire support until they need it.
  • Vendor-supplied support competes with third-party support (as provided by UWS, for instance), which we want to actively encourage. We’d love to generate business for an entirely new market, not compete with our allies.

So we were looking for a solution involving commercial licensing. We wanted to keep an Open Source version of our product because:

  • We’ll get traction with Open Source licensing much much more quickly than with commercial licensing
  • While Open Source is a very tough competitor for vendors, it is also a great enabler for consumers. For instance, we’re using Java, Eclipse, H2, and much more. Great software for free!

It wouldn’t be honest to say that we truly believe in “free as in freedom” (libre), but we certainly believe in “free as in beer” (gratis) – because, who doesn’t. So, one very simple solution to meet the above goals was to offer jOOQ as Open Source with Open Source databases, and to offer jOOQ under a commercial license with commercial databases.

The Caveat

This was generally well received with our user base as a credible and viable dual-licensing model. But there were also caveats. All of a sudden, we didn’t have access to these distribution channels any more:

  • Maven Central
  • GitHub

… and our paying customers didn’t have access to these very useful OSS rights any more:

  • Source Code
  • Modifications

Solution 1 – Ship Source Code

Well, we actually ship our source code with the commercial binaries. At first, this was done merely for documentation purposes.

Regardless of the actual license constraints, when you’re in trouble, e.g. when your productive system is down and you have to urgently fix a bug, doesn’t it just suck if you don’t have access to the source code of third-party dependencies? You will just have to guess what it does. Or illegally de-compile it.

We don’t want to be that company. We trust our customers to deal responsibly with our source code.

Solution 2 – Allow Modifications

Our commercial licenses come in two flavours: Yearly and Perpetual. We quickly realised that some of our customers do not want to be dependent on us as a vendor. Perpetual licenses obviously help making customers more independent. But the disadvantage of perpetual licenses is the fact that vendors will not support old versions forever, and customers won’t have the right to upgrade to the next major release. While they are probably fine with not having access to new features, they would still like to receive an occasional bugfix.

The solution we’ve come to adopt is a very pragmatic one: Customers already have the source code (see above), so why not allow customers to also apply urgent fixes themselves? Obviously, such modifications will void the warranty offered by us, but if you buy jOOQ today and 5 years down the line, you discover a very subtle bug in what will then be an unsupported version of jOOQ… don’t you just want to fix it?

Conclusion

Dual-licensing is a tricky business. You partition your user-base into two:

  • The paying / premium customers
  • The “freemium” customers

By all means, you must prevent your premium customers from being at a disadvantage compared to your “freemium” customers. There are certain rights that are probably OK to remove (such as the right of free distribution). But there are rights that are just annoying not to have. And those rights are the rights that matter the most to the every day work of an engineer:

To fix that bloody bug :-)

We’re very curious: What are your opinions towards dual-licensing?

Asynchronous SQL Execution with jOOQ and Java 8’s CompletableFuture


Reactive programming is the new buzzword, which essentially just means asynchronous programming or messaging.

Fact is that functional syntax greatly helps with structuring asynchronous execution chains, and today, we’ll see how we can do this in Java 8 using jOOQ and the new CompletableFuture API.

In fact, things are quite simple:

// Initiate an asynchronous call chain
CompletableFuture

    // This lambda will supply an int value
    // indicating the number of inserted rows
    .supplyAsync(() -> DSL
        .using(configuration)
        .insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
        .values(3, "Hitchcock")
        .execute()
    )
                          
    // This will supply an AuthorRecord value
    // for the newly inserted author
    .handleAsync((rows, throwable) -> DSL
        .using(configuration)
        .fetchOne(AUTHOR, AUTHOR.ID.eq(3))
    )

    // This should supply an int value indicating
    // the number of rows, but in fact it'll throw
    // a constraint violation exception
    .handleAsync((record, throwable) -> {
        record.changed(true);
        return record.insert();
    })
    
    // This will supply an int value indicating
    // the number of deleted rows
    .handleAsync((rows, throwable) -> DSL
        .using(configuration)
        .delete(AUTHOR)
        .where(AUTHOR.ID.eq(3))
        .execute()
    )

    // This tells the calling thread to wait for all
    // chained execution units to be executed
    .join();

What did really happen here? Nothing out of the ordinary. There are 4 execution blocks:

  1. One that inserts a new AUTHOR
  2. One that fetches that same AUTHOR again
  3. One that re-inserts the newly fetched AUTHOR (throwing an exception)
  4. One that ignores the thrown exception and delets the AUTHOR again

Finally, when the execution chain is established, the calling thread will join the whole chain using the CompletableFuture.join() method, which is essentially the same as the Future.get() method, except that it doesn’t throw any checked exception.

Comparing this to other APIs

Other APIs like Scala’s Slick have implemented similar things via “standard API”, such as calls to flatMap(). We’re currently not going to mimick such APIs as we believe that the new Java 8 APIs will become much more idiomatic to native Java speakers. Specifically, when executing SQL, getting connection pooling and transactions right is of the essence. The semantics of asynchronously chained execution blocks and how they relate to transactions is very subtle. If you want a transaction to span more than one such block, you will have to encode this yourself via jOOQ’s Configuration and its contained ConnectionProvider.

Blocking JDBC

Obviously, there will always be one blocking barrier to such solutions, and that is JDBC itself – which is very hard to turn into an asynchronous API. In fact, few databases really support asynchronous query executions and cursors, as most often, a single database session can only be used by a single thread for a single query at a time.

We’d be very interested to learn about your asynchronous SQL querying requirements, so feel free to comment on this post!

Follow

Get every new post delivered to your Inbox.

Join 2,597 other followers

%d bloggers like this: