Use MySQL’s Strict Mode on all new Projects!


MySQL is a database that has been bending the SQL standard in ways that make it hard to move off MySQL. What may appear to be a clever technique for vendor lockin (or maybe just oversight of the standard) can be quite annoying in understanding the real meaning of the SQL language.

One such example is MySQL’s interpretation of how GROUP BY works. In MySQL, unlike any other database, you can put arbitrary expressions into your SELECT clause, even if they do not have a formal dependency on the GROUP BY expression. For instance:

SELECT employer, first_name, last_name
FROM employees
GROUP BY employer

This will work in MySQL, but what does it mean? If we only have one resulting record per employer, which one of the employees will be returned? The semantics of the above query is really this one:

SELECT employer, ARBITRARY(first_name), ARBITRARY(last_name)
FROM employees
GROUP BY employer

If we assume that there is such an aggregation function as ARBITRARY(). Some may claim that this can be used for some clever performance “optimisation”. I say: Don’t. This is so weakly specified, it is not even clear if the two references of this pseudo-ARBITRARY() aggregate function will produce values from the same record.

Just look at the number of Stack Overflow questions that evolve around the “not a GROUP BY expression” error:

I’m sure that parts of this damage that has been caused to a generation of SQL developers is due to the fact that this works in some databases.

ONLY_FULL_GROUP_BY

But there is a flag in MySQL called ONLY_FULL_GROUP_BY, and Morgan Tocker, the MySQL community manager suggests eventually turning it on by default.

MySQL community members tend to agree that this is a good decision in the long run.

While it is certainly very hard to turn this flag on for a legacy application, all new applications built on top of MySQL should make sure to turn on this flag. In fact, new applications should even consider turning on “strict SQL mode” entirely, to make sure they get a better, more modern SQL experience.

For more information about MySQL server modes, please consider the manual:

http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html

Don’t Migrate to MariaDB just yet. MySQL is Back!


Now that I have your attention, I’d like to invite you to a critical review of where we’re at in the MySQL vs. MariaDB debate. Around one month ago, I visited Oracle Open World 2014, and I’ve met with Morgan Tocker, the MySQL community manager at Oracle to learn about where MySQL is heading.

Who “is” MySQL

An interesting learning for myself is the fact that according to Morgan, there had been quite a few former MySQL AB employees that stayed with MySQL when Sun acquired the database, and that are still there now (perhaps after a short break), now that Oracle owns Sun and thus MySQL. In fact, right now as we speak, Oracle is pushing hard to get even more people on board of the MySQL team. When this article was written, there were 21 open MySQL jobs on this blog post dating February 25, 2014.

More details about Oracle’s plans to promote and push MySQL can be seen in this presentation by Morten Andersen:

oracle-mysql

So, if you’re still contemplating a migration to MariaDB, maybe give this all a second thought. MySQL is still the database that is being used by most large companies, including Facebook and LinkedIn. MySQL won’t go away and it will get much better in the next couple of years – maybe even to a point where it has a similar amount of features as PostgreSQL? We can only hope.

Need another convincing argument? You can now use Oracle Enterprise Manager for both Oracle and MySQL databases. How awesome is that!

enterprise-manager

Stay tuned for a couple of additional blog posts on this blog, about what’s new and what’s on the MySQL roadmap in the next years. See all of Morten Andersen’s slides here:

Have You Ever Wondered About the Difference Between NOT NULL and DEFAULT?


When writing DDL in SQL, you can specify a couple of constraints on columns, like NOT NULL or DEFAULT constraints. Some people might wonder, if the two constraints are actually redundant, i.e. is it still necessary to specify a NOT NULL constraint, if there is already a DEFAULT clause?

The answer is: Yes!

Yes, you should still specify that NOT NULL constraint. And no, the two constraints are not redundant. The answer I gave here on Stack Overflow wraps it up by example, which I’m going to repeat here on our blog:

DEFAULT is the value that will be inserted in the absence of an explicit value in an insert / update statement. Lets assume, your DDL did not have the NOT NULL constraint:

ALTER TABLE tbl 
  ADD COLUMN col VARCHAR(20) 
    DEFAULT "MyDefault"

Then you could issue these statements

-- 1. This will insert "MyDefault" 
--    into tbl.col
INSERT INTO tbl (A, B) 
  VALUES (NULL, NULL);

-- 2. This will insert "MyDefault" 
--    into tbl.col
INSERT INTO tbl (A, B, col) 
  VALUES (NULL, NULL, DEFAULT);

-- 3. This will insert "MyDefault"
--    into tbl.col
INSERT INTO tbl (A, B, col)
  DEFAULT VALUES;

-- 4. This will insert NULL
--    into tbl.col
INSERT INTO tbl (A, B, col)
  VALUES (NULL, NULL, NULL);

Alternatively, you can also use DEFAULT in UPDATE statements, according to the SQL-1992 standard:

-- 5. This will update "MyDefault"
--    into tbl.col
UPDATE tbl SET col = DEFAULT;

-- 6. This will update NULL 
--    into tbl.col
UPDATE tbl SET col = NULL;

Note, not all databases support all of these SQL standard syntaxes. Adding the NOT NULL constraint will cause an error with statements 4, 6, while 1-3, 5 are still valid statements. So to answer your question:

No, NOT NULL and DEFAULT are not redundant

That’s already quite interesting, so the DEFAULT constraint really only interacts with DML statements and how they specify the various columns that they’re updating. The NOT NULL constraint is a much more universal guarantee, that constraints a column’s content also “outside” of the manipulating DML statements.

For instance, if you have a set of data and then you add a DEFAULT constraint, this will not affect your existing data, only new data being inserted.

If, however, you have a set of data and then you add a NOT NULL constraint, you can actually only do so if the constraint is valid – i.e. when there are no NULL values in your column. Otherwise, an error will be raised.

Query performance

Another very interesting use case that applies only to NOT NULL constraints is their usefulness for query optimisers and query execution plans. Assume that you have such a constraint on your column and then, you’re using a NOT IN predicate:

SELECT *
FROM table
WHERE value NOT IN (
  SELECT not_nullable
  FROM other_table
)

In particular, when you’re using Oracle, the above query will be much faster when the not_nullable column has an index AND that particular constraint, because unfortunately, NULL values are not included in Oracle indexes.

Read more about NULL and NOT IN predicates here.

Don’t Miss out on Awesome SQL Power with FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG()


If you’re using a commercial database or PostgreSQL / Firebird / CUBRID, you will be able to take advantage of the full power of window functions. We’ve blogged about window functions’ awesomeness a couple of times, in particular about ROW_NUMBER(), RANK(), DENSE_RANK().

Today, we’re going to look into some awesome window functions that produce values of other rows that are positioned before or after the current row.

Setting up the test data

We’re going to do some interesting statistics today using publicly available data from the World Bank. To keep things simple, we’ll only do analyses for the G8 countries:

  • Canada (CA)
  • France (FR)
  • Germany (DE)
  • Italy (IT)
  • Japan (JP)
  • Russian Federation (RU)
  • United Kingdom (GB)
  • United States (US)

And for those countries, let’s consider the following data points for the years 2009-2012:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

Central government debt, total (% of GDP)

          2009    2010    2011    2012
CA        51.3    51.4    52.5    53.5	
DE        47.6    55.5    55.1    56.9	
FR        85.0    89.2    93.2   103.8	
GB        71.7    85.2    99.6   103.2	
IT       121.3   119.9   113.0   131.1	
JP       166.8   174.8   189.5   196.5	
RU         8.7     9.1     9.3     9.4	
US        76.3    85.6    90.1    93.8	

Let’s put all that data into a fact table like so (PostgreSQL syntax):

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL,
  govt_debt DECIMAL(10, 2) NOT NULL
);

INSERT INTO countries
VALUES ('CA', 2009, 40764, 51.3),
       ('CA', 2010, 47465, 51.4),
       ('CA', 2011, 51791, 52.5),
       ('CA', 2012, 52409, 53.5),
       ('DE', 2009, 40270, 47.6),
       ('DE', 2010, 40408, 55.5),
       ('DE', 2011, 44355, 55.1),
       ('DE', 2012, 42598, 56.9),
       ('FR', 2009, 40488, 85.0),
       ('FR', 2010, 39448, 89.2),
       ('FR', 2011, 42578, 93.2),
       ('FR', 2012, 39759,103.8),
       ('GB', 2009, 35455,121.3),
       ('GB', 2010, 36573, 85.2),
       ('GB', 2011, 38927, 99.6),
       ('GB', 2012, 38649,103.2),
       ('IT', 2009, 35724,121.3),
       ('IT', 2010, 34673,119.9),
       ('IT', 2011, 36988,113.0),
       ('IT', 2012, 33814,131.1),
       ('JP', 2009, 39473,166.8),
       ('JP', 2010, 43118,174.8),
       ('JP', 2011, 46204,189.5),
       ('JP', 2012, 46548,196.5),
       ('RU', 2009,  8616,  8.7),
       ('RU', 2010, 10710,  9.1),
       ('RU', 2011, 13324,  9.3),
       ('RU', 2012, 14091,  9.4),
       ('US', 2009, 46999, 76.3),
       ('US', 2010, 48358, 85.6),
       ('US', 2011, 49855, 90.1),
       ('US', 2012, 51755, 93.8);

Start the querying fun

People who are used to SQL-92 syntax will be able to quickly find the highest GDP per capita or the highest debt from the table. It’s an easy query like this one:

SELECT MAX(gdp_per_capita), MAX(govt_debt)
FROM countries;

Which will return:

52409.00    196.50

But that’s not interesting. We don’t even know what countries and what years these values are associated with.

A standard SQL-92 (and also a standard relational) query to return all of these values would look something like this:

SELECT 
  'highest gdp per capita' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.gdp_per_capita < c2.gdp_per_capita
)
UNION ALL
SELECT
  'highest government debt' AS what,
  c1.*
FROM countries c1
WHERE NOT EXISTS (
  SELECT 1
  FROM countries c2
  WHERE c1.govt_debt < c2.govt_debt
)

In essence, we select those rows for which there doesn’t exist any other row with a higher value for either gdp_per_capita (first subselect) or govt_debt (second subselect).

Trick! Use quantified comparison predicates!

If your database supports quantified comparison predicates, then you can write this a bit more concisely like this:

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita >= ALL (
  SELECT gdp_per_capita FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt >= ALL (
  SELECT govt_debt FROM countries
)

Which is essentially the same as…

SELECT 
  'highest gdp per capita' AS what,
  countries.*
FROM countries
WHERE gdp_per_capita = (
  SELECT MAX(gdp_per_capita) FROM countries
)
UNION ALL
SELECT
  'highest government debt' AS what,
  countries.*
FROM countries
WHERE govt_debt = (
  SELECT MAX(govt_debt) FROM countries
)

The output would be:

what                     code year       gdp    debt
----------------------------------------------------
highest gdp per capita   CA   2012  52409.00   53.50
highest government debt  JP   2012  46548.00  196.50

That’s a lot of SQL for only little analysis capability, and somehow, it just doesn’t feel entirely right to query the same table four times with all these subselects!

FIRST_VALUE() and LAST_VALUE()

This is where window functions come into play, and in this particular case, FIRST_VALUE() or LAST_VALUE(). For now, let’s focus on calculating the maximum GDP per capita from the data set:

SELECT
  countries.*,
  FIRST_VALUE (code)           OVER (w_gdp) AS max_gdp_code,
  FIRST_VALUE (year)           OVER (w_gdp) AS max_gdp_year,
  FIRST_VALUE (gdp_per_capita) OVER (w_gdp) AS max_gdp_gdp,
  FIRST_VALUE (govt_debt)      OVER (w_gdp) AS max_gdp_debt
FROM
  countries
WINDOW
  w_gdp  AS (ORDER BY gdp_per_capita DESC)
ORDER BY
  code, year

Notice how we make use of the SQL standard WINDOW clause, which is only currently supported by PostgreSQL and Sybase SQL Anywhere.

If you’re using Oracle or any other commercial database, you can simply substitute the window reference w_gdp into the various OVER() clauses to achieve equivalent behaviour – or you can use jOOQ’s WINDOW clause support and let jOOQ do the same for you.

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

The above query will not produce any aggregates, but it will add the values for the country / year with the highest GDP per capita to every row in the table:

each country             highest per year
-----------------------------------------------
CA 2009 40764.00 51.30   CA 2012 52409.00 53.50
CA 2010 47465.00 51.40   CA 2012 52409.00 53.50
CA 2011 51791.00 52.50   CA 2012 52409.00 53.50
CA 2012 52409.00 53.50   CA 2012 52409.00 53.50

This is extremely interesting because the data is not yet aggregated – the original data set remains unchanged, enriched with new computed columns.

You can then further process things, e.g. compare each country / year with the highest GDP per capita and with the highest debt per GDP of that country / year:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp) , '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt      / FIRST_VALUE (govt_debt)      OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY year ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY year ORDER BY govt_debt DESC)
ORDER BY
  code, year

Notice how I’ve added PARTITION BY to the window definitions of the WINDOW clause. I’ve done this because I want to partition the data set by year, in order to find the highest GDP / debt values for each year, not for the whole data set.

The outcome of the above query can then be seen here:

country                   percentages
------------------------------------------
CA   2009  40764   51.3    86.73%   30.76%
CA   2010  47465   51.4    98.15%   29.41%
CA   2011  51791   52.5   100.00%   27.70%
CA   2012  52409   53.5   100.00%   27.23%
DE   2009  40270   47.6    85.68%   28.54%
DE   2010  40408   55.5    83.56%   31.75%
DE   2011  44355   55.1    85.64%   29.08%
DE   2012  42598   56.9    81.28%   28.96%
FR   2009  40488   85.0    86.15%   50.96%
FR   2010  39448   89.2    81.57%   51.03%
FR   2011  42578   93.2    82.21%   49.18%
FR   2012  39759  103.8    75.86%   52.82%
GB   2009  35455  121.3    75.44%   72.72%
GB   2010  36573   85.2    75.63%   48.74%
GB   2011  38927   99.6    75.16%   52.56%
GB   2012  38649  103.2    73.74%   52.52%
IT   2009  35724  121.3    76.01%   72.72%
IT   2010  34673  119.9    71.70%   68.59%
IT   2011  36988  113.0    71.42%   59.63%
IT   2012  33814  131.1    64.52%   66.72%
JP   2009  39473  166.8    83.99%  100.00%
JP   2010  43118  174.8    89.16%  100.00%
JP   2011  46204  189.5    89.21%  100.00%
JP   2012  46548  196.5    88.82%  100.00%
RU   2009  8616     8.7    18.33%    5.22%
RU   2010  10710    9.1    22.15%    5.21%
RU   2011  13324    9.3    25.73%    4.91%
RU   2012  14091    9.4    26.89%    4.78%
US   2009  46999   76.3   100.00%   45.74%
US   2010  48358   85.6   100.00%   48.97%
US   2011  49855   90.1    96.26%   47.55%
US   2012  51755   93.8    98.75%   47.74%

We could say that among the G8 countries, Canada has really improved the most in the last years, decreasing their debt compared to the GDP on a global comparison, while at the same time increasing their GDP per capita on a global comparison.

Instead of partitioning the data set by year, we could also partition it by country, and find the best / worst year for each country over the years:

SELECT
  countries.*,
  TO_CHAR(100 * gdp_per_capita / FIRST_VALUE (gdp_per_capita) OVER (w_gdp), '999.99 %') gdp_rank,
  TO_CHAR(100 * govt_debt / FIRST_VALUE (govt_debt) OVER (w_debt), '999.99 %') debt_rank
FROM
  countries
WINDOW
  w_gdp  AS (PARTITION BY code ORDER BY gdp_per_capita DESC),
  w_debt AS (PARTITION BY code ORDER BY govt_debt DESC)
ORDER BY
  code, year

The result would now look quite different:

country                    percentages
------------------------------------------
CA   2009  40764   51.3    77.78%   95.89%
CA   2010  47465   51.4    90.57%   96.07%
CA   2011  51791   52.5    98.82%   98.13%
CA   2012  52409   53.5   100.00%  100.00%
DE   2009  40270   47.6    90.79%   83.66%
DE   2010  40408   55.5    91.10%   97.54%
DE   2011  44355   55.1   100.00%   96.84%
DE   2012  42598   56.9    96.04%  100.00%
FR   2009  40488   85.0    95.09%   81.89%
FR   2010  39448   89.2    92.65%   85.93%
FR   2011  42578   93.2   100.00%   89.79%
FR   2012  39759  103.8    93.38%  100.00%
GB   2009  35455  121.3    91.08%  100.00%
GB   2010  36573   85.2    93.95%   70.24%
GB   2011  38927   99.6   100.00%   82.11%
GB   2012  38649  103.2    99.29%   85.08%
IT   2009  35724  121.3    96.58%   92.52%
IT   2010  34673  119.9    93.74%   91.46%
IT   2011  36988  113.0   100.00%   86.19%
IT   2012  33814  131.1    91.42%  100.00%
JP   2009  39473  166.8    84.80%   84.89%
JP   2010  43118  174.8    92.63%   88.96%
JP   2011  46204  189.5    99.26%   96.44%
JP   2012  46548  196.5   100.00%  100.00%
RU   2009   8616    8.7    61.15%   92.55%
RU   2010  10710    9.1    76.01%   96.81%
RU   2011  13324    9.3    94.56%   98.94%
RU   2012  14091    9.4   100.00%  100.00%
US   2009  46999   76.3    90.81%   81.34%
US   2010  48358   85.6    93.44%   91.26%
US   2011  49855   90.1    96.33%   96.06%
US   2012  51755   93.8   100.00%  100.00%

As you can see, most countries have now generally performed better in terms of GDP per capita over the years, and also most countries have almost strictly increased their own debt per GDP (except for Germany, France and Italy), except for the (United Kingdom). Russia and Canada have seen the most growth.

In the above examples, we’ve been mainly using FIRST_VALUE(). LAST_VALUE() is almost the opposite function with respect to ordering, much like MAX() is the opposite function of MIN(). I’m saying almost because there is a caveat when using LAST_VALUE() with ORDER BY, because a window definition that uses ORDER BY is implicitly equivalent to a window definition that uses ORDER BY with a so-called “frame clause”:

-- Find the "last" year over the complete data set
-- This may not behave as expected, so always provide
-- an explicit ORDER BY clause
LAST_VALUE (year) OVER()

-- These two are implicitly equivalent. We're not
-- looking for the "last" year in the complete data
-- set, but only in the frame that is "before" the
-- current row. In other words, the current row is
-- always the "last value"!
LAST_VALUE (year) OVER(ORDER BY year)
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND CURRENT ROW
)

-- Find the "last" year in the complete data set with
-- explicit ordering
LAST_VALUE (year) OVER(
  ORDER BY year 
  ROWS BETWEEN UNBOUNDED PRECEDING 
           AND UNBOUNDED FOLLOWING
)

LEAD() and LAG()

The previous functions were about comparing values with the maximum / minimum (FIRST_VALUE() and LAST_VALUE()) within a data set. But using window functions, you can also compare things with the next / previous value. Or with the second next / second previous, etc. The functions used for this are called LEAD() (for the next value) and LAG() (for the previous value).

This is best explained by example:

-- Use this view as a data source containing
-- all the distinct years: 2009-2012
WITH years AS (
  SELECT DISTINCT year
  FROM countries
)
SELECT
  FIRST_VALUE (year)    OVER w_year AS first,
  LEAD        (year, 2) OVER w_year AS lead2,
  LEAD        (year)    OVER w_year AS lead1,
  year,
  LAG         (year)    OVER w_year AS lag1,
  LAG         (year, 2) OVER w_year AS lag2,
  LAST_VALUE  (year)    OVER w_year AS last
FROM
  years
WINDOW
  w_year AS (
    ORDER BY year DESC
    ROWS BETWEEN UNBOUNDED PRECEDING
             AND UNBOUNDED FOLLOWING
  )
ORDER BY year

The result is now simply:

first  lead2  lead1  year   lag1   lag2   last
----------------------------------------------
2012                 2009   2010   2011   2009
2012          2009   2010   2011   2012   2009
2012   2009   2010   2011   2012          2009
2012   2010   2011   2012                 2009

LEAD() and LAG() are really the best window functions to help understand the whole concept of window functions. For each year, you can see immediately how the previous and next year in the same window and frame can be generated using very simple function calls.

This could be used, for instance, to find the “neighboring” countries in terms of GDP per capita for every country / year:

SELECT
  year,
  code,
  gdp_per_capita,
  LEAD (code)           OVER w_gdp AS runner_up_code,
  LEAD (gdp_per_capita) OVER w_gdp AS runner_up_gdp,
  LAG  (code)           OVER w_gdp AS leader_code,
  LAG  (gdp_per_capita) OVER w_gdp AS leader_gdp
FROM
  countries
WINDOW
  w_gdp AS (PARTITION BY year ORDER BY gdp_per_capita DESC)
ORDER BY year DESC, gdp_per_capita DESC

Which returns:

year   country      runner-up    leader
------------------------------------------
2012   CA  52409    US  51755
2012   US  51755    JP  46548    CA  52409
2012   JP  46548    DE  42598    US  51755
2012   DE  42598    FR  39759    JP  46548
2012   FR  39759    GB  38649    DE  42598
2012   GB  38649    IT  33814    FR  39759
2012   IT  33814    RU  14091    GB  38649
2012   RU  14091                 IT  33814

2011   CA  51791    US  49855
2011   US  49855    JP  46204    CA  51791
2011   JP  46204    DE  44355    US  49855
2011   DE  44355    FR  42578    JP  46204
2011   FR  42578    GB  38927    DE  44355
2011   GB  38927    IT  36988    FR  42578
2011   IT  36988    RU  13324    GB  38927
2011   RU  13324                 IT  36988

2010   US  48358    CA  47465
2010   CA  47465    JP  43118    US  48358
2010   JP  43118    DE  40408    CA  47465
2010   DE  40408    FR  39448    JP  43118
2010   FR  39448    GB  36573    DE  40408
2010   GB  36573    IT  34673    FR  39448
2010   IT  34673    RU  10710    GB  36573
2010   RU  10710                 IT  34673

2009   US  46999    CA  40764
2009   CA  40764    FR  40488    US  46999
2009   FR  40488    DE  40270    CA  40764
2009   DE  40270    JP  39473    FR  40488
2009   JP  39473    IT  35724    DE  40270
2009   IT  35724    GB  35455    JP  39473
2009   GB  35455    RU   8616    IT  35724
2009   RU   8616                 GB  35455

If you want to do more fancy analyses, you could now compare percentages between leaders and runner-ups, etc. Another great use-case for LEAD() and LAG() can be seen in this article.

Conclusion

Window functions are an incredibly powerful feature that is available from all major commercial databases, and also from a couple of Open Source databases like PostgreSQL, Firebird, and CUBRID. There has essentially been SQL before window functions, and SQL after window functions.

With jOOQ, you can leverage window functions on a type safe level like anything else related to SQL. The last query we’ve seen can be written simply like this:

// Static import the generated tables and all
// of jOOQ's functions from DSL
import static org.jooq.example.db.postgres.Tables.*;
import static org.jooq.impl.DSL.*;

// Shorten the table reference by aliasing
Countries c = COUNTRIES;

// Specifiy a window definition
WindowDefinition w_gdp = 
  name("w_gdp").as(
    partitionBy(c.YEAR)
   .orderBy(c.GDP_PER_CAPITA.desc()
  )
);

// Write the query as if it were native SQL
System.out.println(
    DSL.using(conn)
       .select(
           c.YEAR,
           c.CODE,
           c.GDP_PER_CAPITA,
           lead(c.CODE)          .over(w_gdp).as("runner_up_code"),
           lead(c.GDP_PER_CAPITA).over(w_gdp).as("runner_up_gdp"),
           lag (c.CODE)          .over(w_gdp).as("leader_code"),
           lag (c.GDP_PER_CAPITA).over(w_gdp).as("leader_gdp")
       )
       .from(c)
       .window(w_gdp)
       .orderBy(c.YEAR.desc(), c.GDP_PER_CAPITA.desc())
       .fetch()
);

The above program will output

+----+----+--------------+--------------+-------------+-----------+----------+
|year|code|gdp_per_capita|runner_up_code|runner_up_gdp|leader_code|leader_gdp|
+----+----+--------------+--------------+-------------+-----------+----------+
|2012|CA  |      52409.00|US            |     51755.00|{null}     |    {null}|
|2012|US  |      51755.00|JP            |     46548.00|CA         |  52409.00|
|2012|JP  |      46548.00|DE            |     42598.00|US         |  51755.00|
|2012|DE  |      42598.00|FR            |     39759.00|JP         |  46548.00|
|2012|FR  |      39759.00|GB            |     38649.00|DE         |  42598.00|
|2012|GB  |      38649.00|IT            |     33814.00|FR         |  39759.00|
|2012|IT  |      33814.00|RU            |     14091.00|GB         |  38649.00|
|2012|RU  |      14091.00|{null}        |       {null}|IT         |  33814.00|
|2011|CA  |      51791.00|US            |     49855.00|{null}     |    {null}|
|2011|US  |      49855.00|JP            |     46204.00|CA         |  51791.00|
|2011|JP  |      46204.00|DE            |     44355.00|US         |  49855.00|
|2011|DE  |      44355.00|FR            |     42578.00|JP         |  46204.00|
|2011|FR  |      42578.00|GB            |     38927.00|DE         |  44355.00|
|2011|GB  |      38927.00|IT            |     36988.00|FR         |  42578.00|
|2011|IT  |      36988.00|RU            |     13324.00|GB         |  38927.00|
|2011|RU  |      13324.00|{null}        |       {null}|IT         |  36988.00|
|2010|US  |      48358.00|CA            |     47465.00|{null}     |    {null}|
|2010|CA  |      47465.00|JP            |     43118.00|US         |  48358.00|
|2010|JP  |      43118.00|DE            |     40408.00|CA         |  47465.00|
|2010|DE  |      40408.00|FR            |     39448.00|JP         |  43118.00|
|2010|FR  |      39448.00|GB            |     36573.00|DE         |  40408.00|
|2010|GB  |      36573.00|IT            |     34673.00|FR         |  39448.00|
|2010|IT  |      34673.00|RU            |     10710.00|GB         |  36573.00|
|2010|RU  |      10710.00|{null}        |       {null}|IT         |  34673.00|
|2009|US  |      46999.00|CA            |     40764.00|{null}     |    {null}|
|2009|CA  |      40764.00|FR            |     40488.00|US         |  46999.00|
|2009|FR  |      40488.00|DE            |     40270.00|CA         |  40764.00|
|2009|DE  |      40270.00|JP            |     39473.00|FR         |  40488.00|
|2009|JP  |      39473.00|IT            |     35724.00|DE         |  40270.00|
|2009|IT  |      35724.00|GB            |     35455.00|JP         |  39473.00|
|2009|GB  |      35455.00|RU            |      8616.00|IT         |  35724.00|
|2009|RU  |       8616.00|{null}        |       {null}|GB         |  35455.00|
+----+----+--------------+--------------+-------------+-----------+----------+

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

No matter whether you’re using jOOQ for your database integration, or just plain SQL – start using window functions today.

Liked this article?

Read more about how ROW_NUMBER(), RANK(), and DENSE_RANK() work.

Painless Access from Java to PL/SQL Procedures with jOOQ


PL/SQL is one of those things.

Most people try to stay clear of it. Few people really love it. I just happen to suffer from stockholm syndrome, since I’m working a lot with banks.

Even if the PL/SQL syntax and the tooling sometimes remind me of the good old times…

Fitzgerald, we need to rewind the tape and replace the PL/SQL cartridge.

“Fitzgerald, we’re cruisin’ for a bruisin’. I’ll rewind the tape.” – “Don’t have a cow, Lawrence. We can insert a new PL/SQL cartridge any time.”
Image in public domain

… I still believe that a procedural language (well, any language) combined with SQL can do miracles in terms of productiveness, performance and expressivity.

In this article, we’ll see later on, how we can achieve the same with SQL (and PL/SQL) in Java, using jOOQ.

But first, a little bit of history…

Accessing PL/SQL from Java

One of the biggest reasons why Java developers in particular refrain from writing their own PL/SQL code is because the interface between PL/SQL and Java – ojdbc – is a major pain. We’ll see in the following examples how that is.

Assume we’re working on an Oracle-port of the popular Sakila database (originally created for MySQL). This particular Sakila/Oracle port was implemented by DB Software Laboratory and published under the BSD license.

Here’s a partial view of that Sakila database.

Sakila-film-actor-category

ERD created with vertabelo.comlearn how to use Vertabelo with jOOQ

Now, let’s assume that we have an API in the database that doesn’t expose the above schema, but exposes a PL/SQL API instead. The API might look something like this:

CREATE TYPE LANGUAGE_T AS OBJECT (
  language_id SMALLINT,
  name CHAR(20),
  last_update DATE
);
/

CREATE TYPE LANGUAGES_T AS TABLE OF LANGUAGE_T;
/

CREATE TYPE FILM_T AS OBJECT (
  film_id int,
  title VARCHAR(255),
  description CLOB,
  release_year VARCHAR(4),
  language LANGUAGE_T,
  original_language LANGUAGE_T,
  rental_duration SMALLINT,
  rental_rate DECIMAL(4,2),
  length SMALLINT,
  replacement_cost DECIMAL(5,2),
  rating VARCHAR(10),
  special_features VARCHAR(100),
  last_update DATE
);
/

CREATE TYPE FILMS_T AS TABLE OF FILM_T;
/

CREATE TYPE ACTOR_T AS OBJECT (
  actor_id numeric,
  first_name VARCHAR(45),
  last_name VARCHAR(45),
  last_update DATE
);
/

CREATE TYPE ACTORS_T AS TABLE OF ACTOR_T;
/

CREATE TYPE CATEGORY_T AS OBJECT (
  category_id SMALLINT,
  name VARCHAR(25),
  last_update DATE
);
/

CREATE TYPE CATEGORIES_T AS TABLE OF CATEGORY_T;
/

CREATE TYPE FILM_INFO_T AS OBJECT (
  film FILM_T,
  actors ACTORS_T,
  categories CATEGORIES_T
);
/

You’ll notice immediately, that this is essentially just a 1:1 copy of the schema in this case modelled as Oracle SQL OBJECT and TABLE types, apart from the FILM_INFO_T type, which acts as an aggregate.

Now, our DBA (or our database developer) has implemented the following API for us to access the above information:

CREATE OR REPLACE PACKAGE RENTALS AS
  FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
  FUNCTION GET_ACTORS RETURN ACTORS_T;
  FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
  FUNCTION GET_FILMS RETURN FILMS_T;
  FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
  FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
END RENTALS;
/

This, ladies and gentlemen, is how you can now…

… tediously access the PL/SQL API with JDBC

So, in order to avoid the awkward CallableStatement with its OUT parameter registration and JDBC escape syntax, we’re going to fetch a FILM_INFO_T record via a SQL statement like this:

try (PreparedStatement stmt = conn.prepareStatement(
        "SELECT rentals.get_film_info(1) FROM DUAL");
     ResultSet rs = stmt.executeQuery()) {

    // STRUCT unnesting here...
}

So far so good. Luckily, there is Java 7′s try-with-resources to help us clean up those myriad JDBC objects. Now how to proceed? What will we get back from this ResultSet? A java.sql.Struct:

while (rs.next()) {
    Struct film_info_t = (Struct) rs.getObject(1);

    // And so on...
}

Now, the brave ones among you would continue downcasting the java.sql.Struct to an even more obscure and arcane oracle.sql.STRUCT, which contains almost no Javadoc, but tons of deprecated additional, vendor-specific methods.

For now, let’s stick with the “standard API”, though.

Interlude:

Let’s take a moment to appreciate JDBC in times of Java 8.

When Java 5 was introduced, so were generics. We have rewritten our big code bases to remove all sorts of meaningless boilerplate type casts that are now no longer needed. With the exception of JDBC. When it comes to JDBC, guessing appropriate types is all a matter of luck. We’re accessing complex nested data structures provided by external systems by dereferencing elements by index, and then taking wild guesses at the resulting data types.

Lambdas have just been introduced, yet JDBC still talks to the mainframe.

Rhonda said, she put that STRUCT right between jack 73 and 75 on array F-B4. I wonder if I need my AC/DC converter to plug it

Rhonda said, she put that STRUCT right between jack 73 and 75 on array F-B4. I wonder if I need my AC/DC converter to plug it
Image in public domain

And then…

And here be dragons. And STRUCTS

And here be dragons. And STRUCTS
Original image in public domain

OK, enough of these rants.

Let’s continue navigating our STRUCT

while (rs.next()) {
    Struct film_info_t = (Struct) rs.getObject(1);

    Struct film_t = (Struct) film_info_t.getAttributes()[0];
    String title = (String) film_t.getAttributes()[1];
    Clob description_clob = (Clob) film_t.getAttributes()[2];
    String description = description_clob.getSubString(1, (int) description_clob.length());

    Struct language_t = (Struct) film_t.getAttributes()[4];
    String language = (String) language_t.getAttributes()[1];

    System.out.println("Film       : " + title);
    System.out.println("Description: " + description);
    System.out.println("Language   : " + language);
}

From the initial STRUCT that we received at position 1 from the ResultSet, we can continue dereferencing attributes by index. Unfortunately, we’ll constantly need to look up the SQL type in Oracle (or in some documentation) to remember the order of the attributes:

CREATE TYPE FILM_INFO_T AS OBJECT (
  film FILM_T,
  actors ACTORS_T,
  categories CATEGORIES_T
);
/

And that’s not it! The first attribute of type FILM_T is yet another, nested STRUCT. And then, those horrible CLOBs. The above code is not strictly complete. In some cases that only the maintainers of JDBC can fathom, java.sql.Clob.free() has to be called to be sure that resources are freed in time. Remember that CLOB, depending on your database and driver configuration, may live outside the scope of your transaction.

Unfortunately, the method is called free() instead of AutoCloseable.close(), such that try-with-resources cannot be used. So here we go:

List<Clob> clobs = new ArrayList<>();

while (rs.next()) {
    try {
        Struct film_info_t = (Struct) rs.getObject(1);
        Struct film_t = (Struct) film_info_t.getAttributes()[0];

        String title = (String) film_t.getAttributes()[1];
        Clob description_clob = (Clob) film_t.getAttributes()[2];
        String description = description_clob.getSubString(1, (int) description_clob.length());

        Struct language_t = (Struct) film_t.getAttributes()[4];
        String language = (String) language_t.getAttributes()[1];

        System.out.println("Film       : " + title);
        System.out.println("Description: " + description);
        System.out.println("Language   : " + language);
    }
    finally {
        // And don't think you can call this early, either
        // The internal specifics are mysterious!
        for (Clob clob : clobs)
            clob.free();
    }
}

That’s about it. Now we have found ourselves with some nice little output on the console:

Film       : ACADEMY DINOSAUR
Description: A Epic Drama of a Feminist And a Mad 
             Scientist who must Battle a Teacher in
             The Canadian Rockies
Language   : English             

That’s about it – You may think! But…

The pain has only started

… because we’re not done yet. There are also two nested table types that we need to deserialise from the STRUCT. If you haven’t given up yet (bear with me, good news is nigh), you’ll enjoy reading about how to fetch and unwind a java.sql.Array. Let’s continue right after the printing of the film:

Array actors_t = (Array) film_info_t.getAttributes()[1];
Array categories_t = (Array) film_info_t.getAttributes()[2];

Again, we’re accessing attributes by indexes, which we have to remember, and which can easily break. The ACTORS_T array is nothing but yet another wrapped STRUCT:

System.out.println("Actors     : ");

Object[] actors = (Object[]) actors_t.getArray();
for (Object actor : actors) {
    Struct actor_t = (Struct) actor;

    System.out.println(
        "  " + actor_t.getAttributes()[1]
       + " " + actor_t.getAttributes()[2]);
}

You’ll notice a few things:

  • The Array.getArray() method returns an array. But it declares returning Object. We have to manually cast.
  • We can’t cast to Struct[] even if that would be a sensible type. But the type returned by ojdbc is Object[] (containing Struct elements)
  • The foreach loop also cannot dereference a Struct from the right hand side. There’s no way of coercing the type of actor into what we know it really is
  • We could’ve used Java 8 and Streams and such, but unfortunately, all lambda expressions that can be passed to the Streams API disallow throwing of checked exceptions. And JDBC throws checked exceptions. That’ll be even uglier.

Anyway. Now that we’ve finally achieved this, we can see the print output:

Film       : ACADEMY DINOSAUR
Description: A Epic Drama of a Feminist And a Mad 
             Scientist who must Battle a Teacher in
             The Canadian Rockies
Language   : English             
Actors     : 
  PENELOPE GUINESS
  CHRISTIAN GABLE
  LUCILLE TRACY
  SANDRA PECK
  JOHNNY CAGE
  MENA TEMPLE
  WARREN NOLTE
  OPRAH KILMER
  ROCK DUKAKIS
  MARY KEITEL

When will this madness stop?

It’ll stop right here!

So far, this article read like a tutorial (or rather: medieval torture) of how to deserialise nested user-defined types from Oracle SQL to Java (don’t get me started on serialising them again!)

In the next section, we’ll see how the exact same business logic (listing Film with ID=1 and its actors) can be implemented with no pain at all using jOOQ and its source code generator. Check this out:

// Simply call the packaged stored function from
// Java, and get a deserialised, type safe record
FilmInfoTRecord film_info_t = Rentals.getFilmInfo1(
    configuration, new BigInteger("1"));

// The generated record has getters (and setters)
// for type safe navigation of nested structures
FilmTRecord film_t = film_info_t.getFilm();

// In fact, all these types have generated getters:
System.out.println("Film       : " + film_t.getTitle());
System.out.println("Description: " + film_t.getDescription());
System.out.println("Language   : " + film_t.getLanguage().getName());

// Simply loop nested type safe array structures
System.out.println("Actors     : ");
for (ActorTRecord actor_t : film_info_t.getActors()) {
    System.out.println(
        "  " + actor_t.getFirstName()
       + " " + actor_t.getLastName());
}

System.out.println("Categories     : ");
for (CategoryTRecord category_t : film_info_t.getCategories()) {
    System.out.println(category_t.getName());
}

Is that it?

Yes!

Wow, I mean, this is just as though all those PL/SQL types and procedures / functions were actually part of Java. All the caveats that we’ve seen before are hidden behind those generated types and implemented in jOOQ, so you can concentrate on what you originally wanted to do. Access the data objects and do meaningful work with them. Not serialise / deserialise them!

Let’s take a moment and appreciate this consumer advertising:

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

Not convinced yet?

I told you not to get me started on serialising the types to JDBC. And I won’t, but here’s how to serialise the types to jOOQ, because that’s a piece of cake!

Let’s consider this other aggregate type, that returns a customer’s rental history:

CREATE TYPE CUSTOMER_RENTAL_HISTORY_T AS OBJECT (
  customer CUSTOMER_T,
  films FILMS_T
);
/

And the full PL/SQL package specs:

CREATE OR REPLACE PACKAGE RENTALS AS
  FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
  FUNCTION GET_ACTORS RETURN ACTORS_T;
  FUNCTION GET_CUSTOMER(p_customer_id INT) RETURN CUSTOMER_T;
  FUNCTION GET_CUSTOMERS RETURN CUSTOMERS_T;
  FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
  FUNCTION GET_FILMS RETURN FILMS_T;
  FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer_id INT) RETURN CUSTOMER_RENTAL_HISTORY_T;
  FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer CUSTOMER_T) RETURN CUSTOMER_RENTAL_HISTORY_T;
  FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
  FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
END RENTALS;
/

So, when calling RENTALS.GET_CUSTOMER_RENTAL_HISTORY we can find all the films that a customer has ever rented. Let’s do that for all customers whose FIRST_NAME is “JAMIE”, and this time, we’re using Java 8:

// We call the stored function directly inline in
// a SQL statement
dsl().select(Rentals.getCustomer(
          CUSTOMER.CUSTOMER_ID
      ))
     .from(CUSTOMER)
     .where(CUSTOMER.FIRST_NAME.eq("JAMIE"))

// This returns Result<Record1<CustomerTRecord>>
// We unwrap the CustomerTRecord and consume
// the result with a lambda expression
     .fetch()
     .map(Record1::value1)
     .forEach(customer -> {
         System.out.println("Customer  : ");
         System.out.println("- Name    : " 
           + customer.getFirstName() 
           + " " + customer.getLastName());
         System.out.println("- E-Mail  : " 
           + customer.getEmail());
         System.out.println("- Address : " 
           + customer.getAddress().getAddress());
         System.out.println("            " 
           + customer.getAddress().getPostalCode() 
           + " " + customer.getAddress().getCity().getCity());
         System.out.println("            " 
           + customer.getAddress().getCity().getCountry().getCountry());

// Now, lets send the customer over the wire again to
// call that other stored procedure, fetching his
// rental history:
         CustomerRentalHistoryTRecord history = 
           Rentals.getCustomerRentalHistory2(dsl().configuration(), customer);

         System.out.println("  Customer Rental History : ");
         System.out.println("    Films                 : ");

         history.getFilms().forEach(film -> {
             System.out.println("      Film                : " 
               + film.getTitle());
             System.out.println("        Language          : " 
               + film.getLanguage().getName());
             System.out.println("        Description       : " 
               + film.getDescription());

// And then, let's call again the first procedure
// in order to get a film's actors and categories
             FilmInfoTRecord info = 
               Rentals.getFilmInfo2(dsl().configuration(), film);

             info.getActors().forEach(actor -> {
                 System.out.println("          Actor           : " 
                   + actor.getFirstName() + " " + actor.getLastName());
             });

             info.getCategories().forEach(category -> {
                 System.out.println("          Category        : " 
                   + category.getName());
             });
         });
     });

… and a short extract of the output produced by the above:

Customer  : 
- Name    : JAMIE RICE
- E-Mail  : JAMIE.RICE@sakilacustomer.org
- Address : 879 Newcastle Way
            90732 Sterling Heights
            United States
  Customer Rental History : 
    Films                 : 
      Film                : ALASKA PHANTOM
        Language          : English             
        Description       : A Fanciful Saga of a Hunter
                            And a Pastry Chef who must
                            Vanquish a Boy in Australia
          Actor           : VAL BOLGER
          Actor           : BURT POSEY
          Actor           : SIDNEY CROWE
          Actor           : SYLVESTER DERN
          Actor           : ALBERT JOHANSSON
          Actor           : GENE MCKELLEN
          Actor           : JEFF SILVERSTONE
          Category        : Music
      Film                : ALONE TRIP
        Language          : English             
        Description       : A Fast-Paced Character
                            Study of a Composer And a
                            Dog who must Outgun a Boat
                            in An Abandoned Fun House
          Actor           : ED CHASE
          Actor           : KARL BERRY
          Actor           : UMA WOOD
          Actor           : WOODY JOLIE
          Actor           : SPENCER DEPP
          Actor           : CHRIS DEPP
          Actor           : LAURENCE BULLOCK
          Actor           : RENEE BALL
          Category        : Music

If you’re using Java and PL/SQL…

… then you should click on the below banner and download the free trial right now to experiment with jOOQ and Oracle:

jOOQ generates Java code from your database and lets you build type safe SQL queries through its fluent API.

The Oracle port of the Sakila database is available from this URL for free, under the terms of the BSD license:

https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila/oracle-sakila-db

Finally, it is time to enjoy writing PL/SQL again!

10 Things You Didn’t Know About Java


So, you’ve been working with Java since the very beginning? Remember the days when it was called “Oak”, when OO was still a hot topic, when C++ folks thought that Java had no chance, when Applets were still a thing?

I bet that you didn’t know at least half of the following things. Let’s start this week with some great surprises about the inner workings of Java.

1. There is no such thing as a checked exception

That’s right! The JVM doesn’t know any such thing, only the Java language does.

Today, everyone agrees that checked exceptions were a mistake. As Bruce Eckel said on his closing keynote at GeeCON, Prague, no other language after Java has engaged in using checked exceptions, and even Java 8 does no longer embrace them in the new Streams API (which can actually be a bit of a pain, when your lambdas use IO or JDBC).

Do you want proof that the JVM doesn’t know such a thing? Try the following code:

public class Test {
 
    // No throws clause here
    public static void main(String[] args) {
        doThrow(new SQLException());
    }
 
    static void doThrow(Exception e) {
        Test.<RuntimeException> doThrow0(e);
    }
 
    @SuppressWarnings("unchecked")
    static <E extends Exception> 
    void doThrow0(Exception e) throws E {
        throw (E) e;
    }
}

Not only does this compile, this also actually throws the SQLException, you don’t even need Lombok’s @SneakyThrows for that.

More details about the above can be found in this article here, or here, on Stack Overflow.

2. You can have method overloads differing only in return types

That doesn’t compile, right?

class Test {
    Object x() { return "abc"; }
    String x() { return "123"; }
}

Right. The Java language doesn’t allow for two methods to be “override-equivalent” within the same class, regardless of their potentially differing throws clauses or return types.

But wait a second. Check out the Javadoc of Class.getMethod(String, Class...). It reads:

Note that there may be more than one matching method in a class because while the Java language forbids a class to declare multiple methods with the same signature but different return types, the Java virtual machine does not. This increased flexibility in the virtual machine can be used to implement various language features. For example, covariant returns can be implemented with bridge methods; the bridge method and the method being overridden would have the same signature but different return types.

Wow, yes that makes sense. In fact, that’s pretty much what happens when you write the following:

abstract class Parent<T> {
    abstract T x();
}

class Child extends Parent<String> {
    @Override
    String x() { return "abc"; }
}

Check out the generated byte code in Child:

  // Method descriptor #15 ()Ljava/lang/String;
  // Stack: 1, Locals: 1
  java.lang.String x();
    0  ldc <String "abc"> [16]
    2  areturn
      Line numbers:
        [pc: 0, line: 7]
      Local variable table:
        [pc: 0, pc: 3] local: this index: 0 type: Child
  
  // Method descriptor #18 ()Ljava/lang/Object;
  // Stack: 1, Locals: 1
  bridge synthetic java.lang.Object x();
    0  aload_0 [this]
    1  invokevirtual Child.x() : java.lang.String [19]
    4  areturn
      Line numbers:
        [pc: 0, line: 1]

So, T is really just Object in byte code. That’s well understood.

The synthetic bridge method is actually generated by the compiler because the return type of the Parent.x() signature may be expected to Object at certain call sites. Adding generics without such bridge methods would not have been possible in a binary compatible way. So, changing the JVM to allow for this feature was the lesser pain (which also allows covariant overriding as a side-effect…) Clever, huh?

Are you into language specifics and internals? Then find some more very interesting details here.

3. All of these are two-dimensional arrays!

class Test {
    int[][] a()  { return new int[0][]; }
    int[] b() [] { return new int[0][]; }
    int c() [][] { return new int[0][]; }
}

Yes, it’s true. Even if your mental parser might not immediately understand the return type of the above methods, they are all the same! Similar to the following piece of code:

class Test {
    int[][] a = {{}};
    int[] b[] = {{}};
    int c[][] = {{}};
}

You think that’s crazy? Imagine using JSR-308 / Java 8 type annotations on the above. The number of syntactic possibilities explodes!

@Target(ElementType.TYPE_USE)
@interface Crazy {}

class Test {
    @Crazy int[][]  a1 = {{}};
    int @Crazy [][] a2 = {{}};
    int[] @Crazy [] a3 = {{}};

    @Crazy int[] b1[]  = {{}};
    int @Crazy [] b2[] = {{}};
    int[] b3 @Crazy [] = {{}};

    @Crazy int c1[][]  = {{}};
    int c2 @Crazy [][] = {{}};
    int c3[] @Crazy [] = {{}};
}

Type annotations. A device whose mystery is only exceeded by its power

Or in other words:

When I do that one last commit just before my 4 week vacation

When I do that one last commit just before my 4 week vacation

I let the actual exercise of finding a use-case for any of the above to you.

4. You don’t get the conditional expression

So, you thought you knew it all when it comes to using the conditional expression? Let me tell you, you didn’t. Most of you will think that the below two snippets are equivalent:

Object o1 = true ? new Integer(1) : new Double(2.0);

… the same as this?

Object o2;

if (true)
    o2 = new Integer(1);
else
    o2 = new Double(2.0);

Nope. Let’s run a quick test

System.out.println(o1);
System.out.println(o2);

This programme will print:

1.0
1

Yep! The conditional operator will implement numeric type promotion, if “needed”, with a very very very strong set of quotation marks on that “needed”. Because, would you expect this programme to throw a NullPointerException?

Integer i = new Integer(1);
if (i.equals(1))
    i = null;
Double d = new Double(2.0);
Object o = true ? i : d; // NullPointerException!
System.out.println(o);

More information about the above can be found here.

5. You also don’t get the compound assignment operator

Quirky enough? Let’s consider the following two pieces of code:

i += j;
i = i + j;

Intuitively, they should be equivalent, right? But guess what. They aren’t! The JLS specifies:

A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T)((E1) op (E2)), where T is the type of E1, except that E1 is evaluated only once.

This is so beautiful, I would like to cite Peter Lawrey‘s answer to this Stack Overflow question:

A good example of this casting is using *= or /=

byte b = 10;
b *= 5.7;
System.out.println(b); // prints 57

or

byte b = 100;
b /= 2.5;
System.out.println(b); // prints 40

or

char ch = '0';
ch *= 1.1;
System.out.println(ch); // prints '4'

or

char ch = 'A';
ch *= 1.5;
System.out.println(ch); // prints 'a'

Now, how incredibly useful is that? I’m going to cast/multiply chars right there in my application. Because, you know…

6. Random integers

Now, this is more of a puzzler. Don’t read the solution yet. See if you can find this one out yourself. When I run the following programme:

for (int i = 0; i < 10; i++) {
  System.out.println((Integer) i);
}

… then “sometimes”, I get the following output:

92
221
45
48
236
183
39
193
33
84

How is that even possible??

.

.

.

.

.

. spoiler… solution ahead…

.

.

.

.

.

OK, the solution is here (http://blog.jooq.org/2013/10/17/add-some-entropy-to-your-jvm/) and has to do with overriding the JDK’s Integer cache via reflection, and then using auto-boxing and auto-unboxing. Don’t do this at home! Or in other words, let’s think about it this way, once more

When I do that one last commit just before my 4 week vacation

When I do that one last commit just before my 4 week vacation

7. GOTO

This is one of my favourite. Java has GOTO! Type it…

int goto = 1;

This will result in:

Test.java:44: error: <identifier> expected
    int goto = 1;
       ^

This is because goto is an unused keyword, just in case…

But that’s not the exciting part. The exciting part is that you can actually implement goto with break, continue and labelled blocks:

Jumping forward

label: {
  // do stuff
  if (check) break label;
  // do more stuff
}

In bytecode:

2  iload_1 [check]
3  ifeq 6          // Jumping forward
6  ..

Jumping backward

label: do {
  // do stuff
  if (check) continue label;
  // do more stuff
  break label;
} while(true);

In bytecode:

 2  iload_1 [check]
 3  ifeq 9
 6  goto 2          // Jumping backward
 9  ..

8. Java has type aliases

In other languages (e.g. Ceylon), we can define type aliases very easily:

interface People => Set<Person>;

A People type constructed in such a way can then be used interchangably with Set<Person>:

People?      p1 = null;
Set<Person>? p2 = p1;
People?      p3 = p2;

In Java, we can’t define type aliases at a top level. But we can do so for the scope of a class, or a method. Let’s consider that we’re unhappy with the namings of Integer, Long etc, we want shorter names: I and L. Easy:

class Test<I extends Integer> {
    <L extends Long> void x(I i, L l) {
        System.out.println(
            i.intValue() + ", " + 
            l.longValue()
        );
    }
}

In the above programme, Integer is “aliased” to I for the scope of the Test class, whereas Long is “aliased” to L for the scope of the x() method. We can then call the above method like this:

new Test().x(1, 2L);

This technique is of course not to be taken seriously. In this case, Integer and Long are both final types, which means that the types I and L are effectively aliases (almost. assignment-compatibility only goes one way). If we had used non-final types (e.g. Object), then we’d be really using ordinary generics.

Enough of these silly tricks. Now for something truly remarkable!

9. Some type relationships are undecidable!

OK, this will now get really funky, so take a cup of coffee and concentrate. Consider the following two types:

// A helper type. You could also just use List
interface Type<T> {}

class C implements Type<Type<? super C>> {}
class D<P> implements Type<Type<? super D<D<P>>>> {}

Now, what do the types C and D even mean?

They are somewhat recursive, in a similar (yet subtly different) way that java.lang.Enum is recursive. Consider:

public abstract class Enum<E extends Enum<E>> { ... }

With the above specification, an actual enum implementation is just mere syntactic sugar:

// This
enum MyEnum {}

// Is really just sugar for this
class MyEnum extends Enum<MyEnum> { ... }

With this in mind, let’s get back to our two types. Does the following compile?

class Test {
    Type<? super C> c = new C();
    Type<? super D<Byte>> d = new D<Byte>();
}

Hard question, and Ross Tate has an answer to it. The question is in fact undecidable:

Is C a subtype of Type<? super C>?

Step 0) C <?: Type<? super C>
Step 1) Type<Type<? super C>> <?: Type (inheritance)
Step 2) C  (checking wildcard ? super C)
Step . . . (cycle forever)

And then:

Is D a subtype of Type<? super D<Byte>>?

Step 0) D<Byte> <?: Type<? super C<Byte>>
Step 1) Type<Type<? super D<D<Byte>>>> <?: Type<? super D<Byte>>
Step 2) D<Byte> <?: Type<? super D<D<Byte>>>
Step 3) List<List<? super C<C>>> <?: List<? super C<C>>
Step 4) D<D<Byte>> <?: Type<? super D<D<Byte>>>
Step . . . (expand forever)

Try compiling the above in your Eclipse, it’ll crash! (don’t worry. I’ve filed a bug)

Let this sink in…

Some type relationships in Java are undecidable!

If you’re interested in more details about this peculiar Java quirk, read Ross Tate’s paper “Taming Wildcards in Java’s Type System” (co-authored with Alan Leung and Sorin Lerner), or also our own musings on correlating subtype polymorphism with generic polymorphism

10. Type intersections

Java has a very peculiar feature called type intersections. You can declare a (generic) type that is in fact the intersection of two types. For instance:

class Test<T extends Serializable & Cloneable> {
}

The generic type parameter T that you’re binding to instances of the class Test must implement both Serializable and Cloneable. For instance, String is not a possible bound, but Date is:

// Doesn't compile
Test<String> s = null;

// Compiles
Test<Date> d = null;

This feature has seen reuse in Java 8, where you can now cast types to ad-hoc type intersections. How is this useful? Almost not at all, but if you want to coerce a lambda expression into such a type, there’s no other way. Let’s assume you have this crazy type constraint on your method:

<T extends Runnable & Serializable> void execute(T t) {}

You want a Runnable that is also Serializable just in case you’d like to execute it somewhere else and send it over the wire. Lambdas and serialisation are a bit of a quirk.

Lambdas can be serialised:

You can serialize a lambda expression if its target type and its captured arguments are serializable

But even if that’s true, they do not automatically implement the Serializable marker interface. To coerce them to that type, you must cast. But when you cast only to Serializable

execute((Serializable) (() -> {}));

… then the lambda will no longer be Runnable.

Egh…

So…

Cast it to both types:

execute((Runnable & Serializable) (() -> {}));

Conclusion

I usually say this only about SQL, but it’s about time to conclude an article with the following:

Java is a device whose mystery is only exceeded by its power

Found this article interesting?

How about this one: 10 Subtle Best Practices when Coding Java

A RESTful JDBC HTTP Server built on top of jOOQ


The jOOQ ecosystem and community is continually growing. We’re personally always thrilled to see other Open Source projects built on top of jOOQ. Today, we’re very happy to introduce you to a very interesting approach at combining REST and RDBMS by Björn Harrtell.

bjorn-harrtellBjörn Harrtell is a swedish programmer since childhood. He is usually busy writing GIS systems and integrations at Sweco Position AB but sometimes he spends time getting involved in Open Source projects and contributing to a few pieces of work related to Open Source projects like GeoTools and OpenLayers. Björn has also initiated a few minor Open Source projects himself and one of the latest projects he’s been working on is jdbc-http-server.

We’re excited to publish Björn’s guest post introducing his interesting work:

JDBC HTTP Server

Ever found yourself writing a lot of REST resources that do simple CRUD against a relational database and felt the code was repeating itself? In that case, jdbc-http-server might be a project worth checking out.

jdbc-http-server exposes a relational database instance as a discoverable REST API making it possible to perform simple CRUD from a browser application without requiring any backend code to be written.

A discoverable REST API means you can access the root resource at / and follow links to subresources from there. For example, let’s say you have a database named testdb with a table named testtable in the public schema you can then do the following operations:

Retrieve (GET), update (PUT) or delete (DELETE) a single row at:

/db/testdb/schemas/public/tables/testtable/rows/1

Retrieve (GET), update (PUT) rows or create a new row (POST) at:

/db/testdb/schemas/public/tables/testtable/rows

The above resources accepts parameters select, where, limit, offset
and orderby where applicable. Examples:

GET a maximum of 10 rows where cost>100 at:

/db/testdb/schemas/public/tables/testtable/rows?where=cost>100&limit=10

jdbc-http-server is database engine agnostic since it utilizes jOOQ to generate SQL in a dialect suited to the target database engine. At the moment H2, PostgreSQL and HSQLDB are covered by automated tests. Currently the only available representation data format is JSON but adding more is an interesting possibility.

Feedback and, of course, contributions are welcome :)

Let’s Stream a Map in Java 8 with jOOλ


I wanted to find an easy way to stream a Map in Java 8. Guess what? There isn’t!

What I would’ve expected for convenience is the following method:

public interface Map<K, V> {

    default Stream<Entry<K, V>> stream() {
        return entrySet().stream();
    }    
}

But there’s no such method. There are probably a variety of reasons why such a method shouldn’t exist, e.g.:

  • There’s no “clear” preference for entrySet() being chosen over keySet() or values(), as a stream source
  • Map isn’t really a collection. It’s not even an Iterable
  • That wasn’t the design goal
  • The EG didn’t have enough time

Well, there is a very compelling reason for Map to have been retrofitted to provide both an entrySet().stream() and to finally implement Iterable<Entry<K, V>>. And that reason is the fact that we now have Map.forEach():

default void forEach(
        BiConsumer<? super K, ? super V> action) {
    Objects.requireNonNull(action);
    for (Map.Entry<K, V> entry : entrySet()) {
        K k;
        V v;
        try {
            k = entry.getKey();
            v = entry.getValue();
        } catch(IllegalStateException ise) {
            // this usually means the entry is no longer in the map.
            throw new ConcurrentModificationException(ise);
        }
        action.accept(k, v);
    }
}

forEach() in this case accepts a BiConsumer that really consumes entries in the map. If you search through JDK source code, there are really very few references to the BiConsumer type outside of Map.forEach() and perhaps a couple of CompletableFuture methods and a couple of streams collection methods.

So, one could almost assume that BiConsumer was strongly driven by the needs of this forEach() method, which would be a strong case for making Map.Entry a more important type throughout the collections API (we would have preferred the type Tuple2, of course).

Let’s continue this line of thought. There is also Iterable.forEach():

public interface Iterable<T> {
    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }
}

Both Map.forEach() and Iterable.forEach() intuitively iterate the “entries” of their respective collection model, although there is a subtle difference:

  • Iterable.forEach() expects a Consumer taking a single value
  • Map.forEach() expects a BiConsumer taking two values: the key and the value (NOT a Map.Entry!)

Think about it this way:

This makes the two methods incompatible in a “duck typing sense”, which makes the two types even more different

Bummer!

Improving Map with jOOλ

We find that quirky and counter-intuitive. forEach() is really not the only use-case of Map traversal and transformation. We’d love to have a Stream<Entry<K, V>>, or even better, a Stream<Tuple2<T1, T2>>. So we implemented that in jOOλ, a library which we’ve developed for our integration tests at jOOQ. With jOOλ, you can now wrap a Map in a Seq type (“Seq” for sequential stream, a stream with many more functional features):

Map<Integer, String> map = new LinkedHashMap<>();
map.put(1, "a");
map.put(2, "b");
map.put(3, "c");

assertEquals(
  Arrays.asList(
    tuple(1, "a"), 
    tuple(2, "b"), 
    tuple(3, "c")
  ),

  Seq.seq(map).toList()
);

What you can do with it? How about creating a new Map, swapping keys and values in one go:

System.out.println(
  Seq.seq(map)
     .map(Tuple2::swap)
     .toMap(Tuple2::v1, Tuple2::v2)
);

System.out.println(
  Seq.seq(map)
     .toMap(Tuple2::v2, Tuple2::v1)
);

Both of the above will yield:

{a=1, b=2, c=3}

Just for the record, here’s how to swap keys and values with standard JDK API:

System.out.println(
  map.entrySet()
     .stream()
     .collect(Collectors.toMap(
         Map.Entry::getValue, 
         Map.Entry::getKey
     ))
);

It can be done, but the every day verbosity of standard Java API makes things a bit hard to read / write

Stop Claiming that you’re Using a Schemaless Database


One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
AS
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does

- Murphy

We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

jOOQ: The best way to write SQL in Java

Follow

Get every new post delivered to your Inbox.

Join 2,082 other followers

%d bloggers like this: