Java 8 Friday Goodies: java.io finally rocks!

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. We have blogged a couple of times about some nice Java 8 goodies, and now we feel it’s time to start a new blog series, the…

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub. tweet this

Interacting with the file system has been a bit of a pain in Java. CTMMC shows us an example of how to copy a file with Java. While some issues still remain, at least, we can now use lambdas and the new Streams API to traverse the file system and list files! Here is the FileFilterGoodies example that we’ve pushed to our GitHub repository:

public class FileFilterGoodies {

    public static void main(String args[]) {
        listRecursive(new File("."));
    }

    /**
     * This method recursively lists all
     * .txt and .java files in a directory
     */
    private static void listRecursive(File dir) {
        Arrays.stream(dir.listFiles((f, n) ->
                     !n.startsWith(".")
                  &&
                     (new File(f, n).isDirectory()
                  ||  n.endsWith(".txt")
                  ||  n.endsWith(".java"))
              ))
              .forEach(unchecked((file) -> {
                  System.out.println(
                      file.getCanonicalPath()
                          .substring(new File(".")
                              .getCanonicalPath()
                              .length()));

                  if (file.isDirectory()) {
                      listRecursive(file);
                  }
              }));
    }

    /**
     * This utility simply wraps a functional
     * interface that throws a checked exception
     * into a Java 8 Consumer
     */
    private static <T> Consumer<T>
    unchecked(CheckedConsumer<T> consumer) {
        return t -> {
            try {
                consumer.accept(t);
            }
            catch (Exception e) {
                throw new RuntimeException(e);
            }
        };
    }

    @FunctionalInterface
    private interface CheckedConsumer<T> {
        void accept(T t) throws Exception;
    }
}

The output of the above programme is:

\LICENSE.txt
\out
\out\production
\out\production\jOOQ's Java 8 Goodies
\out\production\jOOQ's Java 8 Goodies\org
\out\production\jOOQ's Java 8 Goodies\org\jooq
\out\production\jOOQ's Java 8 Goodies\org\jooq\java8
\out\production\jOOQ's Java 8 Goodies\org\jooq\java8\goodies
\out\production\jOOQ's Java 8 Goodies\org\jooq\java8\goodies\io
\README.txt
\src
\src\org
\src\org\jooq
\src\org\jooq\java8
\src\org\jooq\java8\goodies
\src\org\jooq\java8\goodies\io
\src\org\jooq\java8\goodies\io\FileFilterGoodies.java

Now, that’s really awesome, isn’t it? Let’s decompose the above listRecursive() method:

// With this method, we wrap the File[] array
// into a new Java 8 Stream, which has awesome
// new methods.
Arrays.stream(

// The Java 1.2 File.listFiles() method luckily
// accepts a @FunctionalInterface, which can be
// instantiated using a lambda expression
// ...
// In this example, we'll just ignore the fact
// that listFiles can return null
              dir.listFiles((f, n) ->
             !n.startsWith(".")
          &&
             (new File(f, n).isDirectory()
          ||  n.endsWith(".txt")
          ||  n.endsWith(".java"))
      ))

// Each Stream (and also java.util.List) has this
// awesome forEach method, that accepts a Consumer
      .forEach(

// Unfortunately, Java 8 Consumers don't allow
// throwing checked exceptions. So let's quickly
// wrap it (see details below) ...
               unchecked(

// ... and pass another lambda expression to it,
// which prints the local path and recurses
                         (file) -> {
          System.out.println(
              file.getCanonicalPath()
                  .substring(new File(".")
                      .getCanonicalPath()
                      .length()));

          if (file.isDirectory()) {
              listRecursive(file);
          }
      }));

More goodies next week

Stay tuned for next week, when we show you how Java 8 improves using XML with jOOX

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page

Advanced Java Trickery for Typesafe Query DSLs

When browsing Hacker News, I recently stumbled upon Benji Weber’s most interesting attempt at creating typesafe database interaction with Java 8. Benji created a typesafe query DSL somewhat similar to jOOQ with the important difference that it uses Java 8 method references to introspect POJOs and deduce query elements from it. This is best explained by example:

Optional<Person> person = 
    from(Person.class)
        .where(Person::getFirstName)
        .like("%ji")
        .and(Person::getLastName)
        .equalTo("weber")
        .select(
            personMapper, 
            connectionFactory::openConnection);

The above query can then be transformed into the following SQL statement:

SELECT * FROM person 
WHERE first_name LIKE ? AND last_name = ?

This is indeed a very interesting approach, and we’ve seen similar ideas around, before. Most prominently, such ideas were implemented in:

  • JaQu, another very interesting competitor product of jOOQ, created by Thomas Müller, the maintainer of the popular H2 database
  • LambdaJ, an attempt to bring lambda expressions to Java long before Java 8
  • OhmDB, a new NoSQL data store with a fluent query DSL

What’s new in Benji’s approach is really the fact that Java 8 method references can be used instead of resorting to CGLIB and other sorts of bytecode trickery through instrumentation. An example of such trickery is JaQu’s experimental bytecode introspection to transform complex Java boolean expressions into SQL – called “natural syntax”:

Timestamp ts = 
  Timestamp.valueOf("2005-05-05 05:05:05");
Time t = Time.valueOf("23:23:23");

long count = db.from(co).
    where(new Filter() { public boolean where() {
        return co.id == x
            && co.name.equals(name)
            && co.value == new BigDecimal("1")
            && co.amount == 1L
            && co.birthday.before(new Date())
            && co.created.before(ts)
            && co.time.before(t);
        } }).selectCount();

While these ideas are certainly very interesting to play around with, we doubt such language and bytecode transformations will lead to robust results. People have criticised Hibernate’s use of proxying in various blog posts.

We prefer a WYSIWYG approach where API consumers remain in full control of what is going on. What are your thoughts about such clever ideas?

Lesser-Known SQL Features: DEFAULT VALUES

A lesser-known SQL feature is the DEFAULT keyword, which can be used in INSERT and UPDATE statements. Consider the following table, created using standard SQL syntax:

CREATE TABLE timestamps (
  id INTEGER   GENERATED BY DEFAULT 
               AS IDENTITY(START WITH 1),
  t  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  CONSTRAINT pk_values PRIMARY KEY (id)
)

Now, in order to generate a new record in this table, you could either explicitly set a timestamp as such:

INSERT INTO timestamps (t) 
  VALUES (CURRENT_TIMESTAMP);

Or, you just use the handy DEFAULT VALUES syntax:

-- Short syntax
INSERT INTO timestamps DEFAULT VALUES;

-- Explicit syntax
INSERT INTO timestamps (t) 
  VALUES (DEFAULT);
INSERT INTO timestamps (id, t) 
  VALUES (DEFAULT, DEFAULT);

The same can be done in an UPDATE statement:

-- Set all timestamps to CURRENT_TIMESTAMP
UPDATE timestamps SET t = DEFAULT;

SQL Compatibility for DEFAULT VALUES

As always with SQL, things aren’t as bright as the SQL-92 standard specifies. According to the standard, all of the above must be supported. In reality, this can be said:

Standards-compliant databases

These databases support the standard, fully

Almost compliant databases

These databases support the DEFAULT keyword, but not the DEFAULT VALUES clause for insert statements:

  • Access
  • DB2
  • Derby (we have created DERBY-6444 for this)
  • MariaDB
  • MySQL
  • Oracle

Support for DEFAULT VALUES in jOOQ 3.3

jOOQ 3.3 will support the DEFAULT VALUES syntax and also a very useful variant of it, when combining inserting DEFAULT values with returning them after the insert:

DSL.using(configuration)
   .insertInto(TIMESTAMPS)
   .defaultValues()
   .returning(TIMESTAMPS.ID, TIMESTAMPS.T)
   .fetch();

The above query will not only insert a new record with default values, but also return those values to your Java program for further processing. As with most jOOQ API elements, the above statement will transparently work with all databases, either through native syntax:

  • DB2: SELECT .. FROM FINAL TABLE (INSERT ..)
  • PostgreSQL: INSERT .. RETURNING

… or through JDBC’s Statement.getGeneratedKeys().

The DBMS of the Year 2013

We have recently blogged about the DB-Engines Ranking and how MongoDB was the only NoSQL store to make it into that ranking’s top 10. Today, this marketing platform offered by solid IT has announced MongoDB to be the DBMS of the year 2013, with PostgreSQL being a close runner-up, followed by Cassandra.

solid IT as a company is slightly biased towards NoSQL, so it’s not surprising that two NoSQL databases are in their top ranking, and the only successful ORDBMS in the market is number two. As we ourselves are “slightly” biased towards SQL, we would like to announce our own DBMS of the year 2013:

SQL Server is the DBMS of the year 2013

… because its SQL dialect Transact-SQL (which Microsoft “shares” with Sybase), is the first SQL-based programming language to make it into TIOBE’s top 10 programming languages.

Congratulations to SQL Server from the jOOQ team!

The SQL Language’s Most Missing Feature

SQL is also awesome in many ways. We can write out the most complex truths and facts and have the database tell us the answer in no time.

But the SQL language is arguably the most beautiful programming language out there. It has so many caveats that people like me get ridiculously rich selling consulting services just to merely explain its semantics. One of the best examples of how twisted the SQL language is, is Alex Bolenok’s article about SQL NULL on Tech.Pro.

Now, one of the biggest criticisms of SQL is its verbosity. Most SQL dialects have virtually no constructs to avoid repetition. It is a common use case in SQL to filter, group, select, and order by the same expression. An example:

SELECT   first_name || ' ' || last_name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY first_name || ' ' || last_name
ORDER BY first_name || ' ' || last_name

You might write a different query for the same, sure. But even if you didn’t doesn’t it bother you that there is hardly any way to re-use the first_name || ' ' || last_name concatenation expression?

Well, you can at least reuse it in the ORDER BY clause:

SELECT   first_name || ' ' || last_name AS name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY first_name || ' ' || last_name
ORDER BY name

And in MySQL, you can also reuse it in the GROUP BY clause, although we think that this is a bad idea:

SELECT   first_name || ' ' || last_name AS name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY name
ORDER BY name

Standard SQL solutions for column reuse

Sure, you could create a derived table:

SELECT   name
FROM (
  SELECT first_name || ' ' || last_name
  FROM   customers
) c(name)
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

… or a common table expression, which is basically a reusable derived table, or a local view:

WITH c(name) AS (
  SELECT first_name || ' ' || last_name
  FROM   customers
)
SELECT   name
FROM     c
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

But why do I even have to create a table to reuse / rename a simple column expression? Why can’t I just simply use something like a common column expression? Like this?

-- Common column expression that can be appended
-- to any table expression.
SELECT   name
FROM     customers
WITH     name AS first_name || ' ' || last_name
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

Note that the common column expression would be scoped to the table source. In other words, it only makes sense in the context of a FROM clause and the tables specified in the FROM clause. In a way, the following two expressions would be exactly equivalent:

-- Proposed syntax
FROM     customers
WITH     name AS first_name || ' ' || last_name

-- Existing syntax
FROM (
  SELECT customers.*,
         first_name || ' ' || last_name AS name
  FROM   customers
) AS customers

The proposed syntax could also be applied to joined tables:

-- Proposed syntax
FROM customers
  AS c
WITH name AS c.first_name || ' ' || c.last_name
JOIN addresses
  AS a
WITH address AS a.street || '\n' || a.city
  ON c.address_id = a.address_id
WITH full_address AS c.name || '\n' || a.address

-- Or alternatively, if you don't need to tightly 
-- scope these common column expressions to their
-- corresponding tables:
FROM customers AS c
JOIN addresses AS a
  ON c.address_id = a.address_id
WITH name AS c.first_name || ' ' || c.last_name,
     address AS a.street || '\n' || a.city,
     full_address AS name || '\n' || address

So in plain English, the new WITH clause could be appended to any type of table expression. With parentheses and comments, the first among the above examples would read:

FROM (
  customers AS c
  -- The "name" column is appended to "c"
  WITH name AS c.first_name || ' ' || c.last_name
)
JOIN (
  addresses AS a
  -- The "address" column is appended to "a"
  WITH address AS a.street || '\n' || a.city
) ON c.address_id = a.address_id
-- The "full_address" column is appended to the
-- above joined table expression
WITH full_address AS c.name || '\n' || a.address

The above syntax would then again be exactly equivalent to this:

FROM (
  SELECT 
    c.*, a.*, 
    c.name || '\n' || a.address AS full_address
  FROM (
    SELECT c.*, 
           c.first_name || ' ' || c.last_name
    FROM customers AS c
  ) c
  JOIN (
    SELECT a.*,
           a.street || '\n' || a.city
    FROM addresses AS a
  ) a
  ON c.address_id = a.address_id
)

SQL’s most missing language feature

Since SQL:1999, we luckily have common table expressions – the WITH clause that can be prepended to any SELECT statement (and to other DML statements in some dialects). Using common table expressions, we can avoid repeating commonly used derived tables.

But such a thing is not possible for columns. The one language feature SQL is in most dire need of are thus common column expressions.

Why Did SQLJ Die?

Every now and then, SQLJ pops up somewhere, mostly in a very dusty/enterprisey or in an academic context.

If you give SQLJ some thought, though, it isn’t such a bad idea. It is:

  • An ANSI and ISO standard
  • Part of the SQL standard
  • Quite easy to understand
  • Quite a powerful extension to JDBC

So why did it die (or rather, why did it never really take off)? This question was asked on Stack Overflow, and we gave an answer.

Let’s assume that you have already decided to embed your SQL (as opposed to externalising it through a templating mechanism, hiding it with an ORM, or with stored procedures). Here are a couple of reasons why SQLJ is not an optimal solution for embedding SQL:

IDE support

While Pro*C worked well for C and C++ in the 90s, Java really took off in the early 2000’s. With Java, there were also an increasing number of powerful IDEs such as Eclipse, NetBeans, JBuilder, and others. Java preprocessors and IDEs have never become friends, though, as parsing one language is hard enough. Parsing (and providing tooling) for two languages is much harder.

In fact, SQLJ made the surrounding Java code type-unsafe as IDEs and compilers couldn’t process .sqlj files before they had been pre-processed.

SQL popularity

There was a time when people started thinking that SQL itself was going to be dead. First, they did so with the advent of ORMs, then they did so with the advent of NoSQL. People thought that the DBA is dead. Again.

Well, this has been proven to be wrong a couple of times, but certainly not because of SQLJ.

Typesafety

In the late 2000’s, there had now been typesafe alternatives to SQLJ, such as jOOQ in Java, or LINQ-to.SQL in .NET, which leverage IDE features such as syntax autocompletion. By being internal domain-specific languages / query DSLs, these APIs not only bring typesafety to embedded SQL, but they also allow for dynamic SQL building, which SQLJ doesn’t support.

Predictions

While embedding SQL into other languages is a useful thing, SQLJ never solved this problem adequately. Hence, R.I.P., SQLJ

Why Your Boring Data Will Outlast Your Sexy New Technology

So you’re playing around with all those sexy new technologies, enjoying yourself, getting inspiration from state-of-the-art closure / lambda / monads and other concepts-du-jour

Now that I have your attention provoking a little anger / smirk / indifference, let’s think about the following. I’ve recently revisited a great article by Ken Downs written in 2010. There’s an excellent observation in the middle.

[…] in fact there are two things that are truly real for a developer:

  • The users, who create the paycheck, and
  • The data, which those users seemed to think was supposed to be correct 100% of the time.

He says this in the context of a sophisticated rant against ORMs. It can also be seen as a rant against any recent abstraction or database technology with a mixture of nostalgia and cynicism. But the essence of his article lies in another observation that I’ve made with so many software systems:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Having said so, we can quickly understand why Java has become the new COBOL to some, who are bored with its non-sexiness (specifically in the JEE corner). Yet, according to TIOBE, Java and C are still the most widely used programming languages in the market and many systems are thus implemented in Java.

On the other hand, many inspiring new technologies have emerged in the last couple of years, also in the JVM field. A nice overview is given in ZeroTurnaround’s The Adventurous Developer’s Guide to JVM Languages. And all of that is fine to a certain extent. As Bruno Borges from Oracle recently and adequately put it:

anything not mainstream has more odds to be “sexy” [than JSF]

Now, let’s map this observation back to a subsequent section of Ken’s article:

[…] the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user […] and takes instructions back from the user and puts them in the database […]. No matter how beautiful the code was, the user would only ever see the screen […] and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.

Think about an E-Banking system. None of your users really cares about the code you wrote to get it running. What they care about is their data (i.e. their money) and the fact that it is correct. This is true for many many systems where the business logic and UI layers can be easily replaced with fancy new technology, whereas the data stays around.

In other words, you are free to choose whatever sexy new UI technology you like as long as it helps your users get access to their data.

So what about sexy new database technology?

Now, that’s an entirely different story, compared to sexy new UI technology.

You might be considering some NoSQL-solution-du-jour to “persist” your data, because it’s so easy and because it costs so much less. Granted, the cost factor may seem very tempting at first. But have you considered the fact that:

  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Once your data goes into that NoSQL store, it may stay there much longer than you had wanted it to. Your developers and architects (who originally chose this particular NoSQL solution) may have left long ago. Parts of your system may have been replaced, too, because now you’re doing everything in HTML5. JavaScript is the new UI technology.

And all this time, you have “persisted” UI / user / domain model data in your database, from various systems, from various developers, through various programming paradigms. And then, you realise:

We’re not saying that there aren’t some use-cases where NoSQL databases really provide you with a better solution than the “legacy” alternatives. Most specifically, graph databases solve a problem that no RDBMS has really solved well, yet.

But consider one thing. You will have to migrate your data. Time and again. You will have to archive it. And maybe, migrate the archive. And maybe provide reports of the archive. And provide correct reports (which means: ACID!) And be transactional (which means: ACID!) And all the other things that people do with data.

In fact, your system will grow like any other system ever did before and it will have high demands from your database. While some NoSQL databases have started to get some decent traction, in a way that it is safe to say their vendors might still be around in 5-10 years, when the systems will have been replaced by developers who have replaced other developers.

In other words, there is one more bullet to this list:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it
  • Data might tend to outlast the vendors providing tools for operating the data

Beware of the fact that data will probably outlast your sexy new technology. So, choose wisely when thinking about sexy new technology to operate your data.