Java 8 Friday: Java 8 Will Revolutionize Database Access

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem. For our Java 8 series, we’re honoured to host a very relevant guest post by Dr. Ming-Yee Iu.

Dr. Ming-Yee Iu completed a PhD on Database Queries in Java at EPFL. He has created the open source project Jinq to demonstrate some new techniques for supporting database queries in Java.

Our editorial note:

Ever since Erik Meijer has introduced LINQ to the .NET ecosystem, us Java folks have been wondering whether we could have the same. We’ve blogged about this topic before, a couple of times:

While most LINQesque APIs in the Java ecosystem operate as internal domain-specific languages like jOOQ, some try to tackle the integration on a bytecode level, like JaQu.

JINQ formalises runtime bytecode transformations through what Dr. Ming-Yee Iu calls symbolic execution. We find this very interesting to a point that we wonder if we should start building a JINQ-to-jOOQ JINQ provider, where the expressive power of the Java 8 Streams API could be combined with our great SQL standardisation and transformation features…?

Convince yourselves:

Java 8 Goodie: Java 8 Will Revolutionize Database Access

Java 8 is finally here! After years of waiting, Java programmers will finally get support for functional programming in Java. Functional programming support helps streamline existing code while providing powerful new capabilities to the Java language. One area that will be disrupted by these new features is how programmers work with databases in Java. Functional programming support opens up exciting new possibilities for simpler yet more powerful database APIs. Java 8 will enable new ways to access databases that are competitive with those of other programming languages such as C#’s LINQ.

The Functional Way of Working With Data

Java 8 not only adds functional-support to the Java language, but it extends the Java collection classes with new functional ways of working with data. Traditionally, working with large amounts of data in Java requires a lot of loops and iterators.

For example, suppose you have a collection of Customer objects:

Collection<Customer> customers;

If you were only interested in the customers from Belgium, you would have to iterate over all the customers and save the ones you wanted.

Collection<Customer> belgians = new ArrayList<>();
for (Customer c : customers) {
    if (c.getCountry().equals("Belgium"))
        belgians.add(c);
}

This takes five lines of code. It is also poorly abstracted. What happens if you have 10 million customers, and you want to speed up the code by filtering it in parallel using two threads? You would have to rewrite everything to use futures and a lot of hairy multi-threaded code.

With Java 8, you can write the same code in one line. With its support for functional programming, Java 8 lets you write a function saying which customers you are interested in (those from Belgium) and then to filter collections using that function. Java 8 has a new Streams API that lets you do this.

customers.stream().filter(
    c -> c.getCountry().equals("Belgium")
);

Not only is the Java 8 version of the code shorter, but the code is easier to understand as well. There is almost no boilerplate. The code calls the method filter(), so it’s clear that this code is used for filtering customers. You don’t have to spend your time trying to decipher the code in a loop to understand what it is doing with its data.

And what happens if you want to run the code in parallel? You just have to use a different type of stream.

customers.parallelStream().filter(
    c -> c.getCountry().equals("Belgium")
);

What’s even more exciting is that this functional-style of code works with databases as well!

The Functional Way of Working with Databases

Traditionally, programmers have needed to use special database query languages to access the data in databases. For example, below is some JDBC code for finding all the customers from Belgium:

PreparedStatement s = con.prepareStatement(
      "SELECT * "
    + "FROM Customer C "
    + "WHERE C.Country = ? ");
s.setString(1, "Belgium");
ResultSet rs = s.executeQuery();

Much of the code is in the form a string, which the compiler can’t check for errors and which can lead to security problems due to sloppy coding. There is also a lot of boilerplate code that makes writing database access code quite tedious. Tools such as jOOQ solve the problem of error-checking and security by providing a database query language that can be written using special Java libraries. Or you can use tools such as object-relational mappers to hide a lot of boring database code for common access patterns, but if you need to write non-trivial database queries, you will still need to use a special database query language again.

With Java 8, it’s possible to write database queries using the same functional-style used when working with the Streams API. For example, Jinq is an open source project that explores how future database APIs can make use of functional programming. Here is a database query written using Jinq:

customers.where(
    c -> c.getCountry().equals("Belgium")
);

This code is almost identical to the code using the Streams API. In fact, future versions of Jinq will let you write queries directly using the Streams API. When the code is run, Jinq will automatically translate the code into a database query like the JDBC query shown before.

So without having to learn a new database query language, you can write efficient database queries. You can use the same style of code you would use for Java collections. You also don’t need a special Java compiler or virtual machine. All of this code compiles and runs using the normal Java 8 JDK. If there are errors in your code, the compiler will find them and report them to you, just like normal Java code.

Jinq supports queries that can be as complicated as SQL92. Selection, projection, joins, and subqueries are all supported. The algorithm for translating Java code into database queries is also very flexible in what code it will accept and translate. For example, Jinq has no problem translating the code below into a database query, despite its complexity.

customers
    .where( c -> c.getCountry().equals("Belgium") )
    .where( c -> {
        if (c.getSalary() < 100000)
            return c.getSalary() < c.getDebt();
        else
            return c.getSalary() < 2 * c.getDebt();
        } );

As you can see, the functional programming support in Java 8 is well-suited for writing database queries. The queries are compact, and complex queries are supported.

Inner Workings

But how does this all work? How can a normal Java compiler translate Java code into database queries? Is there something special about Java 8 that makes this possible?

The key to supporting these new functional-style database APIs is a type of bytecode analysis called symbolic execution. Although your code is compiled by a normal Java compiler and run in a normal Java virtual machine, Jinq is able to analyze your compiled Java code when it is run and construct database queries from them. Symbolic execution works best when analyzing small functions, which are common when using the Java 8 Streams API.

The easiest way to understand how this symbolic execution works is with an example. Let’s examine how the following query is converted by Jinq into the SQL query language:

customers
    .where( c -> c.getCountry().equals("Belgium") )

Initially, the customers variable is a collection that represents this database query

SELECT *
  FROM Customers C

Then, the where() method is called, and a function is passed to it. In this where() method, Jinq opens the .class file of the function and gets the compiled bytecode for the function to analyze. In this example, instead of using real bytecode, let’s just use some simple instructions to represent the bytecode of the function:

  1. d = c.getCountry()
  2. e = “Belgium”
  3. e = d.equals(e)
  4. return e

Here, we pretend that the function has been compiled by the Java compiler into four instructions. This is what Jinq sees when the where() method is called. How can Jinq make sense of this code?

Jinq analyzes the code by executing it. Jinq doesn’t run the code directly though. It runs the code ‘abstractly’. Instead of using real variables and real values, Jinq uses symbols to represent all values when executing the code. This is why the analysis is called symbolic execution.

Jinq executes each instruction and keeps track of all the side-effects or all the things that the code changes in the state of the program. Below is a diagram showing all the side-effects that Jinq finds when it executes the four lines of code using symbolic execution.

Symbolic execution example

Symbolic execution example

In the diagram, you can see how after the first instruction runs, Jinq finds two side-effects: the variable d has changed and the method Customer.getCountry() has been called. With symbolic execution, the variable d is not given a real value like “USA” or “Denmark”. It is assigned the symbolic value of c.getCountry().

After all the instructions have been executed symbolically, Jinq prunes the side-effects. Since the variables d and e are local variables, any changes to them are discarded after the function exits, so those side-effects can be ignored. Jinq also knows that the methods Customer.getCountry() and String.equals() do not modify any variables or show any output, so those method calls can also be ignored. From this, Jinq can conclude that executing the function produces only one effect: it returns c.getCountry().equals("Belgium").

Once Jinq has understood what the function passed to it in the where() method does, it can then merge this knowledge with the database query underlying the customers collection to create a new database query.

Generating a database query

Generating a database query

And that’s how Jinq generates database queries from your code. The use of symbolic execution means that this approach is quite robust to the different code patterns outputted by different Java compilers. If Jinq ever encounters code with side-effects that can’t be emulated using a database query, Jinq will leave your code untouched. Since everything is written using normal Java code, Jinq can just run that code directly instead, and your code will produce the expected results.

This simple translation example should have given you an idea of how the query translation works. You should feel confident that these algorithms can correctly generate database queries from your code.

An Exciting Future

I hope I have given you a taste for how Java 8 enables new ways of working with databases in Java. The functional programming support in Java 8 allows you write database code in a similar way to writing code for working with Java collections. Hopefully, existing database APIs will soon be extended to support these styles of queries.

To play with a prototype for these new types of queries, you can visit http://www.jinq.org

jOOQ’s Reason for Being

The below paragraphs were taken from the jOOQ preface from the manual. It is worth thinking about why you should (or should not) use jOOQ in a given project. Specifically, you might be choosing between jOOQ and JPA, jOOQ and Hibernate, or jOOQ and SLICK (in a Scala context). here’s some guidance (slightly biased towards jOOQ, of course…):

jOOQ’s reason for being – compared to JPA

Java and SQL have come a long way. SQL is an “ancient”, yet established and well-understood technology. Java is a legacy too, although its platform JVM allows for many new and contemporary languages built on top of it. Yet, after all these years, libraries dealing with the interface between SQL and Java have come and gone, leaving JPA to be a standard that is accepted only with doubts, short of any surviving options.

So far, there had been only few database abstraction frameworks or libraries, that truly respected SQL as a first class citizen among languages. Most frameworks, including the industry standards JPA, EJB, Hibernate, JDO, Criteria Query, and many others try to hide SQL itself, minimising its scope to things called JPQL, HQL, JDOQL and various other inferior query languages

jOOQ has come to fill this gap.

jOOQ’s reason for being – compared to LINQ

Other platforms incorporate ideas such as LINQ (with LINQ-to-SQL), or Scala’s SLICK to better integrate querying as a concept into their respective language. By querying, they understand querying of arbitrary targets, such as SQL, XML, Collections and other heterogeneous data stores. jOOQ claims that this is going the wrong way too.

In more advanced querying use-cases (more than simple CRUD and the occasional JOIN), people will want to profit from the expressivity of SQL. Due to the relational nature of SQL, this is quite different from what object-oriented and partially functional languages such as C#, Scala, or Java can offer.

It is very hard to formally express and validate joins and the ad-hoc table expression types they create. It gets even harder when you want support for more advanced table expressions, such as pivot tables, unnested cursors, or just arbitrary projections from derived tables. With a very strong object-oriented typing model, these features will probably stay out of scope.

In essence, the decision of creating an API that looks like SQL or one that looks like C#, Scala, Java is a definite decision in favour of one or the other platform. While it will be easier to evolve SLICK in similar ways as LINQ, SQL feature scope that clearly communicates its underlying intent will be very hard to add, later on (e.g. how would you model Oracle’s partitioned outer join syntax? How would you model ANSI/ISO SQL:1999 grouping sets? How can you support scalar subquery caching? etc…).

jOOQ has come to fill this gap.

jOOQ is different

SQL was never meant to be abstracted. To be confined in the narrow boundaries of heavy mappers, hiding the beauty and simplicity of relational data. SQL was never meant to be object-oriented. SQL was never meant to be anything other than… SQL!

Database Abstraction and SQL Injection

I have subscribed to various user groups of jOOQ’s competing database abstraction tools. One of which is ActiveJDBC, a Java implementation of Active Record design pattern. Its maintainer Igor Polevoy recently claimed that:

SQL injection is a web application problem, and not directly related to an ORM. ActiveJDBC will process any SQL that is passed to it.

(See the discussion here: https://groups.google.com/d/topic/activejdbc-group/5D2jhWuW4Sg/discussion)

Is that really true? Should the database abstraction layer delegate SQL injection prevention to the client application?

SQL Injection Background

SQL injection is a problem that most of us developers have had to deal with one time or another in their professional lives. Wikipedia explains the problem nicely. Given the following piece of Java code (or any other language):

statement = "SELECT * FROM users WHERE name = '" + userName + "';"

Imagine that “userName” is a variable taken from an HTTP request. Blindly pasting an HTTP request parameter gives way to simple attacks as these:

-- attacker sends this code in the userName field:
userName = "a';DROP TABLE users; SELECT * FROM userinfo WHERE 't' = 't"

-- resulting in the following statement:
statement = "SELECT * FROM users WHERE name = 'a';"
          + "DROP TABLE users;" +
          + "SELECT * FROM userinfo WHERE 't' = 't';"

This doesn’t happen to you?

Maybe not. But the problem is seen quite often on Stack Overflow. More than 2000 results when searching for “SQL injection”: http://stackoverflow.com/search?q=sql+injection. So even if you know how to prevent it, someone on your team may not. OK, but …

it’s not that bad if one out of 500 statements is badly written by some programmer that was oblivious of this threat?

Think again. Have you ever heard of a tool called sqlmap? This tool will find any problematic page within your application within a couple of seconds/minutes/hours, depending on the severity of your injection problem. Not only that, once it has found problematic pages, it will be able to extract ALL kinds of data from your database. I mean ALL kinds of data. A selection of sqlmap features:

  • Support to enumerate users, password hashes, privileges, roles, databases, tables and columns.
  • Support to search for specific database names, specific tables across all databases or specific columns across all databases’ tables. This is useful, for instance, to identify tables containing custom application credentials where relevant columns’ names contain string like name and pass.
  • Support to download and upload any file from the database server underlying file system when the database software is MySQL, PostgreSQL or Microsoft SQL Server.
  • Support to execute arbitrary commands and retrieve their standard output on the database server underlying operating system when the database software is MySQL, PostgreSQL or Microsoft SQL Server.

Yes! If you suffer from SQL-injection-unsafe code, an attacker can seize your server under some circumstances!! In our company, we’ve tried sqlmap in a sandbox environment against some open source blog software with known vulnerabilities. We’ve managed to seize the server in no time, without writing a single line of SQL

Database Abstraction and SQL Injection

OK, now that I have your attention, let’s think again about what Igor Polevoy said:

SQL injection is a web application problem, and not directly related to an ORM. ActiveJDBC will process any SQL that is passed to it.

Yes, he may be right. Given that ActiveJDBC is a slim wrapper for JDBC, allowing to do nice CRUD simplifications, such as these (taken from their website):

List<Employee> people =
Employee.where("department = ? and hire_date > ? ", "IT", hireDate)
        .offset(21)
        .limit(10)
        .orderBy("hire_date asc");

Did you spot the risk for SQL injection? Right. Even if it uses bind values for underlying PreparedStatements, this tool is as unsafe as JDBC. You can avoid SQL injection, if you’re careful. Or you can start concatenating strings all over. But you have to be aware of that! How does jOOQ handle situations like these? The jOOQ manual explains how bind values are handled explicitly or implicitly. Here are some examples:

// Implicitly creating a bind value for "Poe"
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal("Poe"));

// Explicitly creating a (named) bind value for "Poe"
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal(param("lastName", "Poe")));

// Explicitly inlining "Poe" in the generated SQL string
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal(inline("Poe")));

The above examples will yield

SELECT * FROM T_AUTHOR WHERE LAST_NAME = ?
SELECT * FROM T_AUTHOR WHERE LAST_NAME = ?
SELECT * FROM T_AUTHOR WHERE LAST_NAME = 'Poe'

In the case where “Poe” is inlined, escaping is handled by jOOQ, to prevent syntax errors and SQL injection. But jOOQ also supports injecting SQL strings directly in generated SQL. For instance:

// Inject plain SQL into jOOQ
create.select()
      .from(T_AUTHOR)
      .where("LAST_NAME = 'Poe'");

In this case, SQL injection can occur just as with JDBC

Conclusion

In essence, Igor is right. It is the (client) application developer’s responsibility to be aware of SQL injection issues created by their code. But if a database abstraction framework built on top of JDBC can avoid SQL injection as much as possible in its API, all the better. From a SQL injection perspective, database abstraction frameworks can be divided into three categories:

  • Simple utilities. These include Spring’s JdbcTemplate, or Apache’s DbUtils. They really just enhance the JDBC API with convenience (less exception handling, less verbosity, simpler variable binding, simpler data fetching). Of course, these tools won’t prevent SQL injection
  • Full SQL abstraction. These include jOOQ, JaQu, JPA’s CriteriaQuery and others. Their normal operation mode will always render bind values in generated SQL. This prevents SQL injection in most cases.
  • The others. Many other frameworks (including ActiveJDBC and Hibernate) are mainly based on (SQL or HQL) string operations. While they abstract many SQL-related things, they do not prevent SQL injection at all.

So, when you choose any SQL abstraction tool in your Java application, beware of the severity of SQL injection. And beware of the fact, whether your tool helps you prevent it or not!

Igor’s response

Note that Igor has posted this interesting response to this post here:

http://igorpolevoy.blogspot.ch/2012/07/defend-against-sql-injection-using.html

MyBatis’ Wicked Statement Builders

Now here’s one of the most wicked API’s I’ve seen in a while!

MyBatis is well-known as a database abstraction framework on top of JDBC, allowing for externalising SQL into files, loading them at appropriate places in your Java code. For those of you who like this approach, you may be used to statements similar to this one here, taken from the MyBatis documentation:

<select id="selectPerson" parameterType="int" resultType="hashmap">
  SELECT * FROM PERSON WHERE ID = #{id}
</select>

This is MyBatis alright, you think? You haven’t seen everything. MyBatis also has statement builders. Not just any type of statement builders. They make your Java code look like SQL. Check this out:

private String selectPersonSql() {
  BEGIN(); // Clears ThreadLocal variable
  SELECT("P.ID, P.USERNAME, P.PASSWORD, P.FULL_NAME");
  SELECT("P.LAST_NAME, P.CREATED_ON, P.UPDATED_ON");
  FROM("PERSON P");
  FROM("ACCOUNT A");
  INNER_JOIN("DEPARTMENT D on D.ID = P.DEPARTMENT_ID");
  INNER_JOIN("COMPANY C on D.COMPANY_ID = C.ID");
  WHERE("P.ID = A.ID");
  WHERE("P.FIRST_NAME like ?");
  OR();
  WHERE("P.LAST_NAME like ?");
  GROUP_BY("P.ID");
  HAVING("P.LAST_NAME like ?");
  OR();
  HAVING("P.FIRST_NAME like ?");
  ORDER_BY("P.ID");
  ORDER_BY("P.FULL_NAME");
  return SQL();
}

… jeez, that is one of the funniest ideas I’ve ever seen. A ThreadLocal to keep the SQL statement currently being rendered!! And an API just … err … adding Strings to an internal StringBuilder? Wow.

You can decide for yourself whether you like this or not. But you can’t say these guys aren’t creative 😉 Be sure to read the full docs to learn also about how to write INSERT, UPDATE and DELETE:

http://www.mybatis.org/core/statement-builders.html

jDBI: A simple convenience layer on top of JDBC

I’m always looking out for similar tools like jOOQ, or at least tools that work in the same domain – the domain of database access abstraction. jDBI looks lovely. It provides simple solutions for what JDBC is lacking in general. Here are a couple of features (taken from the intro):

Fluent API

JDBC is quite verbose in that it usually requires three steps to get to a result:

  1. Obtain a connection
  2. Prepare a statement
  3. Fetch results (meaning iterate over a result set, even if you only need one value)

Here’s how jDBI models its fluent API, to ease some of that pain:

// using in-memory H2 database
DataSource ds = JdbcConnectionPool.create("jdbc:h2:mem:test",
                                          "username",
                                          "password");
DBI dbi = new DBI(ds);
Handle h = dbi.open();
h.execute(
  "create table something (id int primary key, name varchar(100))");
h.execute(
  "insert into something (id, name) values (?, ?)", 1, "Brian");

String name = h.createQuery("select name from something where id = :id")
               .bind("id", 1)
               .map(StringMapper.FIRST)
               .first();
                    
assertThat(name, equalTo("Brian"));

h.close();

DAO layer simplification

In the DAO layer, you’re often writing the same SQL code again and again. Hibernate / JPA are quite convenient in handling this, but you don’t always want to have such big dependencies. So jDBI offers the essence of EJB 3.0. Simple annotations for named queries (although, I do think that Brian McCallister could use JPA annotations instead of his own ones):

public interface MyDAO
{
  @SqlUpdate(
    "create table something (id int primary key, name varchar(100))")
  void createSomethingTable();

  @SqlUpdate("insert into something (id, name) values (:id, :name)")
  void insert(@Bind("id") int id, @Bind("name") String name);

  @SqlQuery("select name from something where id = :id")
  String findNameById(@Bind("id") int id);

  /**
   * close with no args is used to close the connection
   */
  void close();
}

Here’s how to use the above DAO:

// using in-memory H2 database via a pooled DataSource
JdbcConnectionPool ds = JdbcConnectionPool.create("jdbc:h2:mem:test2",
                                                  "username",
                                                  "password");
DBI dbi = new DBI(ds);
MyDAO dao = dbi.open(MyDAO.class);

dao.createSomethingTable();
dao.insert(2, "Aaron");

String name = dao.findNameById(2);
assertThat(name, equalTo("Aaron"));

dao.close();
ds.dispose();

Summary

There are a few other very nice features, which I am now going to check for their usefulness in jOOQ. Read the manual here and discover this little gem:

http://jdbi.codehaus.org/archive.html

Or get the sources here:

https://github.com/brianm/jdbi