What’s Even Harder Than Dates and Timezones? Dates and Timezones in SQL / JDBC!


There was an interesting discussion recently on the jOOQ mailing list about jOOQ’s current lack of out-of-the-box support for TIMESTAMP WITH TIME ZONE data types.

No one said that date, time and timezones are easy! There’s an amusing piece here, which I recommend reading:
Falsehoods programmers believe about time

And when that’s not enough, read also:
More falsehoods programmers believe about time

I personally like the bit about programmers erroneously believing that “Unix time is the number of seconds since Jan 1st 1970.”… unix time doesn’t have a way to represent leap seconds ;)

Back to JDBC

Here’s an interesting Stack Overflow answer by Mark Rotteveel, the Jaybird developer (Firebird JDBC driver):
Is java.sql.Timestamp timezone specific?

Mark’s explanation can be observed as follows (I’m using PostgreSQL here):

Connection c = getConnection();
Calendar utc = Calendar.getInstance(
    TimeZone.getTimeZone("UTC"));

try (PreparedStatement ps = c.prepareStatement(
    "select"
  + "  ?::timestamp,"
  + "  ?::timestamp,"
  + "  ?::timestamp with time zone,"
  + "  ?::timestamp with time zone"
)) {

    ps.setTimestamp(1, new Timestamp(0));
    ps.setTimestamp(2, new Timestamp(0), utc);
    ps.setTimestamp(3, new Timestamp(0));
    ps.setTimestamp(4, new Timestamp(0), utc);

    try (ResultSet rs = ps.executeQuery()) {
        rs.next();

        System.out.println(rs.getTimestamp(1) 
                 + " / " + rs.getTimestamp(1).getTime());
        System.out.println(rs.getTimestamp(2, utc)
                 + " / " + rs.getTimestamp(2, utc).getTime());
        System.out.println(rs.getTimestamp(3) 
                 + " / " + rs.getTimestamp(3).getTime());
        System.out.println(rs.getTimestamp(4, utc)
                 + " / " + rs.getTimestamp(4, utc).getTime());
    }
}

The above program uses all permutations of using timezones and not using timezones in Java and in the DB, and the output is always the same:

1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0

As you can see, in each case, the UTC timestamp 0 was correctly stored and retrieved from the database. My own locale is Switzerland, thus CET / CEST, which was UTC+1 at Epoch, which is what is getting output on Timestamp.toString().

Things get interesting when you use timestamp literals, both in SQL and/or in Java. If you replace the bind variables as such:

Timestamp almostEpoch = Timestamp.valueOf("1970-01-01 00:00:00");

ps.setTimestamp(1, almostEpoch);
ps.setTimestamp(2, almostEpoch, utc);
ps.setTimestamp(3, almostEpoch);
ps.setTimestamp(4, almostEpoch, utc);

This is what I’m getting on my machine, again in CET / CEST

1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000

I.e. not Epoch, but the timestamp literal that I sent to the server in the first place. Observe that the four combinations of binding / fetching still always produce the same timestamp.

Let’s see what happens if the session writing to the database uses a different timezone (let’s assume you’re in PST) than the session fetching from the database (I’m using again CET or UTC). I’m running this program:

Calendar utc = Calendar.getInstance(
    TimeZone.getTimeZone("UTC"));

Calendar pst = Calendar.getInstance(
    TimeZone.getTimeZone("PST"));

try (PreparedStatement ps = c.prepareStatement(
    "select"
  + "  ?::timestamp,"
  + "  ?::timestamp,"
  + "  ?::timestamp with time zone,"
  + "  ?::timestamp with time zone"
)) {

    ps.setTimestamp(1, new Timestamp(0), pst);
    ps.setTimestamp(2, new Timestamp(0), pst);
    ps.setTimestamp(3, new Timestamp(0), pst);
    ps.setTimestamp(4, new Timestamp(0), pst);

    try (ResultSet rs = ps.executeQuery()) {
        rs.next();

        System.out.println(rs.getTimestamp(1)
                 + " / " + rs.getTimestamp(1).getTime());
        System.out.println(rs.getTimestamp(2, utc)
                 + " / " + rs.getTimestamp(2, utc).getTime());
        System.out.println(rs.getTimestamp(3)
                 + " / " + rs.getTimestamp(3).getTime());
        System.out.println(rs.getTimestamp(4, utc)
                 + " / " + rs.getTimestamp(4, utc).getTime());
    }
}

It yields this output:

1969-12-31 16:00:00.0 / -32400000
1969-12-31 17:00:00.0 / -28800000
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0

The first timestamp was Epoch stored as PST (16:00), then the timezone information was removed by the database, which turned Epoch into the local time you had at Epoch (-28800 seconds / -8h), and that’s the information that is really stored.

Now, when I’m fetching this time from my own timezone CET, I will still want to get the local time (16:00). But in my timezone, this is no longer -28800 seconds, but -32400 seconds (-9h). Quirky enough?

Things go the other way round when I’m fetching the stored local time (16:00), but I’m forcing the fetch to happen in UTC, which will produce the timestamp that you’ve stored, originally in PST (-28800 seconds). But when printing this timestamp (-28800 seconds) in my timezone CET, this will now be 17:00.

When we use the TIMESTAMP WITH TIME ZONE data type in the database, the timezone is maintained (PST), and when I fetch the Timestamp value, no matter if using CET or UTC, I will still get Epoch, which was safely stored to the database, printed out as 01:00 in CET.

Whew.

TL;DR:

When using jOOQ’s, if the correct UTC timestamp matters to you, use TIMESTAMP WITH TIMEZONE, but you’ll have to implement your own data type Binding, because jOOQ currently doesn’t support that data type. Once you use your own data type Binding, you can also use Java 8’s time API, which better represents these different types than java.sql.Timestamp + the ugly Calendar.

If the local time matters to you, or if you’re not operating across time zones, you’re fine using TIMESTAMP and jOOQ’s Field<Timestamp>.

Lucky you, if you’re like me, operating in a very small country with a single time zone where most local software just doesn’t run into this issue.

How to Debug Your Maven Build with Eclipse


When running a Maven build with many plugins (e.g. the jOOQ or Flyway plugins), you may want to have a closer look under the hood to see what’s going on internally in those plugins, or in your extensions of those plugins. This may not appear obvious when you’re running Maven from the command line, e.g. via:

C:\Users\jOOQ\workspace>mvn clean install

Luckily, it is rather easy to debug Maven. In order to do so, just create the following batch file on Windows:

@ECHO OFF

IF "%1" == "off" (
    SET MAVEN_OPTS=
) ELSE (
    SET MAVEN_OPTS=-Xdebug -Xnoagent -Djava.compile=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005
)

Of course, you can do the same also on a MacOS X or Linux box, by using export intead of SET.

Now, run the above batch file and proceed again with building:

C:\Users\jOOQ\workspace>mvn_debug

C:\Users\jOOQ\workspace>mvn clean install
Listening for transport dt_socket at address: 5005

Your Maven build will now wait for a debugger client to connect to your JVM on port 5005 (change to any other suitable port). We’ll do that now with Eclipse. Just add a new Remote Java Application that connects on a socket, and hit “Debug”:

port-5005

That’s it. We can now set breakpoints and debug through our Maven process like through any other similar kind of server process. Of course, things work exactly the same way with IntelliJ or NetBeans.

Once you’re done debugging your Maven process, simply call the batch again with parameter off:

C:\Users\jOOQ\workspace>mvn_debug off

C:\Users\jOOQ\workspace>mvn clean install

And your Maven builds will no longer be debugged.

Happy debugging!

Implementing Client-Side Row-Level Security with jOOQ


Some time ago, we’ve promised to follow up on our Constraints on Views article with a sequel showing how to implement client-side row-level security with jOOQ.

What is row-level security?

Some databases like Oracle or the upcoming PostgreSQL 9.5 provide native support for row-level security, which is awesome – but not every database has this feature.

Row level security essentially means that a given database session can access only some rows in the database, but not others. For instance, this is what John can see in his session:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
3   John           secret poker stash  85193065.00

In the above example, assume that John and Jane are a married couple with access to each others’ bank accounts, except that John wants to hide his secret poker stash from Jane because he’s planning on running away with Jill to the Bahamas living the good life. So, the above would model John’s view on the data set, whereas the below would model Jane’s:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00

The nice thing about row-level security is the fact that the data is completely hidden from a database session, as if it weren’t even there. This includes aggregations or filtering, as can be seen in the below examples of John’s view:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
3   John           secret poker stash  85193065.00
--------------------------------------------------
Total John+Jane                        85194865.00

vs. Jane’s view:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
--------------------------------------------------
Total John+Jane                            1800.00

If your database doesn’t support row-level security, you will need to emulate it somehow. One way to do that is by using views, and possibly, temporary tables or some context variables:

CREATE VIEW my_accounts AS
SELECT
  a.id,
  a.account_owner,
  a.account_name,
  a.amount
FROM
  accounts a
JOIN
  account_privileges p
ON
  a.id = p.a_id
WHERE
  p.privilege_owner = SYS_CONTEXT('banking', 'user');

In the above example, I’m using Oracle’s SYS_CONTEXT function, which allows accessing global context from the current session. This context could be initialised every time a JDBC Connection is fetched from the connection pool, for instance.

What if your database doesn’t support these things?

jOOQ to the rescue! Since jOOQ 3.2, a VisitListener Service Provider Interface (SPI) has been introduced for precisely this reason, which allows jOOQ users to perform SQL AST transformations while the SQL statement is being generated.

We’re assuming:

CREATE TABLE accounts (
  id BIGINT NOT NULL PRIMARY KEY,
  account_owner VARCHAR(20) NOT NULL,
  account_name VARCHAR(20) NOT NULL,
  amount DECIMAL(18, 2) NOT NULL
);

CREATE TABLE transactions (
  id BIGINT NOT NULL PRIMARY KEY,
  account_id BIGINT NOT NULL,
  amount DECIMAL(18, 2) NOT NULL
);

INSERT INTO accounts (
  account_owner, account_name, amount
)
VALUES (1, 'John', 'savings', 500.0),
       (2, 'Jane', 'savings', 1300.0),
       (3, 'John', 'secret poker stash', 85193065.00);

INSERT INTO transactions (
  id, account_id, amount
)
VALUES (1, 1, 200.0),
       (2, 1, 300.0),
       (3, 2, 300.0),
       (4, 2, 800.0),
       (5, 2, 200.0),
       (6, 3, 85193065.00);

Adding a simple WHERE clause

This is the simplest transformation use-case:

-- turn this...
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts

-- ... into this
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.id IN (?, ?) -- Predicate, for simplicity

Adding an AND clause to an existing WHERE clause

The transformation should still work, if there is already an existing predicate

-- turn this...
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.account_owner = 'John'

-- ... into this
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.account_owner = 'John'
AND 
  accounts.id IN (?, ?)

Adding the predicate also for aliased tables

When doing self-joins or for whatever other reason you might have applied an alias

-- turn this...
SELECT 
  a.account_name, 
  a.amount
FROM 
  accounts a

-- ... into this
SELECT 
  a.account_name, 
  a.amount
FROM 
  accounts a
WHERE 
  a.id IN (?, ?)

Adding the predicate also in subqueries / joins / semi-joins

Of course, we shouldn’t restrict ourselves to patching top-level SELECT statements. The following transformation must be applied as well:

-- turn this...
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
  )

-- ... into this
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
    AND
      a.id IN (?, ?)
  )

Adding the predicate to foreign key references

This might be easily forgotten, but in fact, we also want to add an additional predicate to all foreign key references of accounts.id, namely transactions.account_id, especially when the transactions table is not joined to the accounts table:

-- turn this...
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t

-- ... into this
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (?, ?)

Long story short, we would like to find all queries that refer to the accounts table in some form, and add a simple predicate to it, implementing access control.

DISCLAIMER: As always with security, you should implement security on several layers. SQL AST transformation is not trivial, and the above scenarios are incomplete. Besides, they work only for queries that are built using the jOOQ AST, not for plain SQL queries, or for queries that are run via JDBC directly, via Hibernate, or via stored procedures, views (which in turn refer to the accounts table), or simple table synonyms.

So, read this post as a tutorial showing how to perform AST transformation, not as a complete solution to row-level security

How to do it with jOOQ?

Now comes the interesting part. We’re going to do the whole thing with jOOQ. First off, remember that in order to implement backwards-compatible SPI evolution, we always provide a default implementation for our SPIs. In this case, we’re going to extend DefaultVisitListener instead of implementing VisitListener directly.

The base of our VisitListener will be the following:

public class AccountIDFilter extends DefaultVisitListener {

    final Integer[] ids;

    public AccountIDFilter(Integer... ids) {
        this.ids = ids;
    }
}

In other words, a filtering listener that filters for a given set of IDs to be put in an IN predicate.

Now, first off, we’ll need a bit of utility methods. The following two utilities push or pop some objects on top of a stack:

void push(VisitContext context) {
    conditionStack(context).push(new ArrayList<>());
    whereStack(context).push(false);
}

void pop(VisitContext context) {
    whereStack(context).pop();
    conditionStack(context).pop();
}

… and the stack can be seen here:

Deque<List<Condition>> conditionStack(
        VisitContext context) {
    Deque<List<Condition>> data = (Deque<List<Condition>>) 
        context.data("conditions");

    if (data == null) {
        data = new ArrayDeque<>();
        context.data("conditions", data);
    }

    return data;
}

Deque<Boolean> whereStack(VisitContext context) {
    Deque<Boolean> data = (Deque<Boolean>) 
        context.data("predicates");

    if (data == null) {
        data = new ArrayDeque<>();
        context.data("predicates", data);
    }

    return data;
}

In prose, the conditionStack maintains a stack of conditions for each subquery, whereas the whereStack maintains a stack of flags for each subquery. The conditions are the conditions that should be generated in the WHERE clause of the given subquery, whereas the flags indicate whether a WHERE clause is already present (i.e. whether new conditions should be appended using AND, rather than WHERE).

For convenience, we’ll also add the following utilities:

List<Condition> conditions(VisitContext context) {
    return conditionStack(context).peek();
}

boolean where(VisitContext context) {
    return whereStack(context).peek();
}

void where(VisitContext context, boolean value) {
    whereStack(context).pop();
    whereStack(context).push(value);
}

These utilities allow for accessing both the conditions and flags at the top of the stack (in the current subquery), as well as for replacing the flag at the top of the stack.

Why do we need a stack?

It’s simple. We want to apply the predicate only locally for the current subquery, while jOOQ transforms and generates your SQL statement. Remember, when we were transforming the following:

SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
    AND
      a.id IN (?, ?)
  )
AND
  t.account_id IN (?, ?)

… in the end, we want two additional predicates generated in the above query. One in the subquery selecting from accounts, and another one in the top-level query selecting from transactions, but the two predicates shouldn’t interfere with each other, i.e. when jOOQ generates the subquery, we only want to see objects that are relevant to the subquery (top of the stack).

So, let’s see how and when we push stuff on the stack. First off, we need to listen for start and end events of SQL clauses:

starting a SQL clause

This is straight-forward. Every time we enter a new SQL statement, we want to push a new set of data (conditions, flags) on the stack. In a way, we’re creating a local scope for the subquery:

@Override
public void clauseStart(VisitContext context) {

    // Enter a new SELECT clause / nested select
    // or DML statement
    if (context.clause() == SELECT ||
        context.clause() == UPDATE ||
        context.clause() == DELETE ||
        context.clause() == INSERT) {
        push(context);
    }
}

Of course, this scope has to be cleaned up at the end of the same clause:

@Override
public void clauseEnd(VisitContext context) {
    // [ ... more code will follow ... ]

    // Leave a SELECT clause / nested select
    // or DML statement
    if (context.clause() == SELECT ||
        context.clause() == UPDATE ||
        context.clause() == DELETE ||
        context.clause() == INSERT) {
        pop(context);
    }
}

This was the easy part. Now, it gets a bit more interesting. When we end a clause, and that clause is a WHERE clause of a SELECT, UPDATE, or DELETE statement, then we want to render an additional keyword and predicate:

@Override
public void clauseEnd(VisitContext context) {

    // Append all collected predicates to the WHERE
    // clause if any
    if (context.clause() == SELECT_WHERE ||
        context.clause() == UPDATE_WHERE ||
        context.clause() == DELETE_WHERE) {
        List<Condition> conditions = 
            conditions(context);

        if (conditions.size() > 0) {
            context.context()
                   .formatSeparator()
                   .keyword(where(context) 
                   ? "and" 
                   : "where"
                   )
                   .sql(' ');

            context.context().visit(
                DSL.condition(Operator.AND, conditions)
            );
        }
    }

    // [ ... code from previous snippet ... ]

So, the only thing we still need to do is assemble that List<Condition> on the stack of the current query, such that we can append it to the WHERE clause, as well as the flag that decides between "and" and "where". This can be done by overriding visitEnd(), a method that is invoked at the end of visiting a QueryPart (as opposed to a Clause):

@Override
public void visitEnd(VisitContext context) {

    // We'll see what this means in a bit...
    pushConditions(context, ACCOUNTS, 
        ACCOUNTS.ID, ids);
    pushConditions(context, TRANSACTIONS, 
        TRANSACTIONS.ACCOUNT_ID, ids);

    // Check if we're rendering any condition within
    // the WHERE clause In this case, we can be sure
    // that jOOQ will render a WHERE keyword
    if (context.queryPart() instanceof Condition) {
        List<Clause> clauses = clauses(context);

        if (clauses.contains(SELECT_WHERE) ||
            clauses.contains(UPDATE_WHERE) ||
            clauses.contains(DELETE_WHERE)) {
            where(context, true);
        }
    }
}

List<Clause> clauses(VisitContext context) {
    List<Clause> result = asList(context.clauses());
    int index = result.lastIndexOf(SELECT);

    if (index > 0)
        return result.subList(index, result.size() - 1);
    else
        return result;
}

At the end of each query part…

  • we’re trying to push the relevant conditions on the stack if applicable
  • we’re checking to see if a WHERE clause is present and set the relevant flag

So, finally, the core logic contained in pushConditions() is the only thing missing, and that’s:

<E> void pushConditions(
        VisitContext context, 
        Table<?> table, 
        Field<E> field, 
        E... values) {

    // Check if we're visiting the given table
    if (context.queryPart() == table) {
        List<Clause> clauses = clauses(context);

        // ... and if we're in the context of the current
        //  subselect's FROM clause
        if (clauses.contains(SELECT_FROM) ||
            clauses.contains(UPDATE_UPDATE) ||
            clauses.contains(DELETE_DELETE)) {

            // If we're declaring a TABLE_ALIAS...
            // (e.g. "ACCOUNTS" as "a")
            if (clauses.contains(TABLE_ALIAS)) {
                QueryPart[] parts = context.queryParts();

                // ... move up the QueryPart visit path to find the
                // defining aliased table, and extract the aliased
                // field from it. (i.e. the "a" reference)
                for (int i = parts.length - 2; i >= 0; i--) {
                    if (parts[i] instanceof Table) {
                        field = ((Table<?>) parts[i]).field(field);
                        break;
                    }
                }
            }

            // Push a condition for the field of the
            // (potentially aliased) table
            conditions(context).add(field.in(values));
        }
    }
}

And we’re done! Whew – well, that certainly wasn’t too easy. AST transformation never is. But the above algorithm is robust and can be used to run all of the aforementioned queries.

Testing the above

Configuration configuration = create().configuration();

// This configuration has full access to all rows
DSLContext fullaccess = DSL.using(configuration);

// This configuration has restricted access to IDs 1 and 2
DSLContext restricted = DSL.using(
    configuration.derive(
        DefaultVisitListenerProvider.providers(
            new AccountIDFilter(1, 2)
        )
    )
);

// Fetching accounts
assertEquals(3, fullaccess.fetch(ACCOUNTS).size());
assertEquals(2, restricted.fetch(ACCOUNTS).size());

Generating the following SQL:

select 
  "PUBLIC"."ACCOUNTS"."ID", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_NAME", 
  "PUBLIC"."ACCOUNTS"."AMOUNT"
from "PUBLIC"."ACCOUNTS"
---------------------------------------
select 
  "PUBLIC"."ACCOUNTS"."ID", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_NAME", 
  "PUBLIC"."ACCOUNTS"."AMOUNT"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ID" in (
  1, 2
)
// Fetching transactions
assertEquals(6, fullaccess.fetch(TRANSACTIONS).size());
assertEquals(5, restricted.fetch(TRANSACTIONS).size());

Generating the following SQL:

select 
  "PUBLIC"."TRANSACTIONS"."ID", 
  "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID", 
  "PUBLIC"."TRANSACTIONS"."AMOUNT"
from "PUBLIC"."TRANSACTIONS"
---------------------------------------
select 
  "PUBLIC"."TRANSACTIONS"."ID", 
  "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID", 
  "PUBLIC"."TRANSACTIONS"."AMOUNT"
from "PUBLIC"."TRANSACTIONS"
where "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID" in (
  1, 2
)
// Fetching John's accounts
assertEquals(asList(1, 3), fullaccess.fetchValues(
    select(ACCOUNTS.ID)
    .from(ACCOUNTS)
    .where(ACCOUNTS.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));
assertEquals(asList(1   ), restricted.fetchValues(
    select(ACCOUNTS.ID)
    .from(ACCOUNTS)
    .where(ACCOUNTS.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));

Generating the following SQL:

select "PUBLIC"."ACCOUNTS"."ID"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER" = 'John'
order by 1 asc
---------------------------------------
select "PUBLIC"."ACCOUNTS"."ID"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER" = 'John'
and "PUBLIC"."ACCOUNTS"."ID" in (
  1, 2
)
order by 1 asc
// Fetching John's accounts via an aliased table
Accounts a = ACCOUNTS.as("a");
assertEquals(asList(1, 3), fullaccess.fetchValues(
    select(a.ID)
    .from(a)
    .where(a.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));
assertEquals(asList(1   ), restricted.fetchValues(
    select(a.ID)
    .from(a)
    .where(a.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));

Generating the following SQL:

select "a"."ID"
from "PUBLIC"."ACCOUNTS" "a"
where "a"."ACCOUNT_OWNER" = 'John'
order by 1 asc
---------------------------------------
select "a"."ID"
from "PUBLIC"."ACCOUNTS" "a"
where "a"."ACCOUNT_OWNER" = 'John'
and "a"."ID" in (
  1, 2
)
order by 1 asc
// Fetching John's transactions
Transactions t = TRANSACTIONS.as("t");
assertEquals(asList(1, 2, 6), fullaccess.fetchValues(
    select(t.ID)
    .from(t)
    .where(t.ACCOUNT_ID.in(
        select(a.ID)
        .from(a)
        .where(a.ACCOUNT_OWNER.eq("John"))
    ))
    .orderBy(1)
));
assertEquals(asList(1, 2   ), restricted.fetchValues(
    select(t.ID)
    .from(t)
    .where(t.ACCOUNT_ID.in(
        select(a.ID)
        .from(a)
        .where(a.ACCOUNT_OWNER.eq("John"))
    ))
    .orderBy(1)
));

Generating the following SQL:

select "t"."ID"
from "PUBLIC"."TRANSACTIONS" "t"
where "t"."ACCOUNT_ID" in (
  select "a"."ID"
  from "PUBLIC"."ACCOUNTS" "a"
  where "a"."ACCOUNT_OWNER" = 'John'
)
order by 1 asc
---------------------------------------
select "t"."ID"
from "PUBLIC"."TRANSACTIONS" "t"
where "t"."ACCOUNT_ID" in (
  select "a"."ID"
  from "PUBLIC"."ACCOUNTS" "a"
  where "a"."ACCOUNT_OWNER" = 'John'
  and "a"."ID" in (
    1, 2
  )
)
and "t"."ACCOUNT_ID" in (
  1, 2
)
order by 1 asc

Conclusion

The above examples have shown how row-level security can be implemented relatively easily using jOOQ’s VisitListener, a very powerful Service Provider Interface for client-side SQL AST transformation.

The applications don’t stop there. It is easy to see how you can implement a VisitListener that throws an exception every time you run a DML query that does not have a WHERE clause.

where-clause

Another application would be to replace a table by a similar table, whenever a certain condition is true.

And the best thing is: With jOOQ, you don’t need to parse SQL in order to transform it (which is extremely hard, depending on the SQL dialect). You already manually build an Abstract Syntax Tree using jOOQ’s fluent API, so you get all these features for free.

So: Happy SQL transformation!

jOOQ: The best way to write SQL in Java

Appendix: Full code

import static java.util.Arrays.asList;
import static org.jooq.Clause.DELETE;
import static org.jooq.Clause.DELETE_DELETE;
import static org.jooq.Clause.DELETE_WHERE;
import static org.jooq.Clause.INSERT;
import static org.jooq.Clause.SELECT;
import static org.jooq.Clause.SELECT_FROM;
import static org.jooq.Clause.SELECT_WHERE;
import static org.jooq.Clause.TABLE_ALIAS;
import static org.jooq.Clause.UPDATE;
import static org.jooq.Clause.UPDATE_UPDATE;
import static org.jooq.Clause.UPDATE_WHERE;
import static org.jooq.test.h2.generatedclasses.Tables.ACCOUNTS;
import static org.jooq.test.h2.generatedclasses.Tables.TRANSACTIONS;

import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.List;

import org.jooq.Clause;
import org.jooq.Condition;
import org.jooq.Field;
import org.jooq.Operator;
import org.jooq.QueryPart;
import org.jooq.Table;
import org.jooq.VisitContext;
import org.jooq.impl.DSL;
import org.jooq.impl.DefaultVisitListener;

@SuppressWarnings("unchecked")
public class AccountIDFilter extends DefaultVisitListener {

    final Integer[] ids;

    public AccountIDFilter(Integer... ids) {
        this.ids = ids;
    }

    void push(VisitContext context) {
        conditionStack(context).push(new ArrayList<>());
        whereStack(context).push(false);
    }

    void pop(VisitContext context) {
        whereStack(context).pop();
        conditionStack(context).pop();
    }

    Deque<List<Condition>> conditionStack(VisitContext context) {
        Deque<List<Condition>> data = (Deque<List<Condition>>) context.data("conditions");

        if (data == null) {
            data = new ArrayDeque<>();
            context.data("conditions", data);
        }

        return data;
    }

    Deque<Boolean> whereStack(VisitContext context) {
        Deque<Boolean> data = (Deque<Boolean>) context.data("predicates");

        if (data == null) {
            data = new ArrayDeque<>();
            context.data("predicates", data);
        }

        return data;
    }

    List<Condition> conditions(VisitContext context) {
        return conditionStack(context).peek();
    }

    boolean where(VisitContext context) {
        return whereStack(context).peek();
    }

    void where(VisitContext context, boolean value) {
        whereStack(context).pop();
        whereStack(context).push(value);
    }

    <E> void pushConditions(VisitContext context, Table<?> table, Field<E> field, E... values) {

        // Check if we're visiting the given table
        if (context.queryPart() == table) {
            List<Clause> clauses = clauses(context);

            // ... and if we're in the context of the current subselect's
            // FROM clause
            if (clauses.contains(SELECT_FROM) ||
                clauses.contains(UPDATE_UPDATE) ||
                clauses.contains(DELETE_DELETE)) {

                // If we're declaring a TABLE_ALIAS... (e.g. "T_BOOK" as "b")
                if (clauses.contains(TABLE_ALIAS)) {
                    QueryPart[] parts = context.queryParts();

                    // ... move up the QueryPart visit path to find the
                    // defining aliased table, and extract the aliased
                    // field from it. (i.e. the "b" reference)
                    for (int i = parts.length - 2; i >= 0; i--) {
                        if (parts[i] instanceof Table) {
                            field = ((Table<?>) parts[i]).field(field);
                            break;
                        }
                    }
                }

                // Push a condition for the field of the (potentially aliased) table
                conditions(context).add(field.in(values));
            }
        }
    }

    /**
     * Retrieve all clauses for the current subselect level, starting with
     * the last {@link Clause#SELECT}.
     */
    List<Clause> clauses(VisitContext context) {
        List<Clause> result = asList(context.clauses());
        int index = result.lastIndexOf(SELECT);

        if (index > 0)
            return result.subList(index, result.size() - 1);
        else
            return result;
    }

    @Override
    public void clauseStart(VisitContext context) {

        // Enter a new SELECT clause / nested select, or DML statement
        if (context.clause() == SELECT ||
            context.clause() == UPDATE ||
            context.clause() == DELETE ||
            context.clause() == INSERT) {
            push(context);
        }
    }

    @Override
    public void clauseEnd(VisitContext context) {

        // Append all collected predicates to the WHERE clause if any
        if (context.clause() == SELECT_WHERE ||
            context.clause() == UPDATE_WHERE ||
            context.clause() == DELETE_WHERE) {
            List<Condition> conditions = conditions(context);

            if (conditions.size() > 0) {
                context.context()
                       .formatSeparator()
                       .keyword(where(context) ? "and" : "where")
                       .sql(' ');

                context.context().visit(DSL.condition(Operator.AND, conditions));
            }
        }

        // Leave a SELECT clause / nested select, or DML statement
        if (context.clause() == SELECT ||
            context.clause() == UPDATE ||
            context.clause() == DELETE ||
            context.clause() == INSERT) {
            pop(context);
        }
    }

    @Override
    public void visitEnd(VisitContext context) {
        pushConditions(context, ACCOUNTS, ACCOUNTS.ID, ids);
        pushConditions(context, TRANSACTIONS, TRANSACTIONS.ACCOUNT_ID, ids);

        // Check if we're rendering any condition within the WHERE clause
        // In this case, we can be sure that jOOQ will render a WHERE keyword
        if (context.queryPart() instanceof Condition) {
            List<Clause> clauses = clauses(context);

            if (clauses.contains(SELECT_WHERE) ||
                clauses.contains(UPDATE_WHERE) ||
                clauses.contains(DELETE_WHERE)) {
                where(context, true);
            }
        }
    }
}

Querying Your Database from Millions of Fibers (Rather than Thousands of Threads)


jOOQ is a great way to do SQL in Java and Quasar fibers bring a much improved concurrency

We’re excited to announce another very interesting guest post on the jOOQ Blog by Fabio Tudone from Parallel Universe.

Parallel Universe develops an open-source stack that allows developers to easily code extremly concurrent application on the JVM. With the Parallel Universe stack you build software that works in harmony with modern hardware rather than fight it at every turn, while keeping your programming language and your simple, familiar programming styles.

fabiotudoneFabio Tudone develops and maintains Quasar integration modules as part of the Comsat project. He’s been part of and then led the development of a cloud-based enterprise content governance platform for several years before joining the Parallel Universe team and he’s been writing mostly JVM software along his whole professional journey. His interests include Dev and DevOps practices, scalability, concurrent and functional programming as well as runtime platforms. Naturally curious and leaning towards exploration, he enjoys gathering knowledge and understanding from people, places and cultures. He’s also interested in awareness practices and likes writing all sorts of stuff.

Quasar features an integration for JDBC and jOOQ as part of the Comsat project, so let’s have a look inside the box.

JDBC, jOOQ and Quasar

comsat-jdbc provides a fiber-blocking wrapper of the JDBC API, so that you can use your connection inside fibers rather than regular Java threads.

Why would you do that? Because fibers are lightweight threads and you can have many more fibers than threads in your running JVM. “Many more” means we’re talking millions versus a handful of thousands.

This means that you have a lot more concurrency capacity in your system to do other things in parallel while you wait for JDBC execution, be it concurrent / parallel calculations (like exchanging actor messages in your highly reliable Quasar Erlang-like actor system) or fiber-blocking I/O (e.g. serving webrequests, invoking micro-services, reading files through fiber NIO or accessing other fiber-enabled data sources like MongoDB).

If your DB can stand it and few more regular threads won’t blow up your system (yet), you can even increase your fiber-JDBC pool (see Extra points: where’s the waiting line later) and send more concurrent jOOQ commands.

Since jOOQ uses JDBC connections to access the database, having jOOQ run on fibers is as easy as bringing in the comsat-jooq dependency and handing your fiber-enabled JDBC connection to the jOOQ context:

import java.sql.Connection;
import static org.jooq.impl.DSL.*;

// ...

Connecton conn = FiberDataSource.wrap(dataSource)
                                .getConnection();
DSLContext create = DSL.using(connection);

// ...

Of course you can also configure a ConnectionProvider to fetch connections from your FiberDataSource.

From this moment on you can use regular jOOQ and everything will happen in fiber-blocking mode rather than thread-blocking. That’s it.

No, really, there’s absolutely nothing more to it: you keep using the excellent jOOQ, only with much more efficient fibers rather than threads. Quasar is a good citizen and won’t force you into a new API (which is nice especially when the original one is great already).

Since the JVM at present doesn’t support native green threads nor continuations, which can be used to implement lightweight threads, Quasar implements continuations (and fibers on top of them) viabytecode instrumentation. This can be done at compile time but often it’s just more convenient to use Quasar’s agent (especially when instrumenting third-party libraries), so here’s an example Gradle project based on Dropwizard that also includes the Quasar agent setup (Don’t forget about Capsule, a really great Java deployment tool for every need, which, needless to say, makes using Quasar and agents in general a breeze). The example doesn’t use all of the jOOQ features, rather it falls in SQL-building use case (both for querying and for CRUD) but you’re encouraged to change it to suit your needs. The without-comsat branch contains a thread-blocking version so you can compare and see the (minimal) differences with the Comsat version.

Where’s the waiting line?

You might be wondering now: ok, but JDBC is a thread-blocking API, how can Quasar turn it into a fiber-blocking one? Because JDBC doesn’t have an aynchronous mode, Quasar uses a thread pool behind the scenes to which fibers dispatch JDBC operations and by which they’re unfrozen and scheduled for resumption when the JDBC operation completes (have a look at Quasar’s integration patterns for more info).

Yes, here’s the nasty waiting line: JDBC commands awaiting to be executed by the thread pool. Although you’re not improving DB parallelism beyond your JDBC thread-pool size, you’re not hurting your fibers, either, even though you’re still using a simple and familiar blocking API. You can still have millions of fibers.

Is it possible to improve the overall situation? Without a standard asynchronous Java RDBMS API there isn’t much we can do. However, this may not matter at all if the database is your bottleneck. There are several nice posts and discussions about this topic and the argument amounts to deciding where you want to move the waiting line.

Bonus: how does this neat jOOQ integration work under the cover?

At present Quasar needs the developer (or integrator) to tell it what to instrument, although fully automatic instrumentation is in the works (This feature depends on some minor JRE changes that won’t be released before Java 9). If you can conveniently alter the source code (or the compiled classes) then it’s enough to annotate methods with @Suspendable or letting them throws SuspendExecution, but this is usually not the case with libraries. But methods with fixed, well known names to be instrumented can be listed in META-INF/suspendables and META-INF/suspendable-supers, respectively for concrete methods and abstract / interface methods that can have suspendable implementations.

If there are a lot (or there’s code generation involved), you can write a SuspendableClassifier to ship with your integration and register it with Quasar’s SPI to provide additional instrumentation logic (see jOOQs). A SuspendableClassifier‘s job is to examine signature information about each and every method in your runtime classpath during the instrumentation phase and tell if it’s suspendable, if it can have suspendable implementations, if for sure neither is the case or if it doesn’t know (Some other classifier could say perhaps “suspendable” or “suspendable-super” later on).

Summing it all up

Well… Just enjoy the excellent jOOQ on efficient fibers!

JavaEE or Spring? Neither! We Call Out For a Fresh Competitor!


If you’ve been following some key Java people on Twitter or reading the “news” on Reddit, you could not have missed the hilarious “bitch fight” (pardon my French) between some Spring and JavaEE evangelists.

First, Jürgen Höller’s provocative article:

“Happy second birthday, Java EE 7! How is it going in production?”

Then, Reza Rahman’s response:

“The Ghosts of Java EE 7 in Production: Past, Present and Future”

And in between, there had been hundreds of equally amusing tweets.

If you haven’t already, take out your popcorn :)

EDIT: This discussion went on further than this blog post. There are also discussions going on:

Troll marketing

At Data Geekery, we’re certainly not unacquainted with the fun and effective technique that we like to refer to as “troll marketing”, e.g. when we recently ranted against the (ab-)use of annotations by both Java EE and Spring, or when we compared Hibernate with the good old Nokia 3310.

It’s childish, yes. But it is also effective. The only reason why it is effective is because by doing so, you’re stirring up a hornets’ nest. You have a point, after all, and you start triggering a very polarised discussion that has no right/wrong answer, which all participants will fail to accept, and thus fight. While they’re fighting, you can put your brand in each and every message. Bingo.

Spring folks do exactly the same thing. If you’ve heard a talk by Josh Long from Pivotal, you will certainly have noticed that much of the talk is just ranting about things like Internet Explorer. If you’re not the kind of person that takes everything personally, seriously, and/or literally, then such a talk can be quite amusing.

The interesting bit here is the fact that the “standards” evangelists react to Spring’s provocative article way too much. At the bottom of Reza’s article, there is a linked presentation by Bert Ertman, in favour of Java EE (and it wouldn’t be wrong to say that Reza thus endorses this talk’s content and style):

The key point I’m trying to make here is not that two opposing technology vendors try to get a hold of their communities, convincing people that their tech is better (that’s what engineers have been doing long before Tesla vs. Edison).

The key point I’m trying to make here is the fact that one of the vendors claims for themselves to be a “standard”, putting themselves (the JCP = Oracle) on eye level with real standards committees like ISO, DIN, w3c, IETF.

JavaEE is a Standard?

It is, in a way. It is a de facto standard, just like Spring. It is a very loose set of small interest groups that are governed mostly by Oracle (and other vendors) who try to promote and establish their interests and technology over alternative, “proprietary” tech.

JavaEE is a “standard” with a long history. It has always been a rather “weak” standard, which is why Spring was created in the first place. The fact that Reza from Oracle reacts so sensitively to Spring’s deliberate provocation can be interpreted as a sign that the “standard” is still weak and endangered by Spring as an alternative.

An interesting comparison could be made to Siebel vs. Salesforce. Siebel was the de facto “standard” in the CRM industry but Salesforce managed to harass Siebel so much that Siebel had to acknowledge Salesforce as a real competitor in its business long before they even remotely met on eye-level, revenue-wise. This was ultimately leading to Siebel’s demise and acquisition by – how ironic – Oracle, where much of the JCP is hosted.

The same thing has been happening with Java EE and Spring. Spring is now so extremely successful in the market, even if completely (= admittedly) proprietary, that Java EE will have to fight hard to get back on top of Spring. Much harder than merely calling out the old and lame “standards” argument. Java EE has to be better and faster than Spring, not just a “standard”. With all the JCP politics going on, I don’t see that happening, though.

So, will Spring win?

My prediction is: Nope.

This whole story just shows us that standards in our Java ecosystem are weak. There are an incredible amount of players on the market. Niche products like Hibernate and jOOQ can shake up the database access market segment. Niche products like GWT, Vaadin, ZK, Spring MVC, Play Framework, etc. can shake up the HTML UI market segment (no way JSF or JSR 371 can prevent further market fragmentation in this area).

Heck, language agnostic technologies like Akka or Vert.X show that not even the Servlet API is really sound and safe. And with Oracle having introduced Nashorn, the whole JavaScript ecosystem might just as well threaten Java EE. The only thing that can be taken for granted for another 5-10 years, I suspect, is the JVM and the Java SE JDK with its libraries, including JDBC.

Our industry is advancing so quickly right now, it seems to be impossible for a highly distributed, rather political committee like the JCP to keep up the pace and deliver the quality and innovation that we’re seeing elsewhere.

In other words, the Java EE “standard” is nothing more than a 5-10 year-behind body of, admittedly, proven technologies. Proven some time ago. Given that a single player like Pivotal has successfully challenged the “standard” for more than 10 years now, I suspect that new players might soon enter the market with new alternatives.

Who will be the new players?

One of them will be the soon-to-be-renamed Typesafe. Why do you think they’re getting rebranded after all? And what do you think they’re trying to prove with Akka + Play + ConductR? They’re trying to prove that you can do software entirely differently on top of the JVM, that’s what they’re trying to prove.

Besides, they’re using the same low-content, high-emotion troll marketing technique that Pivotal / Spring had been so successful with. Observe their use of the evil term “monolith” everywhere (= Java EE, what else):

monolith

Another one of them might be… Microsoft! Why not? Their recent Open Sourcing moves might help establishing .NET on top of Linux, eventually. And once you’re on top of Linux, you might as well get on top of the JVM. The .NET ecosystem is much more coherent than Java’s and this will exert a lot of innovation pressure on our ecosystem.

microsoft+linux

All of this is happening while the JCP is still recovering from the politics behind the game-changer data interchange APIs JSR-353, JSR-367, and JSR-374

Interesting future

The above predictions might or might not become true. But one thing I’m very certain of. We will have even more competition and thus more choices for our infrastructure software in the near future. The current de facto standards will lose market shares. Tough luck for them, better tech for us.

While the evangelists are fighting their little personal fights, I’m looking forward to a very interesting future on the JVM.

Read about a concurring opinion by our friends at Takipi here

jOOQ Tuesdays: Axel Fontaine Predicts Exciting Times as our Industry is Maturing


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

afontaine

We have the pleasure of talking to Axel Fontaine in this fourth edition who will be telling us about the exciting times that are ahead as our industry is maturing.

Hi Axel – Citing your website: From “Software is what I do” to “Architecting for Continuous Delivery & Zero Downtime”. Where did your personal passion for devops come from?

Over the last 18 years in our industry I have seen a lot of projects for numerous clients in many different sectors. And it always struck me. The intersection between development and operations was almost invariably the place where the largest efficiency gains could be realized. Over time this then became the place where I naturally started to focus my attention, initially for consultancy and training, and then later also for public speaking and the product business.

Very interesting! I suspect that the Flyway download rates also confirm the increasing need for devops tooling, right?

I am glad you mentioned it. Yes, Flyway download numbers have been going through the roof. In March we had one download every 42 seconds and we are well on track to break the 1 million mark in 2015! This picture actually tells us a number of things. First of all it confirms that the need for database migrations is a very widespread itch and that Flyway’s no-nonsense pragmatic and lightweight approach to scratch it resonates with the industry. The other thing this tells us, is that our industry as a whole is maturing and continuous delivery and deployment in particular is becoming more mainstream. This increased pace of delivery comes with a pressing need for a higher level of automation. And modern lightweight tools like Flyway are a great fit for this.

Now, you are converting that passion into a product called Boxfuse. Can you tell us a little more about it?

I have always been a strong believer in lightweight and open solutions that are a pleasure to use and minimize lock-in. When it comes to deploying an application to the cloud, so far you had to choose one or the other. You could either have something that is a joy to use (PaaS), but leaving you stuck in a walled garden. Or you could have something open (IaaS) that unfortunately leaves you with a lot of operational concerns to deal with. Boxfuse aims for a fresh new middle ground: the convenience of PaaS with the freedom of IaaS. To achieve that Boxfuse does away with a bunch of cargo cult. No more general purpose operation system, no more traditional provisioning. Instead Boxfuse analyses your application and in just a few seconds, fuses it into a tiny bootable image that is about 1% percent of the size of a regular system. Boxfuse then offers you a sophisticated blue/green deployment process for zero downtime updates. All of this, out of the box and with a single command.

We’re entrepreneurs ourselves. Marketing a product is a long road. What are your own key takeaways from your experience as both a consultant and entrepreneur?

I couldn’t agree more. What marketing really is is competing for attention, and turning that attention into mind share through a series of arguments, both technical and emotional. It is an exercise that is often overlooked in our technical circles, where the code is front and center. But the best technology in the world isn’t worth a dime if you can’t convince someone else to use it. And that takes perseverance, drive, and dedication. What seems easy on the outside of often the fruit of lots of hard labor behind the scenes. Eventually you will succeed, but give it a little time. Overnight success takes years.

Where do you see the future of our industry, from your perspective?

Believe me when I say these are exciting times. While it still feels like a big adventure with great new frontiers to explore, our industry is maturing rapidly. Overly complex tools are going the way of the dodo. User expectations are rising fast. And the only way to meet them is through modern  lightweight tools and services that integrate seamlessly with our workflow and perform the undifferentiated heavy lifting for us. We need to eliminate all unnecessary complexity and so we can focus on what matters most: delighting our customers.

What Exactly are SQL Views?


You probably know about “ordinary views” already, but I’m sure you’ll find one or two things in this article that you haven’t thought about in this way yet…

What exactly are SQL views?

Views in SQL are a means of treating complex queries in the same way as “ordinary” tables. In fact, SQL is all about tables (which are bags of records), much like relational algebra is all about relations (which are sets of tuples).

There are different types of views:

“Ordinary” views

These are most commonly referred to as “views”. Most databases allow for declaring them using this syntax

CREATE VIEW my_view AS 
SELECT col1, col2
FROM my_table
WHERE ...

These stored views are then part of the catalog and can be referenced by name just like tables, which is great for re-use. And what’s even greater, you can grant a different set of privileges to views than to tables, which allows you to implement a complete security layer only using views (e.g. hiding some columns, or rows from certain users)

Some databases (including Oracle, PostgreSQL) even allow for updating them under certain circumstances – mostly when they’re unambiguous, 1-1 mappings of a single table that does not produce any calculations or denormalisations.

Materialized views

Just like the above “ordinary views”, materialized views can be used just like tables. In fact, they are tables as their data is materialized on disk, updated whenever their content is updated. These are useful for frequent, complex queries on rarely updated data.

Just add the MATERIALIZED keyword and you’re set:

CREATE MATERIALIZED VIEW my_view AS 
SELECT col1, col2
FROM my_table
WHERE ...

Among others, Oracle and PostgreSQL support materialized views. Other databases like SQL Server know of “indexed views”, which are a bit less powerful as you have to explicitly “materialise” your view data in indexes.

“Snapshot” views

These aren’t really views, they’re real tables. But in the context of this blog post, you could think of them as a permanently materialized “snapshot” view of your data. You can create such views using different syntaxes:

Most databases, e.g. Oracle

CREATE TABLE my_view AS
SELECT col1, col2
FROM my_table
WHERE ...

Some databases, e.g. SQL Server

SELECT col1, col2
INTO my_view
FROM my_table
WHERE ...

The nice thing about this approach is the fact that like materialized views, these “views” can be very useful for frequent querying – you have to pre-calculate the data only once. But once you’ve calculated that data, you generate a “snapshot” of it, and the data can continue to live independently from your view – just like a snapshot! (don’t forget to add relevant indexes, though)

Note that some databases including DB2 and Oracle support real SQL:2011 standard “snapshots”, such as flashback query in Oracle, or time travel queries in DB2. That’s a different story, though.

Parameterized views

Few people refer to these views as “views”, but if you think about it, that’s what they really are. Table-valued functions are stored procedures that return tables that can then again be used in SQL. For example (using PostgreSQL syntax):

CREATE FUNCTION my_view (arg1 INTEGER, arg2 INTEGER)
RETURNS TABLE (
    col1 INTEGER
    col2 INTEGER
)
AS $$
BEGIN
    RETURN QUERY
    SELECT col1, col2
    FROM my_table
    WHERE v1 = arg2 AND v2 = arg2;
END
$$ LANGUAGE plpgsql;

And then…

SELECT *
FROM my_view (42, 1337)
WHERE ...

That’s quite powerful, isn’t it? Among others, Firebird, HANA, HSQLDB, Oracle, PostgreSQL, SQL Server support table-valued functions.

Common Table Expressions

Like ordinary views, these views are named but they’re scoped only for a single statement – mostly a SELECT statement, although PostgreSQL or SQL Server also allow for common table expressions to be used with other DML statements. These “views” can be written as such:

WITH 
    my_view_a AS (
        SELECT ...
    ),
    my_view_b AS (
        SELECT ...
    )
-- To be consumed immediately by a statement
SELECT *
FROM my_view_a, my_view_b

While common table expressions are very useful for structuring code (they’re like “table variables”), they come with a price in Oracle or PostgreSQL, as the view is most often temporarily materialized, which prevents a lot of SQL transformations in the optimiser. On the flip side, common table expressions can be recursive / hierarchical, which is great for graph / tree traversal.

Derived tables

The most common type of views (although rarely called “views”) are derived tables, i.e. all nested select statements that are put in a FROM clause. E.g.:

SELECT *
FROM (
    SELECT ...
) my_view_a, (
    SELECT ...
) my_view_b

Unlike common table expressions, derived tables cannot be reused easily within a statement, but chances are high that they can be optimised into a different statement that has a higher performance.

Conclusion

SQL is all about tables and recomposition of tables in ad-hoc queries. The most important clause of any SQL statement is the FROM clause. It specifies the set of tuples that you want to recompose, filter, group, project in various ways. As we have seen above, you can feed any such table transformation easily into yet another transformation via one of the above ways to create views.

Curious about more? Read our popular article “10 Easy Steps to a Complete Understanding of SQL

We’re Taking Bets: This Annotation Will Soon Show up in the JDK


This recent Stack Overflow question by Yahor has intrigued me: How to ensure at Java 8 compile time that a method signature “implements” a functional interface. It’s a very good question. Let’s assume the following nominal type:

@FunctionalInterface
interface LongHasher {
    int hash(long x);
}

The type imposes a crystal clear contract. Implementors must provide a single method named hash() taking a long argument, returning a int value. When using lambdas or method references, then the hash() method name is no longer relevant, and the structural type long -> int will be sufficient.

In his question, Yahor wants to enforce the above type upon three static methods (example modified by me):

class LongHashes {

    // OK
    static int xorHash(long x) {
        return (int)(x ^ (x >>> 32));
    }

    // OK
    static int continuingHash(long x) {
        return (int)(x + (x >>> 32));
    }

    // Yikes
    static int randomHash(NotLong x) {
         return xorHash(x * 0x5DEECE66DL + 0xBL);
    }
}

And he would like the Java compiler to complain in the third case, as the randomHash() does not “conform” to LongHasher.

A compilation error is easy to produce, of course, by actually assigning the static methods in their functional notation (method references) to a LongHasher instance:

// OK
LongHasher good = LongHashes::xorHash;
LongHasher alsoGood = LongHashes::continuingHash;

// Yikes
LongHasher ouch = LongHashes::randomHash;

But that’s not as concise as it could / should be. The type constraint should be imposed directly on the static method.

And what’s the Java way of doing that?

With annotations, of course!

I’m going to take bets that the following pattern will show up by JDK 10:

class LongHashes {

    // Compiles
    @ReferenceableAs(LongHasher.class)
    static int xorHash(long x) {
        return (int)(x ^ (x >>> 32));
    }

    // Compiles
    @ReferenceableAs(LongHasher.class)
    static int continuingHash(long x) {
        return (int)(x + (x >>> 32));
    }

    // Doesn't compile
    @ReferenceableAs(LongHasher.class)
    static int randomHash(NotLong x) {
         return xorHash(x * 0x5DEECE66DL + 0xBL);
    }
}

In fact, you could already implement such an annotation today, and write your own annotation processor (or JSR-308 checker) to validate these methods. Looking forward to yet another great annotation!

So, who’s in for the bet that we’ll have this annotation by JDK 10?

Type Safe Queries for JPA’s Native Query API


When you’re using JPA – sometimes – JPQL won’t do the trick and you’ll have to resort to native SQL. From the very beginning, ORMs like Hibernate kept an open “backdoor” for these cases and offered a similar API to Spring’s JdbcTemplate, to Apache DbUtils, or to jOOQ for plain SQL. This is useful as you can continue using your ORM as your single point of entry for database interaction.

However, writing complex, dynamic SQL using string concatenation is tedious and error-prone, and an open door for SQL injection vulnerabilities. Using a type safe API like jOOQ would be very useful, but you may find it hard to maintain two different connection, transaction, session models within the same application just for 10-15 native queries.

But the truth is:

You an use jOOQ for your JPA native queries!

That’s true! There are several ways to achieve this.

Fetching tuples (i.e. Object[])

The simplest way will not make use of any of JPA’s advanced features and simply fetch tuples in JPA’s native Object[] form for you. Assuming this simple utility method:

public static List<Object[]> nativeQuery(
    EntityManager em, 
    org.jooq.Query query
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(query.getSQL());

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    return result.getResultList();
}

Using the API

This is all you need to bridge the two APIs in their simplest form to run “complex” queries via an EntityManager:

List<Object[]> books =
nativeQuery(em, DSL.using(configuration)
    .select(
        AUTHOR.FIRST_NAME, 
        AUTHOR.LAST_NAME, 
        BOOK.TITLE
    )
    .from(AUTHOR)
    .join(BOOK)
        .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
    .orderBy(BOOK.ID));

books.forEach((Object[] book) -> 
    System.out.println(book[0] + " " + 
                       book[1] + " wrote " + 
                       book[2]));

Agreed, not a lot of type safety in the results – as we’re only getting an Object[]. We’re looking forward to a future Java that supports tuple (or even record) types like Scala or Ceylon.

So a better solution might be the following:

Fetching entities

Let’s assume you have the following, very simple entities:

@Entity
@Table(name = "book")
public class Book {

    @Id
    public int id;

    @Column(name = "title")
    public String title;

    @ManyToOne
    public Author author;
}

@Entity
@Table(name = "author")
public class Author {

    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    @OneToMany(mappedBy = "author")
    public Set<Book> books;
}

And let’s assume, we’ll add an additional utility method that also passes a Class reference to the EntityManager:

public static <E> List<E> nativeQuery(
    EntityManager em, 
    org.jooq.Query query,
    Class<E> type
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(
        query.getSQL(), type);

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    // There's an unsafe cast here, but we can be sure
    // that we'll get the right type from JPA
    return result.getResultList();
}

Using the API

This is now rather slick, just put your jOOQ query into that API and get JPA entities back from it – the best of both worlds, as you can easily add/remove nested collections from the fetched entities as if you had fetched them via JPQL:

List<Author> authors =
nativeQuery(em,
    DSL.using(configuration)
       .select()
       .from(AUTHOR)
       .orderBy(AUTHOR.ID)
, Author.class); // This is our entity class here

authors.forEach(author -> {
    System.out.println(author.firstName + " " + 
                       author.lastName + " wrote");
    
    books.forEach(book -> {
        System.out.println("  " + book.title);

        // Manipulate the entities here. Your
        // changes will be persistent!
    });
});

Fetching EntityResults

If you’re extra-daring and have a strange affection for annotations, or you just want to crack a joke for your coworkers just before you leave on vacation, you can also resort to using JPA’s javax.persistence.SqlResultSetMapping. Imagine the following mapping declaration:

@SqlResultSetMapping(
    name = "bookmapping",
    entities = {
        @EntityResult(
            entityClass = Book.class,
            fields = {
                @FieldResult(name = "id", column = "b_id"),
                @FieldResult(name = "title", column = "b_title"),
                @FieldResult(name = "author", column = "b_author_id")
            }
        ),
        @EntityResult(
            entityClass = Author.class,
            fields = {
                @FieldResult(name = "id", column = "a_id"),
                @FieldResult(name = "firstName", column = "a_first_name"),
                @FieldResult(name = "lastName", column = "a_last_name")
            }
        )
    }
)

Essentially, the above declaration maps database columns (@SqlResultSetMapping -> entities -> @EntityResult -> fields -> @FieldResult -> column) onto entities and their corresponding attributes. With this powerful technique, you can generate entity results from any sort of SQL query result.

Again, we’ll be creating a small little utility method:

public static <E> List<E> nativeQuery(
    EntityManager em, 
    org.jooq.Query query,
    String resultSetMapping
) {

    // Extract the SQL statement from the jOOQ query:
    Query result = em.createNativeQuery(
        query.getSQL(), resultSetMapping);

    // Extract the bind values from the jOOQ query:
    List<Object> values = query.getBindValues();
    for (int i = 0; i < values.size(); i++) {
        result.setParameter(i + 1, values.get(i));
    }

    // This implicit cast is a lie, but let's risk it
    return result.getResultList();
}

Note that the above API makes use of an anti-pattern, which is OK in this case, because JPA is not a type safe API in the first place.

Using the API

Now, again, you can pass your type safe jOOQ query to the EntityManager via the above API, passing the name of the SqlResultSetMapping along like so:

List<Object[]> result =
nativeQuery(em,
    DSL.using(configuration
       .select(
           AUTHOR.ID.as("a_id"),
           AUTHOR.FIRST_NAME.as("a_first_name"),
           AUTHOR.LAST_NAME.as("a_last_name"),
           BOOK.ID.as("b_id"),
           BOOK.AUTHOR_ID.as("b_author_id"),
           BOOK.TITLE.as("b_title")
       )
       .from(AUTHOR)
       .join(BOOK).on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
       .orderBy(BOOK.ID)), 
    "bookmapping" // The name of the SqlResultSetMapping
);

result.forEach((Object[] entities) -> {
    JPAAuthor author = (JPAAuthor) entities[1];
    JPABook book = (JPABook) entities[0];

    System.out.println(author.firstName + " " + 
                       author.lastName + " wrote " + 
                       book.title);
});

The result in this case is again an Object[], but this time, the Object[] doesn’t represent a tuple with individual columns, but it represents the entities as declared by the SqlResultSetMapping annotation.

This approach is intriguing and probably has its use when you need to map arbitrary results from queries, but still want managed entities. We can only recommend Thorben Janssen‘s interesting blog series about these advanced JPA features, if you want to know more:

Conclusion

Choosing between an ORM and SQL (or between Hibernate and jOOQ, in particular) isn’t always easy.

  • ORMs shine when it comes to applying object graph persistence, i.e. when you have a lot of complex CRUD, involving complex locking and transaction strategies.
  • SQL shines when it comes to running bulk SQL, both for read and write operations, when running analytics, reporting.

When you’re “lucky” (as in – the job is easy), your application is only on one side of the fence, and you can make a choice between ORM and SQL. When you’re “lucky” (as in – ooooh, this is an interesting problem), you will have to use both. (See also Mike Hadlow’s interesting article on the subject)

The message here is: You can! Using JPA’s native query API, you can run complex queries leveraging the full power of your RDBMS, and still map results to JPA entities. You’re not restricted to using JPQL.

Side-note

While we’ve been critical with some aspects of JPA in the past (read How JPA 2.1 has become the new EJB 2.0 for details), our criticism has been mainly focused on JPA’s (ab-)use of annotations. When you’re using a type safe API like jOOQ, you can provide the compiler with all the required type information easily to construct results. We’re convinced that a future version of JPA will engage more heavily in using Java’s type system, allowing a more fluent integration of SQL, JPQL, and entity persistence.

Do Not Make This Mistake When Developing an SPI


Most of your code is private, internal, proprietary, and will never be exposed to public. If that’s the case, you can relax – you can refactor all of your mistakes, including those that incur breaking API changes.

If you’re maintining public API, however, that’s not the case. If you’re maintaining public SPI (Service Provider Interfaces), then things get even worse.

The H2 Trigger SPI

In a recent Stack Overflow question about how to implement an H2 database trigger with jOOQ, I have encountered the org.h2.api.Trigger SPI again – a simple and easy-to-implement SPI that implements trigger semantics. Here’s how triggers work in the H2 database:

Use the trigger

CREATE TRIGGER my_trigger
BEFORE UPDATE
ON my_table
FOR EACH ROW
CALL "com.example.MyTrigger"

Implement the trigger

public class MyTrigger implements Trigger {

    @Override
    public void init(
        Connection conn, 
        String schemaName,
        String triggerName, 
        String tableName, 
        boolean before, 
        int type
    )
    throws SQLException {}

    @Override
    public void fire(
        Connection conn, 
        Object[] oldRow, 
        Object[] newRow
    )
    throws SQLException {
        // Using jOOQ inside of the trigger, of course
        DSL.using(conn)
           .insertInto(LOG, LOG.FIELD1, LOG.FIELD2, ..)
           .values(newRow[0], newRow[1], ..)
           .execute();
    }

    @Override
    public void close() throws SQLException {}

    @Override
    public void remove() throws SQLException {}
}

The whole H2 Trigger SPI is actually rather elegant, and usually you only need to implement the fire() method.

So, how is this SPI wrong?

It is wrong very subtly. Consider the init() method. It has a boolean flag to indicate whether the trigger should fire before or after the triggering event, i.e. the UPDATE. What if suddenly, H2 were to also support INSTEAD OF triggers? Ideally, this flag would then be replaced by an enum:

public enum TriggerTiming {
    BEFORE,
    AFTER,
    INSTEAD_OF
}

But we can’t simply introduce this new enum type because the init() method shouldn’t be changed incompatibly, breaking all implementing code! With Java 8, we could at least declare an overload like this:

    default void init(
        Connection conn, 
        String schemaName,
        String triggerName, 
        String tableName, 
        TriggerTiming timing, 
        int type
    )
    throws SQLException {
        // New feature isn't supported by default
        if (timing == INSTEAD_OF)
            throw new SQLFeatureNotSupportedException();

        // Call through to old feature by default
        init(conn, schemaName, triggerName,
             tableName, timing == BEFORE, type);
    }

This would allow new implementations to handle INSTEAD_OF triggers while old implementations would still work. But it feels hairy, doesn’t it?

Now, imagine, we’d also support ENABLE / DISABLE clauses and we want to pass those values to the init() method. Or maybe, we want to handle FOR EACH ROW. There’s currently no way to do that with this SPI. So we’re going to get more and more of these overloads, which are very hard to implement. And effectively, this has happened already, as there is also org.h2.tools.TriggerAdapter, which is redundant with (but subtly different from) Trigger.

What would be a better approach, then?

The ideal approach for an SPI provider is to provide “argument objects”, like these:

public interface Trigger {
    default void init(InitArguments args)
        throws SQLException {}
    default void fire(FireArguments args)
        throws SQLException {}
    default void close(CloseArguments args)
        throws SQLException {}
    default void remove(RemoveArguments args)
        throws SQLException {}

    final class InitArguments {
        public Connection connection() { ... }
        public String schemaName() { ... }
        public String triggerName() { ... }
        public String tableName() { ... }
        /** use #timing() instead */
        @Deprecated
        public boolean before() { ... }
        public TriggerTiming timing() { ... }
        public int type() { ... }
    }

    final class FireArguments {
        public Connection connection() { ... }
        public Object[] oldRow() { ... }
        public Object[] newRow() { ... }
    }

    // These currently don't have any properties
    final class CloseArguments {}
    final class RemoveArguments {}
}

As you can see in the above example, Trigger.InitArguments has been successfully evolved with appropriate deprecation warnings. No client code was broken, and the new functionality is ready to be used, if needed. Also, close() and remove() are ready for future evolutions, even if we don’t need any arguments yet.

The overhead of this solution is at most one object allocation per method call, which shouldn’t hurt too much.

Another example: Hibernate’s UserType

Unfortunately, this mistake happens way too often. Another prominent example is Hibernate’s hard-to-implement org.hibernate.usertype.UserType SPI:

public interface UserType {
    int[] sqlTypes();
    Class returnedClass();
    boolean equals(Object x, Object y);
    int hashCode(Object x);

    Object nullSafeGet(
        ResultSet rs, 
        String[] names, 
        SessionImplementor session, 
        Object owner
    ) throws SQLException;

    void nullSafeSet(
        PreparedStatement st, 
        Object value, 
        int index, 
        SessionImplementor session
    ) throws SQLException;

    Object deepCopy(Object value);
    boolean isMutable();
    Serializable disassemble(Object value);
    Object assemble(
        Serializable cached, 
        Object owner
    );
    Object replace(
        Object original, 
        Object target, 
        Object owner
    );
}

The SPI looks rather difficult to implement. Probably, you can get something working rather quickly, but will you feel at ease? Will you think that you got it right? Some examples:

  • Is there never a case where you need the owner reference also in nullSafeSet()?
  • What if your JDBC driver doesn’t support fetching values by name from ResultSet?
  • What if you need to use your user type in a CallableStatement for a stored procedure?

Another important aspect of such SPIs is the way implementors can provide values back to the framework. It is generally a bad idea to have non-void methods in SPIs as you will never be able to change the return type of a method again. Ideally, you should have argument types that accept “outcomes”. A lot of the above methods could be replaced by a single configuration() method like this:

public interface UserType {
    default void configure(ConfigureArgs args) {}

    final class ConfigureArgs {
        public void sqlTypes(int[] types) { ... }
        public void returnedClass(Class<?> clazz) { ... }
        public void mutable(boolean mutable) { ... }
    }

    // ...
}

Another example, a SAX ContentHandler

Have a look at this example here:

public interface ContentHandler {
    void setDocumentLocator (Locator locator);
    void startDocument ();
    void endDocument();
    void startPrefixMapping (String prefix, String uri);
    void endPrefixMapping (String prefix);
    void startElement (String uri, String localName,
                       String qName, Attributes atts);
    void endElement (String uri, String localName,
                     String qName);
    void characters (char ch[], int start, int length);
    void ignorableWhitespace (char ch[], int start, int length);
    void processingInstruction (String target, String data);
    void skippedEntity (String name);
}

Some examples for drawbacks of this SPI:

  • What if you need the attributes of an element at the endElement() event? You’ll have to remember them yourself.
  • What if you’d like to know the prefix mapping uri at the endPrefixMapping() event? Or at any other event?

Clearly, SAX was optimised for speed, and it was optimised for speed at a time when the JIT and the GC were still weak. Nonetheless, implementing a SAX handler is not trivial. Parts of this is due to the SPI being hard to implement.

We don’t know the future

As API or SPI providers, we simply do not know the future. Right now, we may think that a given SPI is sufficient, but we’ll break it already in the next minor release. Or we don’t break it and tell our users that we cannot implement these new features.

With the above tricks, we can continue evolving our SPI without incurring any breaking changes:

  • Always pass exactly one argument object to the methods.
  • Always return void. Let implementors interact with SPI state via the argument object.
  • Use Java 8’s default methods, or provide an “empty” default implementation.

Did you enjoy this read? You might also enjoy:

Follow

Get every new post delivered to your Inbox.

Join 2,862 other followers

%d bloggers like this: