Java 8’s Method References Put Further Restrictions on Overloading


Method overloading has always been a topic with mixed feelings. We’ve blogged about it and the caveats that it introduces a couple of times:

There are two main reasons why overloading is useful:

  1. To allow for defaulted arguments
  2. To allow for disjunct argument type alternatives

Bot reasons are motivated simply to provide convenience for API consumers. Good examples are easy to find in the JDK:

Defaulted arguments

public class Integer {
    public static int parseInt(String s) {
        return parseInt(s,10);
    }

    public static int parseInt(String s, int radix) {}
}

In the above example, the first parseInt() method is simply a convenience method for calling the second one with the most commonly used radix.

Disjunct argument type alternatives

Sometimes, similar behaviour can be achieved using different types of parameters, which mean similar things but which are not compatible in Java’s type system. For example when constructing a String:

public class String {
    public static String valueOf(char c) {
        char data[] = {c};
        return new String(data, true);
    }

    public static String valueOf(boolean b) {
        return b ? "true" : "false";
    }

    // and many more...
}

As you can see, the behaviour of the same method is optimised depending on the argument type. This does not affect the “feel” of the method when reading or writing source code as the semantics of the two valueOf() methods is the same.

Another use-case for this technique is when commonly used, similar but incompatible types need convenient conversion between each other. As an API designer, you don’t want to make your API consumer goof around with such tedious conversions. Instead, you offer:

public class IOUtils {
    public static void copy(InputStream input, OutputStream output);
    public static void copy(InputStream input, Writer output);
    public static void copy(InputStream input, Writer output, String encoding);
    public static void copy(InputStream input, Writer output, Charset encoding);
}

This is a nice example showing both defaulted parameters (optional encoding) as well as argument type alternatives (OutputStream vs. Writer or String vs. Charset encoding representation.

Side-note

I suspect that the union type and defaulted argument ships have sailed for Java a long time ago – while union types might be implemented as syntax sugar, defaulted arguments would be a beast to introduce into the JVM as it would depend on the JVM’s missing support for named arguments.

As displayed by the Ceylon language, these two features cover about 99% of all method overloading use-cases, which is why Ceylon can do completely without overloading – on top of the JVM!

Overloading is dangerous and unnececssary

The above examples show that overloading is essentially just a means to help humans interact with an API. For the runtime, there is no such thing as overloading. There are only different, unique method signatures to which calls are linked “statically” in byte code (give or take more recent opcodes like invokedynamic). But the point is, there’s no difference for the computer if the above methods are all called copy(), or if they had been called unambiguously m1(), m2(), m3(), and m4().

On the other hand, overloading is real in Java source code, and the compiler has to do a lot of work to find the most specific method, and otherwise apply the JLS’s complex overload resolution algorithm. Things get worse with each new Java language version. In Java 8, for instance, method references will add additional pain to API consumers, and require additional care from API designers. Consider the following example by Josh Bloch:

You can copy-paste the above code into Eclipse to verify the compilation error (note that not-up-to-date compilers may report type inference side-effects instead of the actual error). The compilation error reported by Eclipse for the following simplification:

static void pfc(List<Integer> x) {
    Stream<?> s = x.stream().map(Integer::toString);
}

… is

Ambiguous method reference: both toString() and 
toString(int) from the type Integer are eligible

Oops!

The above expression is ambiguous. It can mean any of the following two expressions:

// Instance method:
x.stream().map(i -> i.toString());

// Static method:
x.stream().map(i -> Integer.toString(i));

As can be seen, the ambiguity is immediately resolved by using lambda expressions rather than method references. Another way to resolve this ambiguity (towards the instance method) would be to use the super-type declaration of toString() instead, which is no longer ambiguous:

// Instance method:
x.stream().map(Object::toString);

Conclusion

The conclusion here for API designers is very clear:

Method overloading has become an even more dangerous tool for API designers since Java 8

While the above isn’t really “severe”, API consumers will waste a lot of time overcoming this cognitive friction when their compilers reject seemingly correct code. One big faux-pas that is a takeaway from this example is to:

Never mix similar instance and static method overloads

And in fact, this amplifies when your static method overload overloads a name from java.lang.Object, as we’ve explained in a previous blog post.

There’s a simple reason for the above rule. Because there are only two valid reasons for overloading (defaulted parameters and incompatible parameter alternatives), there is no point in providing a static overload for a method in the same class. A much better design (as exposed by the JDK) is to have “companion classes” – similar to Scala’s companion objects. For instance:

// Instance logic
public interface Collection<E> {}
public class Object {}

// Utilities
public class Collections {}
public final class Objects {}

By changing the namespace for methods, overloading has been circumvented somewhat elegantly, and the previous problems would not have appeared.

TL;DR: Avoid overloading unless the added convenience really adds value!

NULL is Not The Billion Dollar Mistake. A Counter-Rant


A short while ago, I gave this answer on Quora. The question was “What is the significance of NULL in SQL?” and most of the existing answers went on about citing C.J. Date or Tony Hoare and unanimously declared NULL as “evil”.

So, everyone rants about NULL all the time. Let me counter-rant.

Academics

Of course, academics like C.J. Date will rant about NULL (see Greg Kemnitz’s interesting answer on Quora). Let me remind you that C.J. Date also ranted about UNION ALL, as pure relational theory operates only on sets, not on bags (like SQL does). While in theory, sets are probably much purer than bags, in practice, bags are just very useful.

These people probably also still mourn over the fact that SQL (useful) won over QUEL (pure), and I don’t blame them. Theory is always more beautiful than the real world, which is exposed to real world requirements.

Purists

There are also other kinds of purists who will run about and educate everyone about their black/white opinions that leave no room to “it depends…” pragmatic approaches. I like to display this witty comic strip for such occasions: New intern knows best: GOTO. Purists like extreme abstraction when they describe their world, and such abstraction asks for very simple models, no complexity. NULL adds tremendous complexity to the SQL “model”, and does thus not fit their view.

Fact is: It depends

The only factual opinion ever is one where there’s no clear opinion. NULL is an incredibly useful value, and some representation of NULL is inevitable in all languages / models that want to model cardinalities of the form:

  • 0 or 1 (here’s where NULL is useful)
  • exactly 1 (here, you don’t need NULL)
  • 0 .. many (here, you don’t need NULL)

Functional programming languages like to make use of the Optional “monad” (see Mario Fusco’s excellent explanation of what a monad is) to model the 0 or 1 cardinality, but that’s just another way of modelling NULL. The (possibly) absent value. Perhaps, if you like to discuss style (then you should read this), NULL vs. Optional may matter to you, but they’re really exactly the same thing. We’ve just been shifting whitespace and curly braces.

The only way to really do without the absent value would be to disallow the optional cardinality and use 0 .. many instead, which would be much less descriptive.

So, regardless of what purists or academics say about a perfect world, we engineers need potent tools that help us get our work done, and NULL (or “Optional” is one of these potent tools that allow us to do so.

Caveat: SQL NULL is not an absent value

Now, the caveat with SQL’s NULL is that it doesn’t behave like an absent value. It is the UNKNOWN value as others have also explained. This subtle difference has severe impact on a variety of operations and predicates, which do not behave very intuitively if you’re not aware of this distinction. Some examples (and there are many many more):

Even with this specification of SQL NULL being UNKNOWN, most people abuse SQL NULL to model the absent value instead, which works just nicely in most cases until you run into a caveat. It turns out that the UNKNOWN value is even more useful than the absent value, as it allows for modelling things with even more descriptiveness. One might think that having two “special” values would solve problems, like JavaScript, which distinguishes between null (UNKNOWN) and undefined (absent).

JavaScript itself is a beacon of usefulness that is inversely proportional to its purity or beauty, so long story short:

Pick your favourite spot on the useful <-> pure scale

Programming, languages, data models are always a tradeoff between purity and usefulness. Pick your favourite spot on that scale, but stop ranting about NULL being evil. Or as Simon Peyton Jones said:

Haskell is useless

What the sun.misc.Unsafe Misery Teaches Us


Oracle will remove the internal sun.misc.Unsafe class in Java 9. While most people are probably rather indifferent regarding this change, some other people – mostly library developers – are not. There had been a couple of recent articles in the blogosphere painting a dark picture of what this change will imply:

Maintaining a public API is extremely difficult, especially when the API is as popular as that of the JDK. There is simply (almost) no way to keep people from shooting themselves in the foot. Oracle (and previously Sun) have always declared the sun.* packages as internal and not to be used. Citing from the page called “Why Developers Should Not Write Programs That Call ‘sun’ Packages”:

The sun.* packages are not part of the supported, public interface.

A Java program that directly calls into sun.* packages is not guaranteed to work on all Java-compatible platforms. In fact, such a program is not guaranteed to work even in future versions on the same platform.

This disclaimer is just one out of many similar disclaimers and warnings. Whoever goes ahead and uses Unsafe does so … “unsafely“.

What do we learn from this?

The concrete solution to solving this misery is being discussed and still open. A good idea would be to provide a formal and public replacement before removing Unsafe, in order to allow for migration paths of the offending libraries.

But there’s a more important message to all of this. The message is:

When all you have is a hammer, every problem looks like a thumb

Translated to this situation: The hammer is Unsafe and given that it’s a very poor hammer, but the only option, well, library developers might just not have had much of a choice. They’re not really to blame. In fact, they took a gamble in one of the world’s most stable and backwards compatible software environments (= Java) and they fared extremely well for more than 10 years. Would you have made a different choice in a similar situation? Or, let me ask differently. Was betting on AWT or Swing a much safer choice at the time?

If something can somehow be used by someone, then it will be, no matter how obviously they’re gonna shoot themselves in the foot. The only way to currently write a library / API and really prevent users from accessing internals is to put everything in a single package and make everything package-private. This is what we’ve been doing in jOOQ from the beginning, knowing that jOOQ’s internals are extremely delicate and subject to change all the time.

For more details about this rationale, read also:

However, this solution has a severe drawback for those developing those internals. It’s a hell of a package with almost no structure. That makes development rather difficult.

What would be a better Java, then?

Java has always had an insufficient set of visibilities:

  • public
  • protected
  • default (package-private)
  • private

There should be a fifth visibility that behaves like public but prevents access from “outside” of a module. In a way, that’s between the existing public and default visibilities. Let’s call this the hypothetical module visibility.

In fact, not only should we be able to declare this visibility on a class or member, we should be able to govern module inter-dependencies on a top level, just like the Ceylon language allows us to do:

module org.hibernate "3.0.0.beta" {
    import ceylon.collection "1.0.0";
    import java.base "7";
    shared import java.jdbc "7";
}

This reads very similar to OSGi’s bundle system, where bundles can be imported / exported, although the above module syntax is much much simpler than configuring OSGi.

A sophisticated module system would go even further. Not only would it match OSGi’s features, it would also match those of Maven. With the possibility of declaring dependencies on a Java language module basis, we might no longer need the XML-based Maven descriptors, as those could be generated from a simple module syntax (or Gradle, or ant/ivy).

And with all of this in place, classes like sun.misc.Unsafe could be declared as module-visible for only a few JDK modules – not the whole world. I’m sure the number of people abusing reflection to get a hold of those internals would decrease by 50%.

Conclusion

I do hope that in a future Java, this Ceylon language feature (and also Fantom language feature, btw) will be incorporated into the Java language. A nice overview of Java 9 / Jigsaw’s modular encapsulation can be seen in this blog post:
http://blog.codefx.org/java/dev/features-project-jigsaw-java-9/#Encapsulation

Until then, if you’re an API designer, do know that all disclaimers won’t work. Your internal APIs will be used and abused by your clients. They’re part of your ordinary public API from day 1 after you publish them. It’s not your user’s fault. That’s how things work.

What’s Even Harder Than Dates and Timezones? Dates and Timezones in SQL / JDBC!


There was an interesting discussion recently on the jOOQ mailing list about jOOQ’s current lack of out-of-the-box support for TIMESTAMP WITH TIME ZONE data types.

No one said that date, time and timezones are easy! There’s an amusing piece here, which I recommend reading:
Falsehoods programmers believe about time

And when that’s not enough, read also:
More falsehoods programmers believe about time

I personally like the bit about programmers erroneously believing that “Unix time is the number of seconds since Jan 1st 1970.”… unix time doesn’t have a way to represent leap seconds ;)

Back to JDBC

Here’s an interesting Stack Overflow answer by Mark Rotteveel, the Jaybird developer (Firebird JDBC driver):
Is java.sql.Timestamp timezone specific?

Mark’s explanation can be observed as follows (I’m using PostgreSQL here):

Connection c = getConnection();
Calendar utc = Calendar.getInstance(
    TimeZone.getTimeZone("UTC"));

try (PreparedStatement ps = c.prepareStatement(
    "select"
  + "  ?::timestamp,"
  + "  ?::timestamp,"
  + "  ?::timestamp with time zone,"
  + "  ?::timestamp with time zone"
)) {

    ps.setTimestamp(1, new Timestamp(0));
    ps.setTimestamp(2, new Timestamp(0), utc);
    ps.setTimestamp(3, new Timestamp(0));
    ps.setTimestamp(4, new Timestamp(0), utc);

    try (ResultSet rs = ps.executeQuery()) {
        rs.next();

        System.out.println(rs.getTimestamp(1) 
                 + " / " + rs.getTimestamp(1).getTime());
        System.out.println(rs.getTimestamp(2, utc)
                 + " / " + rs.getTimestamp(2, utc).getTime());
        System.out.println(rs.getTimestamp(3) 
                 + " / " + rs.getTimestamp(3).getTime());
        System.out.println(rs.getTimestamp(4, utc)
                 + " / " + rs.getTimestamp(4, utc).getTime());
    }
}

The above program uses all permutations of using timezones and not using timezones in Java and in the DB, and the output is always the same:

1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0

As you can see, in each case, the UTC timestamp 0 was correctly stored and retrieved from the database. My own locale is Switzerland, thus CET / CEST, which was UTC+1 at Epoch, which is what is getting output on Timestamp.toString().

Things get interesting when you use timestamp literals, both in SQL and/or in Java. If you replace the bind variables as such:

Timestamp almostEpoch = Timestamp.valueOf("1970-01-01 00:00:00");

ps.setTimestamp(1, almostEpoch);
ps.setTimestamp(2, almostEpoch, utc);
ps.setTimestamp(3, almostEpoch);
ps.setTimestamp(4, almostEpoch, utc);

This is what I’m getting on my machine, again in CET / CEST

1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000
1970-01-01 00:00:00.0 / -3600000

I.e. not Epoch, but the timestamp literal that I sent to the server in the first place. Observe that the four combinations of binding / fetching still always produce the same timestamp.

Let’s see what happens if the session writing to the database uses a different timezone (let’s assume you’re in PST) than the session fetching from the database (I’m using again CET or UTC). I’m running this program:

Calendar utc = Calendar.getInstance(
    TimeZone.getTimeZone("UTC"));

Calendar pst = Calendar.getInstance(
    TimeZone.getTimeZone("PST"));

try (PreparedStatement ps = c.prepareStatement(
    "select"
  + "  ?::timestamp,"
  + "  ?::timestamp,"
  + "  ?::timestamp with time zone,"
  + "  ?::timestamp with time zone"
)) {

    ps.setTimestamp(1, new Timestamp(0), pst);
    ps.setTimestamp(2, new Timestamp(0), pst);
    ps.setTimestamp(3, new Timestamp(0), pst);
    ps.setTimestamp(4, new Timestamp(0), pst);

    try (ResultSet rs = ps.executeQuery()) {
        rs.next();

        System.out.println(rs.getTimestamp(1)
                 + " / " + rs.getTimestamp(1).getTime());
        System.out.println(rs.getTimestamp(2, utc)
                 + " / " + rs.getTimestamp(2, utc).getTime());
        System.out.println(rs.getTimestamp(3)
                 + " / " + rs.getTimestamp(3).getTime());
        System.out.println(rs.getTimestamp(4, utc)
                 + " / " + rs.getTimestamp(4, utc).getTime());
    }
}

It yields this output:

1969-12-31 16:00:00.0 / -32400000
1969-12-31 17:00:00.0 / -28800000
1970-01-01 01:00:00.0 / 0
1970-01-01 01:00:00.0 / 0

The first timestamp was Epoch stored as PST (16:00), then the timezone information was removed by the database, which turned Epoch into the local time you had at Epoch (-28800 seconds / -8h), and that’s the information that is really stored.

Now, when I’m fetching this time from my own timezone CET, I will still want to get the local time (16:00). But in my timezone, this is no longer -28800 seconds, but -32400 seconds (-9h). Quirky enough?

Things go the other way round when I’m fetching the stored local time (16:00), but I’m forcing the fetch to happen in UTC, which will produce the timestamp that you’ve stored, originally in PST (-28800 seconds). But when printing this timestamp (-28800 seconds) in my timezone CET, this will now be 17:00.

When we use the TIMESTAMP WITH TIME ZONE data type in the database, the timezone is maintained (PST), and when I fetch the Timestamp value, no matter if using CET or UTC, I will still get Epoch, which was safely stored to the database, printed out as 01:00 in CET.

Whew.

TL;DR:

When using jOOQ’s, if the correct UTC timestamp matters to you, use TIMESTAMP WITH TIMEZONE, but you’ll have to implement your own data type Binding, because jOOQ currently doesn’t support that data type. Once you use your own data type Binding, you can also use Java 8’s time API, which better represents these different types than java.sql.Timestamp + the ugly Calendar.

If the local time matters to you, or if you’re not operating across time zones, you’re fine using TIMESTAMP and jOOQ’s Field<Timestamp>.

Lucky you, if you’re like me, operating in a very small country with a single time zone where most local software just doesn’t run into this issue.

How to Debug Your Maven Build with Eclipse


When running a Maven build with many plugins (e.g. the jOOQ or Flyway plugins), you may want to have a closer look under the hood to see what’s going on internally in those plugins, or in your extensions of those plugins. This may not appear obvious when you’re running Maven from the command line, e.g. via:

C:\Users\jOOQ\workspace>mvn clean install

Luckily, it is rather easy to debug Maven. In order to do so, just create the following batch file on Windows:

@ECHO OFF

IF "%1" == "off" (
    SET MAVEN_OPTS=
) ELSE (
    SET MAVEN_OPTS=-Xdebug -Xnoagent -Djava.compile=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005
)

Of course, you can do the same also on a MacOS X or Linux box, by using export intead of SET.

Now, run the above batch file and proceed again with building:

C:\Users\jOOQ\workspace>mvn_debug

C:\Users\jOOQ\workspace>mvn clean install
Listening for transport dt_socket at address: 5005

Your Maven build will now wait for a debugger client to connect to your JVM on port 5005 (change to any other suitable port). We’ll do that now with Eclipse. Just add a new Remote Java Application that connects on a socket, and hit “Debug”:

port-5005

That’s it. We can now set breakpoints and debug through our Maven process like through any other similar kind of server process. Of course, things work exactly the same way with IntelliJ or NetBeans.

Once you’re done debugging your Maven process, simply call the batch again with parameter off:

C:\Users\jOOQ\workspace>mvn_debug off

C:\Users\jOOQ\workspace>mvn clean install

And your Maven builds will no longer be debugged.

Happy debugging!

Implementing Client-Side Row-Level Security with jOOQ


Some time ago, we’ve promised to follow up on our Constraints on Views article with a sequel showing how to implement client-side row-level security with jOOQ.

What is row-level security?

Some databases like Oracle or the upcoming PostgreSQL 9.5 provide native support for row-level security, which is awesome – but not every database has this feature.

Row level security essentially means that a given database session can access only some rows in the database, but not others. For instance, this is what John can see in his session:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
3   John           secret poker stash  85193065.00

In the above example, assume that John and Jane are a married couple with access to each others’ bank accounts, except that John wants to hide his secret poker stash from Jane because he’s planning on running away with Jill to the Bahamas living the good life. So, the above would model John’s view on the data set, whereas the below would model Jane’s:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00

The nice thing about row-level security is the fact that the data is completely hidden from a database session, as if it weren’t even there. This includes aggregations or filtering, as can be seen in the below examples of John’s view:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
3   John           secret poker stash  85193065.00
--------------------------------------------------
Total John+Jane                        85194865.00

vs. Jane’s view:

id  account_owner  account_name             amount
--------------------------------------------------
1   John           savings                  500.00
2   Jane           savings                 1300.00
--------------------------------------------------
Total John+Jane                            1800.00

If your database doesn’t support row-level security, you will need to emulate it somehow. One way to do that is by using views, and possibly, temporary tables or some context variables:

CREATE VIEW my_accounts AS
SELECT
  a.id,
  a.account_owner,
  a.account_name,
  a.amount
FROM
  accounts a
JOIN
  account_privileges p
ON
  a.id = p.a_id
WHERE
  p.privilege_owner = SYS_CONTEXT('banking', 'user');

In the above example, I’m using Oracle’s SYS_CONTEXT function, which allows accessing global context from the current session. This context could be initialised every time a JDBC Connection is fetched from the connection pool, for instance.

What if your database doesn’t support these things?

jOOQ to the rescue! Since jOOQ 3.2, a VisitListener Service Provider Interface (SPI) has been introduced for precisely this reason, which allows jOOQ users to perform SQL AST transformations while the SQL statement is being generated.

We’re assuming:

CREATE TABLE accounts (
  id BIGINT NOT NULL PRIMARY KEY,
  account_owner VARCHAR(20) NOT NULL,
  account_name VARCHAR(20) NOT NULL,
  amount DECIMAL(18, 2) NOT NULL
);

CREATE TABLE transactions (
  id BIGINT NOT NULL PRIMARY KEY,
  account_id BIGINT NOT NULL,
  amount DECIMAL(18, 2) NOT NULL
);

INSERT INTO accounts (
  account_owner, account_name, amount
)
VALUES (1, 'John', 'savings', 500.0),
       (2, 'Jane', 'savings', 1300.0),
       (3, 'John', 'secret poker stash', 85193065.00);

INSERT INTO transactions (
  id, account_id, amount
)
VALUES (1, 1, 200.0),
       (2, 1, 300.0),
       (3, 2, 300.0),
       (4, 2, 800.0),
       (5, 2, 200.0),
       (6, 3, 85193065.00);

Adding a simple WHERE clause

This is the simplest transformation use-case:

-- turn this...
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts

-- ... into this
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.id IN (?, ?) -- Predicate, for simplicity

Adding an AND clause to an existing WHERE clause

The transformation should still work, if there is already an existing predicate

-- turn this...
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.account_owner = 'John'

-- ... into this
SELECT 
  accounts.account_name, 
  accounts.amount
FROM 
  accounts
WHERE 
  accounts.account_owner = 'John'
AND 
  accounts.id IN (?, ?)

Adding the predicate also for aliased tables

When doing self-joins or for whatever other reason you might have applied an alias

-- turn this...
SELECT 
  a.account_name, 
  a.amount
FROM 
  accounts a

-- ... into this
SELECT 
  a.account_name, 
  a.amount
FROM 
  accounts a
WHERE 
  a.id IN (?, ?)

Adding the predicate also in subqueries / joins / semi-joins

Of course, we shouldn’t restrict ourselves to patching top-level SELECT statements. The following transformation must be applied as well:

-- turn this...
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
  )

-- ... into this
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
    AND
      a.id IN (?, ?)
  )

Adding the predicate to foreign key references

This might be easily forgotten, but in fact, we also want to add an additional predicate to all foreign key references of accounts.id, namely transactions.account_id, especially when the transactions table is not joined to the accounts table:

-- turn this...
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t

-- ... into this
SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (?, ?)

Long story short, we would like to find all queries that refer to the accounts table in some form, and add a simple predicate to it, implementing access control.

DISCLAIMER: As always with security, you should implement security on several layers. SQL AST transformation is not trivial, and the above scenarios are incomplete. Besides, they work only for queries that are built using the jOOQ AST, not for plain SQL queries, or for queries that are run via JDBC directly, via Hibernate, or via stored procedures, views (which in turn refer to the accounts table), or simple table synonyms.

So, read this post as a tutorial showing how to perform AST transformation, not as a complete solution to row-level security

How to do it with jOOQ?

Now comes the interesting part. We’re going to do the whole thing with jOOQ. First off, remember that in order to implement backwards-compatible SPI evolution, we always provide a default implementation for our SPIs. In this case, we’re going to extend DefaultVisitListener instead of implementing VisitListener directly.

The base of our VisitListener will be the following:

public class AccountIDFilter extends DefaultVisitListener {

    final Integer[] ids;

    public AccountIDFilter(Integer... ids) {
        this.ids = ids;
    }
}

In other words, a filtering listener that filters for a given set of IDs to be put in an IN predicate.

Now, first off, we’ll need a bit of utility methods. The following two utilities push or pop some objects on top of a stack:

void push(VisitContext context) {
    conditionStack(context).push(new ArrayList<>());
    whereStack(context).push(false);
}

void pop(VisitContext context) {
    whereStack(context).pop();
    conditionStack(context).pop();
}

… and the stack can be seen here:

Deque<List<Condition>> conditionStack(
        VisitContext context) {
    Deque<List<Condition>> data = (Deque<List<Condition>>) 
        context.data("conditions");

    if (data == null) {
        data = new ArrayDeque<>();
        context.data("conditions", data);
    }

    return data;
}

Deque<Boolean> whereStack(VisitContext context) {
    Deque<Boolean> data = (Deque<Boolean>) 
        context.data("predicates");

    if (data == null) {
        data = new ArrayDeque<>();
        context.data("predicates", data);
    }

    return data;
}

In prose, the conditionStack maintains a stack of conditions for each subquery, whereas the whereStack maintains a stack of flags for each subquery. The conditions are the conditions that should be generated in the WHERE clause of the given subquery, whereas the flags indicate whether a WHERE clause is already present (i.e. whether new conditions should be appended using AND, rather than WHERE).

For convenience, we’ll also add the following utilities:

List<Condition> conditions(VisitContext context) {
    return conditionStack(context).peek();
}

boolean where(VisitContext context) {
    return whereStack(context).peek();
}

void where(VisitContext context, boolean value) {
    whereStack(context).pop();
    whereStack(context).push(value);
}

These utilities allow for accessing both the conditions and flags at the top of the stack (in the current subquery), as well as for replacing the flag at the top of the stack.

Why do we need a stack?

It’s simple. We want to apply the predicate only locally for the current subquery, while jOOQ transforms and generates your SQL statement. Remember, when we were transforming the following:

SELECT 
  t.amount,
  t.balance
FROM 
  transactions t
WHERE
  t.account_id IN (
    SELECT 
      a.id 
    FROM 
      accounts a
    WHERE 
      a.account_owner = 'John'
    AND
      a.id IN (?, ?)
  )
AND
  t.account_id IN (?, ?)

… in the end, we want two additional predicates generated in the above query. One in the subquery selecting from accounts, and another one in the top-level query selecting from transactions, but the two predicates shouldn’t interfere with each other, i.e. when jOOQ generates the subquery, we only want to see objects that are relevant to the subquery (top of the stack).

So, let’s see how and when we push stuff on the stack. First off, we need to listen for start and end events of SQL clauses:

starting a SQL clause

This is straight-forward. Every time we enter a new SQL statement, we want to push a new set of data (conditions, flags) on the stack. In a way, we’re creating a local scope for the subquery:

@Override
public void clauseStart(VisitContext context) {

    // Enter a new SELECT clause / nested select
    // or DML statement
    if (context.clause() == SELECT ||
        context.clause() == UPDATE ||
        context.clause() == DELETE ||
        context.clause() == INSERT) {
        push(context);
    }
}

Of course, this scope has to be cleaned up at the end of the same clause:

@Override
public void clauseEnd(VisitContext context) {
    // [ ... more code will follow ... ]

    // Leave a SELECT clause / nested select
    // or DML statement
    if (context.clause() == SELECT ||
        context.clause() == UPDATE ||
        context.clause() == DELETE ||
        context.clause() == INSERT) {
        pop(context);
    }
}

This was the easy part. Now, it gets a bit more interesting. When we end a clause, and that clause is a WHERE clause of a SELECT, UPDATE, or DELETE statement, then we want to render an additional keyword and predicate:

@Override
public void clauseEnd(VisitContext context) {

    // Append all collected predicates to the WHERE
    // clause if any
    if (context.clause() == SELECT_WHERE ||
        context.clause() == UPDATE_WHERE ||
        context.clause() == DELETE_WHERE) {
        List<Condition> conditions = 
            conditions(context);

        if (conditions.size() > 0) {
            context.context()
                   .formatSeparator()
                   .keyword(where(context) 
                   ? "and" 
                   : "where"
                   )
                   .sql(' ');

            context.context().visit(
                DSL.condition(Operator.AND, conditions)
            );
        }
    }

    // [ ... code from previous snippet ... ]

So, the only thing we still need to do is assemble that List<Condition> on the stack of the current query, such that we can append it to the WHERE clause, as well as the flag that decides between "and" and "where". This can be done by overriding visitEnd(), a method that is invoked at the end of visiting a QueryPart (as opposed to a Clause):

@Override
public void visitEnd(VisitContext context) {

    // We'll see what this means in a bit...
    pushConditions(context, ACCOUNTS, 
        ACCOUNTS.ID, ids);
    pushConditions(context, TRANSACTIONS, 
        TRANSACTIONS.ACCOUNT_ID, ids);

    // Check if we're rendering any condition within
    // the WHERE clause In this case, we can be sure
    // that jOOQ will render a WHERE keyword
    if (context.queryPart() instanceof Condition) {
        List<Clause> clauses = clauses(context);

        if (clauses.contains(SELECT_WHERE) ||
            clauses.contains(UPDATE_WHERE) ||
            clauses.contains(DELETE_WHERE)) {
            where(context, true);
        }
    }
}

List<Clause> clauses(VisitContext context) {
    List<Clause> result = asList(context.clauses());
    int index = result.lastIndexOf(SELECT);

    if (index > 0)
        return result.subList(index, result.size() - 1);
    else
        return result;
}

At the end of each query part…

  • we’re trying to push the relevant conditions on the stack if applicable
  • we’re checking to see if a WHERE clause is present and set the relevant flag

So, finally, the core logic contained in pushConditions() is the only thing missing, and that’s:

<E> void pushConditions(
        VisitContext context, 
        Table<?> table, 
        Field<E> field, 
        E... values) {

    // Check if we're visiting the given table
    if (context.queryPart() == table) {
        List<Clause> clauses = clauses(context);

        // ... and if we're in the context of the current
        //  subselect's FROM clause
        if (clauses.contains(SELECT_FROM) ||
            clauses.contains(UPDATE_UPDATE) ||
            clauses.contains(DELETE_DELETE)) {

            // If we're declaring a TABLE_ALIAS...
            // (e.g. "ACCOUNTS" as "a")
            if (clauses.contains(TABLE_ALIAS)) {
                QueryPart[] parts = context.queryParts();

                // ... move up the QueryPart visit path to find the
                // defining aliased table, and extract the aliased
                // field from it. (i.e. the "a" reference)
                for (int i = parts.length - 2; i >= 0; i--) {
                    if (parts[i] instanceof Table) {
                        field = ((Table<?>) parts[i]).field(field);
                        break;
                    }
                }
            }

            // Push a condition for the field of the
            // (potentially aliased) table
            conditions(context).add(field.in(values));
        }
    }
}

And we’re done! Whew – well, that certainly wasn’t too easy. AST transformation never is. But the above algorithm is robust and can be used to run all of the aforementioned queries.

Testing the above

Configuration configuration = create().configuration();

// This configuration has full access to all rows
DSLContext fullaccess = DSL.using(configuration);

// This configuration has restricted access to IDs 1 and 2
DSLContext restricted = DSL.using(
    configuration.derive(
        DefaultVisitListenerProvider.providers(
            new AccountIDFilter(1, 2)
        )
    )
);

// Fetching accounts
assertEquals(3, fullaccess.fetch(ACCOUNTS).size());
assertEquals(2, restricted.fetch(ACCOUNTS).size());

Generating the following SQL:

select 
  "PUBLIC"."ACCOUNTS"."ID", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_NAME", 
  "PUBLIC"."ACCOUNTS"."AMOUNT"
from "PUBLIC"."ACCOUNTS"
---------------------------------------
select 
  "PUBLIC"."ACCOUNTS"."ID", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER", 
  "PUBLIC"."ACCOUNTS"."ACCOUNT_NAME", 
  "PUBLIC"."ACCOUNTS"."AMOUNT"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ID" in (
  1, 2
)
// Fetching transactions
assertEquals(6, fullaccess.fetch(TRANSACTIONS).size());
assertEquals(5, restricted.fetch(TRANSACTIONS).size());

Generating the following SQL:

select 
  "PUBLIC"."TRANSACTIONS"."ID", 
  "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID", 
  "PUBLIC"."TRANSACTIONS"."AMOUNT"
from "PUBLIC"."TRANSACTIONS"
---------------------------------------
select 
  "PUBLIC"."TRANSACTIONS"."ID", 
  "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID", 
  "PUBLIC"."TRANSACTIONS"."AMOUNT"
from "PUBLIC"."TRANSACTIONS"
where "PUBLIC"."TRANSACTIONS"."ACCOUNT_ID" in (
  1, 2
)
// Fetching John's accounts
assertEquals(asList(1, 3), fullaccess.fetchValues(
    select(ACCOUNTS.ID)
    .from(ACCOUNTS)
    .where(ACCOUNTS.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));
assertEquals(asList(1   ), restricted.fetchValues(
    select(ACCOUNTS.ID)
    .from(ACCOUNTS)
    .where(ACCOUNTS.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));

Generating the following SQL:

select "PUBLIC"."ACCOUNTS"."ID"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER" = 'John'
order by 1 asc
---------------------------------------
select "PUBLIC"."ACCOUNTS"."ID"
from "PUBLIC"."ACCOUNTS"
where "PUBLIC"."ACCOUNTS"."ACCOUNT_OWNER" = 'John'
and "PUBLIC"."ACCOUNTS"."ID" in (
  1, 2
)
order by 1 asc
// Fetching John's accounts via an aliased table
Accounts a = ACCOUNTS.as("a");
assertEquals(asList(1, 3), fullaccess.fetchValues(
    select(a.ID)
    .from(a)
    .where(a.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));
assertEquals(asList(1   ), restricted.fetchValues(
    select(a.ID)
    .from(a)
    .where(a.ACCOUNT_OWNER.eq("John"))
    .orderBy(1)
));

Generating the following SQL:

select "a"."ID"
from "PUBLIC"."ACCOUNTS" "a"
where "a"."ACCOUNT_OWNER" = 'John'
order by 1 asc
---------------------------------------
select "a"."ID"
from "PUBLIC"."ACCOUNTS" "a"
where "a"."ACCOUNT_OWNER" = 'John'
and "a"."ID" in (
  1, 2
)
order by 1 asc
// Fetching John's transactions
Transactions t = TRANSACTIONS.as("t");
assertEquals(asList(1, 2, 6), fullaccess.fetchValues(
    select(t.ID)
    .from(t)
    .where(t.ACCOUNT_ID.in(
        select(a.ID)
        .from(a)
        .where(a.ACCOUNT_OWNER.eq("John"))
    ))
    .orderBy(1)
));
assertEquals(asList(1, 2   ), restricted.fetchValues(
    select(t.ID)
    .from(t)
    .where(t.ACCOUNT_ID.in(
        select(a.ID)
        .from(a)
        .where(a.ACCOUNT_OWNER.eq("John"))
    ))
    .orderBy(1)
));

Generating the following SQL:

select "t"."ID"
from "PUBLIC"."TRANSACTIONS" "t"
where "t"."ACCOUNT_ID" in (
  select "a"."ID"
  from "PUBLIC"."ACCOUNTS" "a"
  where "a"."ACCOUNT_OWNER" = 'John'
)
order by 1 asc
---------------------------------------
select "t"."ID"
from "PUBLIC"."TRANSACTIONS" "t"
where "t"."ACCOUNT_ID" in (
  select "a"."ID"
  from "PUBLIC"."ACCOUNTS" "a"
  where "a"."ACCOUNT_OWNER" = 'John'
  and "a"."ID" in (
    1, 2
  )
)
and "t"."ACCOUNT_ID" in (
  1, 2
)
order by 1 asc

Conclusion

The above examples have shown how row-level security can be implemented relatively easily using jOOQ’s VisitListener, a very powerful Service Provider Interface for client-side SQL AST transformation.

The applications don’t stop there. It is easy to see how you can implement a VisitListener that throws an exception every time you run a DML query that does not have a WHERE clause.

where-clause

Another application would be to replace a table by a similar table, whenever a certain condition is true.

And the best thing is: With jOOQ, you don’t need to parse SQL in order to transform it (which is extremely hard, depending on the SQL dialect). You already manually build an Abstract Syntax Tree using jOOQ’s fluent API, so you get all these features for free.

So: Happy SQL transformation!

jOOQ: The best way to write SQL in Java

Appendix: Full code

import static java.util.Arrays.asList;
import static org.jooq.Clause.DELETE;
import static org.jooq.Clause.DELETE_DELETE;
import static org.jooq.Clause.DELETE_WHERE;
import static org.jooq.Clause.INSERT;
import static org.jooq.Clause.SELECT;
import static org.jooq.Clause.SELECT_FROM;
import static org.jooq.Clause.SELECT_WHERE;
import static org.jooq.Clause.TABLE_ALIAS;
import static org.jooq.Clause.UPDATE;
import static org.jooq.Clause.UPDATE_UPDATE;
import static org.jooq.Clause.UPDATE_WHERE;
import static org.jooq.test.h2.generatedclasses.Tables.ACCOUNTS;
import static org.jooq.test.h2.generatedclasses.Tables.TRANSACTIONS;

import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.List;

import org.jooq.Clause;
import org.jooq.Condition;
import org.jooq.Field;
import org.jooq.Operator;
import org.jooq.QueryPart;
import org.jooq.Table;
import org.jooq.VisitContext;
import org.jooq.impl.DSL;
import org.jooq.impl.DefaultVisitListener;

@SuppressWarnings("unchecked")
public class AccountIDFilter extends DefaultVisitListener {

    final Integer[] ids;

    public AccountIDFilter(Integer... ids) {
        this.ids = ids;
    }

    void push(VisitContext context) {
        conditionStack(context).push(new ArrayList<>());
        whereStack(context).push(false);
    }

    void pop(VisitContext context) {
        whereStack(context).pop();
        conditionStack(context).pop();
    }

    Deque<List<Condition>> conditionStack(VisitContext context) {
        Deque<List<Condition>> data = (Deque<List<Condition>>) context.data("conditions");

        if (data == null) {
            data = new ArrayDeque<>();
            context.data("conditions", data);
        }

        return data;
    }

    Deque<Boolean> whereStack(VisitContext context) {
        Deque<Boolean> data = (Deque<Boolean>) context.data("predicates");

        if (data == null) {
            data = new ArrayDeque<>();
            context.data("predicates", data);
        }

        return data;
    }

    List<Condition> conditions(VisitContext context) {
        return conditionStack(context).peek();
    }

    boolean where(VisitContext context) {
        return whereStack(context).peek();
    }

    void where(VisitContext context, boolean value) {
        whereStack(context).pop();
        whereStack(context).push(value);
    }

    <E> void pushConditions(VisitContext context, Table<?> table, Field<E> field, E... values) {

        // Check if we're visiting the given table
        if (context.queryPart() == table) {
            List<Clause> clauses = clauses(context);

            // ... and if we're in the context of the current subselect's
            // FROM clause
            if (clauses.contains(SELECT_FROM) ||
                clauses.contains(UPDATE_UPDATE) ||
                clauses.contains(DELETE_DELETE)) {

                // If we're declaring a TABLE_ALIAS... (e.g. "T_BOOK" as "b")
                if (clauses.contains(TABLE_ALIAS)) {
                    QueryPart[] parts = context.queryParts();

                    // ... move up the QueryPart visit path to find the
                    // defining aliased table, and extract the aliased
                    // field from it. (i.e. the "b" reference)
                    for (int i = parts.length - 2; i >= 0; i--) {
                        if (parts[i] instanceof Table) {
                            field = ((Table<?>) parts[i]).field(field);
                            break;
                        }
                    }
                }

                // Push a condition for the field of the (potentially aliased) table
                conditions(context).add(field.in(values));
            }
        }
    }

    /**
     * Retrieve all clauses for the current subselect level, starting with
     * the last {@link Clause#SELECT}.
     */
    List<Clause> clauses(VisitContext context) {
        List<Clause> result = asList(context.clauses());
        int index = result.lastIndexOf(SELECT);

        if (index > 0)
            return result.subList(index, result.size() - 1);
        else
            return result;
    }

    @Override
    public void clauseStart(VisitContext context) {

        // Enter a new SELECT clause / nested select, or DML statement
        if (context.clause() == SELECT ||
            context.clause() == UPDATE ||
            context.clause() == DELETE ||
            context.clause() == INSERT) {
            push(context);
        }
    }

    @Override
    public void clauseEnd(VisitContext context) {

        // Append all collected predicates to the WHERE clause if any
        if (context.clause() == SELECT_WHERE ||
            context.clause() == UPDATE_WHERE ||
            context.clause() == DELETE_WHERE) {
            List<Condition> conditions = conditions(context);

            if (conditions.size() > 0) {
                context.context()
                       .formatSeparator()
                       .keyword(where(context) ? "and" : "where")
                       .sql(' ');

                context.context().visit(DSL.condition(Operator.AND, conditions));
            }
        }

        // Leave a SELECT clause / nested select, or DML statement
        if (context.clause() == SELECT ||
            context.clause() == UPDATE ||
            context.clause() == DELETE ||
            context.clause() == INSERT) {
            pop(context);
        }
    }

    @Override
    public void visitEnd(VisitContext context) {
        pushConditions(context, ACCOUNTS, ACCOUNTS.ID, ids);
        pushConditions(context, TRANSACTIONS, TRANSACTIONS.ACCOUNT_ID, ids);

        // Check if we're rendering any condition within the WHERE clause
        // In this case, we can be sure that jOOQ will render a WHERE keyword
        if (context.queryPart() instanceof Condition) {
            List<Clause> clauses = clauses(context);

            if (clauses.contains(SELECT_WHERE) ||
                clauses.contains(UPDATE_WHERE) ||
                clauses.contains(DELETE_WHERE)) {
                where(context, true);
            }
        }
    }
}

Querying Your Database from Millions of Fibers (Rather than Thousands of Threads)


jOOQ is a great way to do SQL in Java and Quasar fibers bring a much improved concurrency

We’re excited to announce another very interesting guest post on the jOOQ Blog by Fabio Tudone from Parallel Universe.

Parallel Universe develops an open-source stack that allows developers to easily code extremly concurrent application on the JVM. With the Parallel Universe stack you build software that works in harmony with modern hardware rather than fight it at every turn, while keeping your programming language and your simple, familiar programming styles.

fabiotudoneFabio Tudone develops and maintains Quasar integration modules as part of the Comsat project. He’s been part of and then led the development of a cloud-based enterprise content governance platform for several years before joining the Parallel Universe team and he’s been writing mostly JVM software along his whole professional journey. His interests include Dev and DevOps practices, scalability, concurrent and functional programming as well as runtime platforms. Naturally curious and leaning towards exploration, he enjoys gathering knowledge and understanding from people, places and cultures. He’s also interested in awareness practices and likes writing all sorts of stuff.

Quasar features an integration for JDBC and jOOQ as part of the Comsat project, so let’s have a look inside the box.

JDBC, jOOQ and Quasar

comsat-jdbc provides a fiber-blocking wrapper of the JDBC API, so that you can use your connection inside fibers rather than regular Java threads.

Why would you do that? Because fibers are lightweight threads and you can have many more fibers than threads in your running JVM. “Many more” means we’re talking millions versus a handful of thousands.

This means that you have a lot more concurrency capacity in your system to do other things in parallel while you wait for JDBC execution, be it concurrent / parallel calculations (like exchanging actor messages in your highly reliable Quasar Erlang-like actor system) or fiber-blocking I/O (e.g. serving webrequests, invoking micro-services, reading files through fiber NIO or accessing other fiber-enabled data sources like MongoDB).

If your DB can stand it and few more regular threads won’t blow up your system (yet), you can even increase your fiber-JDBC pool (see Extra points: where’s the waiting line later) and send more concurrent jOOQ commands.

Since jOOQ uses JDBC connections to access the database, having jOOQ run on fibers is as easy as bringing in the comsat-jooq dependency and handing your fiber-enabled JDBC connection to the jOOQ context:

import java.sql.Connection;
import static org.jooq.impl.DSL.*;

// ...

Connecton conn = FiberDataSource.wrap(dataSource)
                                .getConnection();
DSLContext create = DSL.using(connection);

// ...

Of course you can also configure a ConnectionProvider to fetch connections from your FiberDataSource.

From this moment on you can use regular jOOQ and everything will happen in fiber-blocking mode rather than thread-blocking. That’s it.

No, really, there’s absolutely nothing more to it: you keep using the excellent jOOQ, only with much more efficient fibers rather than threads. Quasar is a good citizen and won’t force you into a new API (which is nice especially when the original one is great already).

Since the JVM at present doesn’t support native green threads nor continuations, which can be used to implement lightweight threads, Quasar implements continuations (and fibers on top of them) viabytecode instrumentation. This can be done at compile time but often it’s just more convenient to use Quasar’s agent (especially when instrumenting third-party libraries), so here’s an example Gradle project based on Dropwizard that also includes the Quasar agent setup (Don’t forget about Capsule, a really great Java deployment tool for every need, which, needless to say, makes using Quasar and agents in general a breeze). The example doesn’t use all of the jOOQ features, rather it falls in SQL-building use case (both for querying and for CRUD) but you’re encouraged to change it to suit your needs. The without-comsat branch contains a thread-blocking version so you can compare and see the (minimal) differences with the Comsat version.

Where’s the waiting line?

You might be wondering now: ok, but JDBC is a thread-blocking API, how can Quasar turn it into a fiber-blocking one? Because JDBC doesn’t have an aynchronous mode, Quasar uses a thread pool behind the scenes to which fibers dispatch JDBC operations and by which they’re unfrozen and scheduled for resumption when the JDBC operation completes (have a look at Quasar’s integration patterns for more info).

Yes, here’s the nasty waiting line: JDBC commands awaiting to be executed by the thread pool. Although you’re not improving DB parallelism beyond your JDBC thread-pool size, you’re not hurting your fibers, either, even though you’re still using a simple and familiar blocking API. You can still have millions of fibers.

Is it possible to improve the overall situation? Without a standard asynchronous Java RDBMS API there isn’t much we can do. However, this may not matter at all if the database is your bottleneck. There are several nice posts and discussions about this topic and the argument amounts to deciding where you want to move the waiting line.

Bonus: how does this neat jOOQ integration work under the cover?

At present Quasar needs the developer (or integrator) to tell it what to instrument, although fully automatic instrumentation is in the works (This feature depends on some minor JRE changes that won’t be released before Java 9). If you can conveniently alter the source code (or the compiled classes) then it’s enough to annotate methods with @Suspendable or letting them throws SuspendExecution, but this is usually not the case with libraries. But methods with fixed, well known names to be instrumented can be listed in META-INF/suspendables and META-INF/suspendable-supers, respectively for concrete methods and abstract / interface methods that can have suspendable implementations.

If there are a lot (or there’s code generation involved), you can write a SuspendableClassifier to ship with your integration and register it with Quasar’s SPI to provide additional instrumentation logic (see jOOQs). A SuspendableClassifier‘s job is to examine signature information about each and every method in your runtime classpath during the instrumentation phase and tell if it’s suspendable, if it can have suspendable implementations, if for sure neither is the case or if it doesn’t know (Some other classifier could say perhaps “suspendable” or “suspendable-super” later on).

Summing it all up

Well… Just enjoy the excellent jOOQ on efficient fibers!

JavaEE or Spring? Neither! We Call Out For a Fresh Competitor!


If you’ve been following some key Java people on Twitter or reading the “news” on Reddit, you could not have missed the hilarious “bitch fight” (pardon my French) between some Spring and JavaEE evangelists.

First, Jürgen Höller’s provocative article:

“Happy second birthday, Java EE 7! How is it going in production?”

Then, Reza Rahman’s response:

“The Ghosts of Java EE 7 in Production: Past, Present and Future”

And in between, there had been hundreds of equally amusing tweets.

If you haven’t already, take out your popcorn :)

EDIT: This discussion went on further than this blog post. There are also discussions going on:

Troll marketing

At Data Geekery, we’re certainly not unacquainted with the fun and effective technique that we like to refer to as “troll marketing”, e.g. when we recently ranted against the (ab-)use of annotations by both Java EE and Spring, or when we compared Hibernate with the good old Nokia 3310.

It’s childish, yes. But it is also effective. The only reason why it is effective is because by doing so, you’re stirring up a hornets’ nest. You have a point, after all, and you start triggering a very polarised discussion that has no right/wrong answer, which all participants will fail to accept, and thus fight. While they’re fighting, you can put your brand in each and every message. Bingo.

Spring folks do exactly the same thing. If you’ve heard a talk by Josh Long from Pivotal, you will certainly have noticed that much of the talk is just ranting about things like Internet Explorer. If you’re not the kind of person that takes everything personally, seriously, and/or literally, then such a talk can be quite amusing.

The interesting bit here is the fact that the “standards” evangelists react to Spring’s provocative article way too much. At the bottom of Reza’s article, there is a linked presentation by Bert Ertman, in favour of Java EE (and it wouldn’t be wrong to say that Reza thus endorses this talk’s content and style):

The key point I’m trying to make here is not that two opposing technology vendors try to get a hold of their communities, convincing people that their tech is better (that’s what engineers have been doing long before Tesla vs. Edison).

The key point I’m trying to make here is the fact that one of the vendors claims for themselves to be a “standard”, putting themselves (the JCP = Oracle) on eye level with real standards committees like ISO, DIN, w3c, IETF.

JavaEE is a Standard?

It is, in a way. It is a de facto standard, just like Spring. It is a very loose set of small interest groups that are governed mostly by Oracle (and other vendors) who try to promote and establish their interests and technology over alternative, “proprietary” tech.

JavaEE is a “standard” with a long history. It has always been a rather “weak” standard, which is why Spring was created in the first place. The fact that Reza from Oracle reacts so sensitively to Spring’s deliberate provocation can be interpreted as a sign that the “standard” is still weak and endangered by Spring as an alternative.

An interesting comparison could be made to Siebel vs. Salesforce. Siebel was the de facto “standard” in the CRM industry but Salesforce managed to harass Siebel so much that Siebel had to acknowledge Salesforce as a real competitor in its business long before they even remotely met on eye-level, revenue-wise. This was ultimately leading to Siebel’s demise and acquisition by – how ironic – Oracle, where much of the JCP is hosted.

The same thing has been happening with Java EE and Spring. Spring is now so extremely successful in the market, even if completely (= admittedly) proprietary, that Java EE will have to fight hard to get back on top of Spring. Much harder than merely calling out the old and lame “standards” argument. Java EE has to be better and faster than Spring, not just a “standard”. With all the JCP politics going on, I don’t see that happening, though.

So, will Spring win?

My prediction is: Nope.

This whole story just shows us that standards in our Java ecosystem are weak. There are an incredible amount of players on the market. Niche products like Hibernate and jOOQ can shake up the database access market segment. Niche products like GWT, Vaadin, ZK, Spring MVC, Play Framework, etc. can shake up the HTML UI market segment (no way JSF or JSR 371 can prevent further market fragmentation in this area).

Heck, language agnostic technologies like Akka or Vert.X show that not even the Servlet API is really sound and safe. And with Oracle having introduced Nashorn, the whole JavaScript ecosystem might just as well threaten Java EE. The only thing that can be taken for granted for another 5-10 years, I suspect, is the JVM and the Java SE JDK with its libraries, including JDBC.

Our industry is advancing so quickly right now, it seems to be impossible for a highly distributed, rather political committee like the JCP to keep up the pace and deliver the quality and innovation that we’re seeing elsewhere.

In other words, the Java EE “standard” is nothing more than a 5-10 year-behind body of, admittedly, proven technologies. Proven some time ago. Given that a single player like Pivotal has successfully challenged the “standard” for more than 10 years now, I suspect that new players might soon enter the market with new alternatives.

Who will be the new players?

One of them will be the soon-to-be-renamed Typesafe. Why do you think they’re getting rebranded after all? And what do you think they’re trying to prove with Akka + Play + ConductR? They’re trying to prove that you can do software entirely differently on top of the JVM, that’s what they’re trying to prove.

Besides, they’re using the same low-content, high-emotion troll marketing technique that Pivotal / Spring had been so successful with. Observe their use of the evil term “monolith” everywhere (= Java EE, what else):

monolith

Another one of them might be… Microsoft! Why not? Their recent Open Sourcing moves might help establishing .NET on top of Linux, eventually. And once you’re on top of Linux, you might as well get on top of the JVM. The .NET ecosystem is much more coherent than Java’s and this will exert a lot of innovation pressure on our ecosystem.

microsoft+linux

All of this is happening while the JCP is still recovering from the politics behind the game-changer data interchange APIs JSR-353, JSR-367, and JSR-374

Interesting future

The above predictions might or might not become true. But one thing I’m very certain of. We will have even more competition and thus more choices for our infrastructure software in the near future. The current de facto standards will lose market shares. Tough luck for them, better tech for us.

While the evangelists are fighting their little personal fights, I’m looking forward to a very interesting future on the JVM.

Read about a concurring opinion by our friends at Takipi here

jOOQ Tuesdays: Axel Fontaine Predicts Exciting Times as our Industry is Maturing


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

afontaine

We have the pleasure of talking to Axel Fontaine in this fourth edition who will be telling us about the exciting times that are ahead as our industry is maturing.

Hi Axel – Citing your website: From “Software is what I do” to “Architecting for Continuous Delivery & Zero Downtime”. Where did your personal passion for devops come from?

Over the last 18 years in our industry I have seen a lot of projects for numerous clients in many different sectors. And it always struck me. The intersection between development and operations was almost invariably the place where the largest efficiency gains could be realized. Over time this then became the place where I naturally started to focus my attention, initially for consultancy and training, and then later also for public speaking and the product business.

Very interesting! I suspect that the Flyway download rates also confirm the increasing need for devops tooling, right?

I am glad you mentioned it. Yes, Flyway download numbers have been going through the roof. In March we had one download every 42 seconds and we are well on track to break the 1 million mark in 2015! This picture actually tells us a number of things. First of all it confirms that the need for database migrations is a very widespread itch and that Flyway’s no-nonsense pragmatic and lightweight approach to scratch it resonates with the industry. The other thing this tells us, is that our industry as a whole is maturing and continuous delivery and deployment in particular is becoming more mainstream. This increased pace of delivery comes with a pressing need for a higher level of automation. And modern lightweight tools like Flyway are a great fit for this.

Now, you are converting that passion into a product called Boxfuse. Can you tell us a little more about it?

I have always been a strong believer in lightweight and open solutions that are a pleasure to use and minimize lock-in. When it comes to deploying an application to the cloud, so far you had to choose one or the other. You could either have something that is a joy to use (PaaS), but leaving you stuck in a walled garden. Or you could have something open (IaaS) that unfortunately leaves you with a lot of operational concerns to deal with. Boxfuse aims for a fresh new middle ground: the convenience of PaaS with the freedom of IaaS. To achieve that Boxfuse does away with a bunch of cargo cult. No more general purpose operation system, no more traditional provisioning. Instead Boxfuse analyses your application and in just a few seconds, fuses it into a tiny bootable image that is about 1% percent of the size of a regular system. Boxfuse then offers you a sophisticated blue/green deployment process for zero downtime updates. All of this, out of the box and with a single command.

We’re entrepreneurs ourselves. Marketing a product is a long road. What are your own key takeaways from your experience as both a consultant and entrepreneur?

I couldn’t agree more. What marketing really is is competing for attention, and turning that attention into mind share through a series of arguments, both technical and emotional. It is an exercise that is often overlooked in our technical circles, where the code is front and center. But the best technology in the world isn’t worth a dime if you can’t convince someone else to use it. And that takes perseverance, drive, and dedication. What seems easy on the outside of often the fruit of lots of hard labor behind the scenes. Eventually you will succeed, but give it a little time. Overnight success takes years.

Where do you see the future of our industry, from your perspective?

Believe me when I say these are exciting times. While it still feels like a big adventure with great new frontiers to explore, our industry is maturing rapidly. Overly complex tools are going the way of the dodo. User expectations are rising fast. And the only way to meet them is through modern  lightweight tools and services that integrate seamlessly with our workflow and perform the undifferentiated heavy lifting for us. We need to eliminate all unnecessary complexity and so we can focus on what matters most: delighting our customers.

What Exactly are SQL Views?


You probably know about “ordinary views” already, but I’m sure you’ll find one or two things in this article that you haven’t thought about in this way yet…

What exactly are SQL views?

Views in SQL are a means of treating complex queries in the same way as “ordinary” tables. In fact, SQL is all about tables (which are bags of records), much like relational algebra is all about relations (which are sets of tuples).

There are different types of views:

“Ordinary” views

These are most commonly referred to as “views”. Most databases allow for declaring them using this syntax

CREATE VIEW my_view AS 
SELECT col1, col2
FROM my_table
WHERE ...

These stored views are then part of the catalog and can be referenced by name just like tables, which is great for re-use. And what’s even greater, you can grant a different set of privileges to views than to tables, which allows you to implement a complete security layer only using views (e.g. hiding some columns, or rows from certain users)

Some databases (including Oracle, PostgreSQL) even allow for updating them under certain circumstances – mostly when they’re unambiguous, 1-1 mappings of a single table that does not produce any calculations or denormalisations.

Materialized views

Just like the above “ordinary views”, materialized views can be used just like tables. In fact, they are tables as their data is materialized on disk, updated whenever their content is updated. These are useful for frequent, complex queries on rarely updated data.

Just add the MATERIALIZED keyword and you’re set:

CREATE MATERIALIZED VIEW my_view AS 
SELECT col1, col2
FROM my_table
WHERE ...

Among others, Oracle and PostgreSQL support materialized views. Other databases like SQL Server know of “indexed views”, which are a bit less powerful as you have to explicitly “materialise” your view data in indexes.

“Snapshot” views

These aren’t really views, they’re real tables. But in the context of this blog post, you could think of them as a permanently materialized “snapshot” view of your data. You can create such views using different syntaxes:

Most databases, e.g. Oracle

CREATE TABLE my_view AS
SELECT col1, col2
FROM my_table
WHERE ...

Some databases, e.g. SQL Server

SELECT col1, col2
INTO my_view
FROM my_table
WHERE ...

The nice thing about this approach is the fact that like materialized views, these “views” can be very useful for frequent querying – you have to pre-calculate the data only once. But once you’ve calculated that data, you generate a “snapshot” of it, and the data can continue to live independently from your view – just like a snapshot! (don’t forget to add relevant indexes, though)

Note that some databases including DB2 and Oracle support real SQL:2011 standard “snapshots”, such as flashback query in Oracle, or time travel queries in DB2. That’s a different story, though.

Parameterized views

Few people refer to these views as “views”, but if you think about it, that’s what they really are. Table-valued functions are stored procedures that return tables that can then again be used in SQL. For example (using PostgreSQL syntax):

CREATE FUNCTION my_view (arg1 INTEGER, arg2 INTEGER)
RETURNS TABLE (
    col1 INTEGER
    col2 INTEGER
)
AS $$
BEGIN
    RETURN QUERY
    SELECT col1, col2
    FROM my_table
    WHERE v1 = arg2 AND v2 = arg2;
END
$$ LANGUAGE plpgsql;

And then…

SELECT *
FROM my_view (42, 1337)
WHERE ...

That’s quite powerful, isn’t it? Among others, Firebird, HANA, HSQLDB, Oracle, PostgreSQL, SQL Server support table-valued functions.

Common Table Expressions

Like ordinary views, these views are named but they’re scoped only for a single statement – mostly a SELECT statement, although PostgreSQL or SQL Server also allow for common table expressions to be used with other DML statements. These “views” can be written as such:

WITH 
    my_view_a AS (
        SELECT ...
    ),
    my_view_b AS (
        SELECT ...
    )
-- To be consumed immediately by a statement
SELECT *
FROM my_view_a, my_view_b

While common table expressions are very useful for structuring code (they’re like “table variables”), they come with a price in Oracle or PostgreSQL, as the view is most often temporarily materialized, which prevents a lot of SQL transformations in the optimiser. On the flip side, common table expressions can be recursive / hierarchical, which is great for graph / tree traversal.

Derived tables

The most common type of views (although rarely called “views”) are derived tables, i.e. all nested select statements that are put in a FROM clause. E.g.:

SELECT *
FROM (
    SELECT ...
) my_view_a, (
    SELECT ...
) my_view_b

Unlike common table expressions, derived tables cannot be reused easily within a statement, but chances are high that they can be optimised into a different statement that has a higher performance.

Conclusion

SQL is all about tables and recomposition of tables in ad-hoc queries. The most important clause of any SQL statement is the FROM clause. It specifies the set of tuples that you want to recompose, filter, group, project in various ways. As we have seen above, you can feed any such table transformation easily into yet another transformation via one of the above ways to create views.

Curious about more? Read our popular article “10 Easy Steps to a Complete Understanding of SQL

Follow

Get every new post delivered to your Inbox.

Join 2,901 other followers

%d bloggers like this: