Tag Archive | jooq

Stop Claiming that you’re Using a Schemaless Database


One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
AS
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does

Murphy

We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

jOOQ: The best way to write SQL in Java

Don’t Miss out on Writing Java 8 SQL One-Liners with jOOλ or jOOQ


More and more people are catching up with the latest update to our platform by adopting functional programming also for their businesses.

At Data Geekery, we’re using Java 8 for our jOOQ integration tests, as using the new Streams API with lambda expressions makes generating ad-hoc test data so much easier.

However, we don’t feel that the JDK offers as much as it could, which is why we have also implemented and open-sourced jOOλ, a small utility library that patches those short-comings.

Note, we don’t aim to replace more sophisticated libraries like functionaljava. jOOλ is really just patching short-comings.

Putting lambdas to work with jOOλ or jOOQ

I’ve recently encountered this Stack Overflow question, which asked for streaming a result set with all columns into a single list. For example:

Input

+----+------------+------------+
| ID | FIRST_NAME | LAST_NAME  |
+----+------------+------------+
|  1 | Joslyn     | Vanderford |
|  2 | Rudolf     | Hux        |
+----+------------+------------+

Output

1
Joslyn
Vanderford
2
Rudolf
Hux

This is a typical school-book example for using functional programming rather than an iterative solution:

Iterative solution

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list = new ArrayList<>();

while (rs.next()) {
    for (int i = 0; i < meta.getColumnCount(); i++) {
        list.add(rs.getObject(i + 1));
    }
}

Truth is, the iterative solution isn’t all that bad, but let’s learn how this could be done with functional programming.

Using jOOλ

We’re using jOOλ for this example for a couple of reasons:

  • JDBC didn’t really adopt the new features. There is no simple ResultSet to Stream conversion, even if there should be.
  • Unfortunately, the new functional interfaces do not allow for throwing checked exceptions. The try .. catch blocks inside lambdas don’t exactly look nice
  • Interestingly, there is no way of generating a finite stream without also implementing an Iterator or Spliterator

So, here’s the plain code:

ResultSet rs = ...;
ResultSetMetaData meta = rs.getMetaData();

List<Object> list =
Seq.generate()
   .limitWhile(Unchecked.predicate(v -> rs.next()))
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))
   .toList()

So far, this looks about as verbose (or a bit more) than the iterative solution. As you can see, a couple of jOOλ extensions were needed here:

// This generate is a shortcut to generate an
// infinite stream with unspecified content
Seq.generate()

// This predicate-based stream termination
// unfortunately doesn't exist in the JDK
// Besides, the checked exception is wrapped in a
// RuntimeException by calling Unchecked.wrapper(...)
   .limitWhile(Unchecked.predicate(v -> rs.next()))

// Standard JDK flatmapping, producing a "nested"
// stream of column values for the "outer" stream
// of database rows
   .flatMap(Unchecked.function(v -> IntStream
       .range(0, meta.getColumnCount())
       .mapToObj(Unchecked.intFunction(i ->
           rs.getObject(i + 1)
       ))
   ))

// This is another convenience method that is more
// verbose to write with standard JDK code
   .toList()

Using jOOQ

jOOQ has even more convenience API to operate on result records of your SQL statement. Consider the following piece of logic:

ResultSet rs = ...;

List<Object> list =
DSL.using(connection)
   .fetch(rs)
   .stream()
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .collect(Collectors.toList());

Note that the above example is using standard JDK API, without resorting to jOOλ for convenience. If you want to use jOOλ with jOOQ, you could even write:

ResultSet rs = ...;

List<Object> list = 
Seq.seq(DSL.using(connection).fetch(rs))
   .flatMap(r -> Arrays.stream(r.intoArray()))
   .toList();

Easy? I would say so! Let’s remember that this example:

  • Fetches a JDBC ResultSet into a Java Collection
  • Transforms each record in the result set into an array of column values
  • Transforms each array into a stream
  • Flattens that stream into a stream of streams
  • Collects all values into a single list

Whew!

Conclusion

We’re heading towards exciting times! It will take a while until all Java 8 idioms and functional thinking will feel “natural” to Java developers, also in the enterprise.

The idea of having a sort of data source that can be configured with pipelined data transformations expressed as lambda expressions to be evaluated lazily is very compelling, though. jOOQ is an API that encapsulates SQL data sources in a very fluent and intuitive way, but it doesn’t stop there. jOOQ produces regular JDK collections of records, which can be transformed out-of-the-box via the new streams API.

We believe that this will drastically change the way the Java ecosystem will think about data transformation. Stay tuned for more examples on this blog!

The Caveats of Dual-Licensing


We’ve been in business for more than one year now with our dual-licensing strategy for jOOQ. While this strategy has worked very well for us, it has also been a bit of a challenge for some of our customers. Today, we’re going to show you what caveats of dual-licensing we’ve run into.

Our dual-licensing strategy

For those of you not acquainted with our license model, just a brief reminder to get you into the subject. We mainly consider ourselves as a vendor of Open Source software. However, contrary to a variety of other companies like Gradleware or Red Hat, we don’t want to build our business model on support, which is a much tougher business than licensing. Why?

  • Support contracts need a lot more long-term trust by customers, and more outbound sales. There’s only little inbound interest for such contracts, as people don’t acquire support until they need it.
  • Vendor-supplied support competes with third-party support (as provided by UWS, for instance), which we want to actively encourage. We’d love to generate business for an entirely new market, not compete with our allies.

So we were looking for a solution involving commercial licensing. We wanted to keep an Open Source version of our product because:

  • We’ll get traction with Open Source licensing much much more quickly than with commercial licensing
  • While Open Source is a very tough competitor for vendors, it is also a great enabler for consumers. For instance, we’re using Java, Eclipse, H2, and much more. Great software for free!

It wouldn’t be honest to say that we truly believe in “free as in freedom” (libre), but we certainly believe in “free as in beer” (gratis) – because, who doesn’t. So, one very simple solution to meet the above goals was to offer jOOQ as Open Source with Open Source databases, and to offer jOOQ under a commercial license with commercial databases.

The Caveat

This was generally well received with our user base as a credible and viable dual-licensing model. But there were also caveats. All of a sudden, we didn’t have access to these distribution channels any more:

  • Maven Central
  • GitHub

… and our paying customers didn’t have access to these very useful OSS rights any more:

  • Source Code
  • Modifications

Solution 1 – Ship Source Code

Well, we actually ship our source code with the commercial binaries. At first, this was done merely for documentation purposes.

Regardless of the actual license constraints, when you’re in trouble, e.g. when your productive system is down and you have to urgently fix a bug, doesn’t it just suck if you don’t have access to the source code of third-party dependencies? You will just have to guess what it does. Or illegally de-compile it.

We don’t want to be that company. We trust our customers to deal responsibly with our source code.

Solution 2 – Allow Modifications

Our commercial licenses come in two flavours: Yearly and Perpetual. We quickly realised that some of our customers do not want to be dependent on us as a vendor. Perpetual licenses obviously help making customers more independent. But the disadvantage of perpetual licenses is the fact that vendors will not support old versions forever, and customers won’t have the right to upgrade to the next major release. While they are probably fine with not having access to new features, they would still like to receive an occasional bugfix.

The solution we’ve come to adopt is a very pragmatic one: Customers already have the source code (see above), so why not allow customers to also apply urgent fixes themselves? Obviously, such modifications will void the warranty offered by us, but if you buy jOOQ today and 5 years down the line, you discover a very subtle bug in what will then be an unsupported version of jOOQ… don’t you just want to fix it?

Conclusion

Dual-licensing is a tricky business. You partition your user-base into two:

  • The paying / premium customers
  • The “freemium” customers

By all means, you must prevent your premium customers from being at a disadvantage compared to your “freemium” customers. There are certain rights that are probably OK to remove (such as the right of free distribution). But there are rights that are just annoying not to have. And those rights are the rights that matter the most to the every day work of an engineer:

To fix that bloody bug :-)

We’re very curious: What are your opinions towards dual-licensing?

Asynchronous SQL Execution with jOOQ and Java 8’s CompletableFuture


Reactive programming is the new buzzword, which essentially just means asynchronous programming or messaging.

Fact is that functional syntax greatly helps with structuring asynchronous execution chains, and today, we’ll see how we can do this in Java 8 using jOOQ and the new CompletableFuture API.

In fact, things are quite simple:

// Initiate an asynchronous call chain
CompletableFuture

    // This lambda will supply an int value
    // indicating the number of inserted rows
    .supplyAsync(() -> DSL
        .using(configuration)
        .insertInto(AUTHOR, AUTHOR.ID, AUTHOR.LAST_NAME)
        .values(3, "Hitchcock")
        .execute()
    )
                          
    // This will supply an AuthorRecord value
    // for the newly inserted author
    .handleAsync((rows, throwable) -> DSL
        .using(configuration)
        .fetchOne(AUTHOR, AUTHOR.ID.eq(3))
    )

    // This should supply an int value indicating
    // the number of rows, but in fact it'll throw
    // a constraint violation exception
    .handleAsync((record, throwable) -> {
        record.changed(true);
        return record.insert();
    })
    
    // This will supply an int value indicating
    // the number of deleted rows
    .handleAsync((rows, throwable) -> DSL
        .using(configuration)
        .delete(AUTHOR)
        .where(AUTHOR.ID.eq(3))
        .execute()
    )

    // This tells the calling thread to wait for all
    // chained execution units to be executed
    .join();

What did really happen here? Nothing out of the ordinary. There are 4 execution blocks:

  1. One that inserts a new AUTHOR
  2. One that fetches that same AUTHOR again
  3. One that re-inserts the newly fetched AUTHOR (throwing an exception)
  4. One that ignores the thrown exception and delets the AUTHOR again

Finally, when the execution chain is established, the calling thread will join the whole chain using the CompletableFuture.join() method, which is essentially the same as the Future.get() method, except that it doesn’t throw any checked exception.

Comparing this to other APIs

Other APIs like Scala’s Slick have implemented similar things via “standard API”, such as calls to flatMap(). We’re currently not going to mimick such APIs as we believe that the new Java 8 APIs will become much more idiomatic to native Java speakers. Specifically, when executing SQL, getting connection pooling and transactions right is of the essence. The semantics of asynchronously chained execution blocks and how they relate to transactions is very subtle. If you want a transaction to span more than one such block, you will have to encode this yourself via jOOQ’s Configuration and its contained ConnectionProvider.

Blocking JDBC

Obviously, there will always be one blocking barrier to such solutions, and that is JDBC itself – which is very hard to turn into an asynchronous API. In fact, few databases really support asynchronous query executions and cursors, as most often, a single database session can only be used by a single thread for a single query at a time.

We’d be very interested to learn about your asynchronous SQL querying requirements, so feel free to comment on this post!

Top 10 Most Popular Articles on the jOOQ Blog


What do people do when they run out of topics? They recycle previous topics and create top 10 lists. Here is a list of the top 10 most popular articles from the jOOQ blog:

  1. Top 10 Very Very VERY Important Topics to Discuss
     
    A fun, not so serious parody on what is being discussed on reddit’s /r/programming. Hint: Bikeshedding topics are the most popular. Like this one. That was so meta!
     
  2. 10 Subtle Best Practices when Coding Java
     
    This is a really interesting article about not-so-common advice that might be handy every once in a while.
     
  3. 10 Common Mistakes Java Developers Make when Writing SQL
     
    A classic and must-read for all SQL developers (not only those that usually write Java)
     
  4. SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT
     
    We’re surprised ourselves that this is so popular. But it appears that we’re really well ranked on Google when people are looking for ROW_NUMBER() and DENSE_RANK(). And the trick is very useful, of course!
     
  5. Why You Should NOT Implement Layered Architectures
     
    What a silly rant! And how it went up in the ranking within only two days! This is not really very serious advice. Obviuosly, you should (as always) do what fits best to your problem domain. But we wanted to make people think about the status quo and how it is often applied too rigidly, without thinking about all the options. Looks like we’ve hit a sweet spot with developers frustrated with overengineered applications…
     
  6. MIT Prof. Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”
     
    Michael Stonebraker is a very controversial person per se. In this article, we’re linking to a talk by Stonebraker where he claims (again) that the RDBMS end is nigh. A year later, we can see that NoSQL is still on the rise, whereas NewSQL is still no where. See also the next article…
     
  7. The 10 Most Popular DB Engines (SQL and NoSQL)
     
    This is an interpretation of a popular ranking of (R)DBMS, showing that even if Oracle, MySQL, and SQL Server are the most wide-spread databases, something’s about to change.
     
  8. Does Java 8 Still Need LINQ? Or is it Better than LINQ?
     
    Again, controversy is king. Of course, LINQ is awesome and often we wish we had something like LINQ in Java. In this article, however, we’re claiming that with Java 8’s Streams API and lambda expressions, we might no longer need LINQ, as collections transformation is already sufficiently covered, and LINQ-to-SQL is not what made LINQ popular (which is where jOOQ is more useful)
     
  9. 10 More Common Mistakes Java Developers Make when Writing SQL
     
    A follow-up article to the previous, very popular article about common SQL mistakes. Yes, there’s a lot to learn in this area.
     
  10. The Java Fluent API Designer Crash Course
     
    From a jOOQ perspective, this is one of the most interesting articles explaining the very simple and easy-to-apply rules that we’re using to produce our API in the form of an internal domain-specific language. If you want to build jOOQ for your own query language (e.g. Cassandra’s CQL), just follow these simple rules
     

Thanks for reading our blog! We promise to keep you up to date with more interesting (and occasionally useless) content!

When the Java 8 Streams API is not Enough


Java 8 was – as always – a release of compromises and backwards-compatibility. A release where the JSR-335 expert group might not have agreed upon scope or feasibility of certain features with some of the audience. See some concrete explanations by Brian Goetz about why …

But today we’re going to focus on the Streams API’s “short-comings”, or as Brian Goetz would probably put it: things out of scope given the design goals.

Parallel Streams?

Parallel computing is hard, and it used to be a pain. People didn’t exactly love the new (now old) Fork / Join API, when it was first shipped with Java 7. Conversely, and clearly, the conciseness of calling Stream.parallel() is unbeatable.

But many people don’t actually need parallel computing (not to be confused with multi-threading!). In 95% of all cases, people would have probably preferred a more powerful Streams API, or perhaps a generally more powerful Collections API with lots of awesome methods on various Iterable subtypes.

Changing Iterable is dangerous, though. Even a no-brainer as transforming an Iterable into a Stream via a potential Iterable.stream() method seems to risk opening pandora’s box!.

Sequential Streams!

So if the JDK doesn’t ship it, we create it ourselves!

Streams are quite awesome per se. They’re potentially infinite, and that’s a cool feature. Mostly – and especially with functional programming – the size of a collection doesn’t really matter that much, as we transform element by element using functions.

If we admit Streams to be purely sequential, then we could have any of these pretty cool methods as well (some of which would also be possible with parallel Streams):

  • cycle() – a guaranteed way to make every stream infinite
  • duplicate() – duplicate a stream into two equivalent streams
  • foldLeft() – a sequential and non-associative alternative to reduce()
  • foldRight() – a sequential and non-associative alternative to reduce()
  • limitUntil() – limit the stream to those records before the first one to satisfy a predicate
  • limitWhile() – limit the stream to those records before the first one not to satisfy a predicate
  • maxBy() – reduce the stream to the maximum mapped value
  • minBy() – reduce the stream to the minimum mapped value
  • partition() – partition a stream into two streams, one satisfying a predicate and the other not satisfying the same predicate
  • reverse() – produce a new stream in inverse order
  • skipUntil() – skip records until a predicate is satisified
  • skipWhile() – skip records as long as a predicate is satisfied
  • slice() – take a slice of the stream, i.e. combine skip() and limit()
  • splitAt() – split a stream into two streams at a given position
  • unzip() – split a stream of pairs into two streams
  • zip() – merge two streams into a single stream of pairs
  • zipWithIndex() – merge a stream with its corresponding stream of indexes into a single stream of pairs

jOOλ’s new Seq type does all that

All of the above is part of jOOλ. jOOλ (pronounced “jewel”, or “dju-lambda”, also written jOOL in URLs and such) is an ASL 2.0 licensed library that emerged from our own development needs when implementing jOOQ integration tests with Java 8. Java 8 is exceptionally well-suited for writing tests that reason about sets, tuples, records, and all things SQL.

But the Streams API just slightly feels insufficient, so we have wrapped JDK’s Streams into our own Seq type (Seq for sequence / sequential Stream):

// Wrap a stream in a sequence
Seq<Integer> seq1 = seq(Stream.of(1, 2, 3));

// Or create a sequence directly from values
Seq<Integer> seq2 = Seq.of(1, 2, 3);

We’ve made Seq a new interface that extends the JDK Stream interface, so you can use Seq fully interoperably with other Java APIs – leaving the existing methods unchanged:

public interface Seq<T> extends Stream<T> {

    /**
     * The underlying {@link Stream} implementation.
     */
    Stream<T> stream();
	
	// [...]
}

Now, functional programming is only half the fun if you don’t have tuples. Unfortunately, Java doesn’t have built-in tuples and while it is easy to create a tuple library using generics, tuples are still second-class syntactic citizens when comparing Java to Scala, for instance, or C# and even VB.NET.

Nonetheless…

jOOλ also has tuples

We’ve run a code-generator to produce tuples of degree 1-8 (we might add more in the future, e.g. to match Scala’s and jOOQ’s “magical” degree 22).

And if a library has such tuples, the library also needs corresponding functions. The essence of these TupleN and FunctionN types is summarised as follows:

public class Tuple3<T1, T2, T3>
implements 
    Tuple, 
	Comparable<Tuple3<T1, T2, T3>>, 
	Serializable, Cloneable {
    
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;
	
	// [...]
}

and

@FunctionalInterface
public interface Function3<T1, T2, T3, R> {

    default R apply(Tuple3<T1, T2, T3> args) {
        return apply(args.v1, args.v2, args.v3);
    }

    R apply(T1 v1, T2 v2, T3 v3);
}

There are many more features in Tuple types, but let’s leave them out for today.

On a side note, I’ve recently had an interesting discussion with Gavin King (the creator of Hibernate) on reddit. From an ORM perspective, Java classes seem like a suitable implementation for SQL / relational tuples, and they are indeed. From an ORM perspective.

But classes and tuples are fundamentally different, which is a very subtle issue with most ORMs – e.g. as explained here by Vlad Mihalcea.

Besides, SQL’s notion of row value expressions (i.e. tuples) is quite different from what can be modelled with Java classes. This topic will be covered in a subsequent blog post.

Some jOOλ examples

With the aforementioned goals in mind, let’s see how the above API can be put to work by example:

zipping

// (tuple(1, "a"), tuple(2, "b"), tuple(3, "c"))
Seq.of(1, 2, 3).zip(Seq.of("a", "b", "c"));

// ("1:a", "2:b", "3:c")
Seq.of(1, 2, 3).zip(
    Seq.of("a", "b", "c"), 
    (x, y) -> x + ":" + y
);

// (tuple("a", 0), tuple("b", 1), tuple("c", 2))
Seq.of("a", "b", "c").zipWithIndex();

// tuple((1, 2, 3), (a, b, c))
Seq.unzip(Seq.of(
    tuple(1, "a"),
    tuple(2, "b"),
    tuple(3, "c")
));

This is already a case where tuples have become very handy. When we “zip” two streams into one, we want a wrapper value type that combines both values. Classically, people might’ve used Object[] for quick-and-dirty solutions, but an array doesn’t indicate attribute types or degree.

Unfortunately, the Java compiler cannot reason about the effective bound of the <T> type in Seq<T>. This is why we can only have a static unzip() method (instead of an instance one), whose signature looks like this:

// This works
static <T1, T2> Tuple2<Seq<T1>, Seq<T2>> 
    unzip(Stream<Tuple2<T1, T2>> stream) { ... }
	
// This doesn't work:
interface Seq<T> extends Stream<T> {
    Tuple2<Seq<???>, Seq<???>> unzip();
}

Skipping and limiting

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipWhile(i -> i < 3);

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipUntil(i -> i == 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitWhile(i -> i < 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitUntil(i -> i == 3);

Other functional libraries probably use different terms than skip (e.g. drop) and limit (e.g. take). It doesn’t really matter in the end. We opted for the terms that are already present in the existing Stream API: Stream.skip() and Stream.limit()

Folding

// "abc"
Seq.of("a", "b", "c").foldLeft("", (u, t) -> t + u);

// "cba"
Seq.of("a", "b", "c").foldRight("", (t, u) -> t + u);

The Stream.reduce() operations are designed for parallelisation. This means that the functions passed to it must have these important attributes:

But sometimes, you really want to “reduce” a stream with functions that do not have the above attributes, and consequently, you probably don’t care about the reduction being parallelisable. This is where “folding” comes in.

A nice explanation about the various differences between reducing and folding (in Scala) can be seen here.

Splitting

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// tuple((1, 3, 5), (2, 4, 6))
Seq.of(1, 2, 3, 4, 5, 6).partition(i -> i % 2 != 0)

// tuple((1, 2), (3, 4, 5))
Seq.of(1, 2, 3, 4, 5).splitAt(2);

The above functions all have one thing in common: They operate on a single stream in order to produce two new streams, that can be consumed independently.

Obviously, this means that internally, some memory must be consumed to keep buffers of partially consumed streams. E.g.

  • duplication needs to keep track of all values that have been consumed in one stream, but not in the other
  • partitioning needs to fast forward to the next value that satisfies (or doesn’t satisfy) the predicate, without losing all the dropped values
  • splitting might need to fast forward to the split index

For some real functional fun, let’s have a look at a possible splitAt() implementation:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    return seq(stream)
          .zipWithIndex()
          .partition(t -> t.v2 < position)
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

… or with comments:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    // Add jOOλ functionality to the stream
    // -> local Type: Seq<T>
    return seq(stream)
	
    // Keep track of stream positions
    // with each element in the stream
    // -> local Type: Seq<Tuple2<T, Long>>
          .zipWithIndex()
	  
    // Split the streams at position
    // -> local Type: Tuple2<Seq<Tuple2<T, Long>>,
    //                       Seq<Tuple2<T, Long>>>
          .partition(t -> t.v2 < position)
		  
    // Remove the indexes from zipWithIndex again
    // -> local Type: Tuple2<Seq<T>, Seq<T>>
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

Nice, isn’t it? A possible implementation for partition(), on the other hand, is a bit more complex. Here trivially with Iterator instead of the new Spliterator:

static <T> Tuple2<Seq<T>, Seq<T>> partition(
        Stream<T> stream, 
        Predicate<? super T> predicate
) {
    final Iterator<T> it = stream.iterator();
    final LinkedList<T> buffer1 = new LinkedList<>();
    final LinkedList<T> buffer2 = new LinkedList<>();

    class Partition implements Iterator<T> {

        final boolean b;

        Partition(boolean b) {
            this.b = b;
        }

        void fetch() {
            while (buffer(b).isEmpty() && it.hasNext()) {
                T next = it.next();
                buffer(predicate.test(next)).offer(next);
            }
        }

        LinkedList<T> buffer(boolean test) {
            return test ? buffer1 : buffer2;
        }

        @Override
        public boolean hasNext() {
            fetch();
            return !buffer(b).isEmpty();
        }

        @Override
        public T next() {
            return buffer(b).poll();
        }
    }

    return tuple(
        seq(new Partition(true)), 
        seq(new Partition(false))
    );
}

I’ll let you do the exercise and verify the above code.

Get and contribute to jOOλ, now!

All of the above is part of jOOλ, available for free from GitHub. There is already a partially Java-8-ready, full-blown library called functionaljava, which goes much further than jOOλ.

Yet, we believe that all what’s missing from Java 8’s Streams API is really just a couple of methods that are very useful for sequential streams.

In a previous post, we’ve shown how we can bring lambdas to String-based SQL using a simple wrapper for JDBC (of course, we still believe that you should use jOOQ instead).

Today, we’ve shown how we can write awesome functional and sequential Stream processing very easily, with jOOλ.

Stay tuned for even more jOOλ goodness in the near future (and pull requests are very welcome, of course!)

Look no Further! The Final Answer to “Where to Put Generated Code?”


This recent question on Stack Overflow made me think.

Why does jOOQ suggest to put generated code under “/target” and not under “/src”?

… and I’m about to give you the final answer to “Where to Put Generated Code?”

This isn’t only about jOOQ

Even if you’re not using jOOQ, or if you’re using jOOQ but without the code generator, there might be some generated source code in your project. There are many tools that generate source code from other data, such as:

  • The Java compiler (ok, byte code, not strictly source code. But still code generation)
  • XJC, from XSD files
  • Hibernate from .hbm.xml files, or from your schema
  • Xtend translates Xtend code to Java code
  • You could even consider data transformations, like XSLT
  • many more…

In this article, we’re going to look at how to deal with jOOQ-generated code, but the same thoughts apply also to any other type of code generated from other code or data.

Now, the very very interesting strategic question that we need to ask ourselves is: Where to put that code? Under version control, like the original data? Or should we consider generated code to be derived code that must be re-generated all the time?

The answer is nigh…

It depends!

Nope, unfortunately, as with many other flame-wary discussions, this one doesn’t have a completely correct or wrong answer, either. There are essentially two approaches:

Considering generated code as part of your code base

When you consider generated code as part of your code base, you will want to:

  • Check in generated sources in your version control system
  • Use manual source code generation
  • Possibly use even partial source code generation

This approach is particularly useful when your Java developers are not in full control of or do not have full access to your database schema (or your XSD or your Java code, etc.), or if you have many developers that work simultaneously on the same database schema, which changes all the time. It is also useful to be able to track side-effects of database changes, as your checked-in database schema can be considered when you want to analyse the history of your schema.

With this approach, you can also keep track of the change of behaviour in the jOOQ code generator, e.g. when upgrading jOOQ, or when modifying the code generation configuration.

When you use this approach, you will treat your generated code as an external library with its own lifecycle.

The drawback of this approach is that it is more error-prone and possibly a bit more work as the actual schema may go out of sync with the generated schema.

Considering generated code as derived artefacts

When you consider generated code to be derived artefacts, you will want to:

  • Check in only the actual DDL, i.e. the “original source of truth” (e.g. controlled via Flyway)
  • Regenerate jOOQ code every time the schema changes
  • Regenerate jOOQ code on every machine – including continuous integration machines, and possibly, if you’re crazy enough, on production

This approach is particularly useful when you have a smaller database schema that is under full control by your Java developers, who want to profit from the increased quality of being able to regenerate all derived artefacts in every step of your build.

This approach is fully supported by Maven, for instance, which foresees special directories (e.g. target/generated-sources), and phases (e.g. <phase>generate-sources</phase>) specifically for source code generation.

The drawback of this approach is that the build may break in perfectly “acceptable” situations, when parts of your database are temporarily unavailable.

Pragmatic approach

Some of you might not like that answer, but there is also a pragmatic approach, a combination of both. You can consider some code as part of your code base, and some code as derived. For instance, jOOQ-meta’s generated sources (used to query the dictionary views / INFORMATION_SCHEMA when generating jOOQ code) are put under version control as few jOOQ contributors will be able to run the jOOQ-meta code generator against all supported databases. But in many integration tests, we re-generate the sources every time to be sure the code generator works correctly.

Huh!

Conclusion

I’m sorry to disappoint you. There is no final answer to whether one approach or the other is better. Pick the one that offers you more value in your specific situation.

In case you’re choosing your generated code to be part of the code base, read this interesting experience report on the jOOQ User Group by Witold Szczerba about how to best achieve this.

Stop Manually Importing Your ERD Export into jOOQ


ERD (Entity Relationship Diagrams) are a great way of designing and visualising your database model. There is a variety of vendors offering free and commercial ERD tools. Vertabelo by E-Point is a SaaS product where you can design and manage your database schema online. For instance, the jOOQ example database can be modelled as such:

jOOQ Sample Database

jOOQ Sample Database

.

The most important aspect of such an ERD tool, however, is its import / export functionality. Not only can an existing schema be reverse-engineered, but you can also export it in SQL or XML format. This is great news for early adopters of the upcoming jOOQ 3.5, which will finally support importing file-based schema definitions, e.g. using an XML representation of your INFORMATION_SCHEMA.

XSLT to the rescue

In “modern” times where people put JSON everywhere and XML has become evil, people might have forgotten the power of XML and its tool chain. In this case, it will be very trivial to transform the Vertabelo export format into the jOOQ import format:

Export

<Tables>
    <Table Id="t1">
        <Name>LANGUAGE</Name>
        <Description></Description>
        <Columns>
            <Column Id="c1">
                <Name>ID</Name>
                <Type>integer</Type>
                <Nullable>false</Nullable>
                <PK>true</PK>
            </Column>
            <!-- ... -->

A full export file can be seen here.

Import

<information_schema>
    <schemata>
        <schema>
            <schema_name>PUBLIC</schema_name>
        </schema>
    </schemata>
    <tables>
        <table>
            <table_schema>PUBLIC</table_schema>
            <table_name>LANGUAGE</table_name>
        </table>
        <!-- ... -->
    </tables>
    <columns>
        <column>
            <table_schema>PUBLIC</table_schema>
            <table_name>LANGUAGE</table_name>
            <column_name>ID</column_name>
            <data_type>integer</data_type>
            <ordinal_position>1</ordinal_position>
            <is_nullable>false</is_nullable>
        </column>
        <!-- ... -->
    </columns>
    <table_constraints>
        <table_constraint>
            <constraint_schema>PUBLIC</constraint_schema>
            <constraint_name>PK_LANGUAGE</constraint_name>
            <constraint_type>PRIMARY KEY</constraint_type>
            <table_schema>PUBLIC</table_schema>
            <table_name>LANGUAGE</table_name>
        </table_constraint>
        <!-- ... -->

While the Vertabelo export format organised hierarchically, the jOOQ import format is a flat XML representation of the relevant SQL standard INFORMATION SCHEMA tables, as implemented by a variety of databases such as H2, HSQLDB, MySQL, PostgreSQL, SQL Server. The transformation can be done easily with XSLT, looking something like this:

<xsl:template match="/">
    <xsl:key name="schema" 
        match="/DatabaseModel/Tables/Table/Properties/Property[Name = 'Schema']" use="." />

    <information_schema xmlns="http://www.jooq.org/xsd/jooq-meta-3.5.0.xsd">
        <schemata>
            <xsl:apply-templates 
                select="/DatabaseModel/Tables/Table/Properties/Property[Name = 'Schema'][generate-id() = generate-id(key('schema', .)[1])]" 
                mode="schema"/>
        </schemata>
        
        <tables>
            <xsl:apply-templates 
                select="/DatabaseModel/Tables/Table" 
                mode="table"/>
        </tables>
        
        <columns>
            <xsl:apply-templates 
                select="/DatabaseModel/Tables/Table/Columns/Column" 
                mode="column"/>
        </columns>

<!-- ... -->
<xsl:template match="Table" mode="table">
    <table>
        <table_schema>
            <xsl:value-of select="Properties/Property[Name = 'Schema']/Value"/>
        </table_schema>
        <table_name>
            <xsl:value-of select="Name"/>
        </table_name>
    </table>
</xsl:template>

<xsl:template match="Column" mode="column">
    <xsl:variable name="Id" select="@Id"/>
    
    <column>
        <table_schema>
            <xsl:value-of select="ancestor::Table/Properties/Property[Name = 'Schema']/Value"/>
        </table_schema>
        <table_name>
            <xsl:value-of select="ancestor::Table/Name"/>
        </table_name>
        <column_name>
            <xsl:value-of select="Name"/>
        </column_name>
        <data_type>
            <xsl:value-of select="Type"/>
        </data_type>
        <ordinal_position>
            <xsl:value-of select="1 + count(preceding-sibling::Column)"/>
        </ordinal_position>
        <is_nullable>
            <xsl:value-of select="Nullable"/>
        </is_nullable>
    </column>
</xsl:template>

The full XSL file can be seen here.

Configuring Maven

All we need now is to put the Vertabelo export file somewhere in src/main/resources, and transform it with the Codehaus xml-maven-plugin like so:

<!-- The XSLT plugin transforming XML files prior to code generation -->
<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>xml-maven-plugin</artifactId>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>transform</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <transformationSets>
            <transformationSet>
                <dir>src/main/resources</dir>
                <includes>
                    <include>vertabelo-export.xml</include>
                </includes>
                <stylesheet>src/main/resources/vertabelo-2-jooq.xsl</stylesheet>
            </transformationSet>
        </transformationSets>
    </configuration>
</plugin>

The output is then available to the jOOQ code generator in the target directory:

<plugin>
    <groupId>org.jooq</groupId>
    <artifactId>jooq-codegen-maven</artifactId>
    <version>${org.jooq.version}</version>

    <executions>
        <execution>
            <id>generate-h2</id>
            <phase>generate-sources</phase>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <generator>
                    <name>org.jooq.util.DefaultGenerator</name>
                    <database>
                        <!-- We're using the new jOOQ 3.5 XMLDatabase here -->
                        <name>org.jooq.util.xml.XMLDatabase</name>
                        <properties>
                            <property>
                                <key>dialect</key>
                                <value>H2</value>
                            </property>
                            <property>
                                <key>xml-file</key>
                                <value>target/generated-resources/xml/xslt/vertabelo-export.xml</value>
                            </property>
                        </properties>
                        <inputSchema>PUBLIC</inputSchema>
                    </database>
                    <generate>
                        <deprecated>false</deprecated>
                        <instanceFields>true</instanceFields>
                    </generate>
                    <target>
                        <packageName>org.jooq.example.db.h2</packageName>
                        <directory>target/generated-sources/jooq-h2</directory>
                    </target>
                </generator>
            </configuration>
        </execution>
    </executions>
</plugin>

That’s it!

This obviously works with other ERD tools than Vertabelo as well. It should be just as easy to write your own XSL file for your own tool. All you need to do is generate valid XML according to the new http://www.jooq.org/xsd/jooq-meta-3.5.0.xsd schema.

See the pom.xml file here, for details, or download the full example project from GitHub.

Integrating jOOQ with PostgreSQL: Partitioning


Introduction

jOOQ is a great framework when you want to work with SQL in Java without having too much ORM in your way. At the same time, it can be integrated into many environments as it is offering you support for many database-specific features. One such database-specific feature is partitioning in PostgreSQL. Partitioning in PostgreSQL is mainly used for performance reasons because it can improve query performance in certain situations. jOOQ has no explicit support for this feature but it can be integrated quite easily as we will show you.

This article is brought to you by the Germany based jOOQ integration partner UWS Software Service (UWS). UWS is specialised in custom software development, application modernisation and outsourcing with a distinct focus on the Java Enterprise ecosystem.

Partitioning in PostgreSQL

With the partitioning feature of PostgreSQL you have the possibility of splitting data that would form a huge table into multiple separate tables. Each of the partitions is a normal table which inherits its columns and constraints from a parent table. This so-called table inheritance can be used for “range partitioning” where, for example, the data from one range does not overlap the data from another range in terms of identifiers, dates or other criteria.

Like in the following example, you can have partitioning for a table “author” that shares the same foreign-key of a table “authorgroup” in all its rows.

CREATE TABLE author (
  authorgroup_id int,
  LastName varchar(255)
);

CREATE TABLE author_1 (
  CONSTRAINT authorgroup_id_check_1
    CHECK ((authorgroup_id = 1))
) INHERITS (author);

CREATE TABLE author_2 (
  CONSTRAINT authorgroup_id_check_2
    CHECK ((authorgroup_id = 2))
) INHERITS (author);

...

As you can see, we set up inheritance and – in order to have a simple example – we just put one constraint checking that the partitions have the same “authorgroup_id”. Basically, this results in the “author” table only containing table and column definitions, but no data. However, when querying the “author” table, PostgreSQL will really query all the inheriting “author_n” tables returning a combined result.

A trivial approach to using jOOQ with partitioning

In order to work with the partitioning described above, jOOQ offers several options. You can use the default way which is to let jOOQ generate one class per table. In order to insert data into multiple tables, you would have to use different classes. This approach is used in the following snippet:

// add
InsertQuery query1 = dsl.insertQuery(AUTHOR_1);
query1.addValue(AUTHOR_1.ID, 1);
query1.addValue(AUTHOR_1.LAST_NAME, "Nowak");
query1.execute();

InsertQuery query2 = dsl.insertQuery(AUTHOR_2);
query2.addValue(AUTHOR_2.ID, 1);
query2.addValue(AUTHOR_2.LAST_NAME, "Nowak");
query2.execute();

// select
Assert.assertTrue(dsl
    .selectFrom(AUTHOR_1)
    .where(AUTHOR_1.LAST_NAME.eq("Nowak"))
    .fetch().size() == 1);

Assert.assertTrue(dsl
    .selectFrom(AUTHOR_2)
    .where(AUTHOR_2.LAST_NAME.eq("Nowak"))
    .fetch().size() == 1);

You can see that multiple classes generated by jOOQ need to be used, so depending on how many partitions you have, generated classes can pollute your codebase. Also, imagine that you eventually need to iterate over partitions, which would be cumbersome to do with this approach. Another approach could be that you use jOOQ to build fields and tables using string manipulation but that is error prone again and prevents support for generic type safety. Also, consider the case where you want true data separation in terms of multi-tenancy.

You see that there are some considerations to do when working with partitioning. Fortunately jOOQ offers various ways of working with partitioned tables, and in the following we’ll compare approaches, so that you can choose the one most suitable for you.

Using jOOQ with partitioning and multi-tenancy

JOOQ’s runtime-schema mapping is often used to realize database environments, such that for example during development, one database is queried but when deployed to production, the queries are going to another database. Multi-tenancy is another recommended use case for runtime-schema mapping as it allows for strict partitioning and for configuring your application to only use databases or tables being configured in the runtime-schema mapping. So running the same code would result in working with different databases or tables depending on the configuration, which allows for true separation of data in terms of multi-tenancy.

The following configuration taken from the jOOQ documentation is executed when creating the DSLContext so it can be considered a system-wide setting:

Settings settings = new Settings()
  .withRenderMapping(new RenderMapping()
  .withSchemata(
      new MappedSchema().withInput("DEV")
                        .withOutput("MY_BOOK_WORLD")
                        .withTables(
      new MappedTable().withInput("AUTHOR")
                       .withOutput("AUTHOR_1"))));

// Add the settings to the Configuration
DSLContext create = DSL.using(
  connection, SQLDialect.ORACLE, settings);

// Run queries with the "mapped" configuration
create.selectFrom(AUTHOR).fetch();

// results in SQL:
// “SELECT * FROM MY_BOOK_WORLD.AUTHOR_1”

Using this approach you can map one table to one partition permanently eg. “AUTHOR” to “AUTHOR_1” for environment “DEV”. In another environment you could choose to map “AUTHOR” table to “AUTHOR_2”.

Runtime-schema mapping only allows you to map to exactly one table on a per-query basis, so you could not handle the use case where you would want to manipulate more than one table partition. If you would like to have more flexibility you might want to consider the next approach.

Using jOOQ with partitioning and without multi-tenancy

If you need to handle multiple table partitions without having multi-tenancy, you need a more flexible way of accessing partitions. The following example shows how you can do it in a dynamic and type safe way, avoiding errors and being usable in the same elegant way you are used to by jOOQ:

// add
for(int i=1; i<=2; i++) {
  Builder part = forPartition(i);
  InsertQuery query = dsl.insertQuery(part.table(AUTHOR));
  query.addValue(part.field(AUTHOR.ID), 1);
  query.addValue(part.field(AUTHOR.LAST_NAME), "Nowak");
  query.execute();
}

// select

for(int i=1; i<=2; i++) {
  Builder part = forPartition(i);
  Assert.assertTrue(dsl
      .selectFrom(part.table(AUTHOR))
      .where(part.field(AUTHOR.LAST_NAME).eq("Nowak"))
      .fetch()
      .size() == 1);
}

What you can see above is that the partition numbers are abstracted away so that you can use “AUTHOR” table instead of “AUTHOR_1”. Thus, your code won’t be polluted with many generated classes. Another thing is that the partitioner object is initialized dynamically so you can use it for example in a loop like above. Also it follows the Builder pattern so that you can operate on it like you are used to by jOOQ.

The code above is doing exactly the same as the first trivial snippet, but there are multiple benefits like type safe and reusable access to partitioned tables.

Integration of jOOQ partitioning without multi-tenancy into a Maven build process (optional)

If you are using Continuous-Integration you can integrate the solution above so that jOOQ is not generating tables for the partitioned tables. This can be achieved using a regular expression that excludes certain table names when generating Java classes. When using Maven, your integration might look something like this:

<generator>
  <name>org.jooq.util.DefaultGenerator</name>
  <database>
    <name>org.jooq.util.postgres.PostgresDatabase</name>
    <includes>.*</includes>
    <excludes>.*_[0-9]+</excludes>
    <inputSchema>${db.schema}</inputSchema>
  </database>
  <target>
    <packageName>com.your.company.jooq</packageName>
    <directory>target/generated-sources/jooq</directory>
  </target>
</generator>

Then it’s just calling mvn install and jOOQ maven plugin will be generating the database schema in compilation time.

Integrating jOOQ with PostgreSQL: Partitioning

This article described how jOOQ in combination with the partitioning feature of PostgreSQL can be used to implement multi-tenancy and improve database performance. PostgreSQL’s documentation states that for partitioning “the benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.”

Achieving support for partitioning with jOOQ is as easy as adding configuration or a small utility class, jOOQ is then able to support partitioning with or without multi-tenancy and without sacrificing type safety. Apart from Java-level integration, the described solution also smoothly integrates into your build and test process.

You may want to look at the sources of the partitioner utility class which also includes a test-class so that you can see the behavior and integration in more detail.

Please let us know if you need support for this or other jOOQ integrations within your environment. UWS Software Service (UWS) is an official jOOQ integration partner.

A Wonderful SQL Feature: Quantified Comparison Predicates (ANY, ALL)


Have you ever wondered about the use-case behind SQL’s ANY (also: SOME) and ALL keywords?

You have probably not yet encountered these keywords in the wild. Yet they can be extremely useful. But first, let’s see how they’re defined in the SQL standard. The easy part:

8.7  <quantified comparison predicate>

Function

    Specify a quantified comparison.

Format

    <quantified comparison predicate> ::=
        <row value constructor> <comp op> 
            <quantifier> <table subquery>

    <quantifier> ::= <all> | <some>
    <all> ::= ALL
    <some> ::= SOME | ANY

Intuitively, such a quantified comparison predicate can be used as such:

-- Is any person of age 42?
42 = ANY (SELECT age FROM person)

-- Are all persons younger than 42?
42 > ALL (SELECT age FROM person)

Let’s keep with the useful ones. Observe that you have probably written the above queries with a different syntax, as such:

-- Is any person of age 42?
42 IN (SELECT age FROM person)

-- Are all persons younger than 42?
42 > (SELECT MAX(age) FROM person)

In fact, you’ve used the <in predicate>, or a greater than predicate with a <scalar subquery> and an aggregate function.

The IN predicate

It’s not a coincidence that you might have used the <in predicate> just like the above <quantified comparison predicate> using ANY. In fact, the <in predicate> is specified just like that:

8.4 <in predicate>

Syntax Rules

2) Let RVC be the <row value constructor> and let IPV 
   be the <in predicate value>.

3) The expression

     RVC NOT IN IPV

   is equivalent to

     NOT ( RVC IN IPV )

4) The expression

     RVC IN IPV

   is equivalent to

     RVC = ANY IPV

Precisely! Isn’t SQL beautiful? Note, the implicit consequences of 3) lead to a very peculiar behaviour of the NOT IN predicate with respect to NULL, which few developers are aware of.

Now, it’s getting awesome

So far, there is nothing out of the ordinary with these <quantified comparison predicate>. All of the previous examples can be emulated with “more idiomatic”, or let’s say, “more everyday” SQL.

But the true awesomeness of <quantified comparison predicate> appears only when used in combination with <row value expression> where rows have a degree / arity of more than one:

-- Is any person called "John" of age 42?
(42, 'John') = ANY (SELECT age, first_name FROM person)

-- Are all persons younger than 55?
-- Or if they're 55, do they all earn less than 150'000.00?
(55, 150000.00) > ALL (SELECT age, wage FROM person)

See the above queries in action on PostgreSQL in this SQLFiddle.

At this point, it is worth mentioning that few databases actually support…

  • row value expressions, or…
  • quantified comparison predicates with row value expressions

Even if specified in SQL-92, it looks as most databases still take their time to implement this feature 22 years later.

Emulating these predicates with jOOQ

But luckily, there is jOOQ to emulate these features for you. Even if you’re not using jOOQ in your project, the following SQL transformation steps can be useful if you want to express the above predicates. Let’s have a look at how this could be done in MySQL:

-- This predicate
(42, 'John') = ANY (SELECT age, first_name FROM person)

-- ... is the same as this:
EXISTS (
  SELECT 1 FROM person 
  WHERE age = 42 AND first_name = 'John'
)

What about the other predicate?

-- This predicate
(55, 150000.00) > ALL (SELECT age, wage FROM person)

-- ... is the same as these:
----------------------------
-- No quantified comparison predicate with
-- Row value expressions available
(55, 150000.00) > (
  SELECT age, wage FROM person
  ORDER BY 1 DESC, 2 DESC
  LIMIT 1
)

-- No row value expressions available at all
NOT EXISTS (
  SELECT 1 FROM person
  WHERE (55 < age)
  OR    (55 = age AND 150000.00 <= wage)
)

Clearly, the EXISTS predicate can be used in pretty much every database to emulate what we’ve seen before. If you just need this for a one-shot emulation, the above examples will be sufficient. If, however, you want to more formally use <row value expression> and <quantified comparison predicate>, you better get SQL transformation right.

Read on about SQL transformation in this article here.

Follow

Get every new post delivered to your Inbox.

Join 2,312 other followers

%d bloggers like this: