Let’s Meet at JAX London and Other Events to Talk About SQL

Exciting times are ahead for Java/SQL developers. SQL is strong and popular as ever. It is the right language for relational databases just as much as for Big Data storage systems. Just this week, Pivotal has announced their open sourcing HAWQ and MADlib as a part of their Hadoop Native SQL strategy.

At Data Geekery, we’re promoting SQL with jOOQ – typesafe, embedded SQL for Java, and now also with our new advanced SQL workshop, which is available publicly and as an in-house event. Check it out here.

In the next couple of months, we’re touring Europe again, and among other places, we’ll be presenting at the JAX London where as an official media partner.

Apart from our own talk, be sure not to miss any of the following exciting talks

And don’t forget:

Get 10% off your JAX London ticket by entering the MP_jOOQ10 discount code

We’re looking forward to meeting you in London, or in any other upcoming jOOQ or SQL event

It is all about the JDBC Basics

We’re very happy to announce a guest post by Marco Behler, who has been blogging about jOOQ in the past.

img31Marco started out in programming (reverse-engineering, actually) and now mainly programmes on the JVM in his day-to-day work. He also always had a sweet tooth for strategy and marketing. Marco Behler GmbH is the result of that hybrid role.

It is all about the JDBC Basics

It is one of the days.

You are reading the Spring documentation’s @Transactional section and still don’t understand the difference between logical and physical transaction scopes. Simultaneously your app throws an
LazyInitializationException and you have no idea why. To top it off you see spontaneous database deadlocks in production and you suspect your connection pool is leaking connections..somehow.

Know what most likely would have helped instead of banging your head against the wall? Spending a couple (literally) of hours on learning the JDBC basics. Let’s find out why:

What are the JDBC basics?

The basics are opening up/closing database connections and then working with transactions. Also understanding how deadlocks, pessimistic and optimistic locking work on a plain JDBC level. A bit of isolation levels and savepoints and then directly on to connection pools and jdbc driver logging. That’s it. Seriously.

Why are the basics so important?

Everything you will encounter in frameworks like Spring, Hibernate, jOOQ etc. builds up on these basics. For example, there are a gazillion topics on the internet regarding Hibernate’s LazyInitializationException and I was scared of that particular exception myself many years ago. But what else would you expect trying to query the database without having a connection to the database open (which is basically all that this exeception is) ?

The same with Spring’s “transaction framework”. There is so much content, or shall we say (F)ear/(U)ncertainty/(D)oubt, out there on how to open up transactions with spring, be it programmatically, with annotations or xml. But what if you knew that under the hood, there is only one way (and actually one line of code) to open up transactions in the JDBC world?

Let me not even get started on the various (mis)configurations of connection pools you see in production in the wild. Or the unawareness of JDBC (driver) logging, which usually leads to debugging in the wild. All basics, which you can master in a couple of hours and which will help you for a lifetime!

Why do people not just learn the basics?

In every middle-sized project there is a ton of technologies involved and there usually is no clear-cut path on how to learn all of them or how they all work together. It simply takes a lot of time and effort to dig through everything.

There’s JPA sessions and JDBC connections and then Spring somehow provides those transactional proxies in 5 different ways and then some other colleague just put jOOQ into the mix, but then somehow my session doesn’t flush and my objects don’t get persisted and the HibernateTransactionManager is not working as expected.

With all of this, I would also hope for my database transactions just to commit – god forbid what happens on rollback 馃檪

But in the end, everything technology mentioned is just a layer on top of JDBC. If you understand transactions or deadlocks or savepoints on the basic level, then Spring or Hibernate or jOOQ will not throw you off.

So what do you recommend ?

If you want to get miles ahead in your day-to-day database programming, you have to start with the basics. Step-by-Step. And then you will see most of your problems automatically evaporate.

Out of my extensive database programming experience, I created an ebook with a ton of ready-to-run exercises, which will take you from Java database novice to expert. At your own pace. You can literally copy the source code of every chapter into your IDE, run it and (hopefully) learn from it. It covers plain JDBC, Spring, Hibernate, jOOQ (soon) and also distributed transactions.

You can read the whole book for free online here, and I would love to get your feedback! I would really like to let the community feedback flow back into future editions of the book. In addition, If you like what you see and the exercises help you, you can also show your support by getting a paid digital version (pdf, epub, mobi).

In any case…

…learn your JDBC basics – and you will profit from them for a lifetime!

Comparing Imperative and Functional Algorithms in Java 8

Mario Fusco’s popular tweet impressively shows what the main difference between imperative and functional approaches to similar algorithms really is:

Both algorithms do the same thing, they’re probably equally fast and reasonable. Yet, one of the algorithms is much easier to write and read than the other. The difference lies in the fact that in imperative programming, different algorithmic requirements are spread throughout the code block, when in functional programming, each requirement has its own little line of code. Compare:

  • Green: Error handling
  • Blue: Stop criteria
  • Red: IO operations
  • Yellow: “Business logic”

Functional programming doesn’t always beat imperative programming as displayed in other examples on the jOOQ blog:

But here’s an example from Stack Overflow by user Aurora_Titanium, where the difference is as clear as in Mario Fusco’s example:

Calculating the Duplicate Values in an Array

The idea is to calculate the sum of all those values that are duplicate in a set of values. For instance, the following array:

int[] list = new int[]{1,2,3,4,5,6,7,8,8,8,9,10};

… should yield as a result something like:

Duplicate: 8. Sum of all duplicate values: 24

The imperative approach

One of the answers by user Volkan Ozkan takes an imperative approach and calculates the sum as such:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

int sum = 0;
for (int j = 0; j < array.length; j++)
{
    for (int k = j + 1; k < array.length; k++) 
    {
        if (k != j && array[k] == array[j])
        {
            sum = sum + array[k];
            System.out.println(
                "Duplicate found: " 
              + array[k]
              + " " 
              + "Sum of the duplicate value is " + sum);
        }
    }
}

The approach works only for sorted arrays where duplicates appear right after one another. In that case, however, it is probably an optimal solution in terms of performance, if performance really matters to this algorithm.

The functional approach

If a slight decrease of performance is acceptable to you (boxing ints, collecting them into maps), and it probably is, you can replace the above difficult-to-read code with the following bit of functional Java-8-style logic, which communicates much more clearly what it does:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

IntStream.of(array)
         .boxed()
         .collect(groupingBy(i -> i))
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

or, with explanations:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

// Create a Stream<Integer> from your data
IntStream.of(array)
         .boxed()

// Group values into a Map<Integer, List<Integer>>
         .collect(groupingBy(i -> i))

// Filter out those map values that have only 
// 1 element in their group
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)

// Print the sum for the remaining groups
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

(note that the functional approach calculates sums for each duplicate value, not an overall sum, like the imperative approach. From the original question, this requirement wasn’t very clear)

As we’ve stated in a previous article on our blog, the power of functional programming via an API like the Java 8 Stream API is the fact that we’re approaching the expressive power of SQL-style declarative programming. We’re no longer concerned with remembering individual array indexes and how to calculate them and store intermediate results into some buffers. We can now focus on the really interesting logic, such as: “what’s a duplicate?” or “what sum am I interested in?”

Read on about how SQL compares to Java 8 Streams:

Common SQL clauses and their equivalents in Java 8 Streams

Don’t Format Dates in SQL. Use the DATE Literal!

I’m seeing people do this all the time. They want to hammer a date or timestamp constant into their SQL query, and the only function they know is the TO_DATE() or TO_TIMESTAMP() date parsing function:

SELECT TO_DATE ('20150801', 'yyyymmdd')
FROM DUAL;

As observed in this Stack Overflow question, for instance:

TO_DATE ('20150801', 'yyyymmdd') AS DAY_20150801_TOTAL,
TO_DATE ('20150802', 'yyyymmdd') AS DAY_20150802_TOTAL,
TO_DATE ('20150803', 'yyyymmdd') AS DAY_20150803_TOTAL,
TO_DATE ('20150804', 'yyyymmdd') AS DAY_20150804_TOTAL,
TO_DATE ('20150805', 'yyyymmdd') AS DAY_20150805_TOTAL,
TO_DATE ('20150806', 'yyyymmdd') AS DAY_20150806_TOTAL,
TO_DATE ('20150807', 'yyyymmdd') AS DAY_20150807_TOTAL,
TO_DATE ('20150808', 'yyyymmdd') AS DAY_20150808_TOTAL,
TO_DATE ('20150809', 'yyyymmdd') AS DAY_20150809_TOTAL,
TO_DATE ('20150810', 'yyyymmdd') AS DAY_20150810_TOTAL,
TO_DATE ('20150811', 'yyyymmdd') AS DAY_20150811_TOTAL, [...]

Date parsing is important only if your date input is really stored as strings in a table and in a custom format, and you want to manipulate the dates. If you are using date constants, please save some time and improve readability by using the DATE literal or TIMESTAMP literal.

Here’s an example:

SELECT 
  DATE '2015-08-01' AS d,
  TIMESTAMP '2015-08-01 15:30:00' AS ts
FROM DUAL;

Most major RDBMS support the above SQL standard literals. The date format that you should use with these is the ISO 8601 format. Advantages of these literals:

  • They’re really constants and as such, they’re parsed only by the SQL parser, not the execution engine (e.g. when put in a WHERE clause)
  • They’re much more readable than any other option

It’s the little things that sum up to make the big difference.

How to use Java 8 Functional Programming to Generate an Alphabetic Sequence

I’ve stumbled upon an interesting Stack Overflow question by user “mip”. The question was:

I’m looking for a way of generating an alphabetic sequence:

A, B, C, ..., Z, AA, AB, AC, ..., ZZ.

This can be quickly recognised as the headings of an Excel spreadsheet, which does precisely that:

excel.

So far, none of the answers employed any Java 8 functional programming, which I accepted as a challenge. We’re going to use jOO位, because the Java 8 Stream API does not offer enough functionality for this task. (I stand corrected – thank you Sebastian, for this interesting answer)

But first, let’s decompose the algorithm in a functional way. What we need are these components:

  1. A (reproducible) representation of the alphabet
  2. An upper bound, i.e. how many letters we want to produce. The requested sequence goes to ZZ, which means the upper bound would be 2
  3. A way to combine each letter of the alphabet with the previously generated combined letters in a cartesian product

Let’s look into some code:

1. Generating the alphabet

We could be writing the alphabet like this:

List<String> alphabet = Arrays.asList("A", "B", ..., "Z");

but that would be lame. Let’s generate it instead, using jOO位:

List<String> alphabet = Seq
    .rangeClosed('A', 'Z')
    .map(Object::toString)
    .toList();

The above generates a “closed” range (Java-8-Stream-speak for a range with inclusive upper bound) of characters between A and Z, maps characters to strings and collects them into a list.

So far so good. Now:

2. Using an upper bound

The requested sequence of characters includes:

A .. Z, AA, AB, .. ZZ

But we could easily imagine to extend this requirement generally to produce the following, or even more.

A .. Z, AA, AB, .. ZZ, AAA, AAB, .. ZZZ

For this, we’ll use again rangeClosed():

// 1 = A .. Z, 2 = AA .. ZZ, 3 = AAA .. ZZZ
Seq.rangeClosed(1, 2)
   .flatMap(length -> ...)
   .forEach(System.out::println);

The idea here is to produce a new stream for each individual length in the range [1 .. 2], and to flatten those streams into one single stream. flatMap() is essentially the same as a nested loop in imperative programming.

3. Combine letters in a cartesian product

This is the trickiest part: We need to combine each letter with each letter length times. For this, we’ll use the following stream:

Seq.rangeClosed(1, length - 1)
   .foldLeft(Seq.seq(alphabet), (s, i) -> 
       s.crossJoin(Seq.seq(alphabet))
        .map(t -> t.v1 + t.v2))
    );

We’re using again rangeClosed() to produce values in the range [1 .. length-1]. foldLeft() is the same as reduce(), except that foldLeft() is guaranteed to go from “left to right” in a stream, without requiring the folding function to be associative. Whew.

In other, more understandable words: foldLeft() is nothing else but an imperative loop. The “seed” of the loop, i.e. the loop’s initial value, is a complete alphabet (Seq.seq(alphabet)). Now, for every value in the range [1 .. length-1], we produce a cartesian product (crossJoin()) between the letters “folded” so far and a new alphabet, and we concatenate each combination into a single new string (t.v1 and t.v2).

That’s it!

Combining everything

The following simple program prints all the values from A .. Z, AA .. ZZ, AAA .. ZZZ to the console:

import java.util.List;

import org.jooq.lambda.Seq;

public class Test {
    public static void main(String[] args) {
        int max = 3;

        List<String> alphabet = Seq
            .rangeClosed('A', 'Z')
            .map(Object::toString)
            .toList();

        Seq.rangeClosed(1, max)
           .flatMap(length ->
               Seq.rangeClosed(1, length - 1)
                  .foldLeft(Seq.seq(alphabet), (s, i) -> 
                      s.crossJoin(Seq.seq(alphabet))
                       .map(t -> t.v1 + t.v2)))
           .forEach(System.out::println);
    }
}

Disclaimer

This is certainly not the most optimal algorithm for this particular case. One of the best implementations has been given by an unnamed user on Stack Overflow:

import static java.lang.Math.*;

private static String getString(int n) {
    char[] buf = new char[(int) floor(log(25 * (n + 1)) / log(26))];
    for (int i = buf.length - 1; i >= 0; i--) {
        n--;
        buf[i] = (char) ('A' + n % 26);
        n /= 26;
    }
    return new String(buf);
}

Unnecessary to say that the latter runs much much faster than the previous functional algorithm.