A True SQL Gem You Didn’t Know Yet: The EVERY() Aggregate Function


We’ve just added support for the EVERY() aggregate function (#1391) to jOOQ, and would like to take the opportunity of letting you know of this true SQL gem that can come in handy EVERY(now and then) (pun intended).

Let’s assume we have four books in our table:

INSERT INTO book VALUES (1, 1, '1984');
INSERT INTO book VALUES (2, 1, 'Animal Farm');
INSERT INTO book VALUES (3, 2, 'O Alquimista');
INSERT INTO book VALUES (4, 2, 'Brida');

Now the question is:

Is EVERY() ID lower than 10?

We’ll ask:

SELECT EVERY(id < 10)
FROM book

And the answer is:

every
-----
true

Does EVERY() book for each author end with the letter ‘a’?

We’ll ask:

SELECT author_id, EVERY(title LIKE '%a')
FROM book
GROUP BY author_id

And the answer is:

author_id   every
-----------------
1           false
2           true

Wonderful!

As with all aggregate functions, we can even use them as a window function!

SELECT 
  book.*, 
  EVERY(title LIKE '%a') OVER (PARTITION BY author_id)
FROM book

Which will produce

id  author_id   title          every
------------------------------------
1   1           1984           false
2   1           Animal Farm    false
3   2           O Alquimista   true
4   2           Brida          true

Who supports EVERY()

Well, the SQL standard has it:

10.9 <aggregate function>

<aggregate function> ::=
    COUNT <left paren> <asterisk> <right paren> [ <filter clause> ]
  | <general set function> [ <filter clause> ]
  | <binary set function> [ <filter clause> ]
  | <ordered set function> [ <filter clause> ]
  | <array aggregate function> [ <filter clause> ]

<general set function> ::=
    <set function type> <left paren> [ <set quantifier> ]
        <value expression> <right paren>

<set function type> ::=
    <computational operation>

<computational operation> ::=
    AVG
  | MAX
  | MIN
  | SUM
  | EVERY <-- yes, here! EVERY!
  | ANY
  | SOME
  | COUNT
  | STDDEV_POP
  | STDDEV_SAMP
  | VAR_SAMP
  | VAR_POP
  | COLLECT
  | FUSION
  | INTERSECTION

And, of course PostgreSQL!

But if your database is not PostgreSQL, don’t worry. EVERY() can be emulated on EVERY() database using SUM() and CASE expressions. Here’s how to emulate the first query:

-- SELECT EVERY(id < 10)
-- FROM book

SELECT CASE SUM(CASE WHEN id < 10 THEN 0 ELSE 1 END) 
         WHEN 0 THEN 1 
         ELSE 0 
       END
FROM book;

Or as window functions

-- SELECT 
--   book.*, 
--   EVERY(title LIKE '%a') OVER (PARTITION BY author_id)
-- FROM book

SELECT 
  book.*,
  CASE SUM(CASE WHEN title LIKE '%a' THEN 0 ELSE 1 END)
       OVER(PARTITION BY author_id)
    WHEN 0 THEN 1 
    ELSE 0
  END
FROM book;

And, as always on this blog, we’re happy to conclude that the upcoming jOOQ 3.6 will now handle EVERY(emulation) for you, so you can write:

DSL.using(configuration)
   .select(BOOK.fields())
   .select(every(BOOK.TITLE.like("%a"))
           .over(partitionBy(BOOK.AUTHOR_ID)))
   .from(BOOK)
   .fetch();

Have fun with this new function!

jOOQ: the best way to write SQL in Java

jOOQ Tuesdays: Yalım Gerger brings Git to Oracle


We’re excited to launch a new series on our blog: the jOOQ Tuesdays. In this series, we’ll publish an article on the third Tuesday every month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

yalim-gergerWe have the pleasure of starting this series with Yalım Gerger who will show us why he thinks that Oracle PL/SQL developers are more than ready for Git!

Hi Yalım – you’re the founder of Gerger, the company behind Formspider. What’s Formspider?

Hi. Formspider is an application development framework for Oracle PL/SQL developers. It enables PL/SQL developers to build high quality business applications using only PL/SQL as the programming language. No Java or JavaScript skills are needed to use Formspider.

Interesting, even from a Java developer’s perspective! Essentially, you’re offering a way to completely bypass Java as middleware (and of course HTML, JavaScript, CSS).

This is not entirely true. We still use Java in our product. Formspider has a middletier application that our customers can deploy to any JEE compliant application server. This middletier application helps us bridge Formspider JavaScript library running in the client’s browser to the PL/SQL running in the database. We also use Java Libraries to generate Excel files from the data stored in an application, a common use case for business applications. So yes, the applications are not coded in Java. Our customers are PL/SQL developers. But we use Java to improve our product. Same with HTML and Javascript. Our job is to understand these technologies and their capabilities really well and expose them as intuitive API’s to PL/SQL developers.

Do you also have customers that access their PL/SQL APIs both through Formspider as well as through their home-grown Java / .NET applications?

Yes. We have customers who have both PL/SQL teams that work with Formspider and Java teams that work with Java technologies. This requires a great deal of collaboration between two teams and that’s not always possible.

Usually, what happens is that Java/.NET teams try to move the application away from the database as much as possible. I was just talking to a friend who works at a large financial institution in which the OO guys are pushing hard to eliminate all the PL/SQL API’s. He was going mad. There are various reasons for this. It is partly political turf wars, partly pure ignorance about the capabilities of database software and PL/SQL.

We can feel the pain. There is no such thing as a magic bullet… So what should they do then? How should an application be architected? Do you think there is a “right” architecture?

No. I think it depends so much on the context. Are you building a consumer app or are you building a business app? Are you a company building a horizontal product or an IT department serving a business operating in a particular vertical? There are so many parameters to consider. At the risk of being too generic, I think an IT department serving a large enterprise should not try build a database agnostic application. That’s silly. On the contrary it should take full advantage of the database software, and other software it uses. You shouldn’t pretend to be seven different organizations building seven different layers of the application just happen to be collaborating. You are just one organization. Act like that. Cut through the fat. Integrate as deeply as you can. This is the most cost effective way to build well performing applications on time and on budget. Database agnostic software is for horizontal software companies.

We’ve recently blogged about the caveats of dual licensing, where we said that shipping our sources to paying customers is essential for a company that calls themselves an Open Source company like Data Geekery does. I’ve seen you ship your sources as well – but you’re not really doing “Open Source”. How would you describe your offering?

I loved that blog post by the way. I think the way jOOQ is licensed is brilliantly fair i.e. it is free if the database is free and it has a price tag if your database has a license fee. In our case, the database always has a license fee. So we don’t have a free option for Formspider. For the Oracle community and for the price tags that they are used to, our license fee is so small that it is practically free. Anyone who is thrown off by our price tag is probably not serious about using Formspider anyway.

Our customers who sign up for our highest level of support service may get the source code of our product for the duration of the professional service. This option is attractive for customers who invest a lot into the application they build with Formspider.

Yes, Oracle price tags have a reputation… Yalım, you seem to be an Oracle person. And as such, you are about to launch gitora. What is it?

Gitora is a free version control tool that integrates the Oracle database with Git. This is a little embarrassing to bring up in a blog mainly read by Java developers but very common version control tasks that most Java developers take for granted are very hard to do in PL/SQL. There is a good reason for this. PL/SQL has no concept of a working directory. PL/SQL is not a file based language i.e. source code units do not reside in the private file system of a developer but in the Oracle database as packages, procedures and functions globally  available to any developer. That makes version control very difficult if not impossible.

So what do people do?

Nothing mostly. Daily backups are used as a way to get back to a previous state of the code if needed.

Some teams create one working directory that is hooked to version control and store all their PL/SQL code in this directory by extracting the DDL’s, usually manually. That’s as sophisticated as it gets.

Proper team development and merging in PL/SQL is very difficult and I haven’t seen it done successfully very often. And I’ve interacted with a lot of PL/SQL teams all around the World. Gitora makes this very easy. It turns the database schema to a working directory. If you execute a Git command, any change to the working directory happens automatically in your database schema.

Interesting. We’ve recently implemented a home-grown “solution” for a customer, which implements automatic version control and installation from a Microsoft Team Foundation Server repository. Maybe, we should migrate to Gitora then?

I didn’t know that. That’s so cool. If you build a version control tool which works for PL/SQL and talks to TFS instead of Git, I think that is also very valuable. Essentially we build the same product but used different version control products. I encourage you to put it out there.

Why not. Maybe we’ll contribute!

Thanks for this very interesting insight, Yalım!

If this interview has triggered your interest, follow Yalım, FormSpider, or Gitora on Twitter:

For more information about Gitora, watch the Gitora tutorial:

Really Too Bad that Java 8 Doesn’t Have Iterable.stream()


This is one of the more interesting recent Stack Overflow questions:

Why does Iterable not provide stream() and parallelStream() methods?

At first, it might seem intuitive to make it straight-forward to convert an Iterable into a Stream, because the two are really more or less the same thing for 90% of all use-cases.

Granted, the expert group had a strong focus on making the Stream API parallel capable, but anyone who works with Java every day will notice immediately, that Stream is most useful in its sequential form. And an Iterable is just that. A sequential stream with no guarantees with respect to parallelisation. So, it would only be intuitive if we could simply write:

iterable.stream();

In fact, subtypes of Iterable do have such methods, e.g.

collection.stream();

Brian Goetz himself gave an answer to the above Stack Overflow question. The reasons for this omittance are rooted in the fact that some Iterables might prefer to return an IntStream instead of a Stream. This really seems to be a very remote reason for a design decision, but as always, omittance today doesn’t mean omittance forever. On the other hand, if they had introduced Iterable.stream() today, and it turned out to be a mistake, they couldn’t have removed it again.

Well, primitive types in Java are a pain and they did all sorts of bad things to generics in the first place, and now to Stream as well, as we have to write the following, in order to turn an Iterable into a Stream:

Stream s = StreamSupport.stream(iterable.spliterator(), false);

Brian Goetz argues that this is “easy”, but I would disagree. As an API consumer, I experience a lot of friction in productivity because of:

  • Having to remember this otherwise useless StreamSupport type. This method could very well have been put into the Stream interface, because we already have Stream construction methods in there, such as Stream.of().
  • Having to remember the subtle difference between Iterator and Spliterator in the context of what I believe has nothing to do with parallelisation. It may well be that Spliterators will become popular eventually, though, so this doubt is for the magic 8 ball to address.
  • In fact, I have to repeat the information that there is nothing to be parallelised via the boolean argument false

Parallelisation really has such a big weight in this new API, even if it will cover only around 5%-10% of all functional collection manipulation operations. While sequential processing was not the main design goal of the JDK 8 APIs, it is really the main benefit for all of us, and the friction around APIs related to sequential processing should be as low as possible.

The method above should have just been called

Stream s = Stream.stream(iterable);

It could be implemented like this:

public static<T> Stream<T> stream(Iterable<T> i) {
    return StreamSupport.stream(i.spliterator(), false);
}

Obviously with convenience overloads that allow for the additional specialisations, like parallelisation, or passing a Spliterator

But again, if Iterable had its own stream() default method, an incredible number of APIs would be so much better integrated with Java 8 out of the box, without even supporting Java 8 explicitly!

Take jOOQ for instance. jOOQ still supports Java 6, so a direct dependency is not possible. However, jOOQ’s ResultQuery type is an Iterable. This allows you to use such queries directly inline in foreach loops, as if you were writing PL/SQL:

PL/SQL

FOR book IN (
  SELECT * FROM books ORDER BY books.title
)
LOOP
  -- Do things with book
END LOOP;

Java

for (BookRecord book : 
  ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
) {
  // Do things with book
}

Now imagine the same thing in Java 8:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .stream()
   .map / reduce / findAny, etc...

Unfortunately, the above is currently not possible. You could, of course, eagerly fetch all the results into a jOOQ Result, which extends List:

ctx.selectFrom(BOOKS).orderBy(BOOKS.TITLE)
   .fetch()
   .stream()
   .map / reduce / findAny, etc...

But it’s one more method to call (every time), and the actual stream semantics is broken, because the fetch is done eagerly.

Complaining on a high level

This is, of course, complaining on a high level, but it would really be great if a future version of Java, e.g. Java 9, would add this missing method to the Iterable API. Again, 99% of all use-cases will want the Stream type to be returned, not the IntStream type. And if they do want that for whatever obscure reason (much more obscure than many evil things from old legacy Java APIs, looking at you Calendar), then why shouldn’t they just declare an intStream() method. After all, if someone is crazy enough to write Iterable<Integer> when they’re really operating on int primitive types, they’ll probably accept a little workaround.

The Inconvenient Truth About Dynamic vs. Static Typing


Sometimes there are these moments of truth. They happen completely unexpectedly, such as when I read this tweet:

David is the author of the lesser-known but not at all lesser-interesting Whiley programming language, a language that has a lot of static type checking built in it. One of the most interesting features of the Whiley language is flow sensitive typing (sometimes also simply called flow typing), which is mostly useful when used along with union types. An example from the getting started guide

function indexOf(string str, char c) => null|int:

function split(string str, char c) => [string]:
  var idx = indexOf(str,c)

  // idx has type null|int
  if idx is int:

    // idx now has type int
    string below = str[0..idx]
    string above = str[idx..]
    return [below,above]

  else:
    // idx now has type null
    return [str] // no occurrence

Remember, other languages like Ceylon also know flow-sensitive typing, and even Java does to a certain extent, because Java has union types, too!

try {
    ...
}
catch (SQLException | IOException e) {
    if (e instanceof SQLException)
        doSomething((SQLException) e);
    else
        doSomethingElse((IOException) e);
}

Granted, Java’s flow-sensitive typing is explicit and verbose. We could expect the Java compiler to infer all the types. The following should type-check and compile just as well:

try {
    ...
}
catch (SQLException | IOException e) {
    if (e instanceof SQLException)
        // e is guaranteed to be of type SQLException
        doSomething(e);
    else
        // e is guaranteed to be of type IOException
        doSomethingElse(e);
}

Flow typing or flow sensitive typing means that the compiler can infer the only possible type from the control flow of the surrounding program. It is a relatively new concept in modern languages like Ceylon, and it makes static typing extremely powerful, especially if the language also supports sophisticated type inference via var or val keywords!

JavaScript static typing with Flow

Let’s get back to David’s Tweet and have a look at what the article said about Flow:

http://sitr.us/2014/11/21/flow-is-the-javascript-type-checker-i-have-been-waiting-for.html

The presence of a use of length with a null argument informs Flow that there should be a null check in that function. This version does type-check:

function length(x) {
  if (x) {
    return x.length;
  } else {
    return 0;
  }
}

var total = length('Hello') + length(null);

Flow is able to infer that x cannot be null inside the if body.

That’s quite cunning. A similar upcoming feature can be observed in Microsoft’s TypeScript. But Flow is different (or claims to be different) from TypeScript. The essence of Facebook Flow can be seen in this paragraph from the official Flow announcement:

Flow’s type checking is opt-in — you do not need to type check all your code at once. However, underlying the design of Flow is the assumption that most JavaScript code is implicitly statically typed; even though types may not appear anywhere in the code, they are in the developer’s mind as a way to reason about the correctness of the code. Flow infers those types automatically wherever possible, which means that it can find type errors without needing any changes to the code at all. On the other hand, some JavaScript code, especially frameworks, make heavy use of reflection that is often hard to reason about statically. For such inherently dynamic code, type checking would be too imprecise, so Flow provides a simple way to explicitly trust such code and move on. This design is validated by our huge JavaScript codebase at Facebook: Most of our code falls in the implicitly statically typed category, where developers can check their code for type errors without having to explicitly annotate that code with types.

Let this sink in

most JavaScript code is implicitly statically typed

again

JavaScript code is implicitly statically typed

Yes!

Programmers love type systems. Programmers love to reason formally about their data types and put them in narrow constraints to be sure the program is correct. That’s the whole essence of static typing: To make less mistakes because of well-designed data structures.

People also love to put their data structures in well-designed forms in databases, which is why SQL is so popular and “schema-less” databases will not gain more market share. Because in fact, it’s the same story. You still have a schema in a “schema-less” database, it’s just not type checked and thus leaves you all the burden of guaranteeing correctness.

On a side note: Obviously, some NoSQL vendors keep writing these ridiculous blog posts to desperately position their products, claiming that you really don’t need any schema at all, but it’s easy to see through that marketing gag. True need for schemalessness is as rare as true need for dynamic typing. In other words, when is the last time you’ve written a Java program and called every method via reflection? Exactly…

But there’s one thing that statically typed languages didn’t have in the past and that dynamically typed languages did have: Means to circumvent verbosity. Because while programmers love type systems and type checking, programmers do not love typing (as in typing on the keyboard).

Verbosity is the killer. Not static typing

Consider the evolution of Java:

Java 4

List list = new ArrayList();
list.add("abc");
list.add("xyz");

// Eek. Why do I even need this Iterator?
Iterator iterator = list.iterator();
while (iterator.hasNext()) {
    // Gee, I *know* I only have strings. Why cast?
    String value = (String) iterator.next();

    // [...]
}

Java 5

// Agh, I have to declare the generic type twice!
List<String> list = new ArrayList<String>();
list.add("abc");
list.add("xyz");

// Much better, but I have to write String again?
for (String value : list) {
    // [...]
}

Java 7

// Better, but I still need to write down two
// times the "same" List type
List<String> list = new ArrayList<>();
list.add("abc");
list.add("xyz");

for (String value : list) {
    // [...]
}

Java 8

// We're now getting there, slowly
Stream.of("abc", "xyz").forEach(value -> {
    // [...]
});

On a side-note, yes, you could’ve used Arrays.asList() all along.

Java 8 is still far from perfect, but things are getting better and better. The fact that I finally do not have to declare a type anymore in a lambda argument list because it can be inferred by the compiler is something really important for productivity and adoption.

Consider the equivalent of a lambda pre-Java 8 (if we had Streams before):

// Yes, it's a Consumer, fine. And yes it takes Strings
Stream.of("abc", "xyz").forEach(new Consumer<String>(){
    // And yes, the method is called accept (who cares)
    // And yes, it takes Strings (I already say so!?)
    @Override
    public void accept(String value) {
        // [...]
    }
});

Now, if we’re comparing the Java 8 version with a JavaScript version:

["abc", "xyz"].forEach(function(value) {
    // [...]
});

We have almost reached as little verbosity as the functional, dynamically typed language that is JavaScript (I really wouldn’t mind those missing list and map literals in Java), with the only difference that we (and the compiler) know that value is of type String. And we know that the forEach() method exists. And we know that forEach() takes a function with one argument.

In the end of the day, things seem to boil down to this:

Dynamically typed languages like JavaScript and PHP have become popular mainly because they “just ran”. You didn’t have to learn all the “heavy” syntax that classic statically typed languages required (just think of Ada and PL/SQL!). You could just start writing your program. Programmers “knew” that the variables would contain strings, there’s no need to write it down. And that’s true, there’s no need to write everything down!

Consider Scala (or C#, Ceylon, pretty much any modern language):

val value = "abc"

What else can it be, other than a String?

val list = List("abc", "xyz")

What else can it be, other than a List[String]?

Note that you can still explicitly type your variables if you must – there are always those edge cases:

val list : List[String] = List[String]("abc", "xyz")

But most of the syntax is “opt-in” and can be inferred by the compiler.

Dynamically typed languages are dead

The conclusion of all this is that once syntactic verbosity and friction is removed from statically typed languages, there is absolutely no advantage in using a dynamically typed language. Compilers are very fast, deployment can be fast too, if you use the right tools, and the benefit of static type checking is huge. (don’t believe it? read this article)

As an example, SQL is also a statically typed language where much of the friction is still created by syntax. Yet, many people believe that it is a dynamically typed language, because they access SQL through JDBC, i.e. through type-less concatenated Strings of SQL statements. If you were writing PL/SQL, Transact-SQL, or embedded SQL in Java with jOOQ, you wouldn’t think of SQL this way and you’d immediately appreciate the fact that your PL/SQL, Transact-SQL, or your Java compiler would type-check all of your SQL statements.

So, let’s abandon this mess that we’ve created because we’re too lazy to type all the types (pun). Happy typing!

And if you’re reading this, Java language expert group members, please do add var and val, as well as flow-sensitive typing to the Java language. We’ll love you forever for this, promised!

Don’t be “Clever”: The Double Curly Braces Anti Pattern


From time to time, I find someone using the double curly braces anti pattern (also called double brace initialisation) in the wild. This time on Stack Overflow:

Map source = new HashMap(){{
    put("firstName", "John");
    put("lastName", "Smith");
    put("organizations", new HashMap(){{
        put("0", new HashMap(){{
            put("id", "1234");
        }});
        put("abc", new HashMap(){{
            put("id", "5678");
        }});
    }});
}};

In case you do not understand the syntax, it’s actually easy. There are two elements:

  1. We’re creating anonymous classes that extend HashMap by writing
    new HashMap() {
    }
    
  2. In that anonymous class, we’re using an instance initialiser to initialise the new anonymous HashMap subtype instance by writing things like:

    {
        put("id", "1234");
    }
    

    Essentially, these initialisers are just constructor code.

So, why is this called the Double Curly Braces Anti Pattern

There are really three reasons for this to be an anti pattern:

1. Readability

This is the least important reason, it’s readability. While it may be a bit easier to write, and feel a bit more like the equivalent data structure initialisation in JSON:

{
  "firstName"     : "John"
, "lastName"      : "Smith"
, "organizations" : 
  {
    "0"   : { "id", "1234" }
  , "abc" : { "id", "5678" }
  }
}

And yes. It would be really awesome if Java had collection literals for List and Map types. Using double curly braces to emulate that is quirky and doesn’t feel quite right, syntactically.

But let’s leave the area where we discuss taste and curly braces (we’ve done that before), because:

2. One type per instance

We’re really creating one type per double brace initialisation! Every time we create a new map this way, we’re also implicitly creating a new non-reusable class just for that one simple instance of a HashMap. If you’re doing this once, that might be fine. If you put this sort of code all over a huge application, you will put some unnecessary burden on your ClassLoader, which keeps references to all these class objects on your heap. Don’t believe it? Compile the above code and check out the compiler output. It will look like this:

Test$1$1$1.class
Test$1$1$2.class
Test$1$1.class
Test$1.class
Test.class

Where the Test.class is the only reasonable class here, the enclosing class.

But that’s still not the most important issue.

3. Memory leak!

The really most important issue is the problem that all anonymous classes have. They contain a reference to their enclosing instance, and that is really a killer. Let’s imagine, you put your clever HashMap initialisation into an EJB or whatever really heavy object with a well-managed lifecycle like this:

public class ReallyHeavyObject {

    // Just to illustrate...
    private int[] tonsOfValues;
    private Resource[] tonsOfResources;

    // This method almost does nothing
    public void quickHarmlessMethod() {
        Map source = new HashMap(){{
            put("firstName", "John");
            put("lastName", "Smith");
            put("organizations", new HashMap(){{
                put("0", new HashMap(){{
                    put("id", "1234");
                }});
                put("abc", new HashMap(){{
                    put("id", "5678");
                }});
            }});
        }};
        
        // Some more code here
    }
}

So this ReallyHeavyObject has tons of resources that need to be cleaned up correctly as soon as they’re garbage collected, or whatever. But that doesn’t matter for you when you’re calling the quickHarmlessMethod(), which executes in no time.

Fine.

Let’s imagine some other developer, who refactors that method to return your map, or even parts of your map:

    public Map quickHarmlessMethod() {
        Map source = new HashMap(){{
            put("firstName", "John");
            put("lastName", "Smith");
            put("organizations", new HashMap(){{
                put("0", new HashMap(){{
                    put("id", "1234");
                }});
                put("abc", new HashMap(){{
                    put("id", "5678");
                }});
            }});
        }};
        
        return source;
    }

Now you’re in big big trouble! You have now inadvertently exposed all the state from ReallyHeavyObject to the outside, because each of those inner classes holds a reference to the enclosing instance, which is the ReallyHeavyObject instance. Don’t believe it? Let’s run this program:

public static void main(String[] args) throws Exception {
    Map map = new ReallyHeavyObject().quickHarmlessMethod();
    Field field = map.getClass().getDeclaredField("this$0");
    field.setAccessible(true);
    System.out.println(field.get(map).getClass());
}

This program returns

class ReallyHeavyObject

Yes, indeed! If you still don’t believe it, you can use a debugger to introspect the returned map:

debug-output

You will see the enclosing instance reference right there in your anonymous HashMap subtype. And all the nested anonymous HashMap subtypes also hold such a reference.

So, please, never use this anti pattern

You might say that one way to circumvent all that hassle from issue 3 is to make the quickHarmlessMethod() a static method to prevent that enclosing instance, and you’re right about that.

But the worst thing that we’ve seen in the above code is the fact that even if you know what you are doing with your map that you might be creating in a static context, the next developer might not notice that and refactor / remove static again. They might store the Map in some other singleton instance and there is literally no way to tell from the code itself that there might just be a dangling, useless reference to ReallyHeavyObject.

Inner classes are a beast. They have caused a lot of trouble and cognitive dissonance in the past. Anonymous inner classes can be even worse, because readers of such code might really be completely oblivious of the fact that they’re enclosing an outer instance and that they’re passing around this enclosed outer instance.

The conclusion is:

Don’t be clever, don’t ever use double brace initialisation

Did you like this article?

We have more articles about best practices in Java:

Do You Really Understand SQL’s GROUP BY and HAVING clauses?


There are some things in SQL that we simply take for granted without thinking about them properly.

One of these things are the GROUP BY and the less popular HAVING clauses. Let’s look at a simple example. For this example, we’ll reiterate the example database we’ve seen in this previous article about the awesome LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE() functions:

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL,
  govt_debt DECIMAL(10, 2) NOT NULL
);

Before there were window functions, aggregations were made only with GROUP BY. A typical question that we could ask our database using SQL is:

What are the top 3 average government debts in percent of the GDP for those countries whose GDP per capita was over 40’000 dollars in every year in the last four years

Whew. Some (academic) business requirements.

In SQL (PostgreSQL dialect), we would write:

select code, avg(govt_debt)
from countries
where year > 2010
group by code
having min(gdp_per_capita) >= 40000
order by 2 desc
limit 3

Or, with inline comments

-- The average government debt
select code, avg(govt_debt)

-- for those countries
from countries

-- in the last four years
where year > 2010

-- yepp, for the countries
group by code

-- whose GDP p.c. was over 40'000 in every year
having min(gdp_per_capita) >= 40000

-- The top 3
order by 2 desc
limit 3

The result being:

code     avg
------------
JP    193.00
US     91.95
DE     56.00

Remember the 10 easy steps to a complete understanding of SQL:

  1. FROM generates the data set
  2. WHERE reduces the generated data set
  3. GROUP BY aggregates the reduced data set
  4. HAVING reduces the aggregated data set
  5. SELECT transforms the reduced aggregated data set
  6. ORDER BY sorts the transformed data set
  7. LIMIT .. OFFSET frames the sorted data set

… where LIMIT .. OFFSET may come in very different flavours.

The empty GROUP BY clause

A very special case of GROUP BY is the explicit or implicit empty GROUP BY clause. Here’s a question that we could ask our database:

Are there any countries at all with a GDP per capita of more than 50’000 dollars?

And in SQL, we’d write:

select true answer
from countries
having max(gdp_per_capita) >= 50000

The result being

answer
------
t

You could of course have used the EXISTS clause instead (please don’t use COUNT(*) in these cases):

select exists(
  select 1 
  from countries 
  where gdp_per_capita >= 50000
);

And we would get, again:

answer
------
t

… but let’s focus on the plain HAVING clause.

Not everyone knows that HAVING can be used all by itself, or what it even means to have HAVING all by itself. Already the SQL 1992 standard allowed for the use of HAVING without GROUP BY, but it wasn’t until the introduction of GROUPING SETS in SQL:1999, when the semantics of this syntax was retroactively unambiguously defined:

7.10 <having clause>

<having clause> ::= HAVING <search condition>

Syntax Rules

1) Let HC be the <having clause>. Let TE be the <table expression> that immediately contains
HC. If TE does not immediately contain a <group by clause>, then GROUP BY ( ) is implicit.

That’s interesting. There is an implicit GROUP BY ( ), if we leave out the explicit GROUP BY clause. If you’re willing to delve into the SQL standard a bit more, you’ll find:

<group by clause> ::=
    GROUP BY <grouping specification>

<grouping specification> ::=
    <grouping column reference>
  | <rollup list>
  | <cube list>
  | <grouping sets list>
  | <grand total>
  | <concatenated grouping>

<grouping set> ::=
    <ordinary grouping set>
  | <rollup list>
  | <cube list>
  | <grand total>

<grand total> ::= <left paren> <right paren>

So, GROUP BY ( ) is essentially grouping by a “grand total”, which is what’s intuitively happening, if we just look for the highest ever GDP per capita:

select max(gdp_per_capita)
from countries;

Which yields:

     max
--------
52409.00

The above query is also implicitly the same as this one (which isn’t supported by PostgreSQL):

select max(gdp_per_capita)
from countries
group by ();

The awesome GROUPING SETs

In this section of the article, we’ll be leaving PostgreSQL land, entering SQL Server land, as PostgreSQL shamefully doesn’t implement any of the following (yet).

Now, we cannot understand the grand total (empty GROUP BY ( ) clause), without having a short look at the SQL:1999 standard GROUPING SETS. Some of you may have heard of CUBE() or ROLLUP() grouping functions, which are just syntactic sugar for commonly used GROUPING SETS. Let’s try to answer this question in a single query:

What are the highest GDP per capita values per year OR per country

In SQL, we’ll write:

select code, year, max(gdp_per_capita)
from countries
group by grouping sets ((code), (year))

Which yields two concatenated sets of records:

code    year    max
------------------------
NULL    2009    46999.00 <- grouped by year
NULL    2010    48358.00
NULL    2011    51791.00
NULL    2012    52409.00

CA      NULL    52409.00 <- grouped by code
DE      NULL    44355.00
FR      NULL    42578.00
GB      NULL    38927.00
IT      NULL    36988.00
JP      NULL    46548.00
RU      NULL    14091.00
US      NULL    51755.00

That’s kind of nice, isn’t it? It’s essentially just the same thing as this query with UNION ALL

select code, null, max(gdp_per_capita)
from countries
group by code
union all
select null, year, max(gdp_per_capita)
from countries
group by year;

In fact, it’s exactly the same thing, as the latter explicitly concatenates two sets of grouped records… i.e. two GROUPING SETS. This SQL Server documentation page also explains it very nicely.

And the most powerful of them all: CUBE()

Now, imagine, you’d like to add the “grand total”, and also the highest value per country AND year, producing four different concatenated sets. To limit the results, we’ll also filter out GDPs of less than 48000 for this example:

select 
  code, year, max(gdp_per_capita), 
  grouping_id(code, year) grp
from countries
where gdp_per_capita >= 48000
group by grouping sets (
  (), 
  (code), 
  (year), 
  (code, year)
)
order by grp desc;

This nice-looking query will now produce all the possible grouping combinations that we can imagine, including the grand total, in order to produce:

code    year    max         grp
---------------------------------
NULL    NULL    52409.00    3 <- grand total

NULL    2012    52409.00    2 <- group by year
NULL    2010    48358.00    2
NULL    2011    51791.00    2

CA      NULL    52409.00    1 <- group by code
US      NULL    51755.00    1

US      2010    48358.00    0 <- group by code and year
CA      2012    52409.00    0
US      2012    51755.00    0
CA      2011    51791.00    0
US      2011    49855.00    0

And because this is quite a common operation in reporting and in OLAP, we can simply write the same by using the CUBE() function:

select 
  code, year, max(gdp_per_capita), 
  grouping_id(code, year) grp
from countries
where gdp_per_capita >= 48000
group by cube(code, year)
order by grp desc;

Compatibility

While the first couple of queries also worked on PostgreSQL, the ones that are using GROUPING SETS will work only on 4 out of 17 RDBMS currently supported by jOOQ. These are:

  • DB2
  • Oracle
  • SQL Server
  • Sybase SQL Anywhere

jOOQ also fully supports the previously mentioned syntaxes. The GROUPING SETS variant can be written as such:

// Countries is an object generated by the jOOQ
// code generator for the COUNTRIES table.
Countries c = COUNTRIES;

ctx.select(
       c.CODE,
       c.YEAR,
       max(c.GDP_PER_CAPITA),
       groupingId(c.CODE, c.YEAR).as("grp"))
   .from(c)
   .where(c.GDP_PER_CAPITA.ge(new BigDecimal("48000")))
   .groupBy(groupingSets(new Field[][] {
       {},
       { c.CODE },
       { c.YEAR },
       { c.CODE, c.YEAR }
   }))
   .orderBy(fieldByName("grp").desc())
   .fetch();

… or the CUBE() version:

ctx.select(
       c.CODE,
       c.YEAR,
       max(c.GDP_PER_CAPITA),
       groupingId(c.CODE, c.YEAR).as("grp"))
   .from(c)
   .where(c.GDP_PER_CAPITA.ge(new BigDecimal("48000")))
   .groupBy(cube(c.CODE, c.YEAR))
   .orderBy(fieldByName("grp").desc())
   .fetch();

jOOQ is the best way to write SQL in Java

… and in the future, we’ll emulate GROUPING SETS by their equivalent UNION ALL queries in those databases that do not natively support GROUPING SETS.

Try it out for yourself and download your free jOOQ trial now!

How to Integrate Commercial Third-Party Artefacts into Your Maven Build


According to a recent survey by ZeroTurnaround’s RebelLabs, Maven is still the leading Java build platform. The current market share distribution, according to RebelLabs is:

  • Maven with 64%
  • Ant + Ivy with 16.5%
  • Gradle with 11%

Yet, at the same time, Maven is often criticised for being a bit obscure and intrusive. Compared to runner-ups Ant and Gradle, Maven allows for only little flexibility with respect to interpretation and thus custom adaptation of the build model. Or as Tim Berglund from Data Stax would put it:

But let’s cut the jokes and have a look at a real-world issue:

Integrating Third-Party Commercial Artefacts

Not all third party artefacts that you would like to depend upon are available for free from Maven Central. Examples for this are commercial JDBC drivers, or the commercial jOOQ editions. There are essentially three ways to integrate such artefacts into your build:

Quick-and-dirty

Often, you only need the commercial dependency for a small test project or demo. You want to be sure that it works when you run it without depending on your local repository setup, or on network connectivity. This is a good use-case for <scope>system</scope>:

For instance: jOOQ

<dependency>
  <groupId>org.jooq</groupId>
  <artifactId>jooq</artifactId>
  <version>${jooq.version}</version>
  <scope>system</scope>
  <systemPath>${basedir}/lib/jooq-${jooq.version}.jar</systemPath>
</dependency>

For instance: Microsoft SQL JDBC

<dependency>
  <groupId>com.microsoft.sqlserver</groupId>
  <artifactId>sqljdbc4</artifactId>
  <version>3.0</version>
  <scope>system</scope>
  <systemPath>${basedir}/lib/sqljdbc4.jar</systemPath>

  <!-- Notice that we can still put "optional"
       on commercial JDBC driver dependencies -->
  <optional>true</optional>
</dependency>

Advantages of this approach

This is really a very easy solution when you want to have a local, self-contained module that is guaranteed to run immediately after checkout from source control, without additional configuration and setup. Don’t forget to check in the libraries into source control first, of course.

Disadvantages of this appraoch

The system dependencies are never transitively inherited. If your module depends on jOOQ this way, your module’s dependencies won’t see the jOOQ API.

Details about system dependencies can be seen in the Maven documentation. Citing from the documentation:

Dependencies with the scope system are always available and are not looked up in repository. They are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus, system dependencies are especially useful for resolving dependencies on artifacts which are now provided by the JDK, but where available as separate downloads earlier. Typical example are the JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).

A bit more robust

An approach that might appear to be a bit more robust is to check out the dependencies from your version control system and then “manually” import them to your local repository. This will make them available to your own local build. The following shell scripts show how you can import, for instance, the jOOQ artefacts into your local repository

Windows Batch

@echo off
set VERSION=3.4.4

if exist jOOQ-javadoc\jooq-%VERSION%-javadoc.jar (
  set JAVADOC_JOOQ=-Djavadoc=jOOQ-javadoc\jooq-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_META=-Djavadoc=jOOQ-javadoc\jooq-meta-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_CODEGEN_MAVEN=-Djavadoc=jOOQ-javadoc\jooq-codegen-maven-%VERSION%-javadoc.jar
  set JAVADOC_JOOQ_SCALA=-Djavadoc=jOOQ-javadoc\jooq-scala-%VERSION%-javadoc.jar
)

if exist jOOQ-src\jooq-%VERSION%-sources.jar (
  set SOURCES_JOOQ=-Dsources=jOOQ-src\jooq-%VERSION%-sources.jar
  set SOURCES_JOOQ_META=-Dsources=jOOQ-src\jooq-meta-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN=-Dsources=jOOQ-src\jooq-codegen-%VERSION%-sources.jar
  set SOURCES_JOOQ_CODEGEN_MAVEN=-Dsources=jOOQ-src\jooq-codegen-maven-%VERSION%-sources.jar
  set SOURCES_JOOQ_SCALA=-Dsources=jOOQ-src\jooq-scala-%VERSION%-sources.jar
)

call mvn install:install-file -Dfile=jOOQ-pom\pom.xml                          -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=%VERSION% -Dpackaging=pom
call mvn install:install-file -Dfile=jOOQ-lib\jooq-%VERSION%.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ%               %SOURCES_JOOQ%              -DpomFile=jOOQ-pom\jooq\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-meta-%VERSION%.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_META%          %SOURCES_JOOQ_META%         -DpomFile=jOOQ-pom\jooq-meta\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-%VERSION%.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN%       %SOURCES_JOOQ_CODEGEN%      -DpomFile=jOOQ-pom\jooq-codegen\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-codegen-maven-%VERSION%.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_CODEGEN_MAVEN% %SOURCES_JOOQ_CODEGEN_META% -DpomFile=jOOQ-pom\jooq-codegen-maven\pom.xml
call mvn install:install-file -Dfile=jOOQ-lib\jooq-scala-%VERSION%.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=%VERSION% -Dpackaging=jar %JAVADOC_JOOQ_SCALA%         %SOURCES_JOOQ_SCALA%        -DpomFile=jOOQ-pom\jooq-scala\pom.xml

Linux Shell

#!/bin/sh
VERSION=3.4.4

if [ -f jOOQ-javadoc/jooq-$VERSION-javadoc.jar ]; then
  JAVADOC_JOOQ=-Djavadoc=jOOQ-javadoc/jooq-$VERSION-javadoc.jar
  JAVADOC_JOOQ_META=-Djavadoc=jOOQ-javadoc/jooq-meta-$VERSION-javadoc.jar
  JAVADOC_JOOQ_CODEGEN=-Djavadoc=jOOQ-javadoc/jooq-codegen-$VERSION-javadoc.jar
  JAVADOC_JOOQ_CODEGEN_MAVEN=-Djavadoc=jOOQ-javadoc/jooq-codegen-maven-$VERSION-javadoc.jar
  JAVADOC_JOOQ_SCALA=-Djavadoc=jOOQ-javadoc/jooq-scala-$VERSION-javadoc.jar
fi

if [ -f jOOQ-src/jooq-$VERSION-sources.jar ]; then
  SOURCES_JOOQ=-Dsources=jOOQ-src/jooq-$VERSION-sources.jar
  SOURCES_JOOQ_META=-Dsources=jOOQ-src/jooq-meta-$VERSION-sources.jar
  SOURCES_JOOQ_CODEGEN=-Dsources=jOOQ-src/jooq-codegen-$VERSION-sources.jar
  SOURCES_JOOQ_CODEGEN_MAVEN=-Dsources=jOOQ-src/jooq-codegen-maven-$VERSION-sources.jar
  SOURCES_JOOQ_SCALA=-Dsources=jOOQ-src/jooq-scala-$VERSION-sources.jar
fi

mvn install:install-file -Dfile=jOOQ-pom/pom.xml                         -DgroupId=org.jooq -DartifactId=jooq-parent        -Dversion=$VERSION -Dpackaging=pom
mvn install:install-file -Dfile=jOOQ-lib/jooq-$VERSION.jar               -DgroupId=org.jooq -DartifactId=jooq               -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ               $SOURCES_JOOQ              -DpomFile=jOOQ-pom/jooq/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-meta-$VERSION.jar          -DgroupId=org.jooq -DartifactId=jooq-meta          -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_META          $SOURCES_JOOQ_META         -DpomFile=jOOQ-pom/jooq-meta/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-$VERSION.jar       -DgroupId=org.jooq -DartifactId=jooq-codegen       -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN       $SOURCES_JOOQ_CODEGEN      -DpomFile=jOOQ-pom/jooq-codegen/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-codegen-maven-$VERSION.jar -DgroupId=org.jooq -DartifactId=jooq-codegen-maven -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_CODEGEN_MAVEN $SOURCES_JOOQ_CODEGEN_META -DpomFile=jOOQ-pom/jooq-codegen-maven/pom.xml
mvn install:install-file -Dfile=jOOQ-lib/jooq-scala-$VERSION.jar         -DgroupId=org.jooq -DartifactId=jooq-scala         -Dversion=$VERSION -Dpackaging=jar $JAVADOC_JOOQ_SCALA         $SOURCES_JOOQ_SCALA        -DpomFile=jOOQ-pom/jooq-scala/pom.xml

The above scripts essentially check if any of Javadoc, Sources, and/or binaries are available in the distribution, and then install:

  • The parent pom.xml
  • The various artefact binaries, sources, javadocs, and pom.xml files

Advantages of this approach

Dependencies can now be referenced like any other type of dependency, as the artefacts are registered in your local repository. Moreover, they’re also available to your module’s own dependencies, transitively – which is probably what you want when you’re using jOOQ. Here’s how you’d then specify the dependencies:

<dependency>
  <groupId>org.jooq</groupId>
  <artifactId>jooq</artifactId>
  <version>${jooq.version}</version>
</dependency>

<dependency>
  <groupId>com.microsoft.sqlserver</groupId>
  <artifactId>sqljdbc4</artifactId>
  <version>3.0</version>
  <scope>provided</scope>
</dependency>

Disadvantages of this approach

There is a manual step involved in the installation of the dependencies. If you don’t have the above scripts readily available, it can be quite tedious to figure out exactly how to import all those dependencies step by step into your repository. Specifically if you’re running a demo or prototype, this may lead to unexpected compilation failure in the worst moments.

The way to go

In an actual project setup, obviously, neither of the above approaches will be sufficient, and you’ll probably import the libraries into your local Nexus or Bintray or whatever repository you’re using. Just beware of potential restrictions on distribution that commercial deliverables may have.

A small tutorial about how to install artefacts into Nexus can be found here.

jOOQ Newsletter: November 28, 2014 – Black Friday jOOQ Sale – Only Today!


Subscribe to this newsletter here

Tweet of the Day and Webinar with Arun Gupta from Red Hat

Today, we have a very special Tweet of the Day by Oliver Hubaut who expresses what we believe is a general feeling in the industry. He says:

There is a lot of truth in his statement, albeit perhaps not the one he intended. Many people have misinterpreted JPA in the past, believing that it will be a full replacement for SQL. This couldn’t be farther from the truth.

Gavin King, the creator of Hibernate has told us the following:

… and this is also the point we’re trying to make. Join us next week on Wednesday, December 3 when we meet with Arun Gupta from Red Hat for his Webinar about JPA and jOOQ. If you have any questions that you’d like us to talk about, ask them here:

https://github.com/javaee-samples/webinars/issues/4

Black Friday Sale: Get 20% off any jOOQ purchase, today!

We’re participating in the Black Friday sale tradition and give you an incredible 20% off your purchase of any jOOQ license that you order today, Black Friday, November 28, 2014.

Ask your manager today to treat you to a wonderful pre-christmas gift! Don’t waste time, act quickly and order jOOQ licenses right now:

http://www.jooq.org/black-friday

jOOQ 3.5 released

If you’ve been following the jOOQ User Group, you’ve heard it already. Last Friday, we’ve shipped the awesome jOOQ 3.5 with loads of new features!

The new Binding SPI

The main improvement of this exciting release is the new org.jooq.Binding SPI which can be used to fully control all aspects of a user-type’s JDBC interaction. This goes much further than the existing org.jooq.Converter SPI that can be used to map standard JDBC types to user-types. With the new Binding SPI, virtually *ALL* vendor-specific types can be supported now. Examples include PostgreSQL’s JSON or HSTORE types, or Oracle’s DATE type – which is really incorrectly represented via java.sql.Timestamp, which is why we have retrofitted the existing <dateAsTimestamp/> feature to use such bindings, now.

Stored procedures are everywhere

Stored procedure support was generally improved in this release. This includes lots of new little features and conveniences for use with PL/SQL or Transact-SQL. For instance, jOOQ 3.5.0 now supports cross-schema references of PL/SQL OBJECT and TABLE types, which allows for binding directly to Oracle Spatial. We’ve blogged about this exciting improvement here:
http://blog.jooq.org/2014/11/04/painless-access-from-java-to-plsql-procedures-with-jooq/

And while we were at it, we’ve also added basic support for Oracle AQ, which integrates very nicely with our OBJECT type support!

In Transact-SQL and MySQL, we now support fetching arbitrary numbers of Results from stored procedures, and we’ve also implemented support for Firebird PSQL, including Firebird’s very interesting syntax for table-valued functions.

By the way, we support user-defined aggregate functions for a variety of databases, including Oracle, PostgreSQL, and HSQLDB. Definitely something you should look into!

SQL improvements

In this release, we’ve finally got support for UNION, INTERSECT, and EXCEPT right with respect to nesting such operations, as well as combining them with ORDER BY and LIMIT .. OFFSET.

Let’s talk some more DDL

We’ve continued to add support for DDL statements, including

  • CREATE TABLE
  • CREATE TABLE AS SELECT
  • CREATE VIEW and DROP VIEW
  • CREATE INDEX and DROP INDEX
  • CREATE SEQUENCE and DROP SEQUENCE
  • DROP .. IF EXISTS

We’ll continue to add support for more DDL statements also in the future.

Code generation improvements

We’ve added support for the new XMLDatabase, a code generation configuration that allows to read meta information from XML formats, e.g. from a standard INFORMATION_SCHEMA.xml, or from Vertabelo’s XML export format:
http://blog.jooq.org/2014/09/05/importing-your-erd-export-into-jooq/

Future versions of jOOQ will include built-in support for a variety of XML formats.

We’ve had an awesome contribution by Etienne Studer from Gradleware to help our users integrate the jOOQ code generation with their Gradle builds.

Last but not least: Informix!

Oh, and by the way, we now also support IBM’s second most popular database: Informix. Support for this database will be included in the jOOQ Enterprise Edition.

More information can be found here:
http://www.jooq.org/notes

jOOQ 3.2 End of Life

While 3.5 is out, 3.2 is now more than one year old, which means that it has reached its end of life. We’ll be shipping a last patch update 3.2.7 in early December. After that, we advise all our customers and users to upgrade to a newer minor release.

Do you want to stay on the 3.2 release? No problem, contact our sales team and we’ll find a solution for you.

Use MySQL’s Strict Mode on all new Projects!


MySQL is a database that has been bending the SQL standard in ways that make it hard to move off MySQL. What may appear to be a clever technique for vendor lockin (or maybe just oversight of the standard) can be quite annoying in understanding the real meaning of the SQL language.

One such example is MySQL’s interpretation of how GROUP BY works. In MySQL, unlike any other database, you can put arbitrary expressions into your SELECT clause, even if they do not have a formal dependency on the GROUP BY expression. For instance:

SELECT employer, first_name, last_name
FROM employees
GROUP BY employer

This will work in MySQL, but what does it mean? If we only have one resulting record per employer, which one of the employees will be returned? The semantics of the above query is really this one:

SELECT employer, ARBITRARY(first_name), ARBITRARY(last_name)
FROM employees
GROUP BY employer

If we assume that there is such an aggregation function as ARBITRARY(). Some may claim that this can be used for some clever performance “optimisation”. I say: Don’t. This is so weakly specified, it is not even clear if the two references of this pseudo-ARBITRARY() aggregate function will produce values from the same record.

Just look at the number of Stack Overflow questions that evolve around the “not a GROUP BY expression” error:

I’m sure that parts of this damage that has been caused to a generation of SQL developers is due to the fact that this works in some databases.

ONLY_FULL_GROUP_BY

But there is a flag in MySQL called ONLY_FULL_GROUP_BY, and Morgan Tocker, the MySQL community manager suggests eventually turning it on by default.

MySQL community members tend to agree that this is a good decision in the long run.

While it is certainly very hard to turn this flag on for a legacy application, all new applications built on top of MySQL should make sure to turn on this flag. In fact, new applications should even consider turning on “strict SQL mode” entirely, to make sure they get a better, more modern SQL experience.

For more information about MySQL server modes, please consider the manual:

http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html

Follow

Get every new post delivered to your Inbox.

Join 2,190 other followers

%d bloggers like this: