jOOQ 3.10 Supports JPA AttributeConverter

One of the cooler hidden features in jOOQ is the JPADatabase, which allows for reverse engineering a pre-existing set of JPA-annotated entities to generate jOOQ code.

For instance, you could write these entities here:

@Entity
public class Actor {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    public Integer actorId;

    @Column
    public String firstName;

    @Column
    public String lastName;

    @ManyToMany(fetch = LAZY, mappedBy = "actors", 
        cascade = CascadeType.ALL)
    public Set<Film> films = new HashSet<>();

    public Actor(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }
}

@Entity
public class Film {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    public Integer filmId;

    @Column
    public String title;

    @Column(name = "RELEASE_YEAR")
    @Convert(converter = YearConverter.class)
    public Year releaseYear;

    @ManyToMany(fetch = LAZY, cascade = CascadeType.ALL)
    public Set<Actor> actors = new HashSet<>();

    public Film(String title, Year releaseYear) {
        this.title = title;
        this.releaseYear = releaseYear;
    }
}

// Imagine also a Language entity here...

(Just a simple example. Let’s not discuss the caveats of @ManyToMany mapping).

For more info, the full example can be found on Github:

Now observe the fact that we’ve gone through all the trouble of mapping the database type INT for the RELEASE_YEAR column to the cool JSR-310 java.time.Year type for convenience. This has been done using a JPA 2.1 AttributeConverter, which simply looks like this:

public class YearConverter 
implements AttributeConverter<Year, Integer> {

    @Override
    public Integer convertToDatabaseColumn(Year attribute) {
        return attribute == null ? null : attribute.getValue();
    }

    @Override
    public Year convertToEntityAttribute(Integer dbData) {
        return dbData == null ? null : Year.of(dbData);
    }
}

Using jOOQ’s JPADatabase

Now, the JPADatabase in jOOQ allows you to simply configure the input entities (e.g. their package names) and generate jOOQ code from it. This works behind the scenes with this algorithm:

  • Spring is used to discover all the annotated entities on the classpath
  • Hibernate is used to generate an in-memory H2 database from those entities
  • jOOQ is used to reverse-engineer this H2 database again to generate jOOQ code

This works pretty well for most use-cases as the JPA annotated entities are already very vendor-agnostic and do not provide access to many vendor-specific features. We can thus perfectly easily write the following kind of query with jOOQ:

ctx.select(
        ACTOR.FIRSTNAME,
        ACTOR.LASTNAME,
        count().as("Total"),
        count().filterWhere(LANGUAGE.NAME.eq("English"))
          .as("English"),
        count().filterWhere(LANGUAGE.NAME.eq("German"))
          .as("German"),
        min(FILM.RELEASE_YEAR),
        max(FILM.RELEASE_YEAR))
   .from(ACTOR)
   .join(FILM_ACTOR)
     .on(ACTOR.ACTORID.eq(FILM_ACTOR.ACTORS_ACTORID))
   .join(FILM)
     .on(FILM.FILMID.eq(FILM_ACTOR.FILMS_FILMID))
   .join(LANGUAGE)
     .on(FILM.LANGUAGE_LANGUAGEID.eq(LANGUAGE.LANGUAGEID))
   .groupBy(
        ACTOR.ACTORID,
        ACTOR.FIRSTNAME,
        ACTOR.LASTNAME)
   .orderBy(ACTOR.FIRSTNAME, ACTOR.LASTNAME, ACTOR.ACTORID)
   .fetch()

(more info about the awesome FILTER clause here)

In this example, we’re also using the LANGUAGE table, which we omitted in the article. The output of the above query is something along the lines of:

+---------+---------+-----+-------+------+----+----+
|FIRSTNAME|LASTNAME |Total|English|German|min |max |
+---------+---------+-----+-------+------+----+----+
|Daryl    |Hannah   |    1|      1|     0|2015|2015|
|David    |Carradine|    1|      1|     0|2015|2015|
|Michael  |Angarano |    1|      0|     1|2017|2017|
|Reece    |Thompson |    1|      0|     1|2017|2017|
|Uma      |Thurman  |    2|      1|     1|2015|2017|
+---------+---------+-----+-------+------+----+----+

As we can see, this is a very suitable combination of jOOQ and JPA. JPA was used to insert the data through JPA’s useful object graph persistence capabilities, whereas jOOQ is used for reporting on the same tables.

Now, since we already wrote this nice AttributeConverter, we certainly want to apply it also to the jOOQ query and get the java.time.Year data type also in jOOQ, without any additional effort.

jOOQ 3.10 auto conversion

In jOOQ 3.10, we don’t have to do anything anymore. The existing JPA converter will automatically mapped to a jOOQ converter as the generated jOOQ code reads:

// Don't worry about this generated code
public final TableField<FilmRecord, Year> RELEASE_YEAR = 
    createField("RELEASE_YEAR", org.jooq.impl.SQLDataType.INTEGER, 
        this, "", new JPAConverter(YearConverter.class));

… which leads to the previous jOOQ query now returning a type:

Record7<String, String, Integer, Integer, Integer, Year, Year>

Luckily, this was rather easy to implement as the Hibernate meta model allows for navigating the binding between entities and tables very conveniently as described in this article here:

How to get the entity mapping to database table binding metadata from Hibernate

More similar features are coming up in jOOQ 3.11, e.g. when we look into reverse engineering JPA @Embedded types as well. See https://github.com/jOOQ/jOOQ/issues/6518

If you want to run this example, do check out our jOOQ/JPA example on GitHub:

Finding all Palindromes Contained in Strings with SQL

SQL is a really cool language. I can write really complex business logic with this logic programming language. I was again thrilled about SQL recently, at a customer site:

But whenever I tweet something like the above, the inevitable happened. I was nerd sniped. Oleg Šelajev from ZeroTurnaround challenged me to prove that SQL is so awesome:

Given a string, find all substrings from that string, which are palindromes. Challenge accepted! (For the moment, let’s forget about algorithmic complexity.)

Here’s the Full Query

Spoiler first.

Bear with me, I’ll explain it step by step afterwards. Here’s with PostgreSQL syntax:

WITH RECURSIVE
  words (word) AS (
    VALUES
      ('pneumonoultramicroscopicsilicovolcanoconiosis'),
      ('pseudopseudohypoparathyroidism'),
      ('floccinaucinihilipilification'),
      ('antidisestablishmentarianism'),
      ('supercalifragilisticexpialidocious'),
      ('incomprehensibilities'),
      ('honorificabilitudinitatibus'),
      ('tattarrattat')
  ),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  ),
  palindromes (word, palindrome, start, length) AS (
    SELECT word, substring(word, start, x), start, x
    FROM starts CROSS JOIN (VALUES(0), (1)) t(x)
    UNION ALL
    SELECT word, palindrome, start, length + 2
    FROM (
      SELECT 
        word, 
        substring(word, start - length / 2, length) AS palindrome,
        start, length
      FROM palindromes
    ) AS p
    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)
  )
SELECT DISTINCT 
  word, 
  trim(replace(word, palindrome, ' ' || upper(palindrome) || ' '))
    AS palindromes
FROM palindromes
WHERE length(palindrome) > 1
ORDER BY 2

(You can run it yourself on SQLFiddle)

The result being:

word                                          |palindromes                                    
----------------------------------------------|-----------------------------------------------
antidisestablishmentarianism                  |ant  IDI  sestablishmentarianism               
antidisestablishmentarianism                  |antidi  SES  tablishmentarianism               
floccinaucinihilipilification                 |flo  CC  inaucinihilipilification              
floccinaucinihilipilification                 |floccinauc  INI  hilipilification              
floccinaucinihilipilification                 |floccinaucin  IHI  lipilification              
floccinaucinihilipilification                 |floccinaucinih  ILI  p  ILI  fication          
floccinaucinihilipilification                 |floccinaucinih  ILIPILI  fication              
floccinaucinihilipilification                 |floccinaucinihi  LIPIL  ification              
floccinaucinihilipilification                 |floccinaucinihil  IPI  lification              
floccinaucinihilipilification                 |floccinaucinihilipil  IFI  cation              
honorificabilitudinitatibus                   |h  ONO  rificabilitudinitatibus                
honorificabilitudinitatibus                   |honor  IFI  cabilitudinitatibus                
honorificabilitudinitatibus                   |honorificab  ILI  tudinitatibus                
honorificabilitudinitatibus                   |honorificabilitud  INI  tatibus                
honorificabilitudinitatibus                   |honorificabilitudin  ITATI  bus                
honorificabilitudinitatibus                   |honorificabilitudini  TAT  ibus                
incomprehensibilities                         |incompr  EHE  nsibilities                      
incomprehensibilities                         |incomprehens  IBI  lities                      
incomprehensibilities                         |incomprehensib  ILI  ties                      
incomprehensibilities                         |incomprehensibil  ITI  es                      
pneumonoultramicroscopicsilicovolcanoconiosis |pneum  ONO  ultramicroscopicsilicovolcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopics  ILI  covolcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilic  OVO  lcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolca  NOCON  ios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcan  OCO  nios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcanoconio  SIS
pseudopseudohypoparathyroidism                |pseudopseudohy  POP  arathyroidism             
pseudopseudohypoparathyroidism                |pseudopseudohypop  ARA  thyroidism             
pseudopseudohypoparathyroidism                |pseudopseudohypoparathyro  IDI  sm             
supercalifragilisticexpialidocious            |supercalifrag  ILI  sticexpialidocious         
tattarrattat                                  |t  ATTA  rr  ATTA  t                           
tattarrattat                                  |t  ATTARRATTA  t                               
tattarrattat                                  |ta  TT  arra  TT  at                           
tattarrattat                                  |ta  TTARRATT  at                               
tattarrattat                                  |tat  TARRAT  tat                               
tattarrattat                                  |TAT  tarrat  TAT                               
tattarrattat                                  |tatt  ARRA  ttat                               
tattarrattat                                  |tatta  RR  attat                               
tattarrattat                                  |TATTARRATTAT                                    

This query uses a couple of nice features:

Common Table Expressions

They’re the only way to declare variables in SQL – i.e. you can “store” a query in such a table expression, assign a name to it, and reuse it several times. This is done with the WITH clause. The nice thing about common table expressions is that they are allowed to be RECURSIVE (depending on the database, the RECURSIVE keyword may be required / optional / not available).

VALUES() clause

The VALUES() clause is a very handy tool to create ad-hoc data in the form of tables. We did this to create a table called WORDS, which contains a couple of words inside of which we’d like to look for palindromes. In some databases (including DB2 and PostgreSQL), it’s totally possible to just use the VALUES() clause as a standalone clause instead of SELECT:

VALUES
  ('pneumonoultramicroscopicsilicovolcanoconiosis'),
  ('pseudopseudohypoparathyroidism'),
  ('floccinaucinihilipilification'),
  ('antidisestablishmentarianism'),
  ('supercalifragilisticexpialidocious'),
  ('incomprehensibilities'),
  ('honorificabilitudinitatibus'),
  ('tattarrattat')

Recursive Generation of Integers

In order to look for palindromes, the algorithm used here lists, for each word, the character index of each individual character in the word:

WITH RECURSIVE
  ...
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  ),
  ...

For example, if we used the word “word”…

WITH RECURSIVE
  words (word) AS (VALUES('word')),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  )
SELECT *
FROM starts

We’d get the following result:

word |start 
-----|------
word |1     
word |2     
word |3     
word |4     

The idea of the algorithm is that we start from a given character and “fan out” in both directions of a character, again recursively. The characters to the left and to the right of such a character must be the same and so on.

If you’re interested in the syntax, stay tuned. There will be a blog post coming up soon.

Using CROSS JOIN to run the Algorithm Twice

There are two types of palindromes:

  • Those with an even amount of characters: “”, “aa”, “abba”, “babbab”
  • Those with an odd amount of characters: “b”, “aba”, “aabaa”

In our algorithm, we’re treating them in almost the same way by running the algorithm twice. Whenever you think “hey, I need to run this twice in SQL”, CROSS JOIN could be a good candidate, as we’re creating a cartesian product between:

  • The previous “starts” table, giving the starting character index
  • A table containing values 0 (palindromes with even amounts of letters) and 1 (palindromes with odd amounts of letters)

For more information about CROSS JOIN, read our article about JOINs.

Run this query for illustration:

WITH RECURSIVE
  words (word) AS (VALUES('word')),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  )
SELECT *
FROM starts CROSS JOIN (VALUES(0),(1)) AS t(x)

Output:

word |start |x 
-----|------|--
word |1     |0  <-- Even length palindromes centering around position 1 length 0
word |1     |1  <-- Odd length palindromes centering around position 1 length 1
word |2     |0 
word |2     |1 
word |3     |0 
word |3     |1 
word |4     |0 
word |4     |1 

Trivially, a palindrome of length 0 (empty string) or length 1 (“w”, “o”, “r”, “d”) is acceptable in principle, but boring. We’ll filter them out later, but keep them in the algorithm as this simplifies the algorithm. If you want to tune the query, you could prevent generating them in the first place.

The Palindrome Algorithm

Thus far, we’ve just prepared utility data to calculate palindromes:

  • The words inside of which to search for palindromes
  • The individual character indexes to start fanning out from, in each individual word
  • The minimum palindrome length 0 (even) or 1 (odd)

Now comes the interesting part. Recursively fanning out from a starting character to find more palindromes, stopping the fanning out as soon as the new candidate is no longer a plaindrome:

WITH RECURSIVE
  ...
  palindromes (word, palindrome, start, length) AS (

    -- This part just creates the even/odd semantics
    SELECT word, substring(word, start, x), start, x
    FROM starts CROSS JOIN (VALUES(0), (1)) t(x)
    UNION ALL

    -- This part recurses by "fanning out"
    SELECT word, palindrome, start, length + 2
    FROM (
      SELECT 
        word, 
        substring(word, start - length / 2, length) AS palindrome,
        start, length
      FROM palindromes
    ) AS p
    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)
  )
  ...

It isn’t really so hard in fact. The recursion part selects recursively from the PALINDROMES table (the previously calculated palindromes). That table has 4 columns:

  • WORD: The word we’re looking for palindromes in. This is always the same per recursion
  • PALINDROME: The palindrome, i.e. the substring inside of a word. This changes per recursion
  • START: The start character index from which we started fanning out. This is always the same per recursion
  • LENGTH: The palindrome length. This increases by 2 per recursion

Let’s look at the result for “floccinaucinihilipilification”:

flo  CC  inaucinihilipilification              
floccinauc  INI  hilipilification              
floccinaucin  IHI  lipilification              
floccinaucinih  ILI  p  ILI  fication          
floccinaucinih  ILIPILI  fication              
floccinaucinihi  LIPIL  ification              
floccinaucinihil  IPI  lification              
floccinaucinihilipil  IFI  cation              

There are a total of 8 distinct palindromes contained in this word (and believe it or not, the word exists, too. Pronunciation is another thing).

Let’s “debug” the algorithm to get to this list (remember, SQL indexes are 1 based):

Start 1-4: No palindromes
Start 5: Even palindrome [4:5]
  flo  CC  inaucinihilipilification              

Start 6-11: No palindromes (I wont' repeat this further down)
Start 12: Odd palindrome [11:13]
  floccinauc  INI  hilipilification              

Start 14: Odd palindrome [13:15]
  floccinaucin  IHI  lipilification              

Start 16: Odd palindrome [15:17]
  floccinaucinih  ILI  p  ILI  fication          

Start 18: Odd palindrome [17:19], [16:20], [15:21] (Fanning out 3 times)
  floccinaucinihil  IPI  lification              
  floccinaucinihi  LIPIL  ification              
  floccinaucinih  ILIPILI  fication              

Start 20: Odd palindrome [19:17] (already found this)
  floccinaucinih  ILI  p  ILI  fication          

Start 22: Odd palindrome [21:23]
floccinaucinihilipil  IFI  cation              

The IPI, LIPIL, ILIPILI chain is the most interesting. We’ve succeeded to fan out 3 times adding new characters from WORD on both sides of the initial character.

When do we stop fanning out? Whenever one of these conditions hold true:

    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)

I.e.

  • When we’ve reached the beginning of WORD (no more characters to the left)
  • When we’ve reached the end of WORD (no more characters to the right)
  • When the letter to the left of the palindrome of the previous recursion doesn’t match the letter to the right of the palindrome

That last predicate could also simply read:

    AND palindrome = reverse(palindrome)

But that might be a bit slower as it compares something that we’ve already proven to be true in the previous recursion.

Finally, formatting the result

The final part isn’t too interesting anymore:

SELECT DISTINCT 
  word, 
  trim(replace(word, palindrome, ' ' || upper(palindrome) || ' '))
    AS palindromes
FROM palindromes
WHERE length(palindrome) > 1
ORDER BY 2

We’ll simply:

  • Select DISTINCT results only, as palindromes might appear several times in a word
  • Replace the palindrome substring by its upper case version and add some whitespace to better visualise it. That’s totally optional, of course
  • Remove the trivial palindromes of length 0 and 1

And we’re done! Again, feel free to play around with this on SQLFiddle, or, much better, provide a cool palindrome algorithm implementation in your favourite language in the comments section, or on Twitter:

More beautiful SQL in these articles here:

jOOQ Tuesdays: Oliver Gierke Talks About Spring Data

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Oliver Gierke, the Spring Data Project Lead at Pivotal with strong opinions on DDD and REST.

Hi Oliver – Since 2011, you have mostly been working for Pivotal (previously SpringSource) on Spring Data as the project lead. What is it that fascinates you most about working with data?

To be completely honest it’s not data in itself that got me into Spring Data, or even the predecessor of the JPA module which I was working on even before I joined SpringSource, but the interest in managing complexity in software. When you talk about that, there’s no way you can avoid talking about Domain-Driven Design and its building blocks like value objects, entities, aggregates and the concept of a repository which then again gets you into the realms of data access.

So we as the Spring Data team always have to trade different driving forces against each other: first, the level of abstraction and the programming model that you use in your application code and how well and easy it actually allows you to implement domain logic that solves your business problem at hand. Second, the different tradeoffs that different data stores have already made and in how we actually allow our users to leverage them and at the same time expose some commonalities in the programming model so that developers can transfer knowledge between projects that might use different stores for certain reasons easily. Spring Data is trying to bridge that gap, provide a low entry barrier in modelling aggregates and repositories but at the same time give users the tools to fall back to very efficient means of data access that require a lot of developer control for cases where that’s the top priority.

Spring Data has an impressive number of officially and unofficially supported modules that reach far beyond relational data models. What are the biggest challenges in working with so many models and technologies in a single API?

Definitely the diversity in approaches and tradeoffs by the underlying persistence technologies. Actually that’s one of the reasons, Spring Data does not try to provide a singular unifying API. It’s not even trying to do that for stores of a certain category, for example document databases. We’ve rather taken a general Spring ecosystem philosophy and implemented it in the repositories and data access space: let’s have a consistent programming model with repeatable patterns so that it’s easy to understand the purpose of the abstraction, but then let the abstraction expose store specific specialties so that we’re not abstracting away those but rather allow developers to leverage them when needed.

The general concept of an interface based repository abstraction is not the most revolutionary thing on earth, I admit. But we now look back at almost a decade of experience in designing the parts of the programming model that are indeed API and I think we’ve learned a lot from mistakes we made. Java 8 will be the baseline for the upcoming second generation of Spring Data which enriches our options in terms of APIs. Reactive programming is a hot topic at them moment, too. So there are a lot of balls to juggle in that space.

What’s your favourite module, and why?

That’s of course hard to say as it’s been awesome to see what the individual store modules have turned into over time. However, I’ve grown a bit of a special relationship for the Commons module which is the foundation for all of the store ones as it basically contains the heart and soul of Spring Data: the object mapping facilities, the repository proxy implementation etc. And it’s great to see how we often times can add some functionality there and that functionality is immediately available for repositories of all stores.

On the other end of the spectrum there’s Spring Data REST, a module that exposes RESTful resources based on your aggregate and repository definitions. I like that very much as well, as it works across all the Spring Data modules exposing a repository API and is a great showcase of what you can achieve on top of such an abstraction. Also, it has really helped us to make developers aware of a couple of often overseen aspects of REST, but I guess we’re gonna get to that in a second.

Maintaining a big and widely used API is hard, balancing tons of user requests, integrating third party functionality, maintaining backwards compatibility. What are some maintainer battle stories you’d like to share?

It certainly is. Especially with so many concurrent — sometimes contradicting — forces in play. That starts with the question of versioning the modules: what do we actually version here? User facing API. But which part of the API is that? That totally depends on how much the user is customizing behavior. Is a developer building a Spring Data module for some new data store a user, too? Of course, but a very different kind of user. We usually try to be very conservative with changes that could affect application developers but a bit more demanding when it comes to the implementors of a store module.

That’s all stuff we sort of had and have to deal with on a day to day basis. Interestingly, we’ve been the first ones in the broader Spring engineering team that have picked up the notion of a release train — we group together releases of all modules and name them after famous computer scientists —, that had been popularized by the Eclipse team. That approach worked well for us and has now been adopted by Spring Cloud, Reactor and other teams as well.

Oliver, I have to ask, why does everyone misunderstand REST?

I’m kind of surprised this question comes up in this context, but I guess I have build up some reputation on the internet to complain about people being from unspecific to — in my opinion — outright wrong about this topic :D.

I guess the fundamental problem that REST has is that some parts of it are moderately easy to understand and implement. These days everyone agrees that URIs are a cool thing and that using the right HTTP verb for a given task is a good idea. But even with the latter you’ll easily find people that don’t understand why it’s a good idea to prefer a PUT request over a POST one. Which already brings us to the second part.

Then there are parts that are harder to grasp and a bit harder to implement. The hypermedia aspect comes to mind. Unfortunately theses aspects are the ones that heavily influence whether what you build delivers on the promises that REST makes: being an architectural style that gives you e.g. scalability and evolvability. So people basically start ignoring these aspects, sometimes even outright arguing they don’t need them but then turn around and criticize REST for not delivering on its promises.

In my opinion that’s a way to common pattern observable in the wild, but I guess the only way to improve the situation here is to work on making it easier to implement those aspects and good examples of the benefits you get when you follow those advanced constraints.