jOOQ Tuesdays: Thorben Janssen Shares his Hibernate Performance Secrets


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

thorben-janssen

I’m very excited to feature today Thorben Janssen who has spent most of his professional life with Hibernate.

Thorben, with your blog and training, you are one of the few daring “annotatioficionados” as we like to call them, who risks diving deep into JPA’s more sophisticated annotations – like @SqlResultSetMapping. What is your experience with JPA’s advanced, declarative programming style?

From my point of view, the declarative style of JPA is great and a huge problem at the same time.

If you know what you’re doing, you just add an annotation, set a few properties and your JPA implementation takes care of the rest. That makes it very easy to use complex features and avoids a lot of boilerplate code.

But it can also become a huge issue, when someone is not that familiar with JPA and just copies a few annotations from stack overflow and hopes that it works.

It will work in most of the cases. JPA and Hibernate are highly optimized and handle suboptimal code and annotations quite well. At least as long as it is tested with one user on a local machine. But that changes quickly when the code gets deployed to production and several hundred or thousand users use it in parallel. These issues get then often posted on stack overflow or other forums together with a complaint about the bad performance of Hibernate…

Your training goes far beyond these rather esoteric use-cases and focuses on JPA / Hibernate performance. What are three things every ORM user should know about JPA / SQL performance?

Only three things? I could talk about a lot more things related to JPA and Hibernate performance.

The by far most important one is to remember that your ORM framework is using SQL to store your data in a relational database. That seems to be pretty obvious, but you can avoid the most common performance issues by analyzing and optimizing the executed SQL statements. One example for that is the popular n+1 select issue which you can easily find and fix as I show in my free, 3-part video course.

Another important thing is that no framework or specification provides a good solution for every problem. JPA and Hibernate make it very easy to insert and update data into a relational database. And they provide a set of advanced features for performance optimizations, like caching or the ordering of statements to improve the efficiency of JDBC batches.

But Hibernate and JPA are not a good fit for applications that have to perform a lot of very complex queries for reporting or data mining use cases. The feature set of JPQL is too limited for these use cases. You can, of course, use native queries to execute plain SQL, but you should have a look at other frameworks if you need a lot of these queries.

So, always make sure that your preferred framework is a good fit for your project.

The third thing you should keep in mind is that you should prefer lazy fetching for the relationships between your entities. This prevents Hibernate from executing additional SQL queries to initialize the relationships to other entities when it gets an entity from the database. Most use cases don’t need the related entities, and the additional queries slow down the application. And if one of your use cases uses the relationships, you can use FETCH JOIN statements or entity graphs to initialize them with the initial query.

This approach avoids the overhead of unnecessary SQL queries for most of your use cases and allows you to initialize the relationships if you need them.

These are the 3 most important things you should keep in mind, if you want to avoid performance problems with Hibernate. If you want to dive deeper into this topic, have a look at my Hibernate Performance Tuning Online Training. The next one starts on 23th July.

What made you focus your training mostly on Hibernate, rather than also on EclipseLink / OpenJPA, or just plain SQL / jOOQ? Do you have plans to extend to those topics?

To be honest, that decision was quite easy for me. I’m working with Hibernate for about 15 years now and used it in a lot of different projects with very different requirements. That gives me the experience and knowledge about the framework, which you need if you want to optimize its performance. I also tried EclipseLink but not to the same extent as Hibernate.

And I also asked my readers which JPA implementation they use, and most of them told me that they either use plain JPA or Hibernate. That made it pretty easy to focus on Hibernate.

I might integrate jOOQ into one of my future trainings. Because as I said before, Hibernate and JPA are a good solution if you want to create or update data or if your queries are not too complex. As soon as your queries get complex, you have to use native queries with plain SQL. In these cases, jOOQ can provide some nice benefits.

What’s the advantage of your online training over a more classic training format, where people meet physically – both for you and for your participants?

The good thing about a classroom training is that you can discuss your questions with other students and the instructor. But it also requires you to be in a certain place at a certain time which creates additional costs, requires you to get out of your current projects and keeps you away from home.

With the Hibernate Performance Tuning Online Training, I want to provide a similar experience to a classroom training in which you study with other students and ask your questions but without having to travel somewhere. You can watch my training videos and do the exercises from your office or home and meet with me, and other students in the forum or group coaching calls to discuss your questions.

So you get the best of both worlds without declaring any travel expenses😉

Your blog also includes a weekly digest of all things happening in the Java ecosystem called Java Weekly. What are the biggest insights into our ecosystem that you’ve gotten out of this work, yourself?

The Java ecosystem is always changing and improving, and you need to learn constantly if you want to stay up to date. One way to do that is to read good blog posts. And there are A LOT of great, small blogs out there written by very experienced Java developers who like to share their knowledge. You just have to find them. That’s probably the biggest insight I got.

I read a lot about Java and Java EE each week (that’s probably the only advantage of a 1.5-hour commute with public transportation) and present the most interesting ones every Monday in a new issue of Java Weekly.

jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk:

jOOQ Tuesdays: Glenn Paulley Gives Insight into SQL’s History


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

glenn-paulley

I’m very excited to feature today Glenn Paulley who has been working with and on SQL for several decades.

Glenn, you have been a part of the database ecosystem since the very early days, having been Director of Engineering at Sybase and representing SAP with the SQL Standard committee. What is it that fascinates you so much about databases?

Data management technology has been my favourite subject within Computer Science since I was an undergraduate student at the University of Manitoba, and so it has remained throughout much of my career. I was privileged to be part of the team that implemented the first online IBM DB2 application – using IMS/TM as the transaction monitor – in Canada at Great-West Life Assurance in the late 1980’s, and we hosted some of the DB2 development team from IBM’s San Jose lab – including Don Haderle – to celebrate that achievement. So yes, I guess you could say that I’ve been around for a while.

Much of my personal expertise lies in the realm of query processing and optimization, having had Paul Larson and Frank Wm. Tompa as my Ph.D. supervisors at the University of Waterloo. But I am interested in many related database subjects: query languages certainly, multidatabase systems, and information retrieval. Two closely-related topics within database systems are of particular interest to me.  One is scale. Companies and organizations have a lot of data; billion-row tables in a relational database are fairly routine today in many companies. Advances in other technologies, such as the Internet of Things, are going to dramatically increase the data management requirements for firms wanting to take advantage of these technologies. Implementing solutions to achieve the scale required is difficult, and is of great interest to me. The related issue, quite naturally, is processing queries over such vast collections and getting the execution time down to something reasonable. So query optimization remains a favourite topic.

We’ve had a very interesting E-Mail conversation about SQL’s experiments related to Object Orientation in the late 1990’s. Today, we have ORDBMS like Oracle, Informix, PostgreSQL. Why didn’t ORDBMS succeed?

ORDBMS implementations were not as successful as their developers expected, to be sure, though I would note that many SQL implementations now contain “object” features even though objects are specifically omitted from the ISO SQL standard. Oracle 12c is a good example – Oracle’s PL/SQL object implementation supports a good selection of object-oriented programming features, such as polymorphism and single inheritance, that when coupled with collection types provide a very rich data model that can handle very complex applications. There are others, too, of course: InterSystems’ Caché product, for example, is still available.

So, while object support in relational systems is present, in many instances, to me the significant issues are (1) the performance of object constructions on larger database instances and (2) the question of where do you want objects to exist in the application stack: in the server proper, or in another tier, implemented in a true object-oriented language? The latter issue is the premise behind object-relational mapping tools, though I think that their usage often causes as many problems as they solve.

How do you see the future of the SQL language – e.g. with respect to Big Data or alternative models like document stores (which have N1QL) or graph stores (which have Open Cypher)?

My personal view is that SQL will continue to evolve; having an independent query language that permits one to query or manipulate a database but avoid writing a “program” in the traditional sense is a good idea. Over time that language might evolve to something different from today’s SQL, but it is likely that we will still call it S-Q-L. I do not expect a revolutionary approach to be successful; there is simply far too much invested in current applications and infrastructure. Don’t get me wrong – I’m not saying SQL doesn’t need improvements. It does.

Unlike the work of e.g. the JCP or w3c, which are public and open, SQL still seems to be a more closed environment. How do you see the interaction between SQL and the end user? Can “ordinary” folks participate in the future of SQL?

The SQL standard is published by the International Standards Organization (ISO), whose member countries contribute to changes to the standard and vote on them on a regular basis. Member countries that contribute to the ISO SQL standard include (at least) the US, Canada, Germany, the United Kingdom, Japan, and Korea. Participation in the standards process requires individuals or companies to belong to these “national bodies” in order to view drafts of the standard and vote on proposed changes. Usually those meetings are held in-person – at least in Canada and the United States – so there is a real cost to participation in the process.

While having SQL as an international ISO standard is, I think, worthwhile, the ISO’s business model is based on activities such as collecting revenue from the sale of standards documents. Unless governments or other benefactors sponsor the development of those standards, then it is difficult to see how the standard can be made more freely available. That issue is a political one, rather than a technical one.

After a brief stop at Conestoga College, you’re heading back to SAP. What will you be working on in the next few years?

I will be continuing to focus on database technology now that I’ve returned to SAP. So far I’ve had a fantastic experience being back at the SAP Waterloo lab, and I have every expectation that will continue into the future.

You’ve spent a lot of time at SAP (and before SAP) with Sybase. How do the different Sybase distributions relate to SAP’s new flagship database HANA?

The various SAP database systems (SQL Anywhere, IQ, ASE) all contain database technology pertinent to the HANA platform.

Last question: What do SQL and Curling have in common?:-)

As a Level 3 curling coach, one of my tasks as a High Performance Coach for the Ontario Curling Council is to collect performance data on the athletes in our various programs. Naturally I use a database system to store that data, and perform various analyses on it, so the two are not as unrelated as you might think!

 

jOOQ Tuesdays: Rafael Winterhalter is Wrestling Byte Code with Byte Buddy


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Rafael Winterhalter

We have the pleasure of talking to Rafael Winterhalter in this seventh edition who will be telling us about the depths of Java byte code, and about his library Byte Buddy, which makes working with byte code extremely easy.

Note that Byte Buddy won the 2015 Duke’s Choice award – congratulations to this from our side!

Hi Rafael – You’re the author of the popular Byte Buddy library. What does Byte Buddy do?

Byte Buddy is a code generation and manipulation library. It offers APIs for creating new Java classes at runtime and for changing existing classes before or after they were loaded.

At first glance, this might sound like a very esoteric thing to do, but runtime code generation is used in a large number of Java projects. Code generation tools are often used by library developers to implement aspect-oriented programming. For example, the mocking library Mockito adopted Byte Buddy to create subclasses of mocked classes at runtime. In order to implement a mock, Mockito overrides all methods of a class such that the user’s original code is not invoked when a method is called in a test. And there are plenty of other well-known users of code generation. Spring, for example, uses code generation to implement its annotation aspects such as security or transactions. And Hibernate uses code-generation to lazily load properties from getter methods by overriding those getters to query the database only if they are invoked.

Why is there a need for Byte Buddy when there are alternatives like ASM, CGLIB, AspectJ or Javassist?

Before I started working on Byte Buddy, I was involved in several other open-source projects as a contributor. As mentioned before, code generation is a typical requirement for implementing many libraries and so I got used to working with mostly CGLIB and Javassist. However, I became constantly frustrated with those libraries’ limitations and I wanted to resolve the problems I had discovered. Eventually, I started to write an alternative library that I later published as Byte Buddy.

To understand the limitations of alternative libraries, mocks are a good example use case. Mocks in Mockito were previously created using CGLIB. CGLIB is a rather mature library. It has been around for over 15 years and when it was originally developed, the library’s developers did of course not anticipate features such as annotations, generic types or defender methods. Annotations did however become an important part of many APIs which would not accept a mock instance because any annotations of overridden methods were lost. In Java, annotations on methods are never inherited when they are overridden. And annotations on types are only inherited if they are explicitly declared to be. To overcome this, Byte Buddy allows to copy any annotation to a subclass what is now a feature in Mockito 2.

In contrast, Javassist allows to copy annotations, but I do not personally like the approach of the library. In Javassist, all generated code is represented as Java code contained in strings. As a result, Javassist code evolves similarly unstructured to Java code that only describes SQL as concatenated strings. Besides creating code that is difficult to maintain, this approach also offers vulnerabilities such as Java code injection similar to SQL injection. It is sometimes possible to attack Javassist code by letting it compile arbitrary code what can cause sever damage to an application.

AspectJ is a powerful tool when manipulating existing code. However, Byte Buddy lets you do anything that AspectJ is capable of but in plain and simple Java. This way, developers do not need to learn a new syntax or programming metaphor or install tools for their build-process and IDEs. Furthermore, I do not find the join-point and point-cut terminology intuitive and decided to avoid it altogether. Instead, I decided to mimic terminology that developers already know from the Java programming language to ease the first steps with Byte Buddy.

ASM on the other hand is the basis on top of which Byte Buddy is implemented. ASM is a byte code parser rather than a code generation library. ASM processes single class files  and does not consider type hierarchies. ASM does neither have a concept of class loading and does not include higher-level concepts on top of byte code instructions. Byte Buddy offers however an adapter that exposes the ASM API to users that require the generation of very specific code.

How does one become so involved with low-level Java?

In the beginning, I set myself a goal of only creating a version of CGLIB with annotation support which was what I originally needed. But I quickly found out that a lot of developers were looking for the solution that Byte Buddy has become today. Therefore, I started to plan to make the full feature set of the Java virtual machine accessible. For doing so, learning all the gory details and corner cases of the class file format has become a necessity to implement these features. To be fair, the class file format is fairly trivial once you get the hang of it and I really enjoy to see my library mature.

Between Java byte code (2GL language) and SQL (4GL language), there are many levels of programmatic abstraction. Where do you feel at home the most?

I would want to use the right tool for the right job. Obviously, I enjoy working with byte code, but I would avoid handcrafting byte code when working in a production project. In the end, this is what higher-level abstractions such as Byte Buddy are made for.

Looking at the common use cases, Byte Buddy is however often used for implementing custom features by changing code based on annotations on methods. In a way, Byte Buddy enables developers to implement their own 4G abstraction. Declarative programming is a great abstraction for certain tasks, SQL being one of them.

You’ve become a famous speaker and domain expert in a very short time. What’s your most exciting story, being an influencer?

Mainly, I find it exciting to meet users of my library. I have met folks that implemented internal frameworks with large teams that is based on my software and obviously, it makes me proud that Byte Buddy proves to be that useful.

Thank you very much Rafael

If you want to learn more about Rafael’s work, about byte code or about Byte Buddy, check out his talk at JavaZone:

jOOQ Tuesdays: Markus Winand is on a Modern SQL Mission


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. markuswinandThis includes people who work with SQL, Java, Open Source, and a variety of other related topics.

We are excited to talk with Markus Winand in this sixth edition. Markus is the author of the popular book SQL Performance Explained and the even more popular website Use The Index, Luke, and we’re thrilled to see that he’s pulling off another stunt:

Hi Markus – You have recently launched modern-sql.com. What is your goal with this website?

My goal for modern-sql.com is to create a textbook and reference about the SQL goodies you didn’t learn in school or university. Interestingly, online manuals about these features are pretty sparse. They come in two fashions: blog posts and vendor documentation. Blog posts are usually one-off events covering a particular feature or use case. There are many great blogs out there – the jOOQ blog being one of them – but there is no one I could recommend to learn all about recent SQL features. Vendor documentation, on the other hand, is mostly a reference about syntax—quite often even a bad one: they often don’t mention standard compliance at all and tend to follow a “proprietary features first” approach.

The consequence is that SQL market is very fragmented: besides SQL-92, there is no obvious base that is common to all databases. This becomes particularly evident on the job market: job offers either require just SQL—meaning good old relational SQL—or they require experience with a specific product. That’s pretty much the norm nowadays and nobody questions it. However, how would you think about this job opening: “Google Chrome Web Developer.” Web developers can’t choose the client’s browser. Many tried, but failed. Remember “optimized for XYZ”? That’s why web developers demanded standard conformance from browsers over the past decades. Just having launched a new website I can say that CSS conformance has improved drastically over the past five years. Ultimately, I’d like the same thing to happen for SQL. I hope that modern-sql.com sparks interest in standard conforming SQL so that developers also start to demand standard conformance from the database vendors. Quite an ambitious goal.

Last year, you’ve gone into battle against the SQL OFFSET clause. Want to shed some light on the background of that campaign?

The most striking problem with OFFSET is that it is generally used for an invalid use case: pagination. In this case, OFFSET is used to skip over a number of rows in the intention to find the rows following the previously selected ones. However, OFFSET does per definition not return the rows following one that was selected earlier, but just discards the first N rows of the result. Coincidentally, OFFSET yields the expected result if the data has not changed in the meanwhile—a case that is pretty common during development. But as soon as rows are added or deleted, discarding a fixed number of rows just doesn’t give the right result. The correct approach is to remember the last row fetched and use this data in a WHERE clause to select the rows following. This approach is explained in detail at http://use-the-index-luke.com/no-offset.

Besides the fact that OFFSET cannot be used to implement correct pagination, OFFSET is also bad for performance. OFFSET is wrong and slow. What else do you need? As a matter of fact, the only valid use case I know for OFFSET it to implement SLEEP in SQL—not that I ever need that. Unfortunately, OFFSET made it into the SQL standard in 2011. I consider this the worst mistake in recent history of SQL because it can’t be corrected. The only good part is that it is an optional feature—vendors don’t need to implement for standard conformance. Nevertheless, Oracle and Microsoft just recently added OFFSET to their SQL databases.

You’ve written a very popular book on SQL Performance called SQL Performance explained. How does it compare to other SQL books and why should our readers buy it?

I’ll start with the second question. First of all you must know that the full content of SQL Performance Explained is available for free at http://use-the-index-luke.com/. Most people I’m asking why they bought the book did so because the like the web site. They bought the book either to support my work on Use The Index, Luke (greatly appreciated!) or, more importantly, to finally read the book from cover to cover. A typical answer I get goes along these lines: “I knew Use The Index, Luke for years and have read many articles there, but I finally wanted to read everything from the beginning to end.”

Now coming to the first question why the world needed another SQL performance tome: it didn’t. Therefore, I wrote a very small book that can be read in less than a day. I focused on the basic concepts, which are the same in most databases, and boldly skipped less common special cases. Its shortness is also most appreciated in the reviews. On the other hand, the book has occasionally being criticized as being incomplete—probably because the sub-title reads “Everything developers need to know about SQL performance”. Personally, I think these critiques somehow proof my point: Obviously, Java, PHP or .NET developers don’t need to know as much about SQL performance as database performance consultants. When writing for such an audience, you must skip a lot.

Where do you see SQL in 10 years from today?

I hope that the temporal features of SQL:2011 (see here) become commonly available—also in free open source databases. At the moment, they are only available in commercial databases—even there the completeness and standard conformance varies. I would also hope that the SQL standard finds a way to cope with the current trend that every database vendor adds its own proprietary set of JSON functions. Unfortunately, it might be too late for that already.

However, my greatest hope is that developers realize that SQL is not stuck in 1992. The standard has added many useful features since than. Most databases offer a good part of these features. It’s really just our perception of SQL that got stuck in 1992.

Learn more about Markus’s work

… Markus is giving his Modern SQL talk at conferences. Learn more about it here:

jOOQ Tuesdays: Thomas Müller Unveils How HSQLDB Evolved into the Popular H2 Database


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

tom_grayscale

We have the pleasure of talking to Thomas Müller in this fifth edition who will be telling us about the exciting history of Java’s most popular embedded database H2

Hi Thomas – Your H2 database is virtually everywhere. It has become many Java developers’ favourite integration test database. How did it grow so popular?

I guess because it’s easy to use, relatively fast, and, up to some point, compatible with popular databases.

I understand that you have a common, personal history with HSQLDB, previously also known as Hypersonic SQL. How and why did H2 evolve from HSQLDB?

Back in 1998, I wanted to learn Java. For fun, I implemented the relatively new skip list algorithm, added a SQL interface, and published it as open source. I got feedback from people who thought it’s useful, so I continued and gave it the name Hypersonic SQL. In 2000 I got a job offer from a Silicon Valley startup, PointBase Inc. The plan was to continue with Hypersonic SQL, and keep it open source. But after I started, the company decided it’s better if I stop. This was a surprise to me. I told them I can’t prevent others from continuing. And so Fred Toussi took the code and started HSQLDB with it. At around 2005, PointBase ran out of money, I wanted to go back to HSQLDB. But I felt more radical changes were needed, and it would be better to start a new project instead, which was then H2.

Very interesting historic facts! … and when will you replace Derby / JavaDB in the JDK?:-)

Hopefully never! If H2 is integrated in the JDK, Oracle would put constraints to the future of H2. I don’t want that. I want to keep H2 independent.

You’ve been in the database industry for a while. What is your most interesting anecdote that you’d like to share?

When Oracle bought MySQL, I was very surprised I got mail from the European Union with a large questionary about the merger, how it will affect the industry and competition. I don’t know how they found my work address, it is not on the web site, and they never asked me by email. And Switzerland is not even part of the European Union. H2 is a very small fish in the “database pond”, but it seems H2 does matter.

It’s a small world or small pond, I guess!

You’re one of the few developers I know who is working both on a SQL database (H2) and on a NoSQL database (JackRabbit, Adobe CRX). Tell us a little bit about how those databases compare.

They are quite different. H2 is a relational database with the traditional SQL and JDBC API, and Jackrabbit is a mix between a file system and a database, with a very different API, and a hierarchical data model. The query language is different as well: for Jackrabbit, XPath is more commonly used, even thought SQL is available as well. Both the relational and the hierarchical models have advantages and disadvantages. The hierarchical model more easily supports semi-structured, JSON style data. The relational model, on the other hand, is more “mature”, and there is more competition.

At Adobe / JackRabbit, you’re heavily involved with implementing storage algorithms in Java. Is the JVM even good at implementing low-level storage stuff?

Yes! The real advantage of Java is that a programmer can concentrate at the algorithms, and doesn’t have to spend so much time with memory management and low level stuff. That way, there is more time to improve the algorithms. And in relational databases, the most important aspect is using the best possible algorithms, for example to reduce I/O. Even for very low level CPU intensive stuff like data compression and encryption, things like concurrency and cache efficiency nowadays are more important than whether to use Java or C.

That’s an interesting thought along the lines of avoiding premature optimisation!

One last question: What problems are you working on right now?

Optimizing the database for solid state disks (SSDs) and new file system. Almost all relational databases still use algorithms optimized for rotating disks, where overwriting small blocks (for example 4 KB) was the way to go. With SSDs and Btrfs this doesn’t work well. H2 version 1.4.x (beta) already uses a new storage subsystem (MVStore) that should work well, however there is still some work needed before it is ready for production.

Other than database stuff, I’m interested in various programming topics. If I implement something that might be useful, I publish it as open source within my H2 database project, in the “tools” directory, until it is used in the database or moved to another project. I wrote a archiving utility (like zip, gzip) called “ArchiveTool” that combines de-duplication with regular compression. It is fast but compresses large directories (source code, databases) very well. As part of my work on the new storage subsystem, I came across minimal perfect hash tables. I invented a new algorithm that needs less space than all known ones (“MinimalPerfectHash”). I would like to publish a paper about that. There are plenty of interesting problems to solve.

jOOQ Tuesdays: Axel Fontaine Predicts Exciting Times as our Industry is Maturing


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

afontaine

We have the pleasure of talking to Axel Fontaine in this fourth edition who will be telling us about the exciting times that are ahead as our industry is maturing.

Hi Axel – Citing your website: From “Software is what I do” to “Architecting for Continuous Delivery & Zero Downtime”. Where did your personal passion for devops come from?

Over the last 18 years in our industry I have seen a lot of projects for numerous clients in many different sectors. And it always struck me. The intersection between development and operations was almost invariably the place where the largest efficiency gains could be realized. Over time this then became the place where I naturally started to focus my attention, initially for consultancy and training, and then later also for public speaking and the product business.

Very interesting! I suspect that the Flyway download rates also confirm the increasing need for devops tooling, right?

I am glad you mentioned it. Yes, Flyway download numbers have been going through the roof. In March we had one download every 42 seconds and we are well on track to break the 1 million mark in 2015! This picture actually tells us a number of things. First of all it confirms that the need for database migrations is a very widespread itch and that Flyway’s no-nonsense pragmatic and lightweight approach to scratch it resonates with the industry. The other thing this tells us, is that our industry as a whole is maturing and continuous delivery and deployment in particular is becoming more mainstream. This increased pace of delivery comes with a pressing need for a higher level of automation. And modern lightweight tools like Flyway are a great fit for this.

Now, you are converting that passion into a product called Boxfuse. Can you tell us a little more about it?

I have always been a strong believer in lightweight and open solutions that are a pleasure to use and minimize lock-in. When it comes to deploying an application to the cloud, so far you had to choose one or the other. You could either have something that is a joy to use (PaaS), but leaving you stuck in a walled garden. Or you could have something open (IaaS) that unfortunately leaves you with a lot of operational concerns to deal with. Boxfuse aims for a fresh new middle ground: the convenience of PaaS with the freedom of IaaS. To achieve that Boxfuse does away with a bunch of cargo cult. No more general purpose operation system, no more traditional provisioning. Instead Boxfuse analyses your application and in just a few seconds, fuses it into a tiny bootable image that is about 1% percent of the size of a regular system. Boxfuse then offers you a sophisticated blue/green deployment process for zero downtime updates. All of this, out of the box and with a single command.

We’re entrepreneurs ourselves. Marketing a product is a long road. What are your own key takeaways from your experience as both a consultant and entrepreneur?

I couldn’t agree more. What marketing really is is competing for attention, and turning that attention into mind share through a series of arguments, both technical and emotional. It is an exercise that is often overlooked in our technical circles, where the code is front and center. But the best technology in the world isn’t worth a dime if you can’t convince someone else to use it. And that takes perseverance, drive, and dedication. What seems easy on the outside of often the fruit of lots of hard labor behind the scenes. Eventually you will succeed, but give it a little time. Overnight success takes years.

Where do you see the future of our industry, from your perspective?

Believe me when I say these are exciting times. While it still feels like a big adventure with great new frontiers to explore, our industry is maturing rapidly. Overly complex tools are going the way of the dodo. User expectations are rising fast. And the only way to meet them is through modern  lightweight tools and services that integrate seamlessly with our workflow and perform the undifferentiated heavy lifting for us. We need to eliminate all unnecessary complexity and so we can focus on what matters most: delighting our customers.