jOOQ Tuesdays: Nicolai Parlog Talks About Java 9

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Nicolai Parlog, author of The Java Module System

Nicolai, your blog is an “archeological” treasure trove for everyone who wants to learn about why Java expert group decisions were made. What made you dig out all these interesting discussions on the mailing lists?

Ha, thank you, didn’t know I was sitting on a treasure.

It all started with everyone’s favorite bikeshed: Optional. After using it for a few months, I was curious to learn more about the reason behind its introduction to Java and why it was designed the way it was, so I started digging and learned a few things:

  • Piperman, the JDK mailing list archive, is a horrible place to peruse and search.
  • Mailing list discussions are often lengthy, fragmented, and thus hard to revisit.
  • Brian Goetz was absolutely right: Everything related to Optional seems to take 300 messages.

Consequently, researching that post about Optional’s design took a week or so. But as you say, it’s interesting to peek behind the curtain and once a discussion is condensed to its most relevant positions and peppered with some context it really appeals to the wider Java community.

I actually think there’s a niche to be filled, here. Imagine there were a site that did regularly (at least once a week) what I did with a few selected topics: Follow the JDK mailing list, summarize ongoing discussions, and make them accessible to a wide audience. That would be a great service to the Java community as it would make it much easier to follow what is going on and to chime in with an informed opinion when you feel you have something to contribute. Now we just need to find someone with a lot of free time on their hands.

By the way, I think it’s awesome that the comparitively open development of the JDK makes that possible.

I had followed your blog after Java 8 came out, where you explained expert group decisions in retrospect. Now, you’re mostly covering what’s new in Java 9. What are your favourite “hidden” (i.e. non-Jigsaw) Java 9 features and why?

From the few language changes, it’s easy pickings: definitely private interface methods. I’ve been in the situation more than once that I wanted to share code between default methods but found no good place to put it without making it part of the public API. With private mehods in interfaces, that’s a thing of the past.

When it comes to API changes, the decision is much harder as there is more to choose from. People definitely like collection factory methods and I do, too, but I think I’ll go with the changes to Stream and Optional. I really enjoy using those Java 8 features and think it’s great that they’ve been improved in 9.

A JVM feature I really like are multi-release JARs. The ability to ship a JAR that uses the newest APIs, but degrades gracefully on older JVMs will come in very handy. Some projects, Spring for example, already do this, but without JVM support it’s not exactly pleasant.

Can I go on? Because there’s so much more! Just two: Unified logging makes it much easier to tease out JVM log messages without having to configure logging for different subsystems and compact strings and indified string concatenation make working with strings faster, reduce garbage and conserve heap space (on average, 10% to 15% less memory!). Ok, that were three, but there you go.

You’re writing a book on the Java 9 module system that can already be pre-ordered on Manning. What will readers get out of your book?

All they need to become module system experts. Of course it explains all the basics (delcaring, compiling, packaging, and running modular applications) and advanced features (services, implied readability, optional dependencies, etc), but it goes far beyond that. More than how to use a feature it also explains when and why to use it, which nuances to consider, and what are good defaults if you’re not sure which way to go.

It’s also full of practical advice. I migrated two large applications to Java 9 (compiling and running on the new release, not turning them into modules) and that experience as well as the many discussions on the mailing list informed a big chapter on migration. If readers are interested in a preview, I condensed it into a post on the most common Java 9 migration challenges. I also show how to debug modules and the module system with various tools (JDeps for example) and logging (that’s when I started using uniform logging), Last but not least, I plan to include a chapter that simply lists error messages and what to do about them.

In your opinion, what are the good parts and the bad parts about  Jigsaw? Do you think Jigsaw will be adopted quickly?

The good, the bad, and the ugly, eh? My favorite feature (of all of Java 9 actually) is strong encapsulation. The ability to have types that are public only within a module is incredibly valuable! This adds another option to the private-to-public-axis and once people internalize that feature we will wonder how we ever lived without it. Can you imagine giving up private? We will think the same about exported.

I hope the worst aspect of the module system will be the compatibility challenges. That’s a weird way to phrase it, but let me explain. These challenges definitely exist and they will require a non-neglectable investmement from the Java community as a whole to get everything working on Java 9, in the long run as modules. (As an aside: This is well invested time – much of it pays back technical debt.)

My hope is that no other aspect of the module system turns out to be worse. One thing I’m a little concerned about is the strictness of reliable configuration. I like the general principle and I’m definitely one for enforcing good practices, but just think about all those POMs that busily exclude transitive dependencies. Once all those JARs are modules, that won’t work – the module system will not let you launch without all dependencies present.

Generally speaking, the module system makes it harder to go against the maintainers’ decisions. Making internal APIs available via reflection or altering dependencies now goes against the grain of a mechanism that is built deeply into the compiler and JVM. There are of course a number of command line flags to affect the module system but they don’t cover everything. To come back to exclusing dependencies, maybe–ignore-missing-modules ${modules} would be a good idea…

Regarding adoption rate, I expect it to be slower than Java 8. But leaving those projects aside that see every new version as insurmountable and are still on Java 6, I’m sure the vast majority will migrate eventually. If not for Java 9’s features than surely for future ones. As a friend and colleague once said: “I’ll do everything to get to value types.”

Now that Java 9 is out and “legacy”, what Java projects will you cover next in your blog and your work?

Oh boy, I’m still busy with Java 9. First I have to finish the book (November hopefully) and then I want to do a few more migrations because I actually like doing that for some weird and maybe not entirely healthy reason (the things you see…). FYI, I’m for hire, so if readers are stuck with their migration they should reach out.

Beyond that, I’m already looking forward to primitive specialization, e.g. ArrayList<int>, and value types (both from Project Valhalla) as well as the changes Project Amber will bring to Java. I’m sure I’ll start discussing those in 2018.

Another thing I’ll keep myself busy with and which I would love your readers to check out is my YouTube channel. It’s still very young and until the book’s done I won’t do a lot of videos (hope to record one next week), but I’m really thrilled about the whole endavour!

jOOQ Tuesdays: Oliver Gierke Talks About Spring Data

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Oliver Gierke, the Spring Data Project Lead at Pivotal with strong opinions on DDD and REST.

Hi Oliver – Since 2011, you have mostly been working for Pivotal (previously SpringSource) on Spring Data as the project lead. What is it that fascinates you most about working with data?

To be completely honest it’s not data in itself that got me into Spring Data, or even the predecessor of the JPA module which I was working on even before I joined SpringSource, but the interest in managing complexity in software. When you talk about that, there’s no way you can avoid talking about Domain-Driven Design and its building blocks like value objects, entities, aggregates and the concept of a repository which then again gets you into the realms of data access.

So we as the Spring Data team always have to trade different driving forces against each other: first, the level of abstraction and the programming model that you use in your application code and how well and easy it actually allows you to implement domain logic that solves your business problem at hand. Second, the different tradeoffs that different data stores have already made and in how we actually allow our users to leverage them and at the same time expose some commonalities in the programming model so that developers can transfer knowledge between projects that might use different stores for certain reasons easily. Spring Data is trying to bridge that gap, provide a low entry barrier in modelling aggregates and repositories but at the same time give users the tools to fall back to very efficient means of data access that require a lot of developer control for cases where that’s the top priority.

Spring Data has an impressive number of officially and unofficially supported modules that reach far beyond relational data models. What are the biggest challenges in working with so many models and technologies in a single API?

Definitely the diversity in approaches and tradeoffs by the underlying persistence technologies. Actually that’s one of the reasons, Spring Data does not try to provide a singular unifying API. It’s not even trying to do that for stores of a certain category, for example document databases. We’ve rather taken a general Spring ecosystem philosophy and implemented it in the repositories and data access space: let’s have a consistent programming model with repeatable patterns so that it’s easy to understand the purpose of the abstraction, but then let the abstraction expose store specific specialties so that we’re not abstracting away those but rather allow developers to leverage them when needed.

The general concept of an interface based repository abstraction is not the most revolutionary thing on earth, I admit. But we now look back at almost a decade of experience in designing the parts of the programming model that are indeed API and I think we’ve learned a lot from mistakes we made. Java 8 will be the baseline for the upcoming second generation of Spring Data which enriches our options in terms of APIs. Reactive programming is a hot topic at them moment, too. So there are a lot of balls to juggle in that space.

What’s your favourite module, and why?

That’s of course hard to say as it’s been awesome to see what the individual store modules have turned into over time. However, I’ve grown a bit of a special relationship for the Commons module which is the foundation for all of the store ones as it basically contains the heart and soul of Spring Data: the object mapping facilities, the repository proxy implementation etc. And it’s great to see how we often times can add some functionality there and that functionality is immediately available for repositories of all stores.

On the other end of the spectrum there’s Spring Data REST, a module that exposes RESTful resources based on your aggregate and repository definitions. I like that very much as well, as it works across all the Spring Data modules exposing a repository API and is a great showcase of what you can achieve on top of such an abstraction. Also, it has really helped us to make developers aware of a couple of often overseen aspects of REST, but I guess we’re gonna get to that in a second.

Maintaining a big and widely used API is hard, balancing tons of user requests, integrating third party functionality, maintaining backwards compatibility. What are some maintainer battle stories you’d like to share?

It certainly is. Especially with so many concurrent — sometimes contradicting — forces in play. That starts with the question of versioning the modules: what do we actually version here? User facing API. But which part of the API is that? That totally depends on how much the user is customizing behavior. Is a developer building a Spring Data module for some new data store a user, too? Of course, but a very different kind of user. We usually try to be very conservative with changes that could affect application developers but a bit more demanding when it comes to the implementors of a store module.

That’s all stuff we sort of had and have to deal with on a day to day basis. Interestingly, we’ve been the first ones in the broader Spring engineering team that have picked up the notion of a release train — we group together releases of all modules and name them after famous computer scientists —, that had been popularized by the Eclipse team. That approach worked well for us and has now been adopted by Spring Cloud, Reactor and other teams as well.

Oliver, I have to ask, why does everyone misunderstand REST?

I’m kind of surprised this question comes up in this context, but I guess I have build up some reputation on the internet to complain about people being from unspecific to — in my opinion — outright wrong about this topic :D.

I guess the fundamental problem that REST has is that some parts of it are moderately easy to understand and implement. These days everyone agrees that URIs are a cool thing and that using the right HTTP verb for a given task is a good idea. But even with the latter you’ll easily find people that don’t understand why it’s a good idea to prefer a PUT request over a POST one. Which already brings us to the second part.

Then there are parts that are harder to grasp and a bit harder to implement. The hypermedia aspect comes to mind. Unfortunately theses aspects are the ones that heavily influence whether what you build delivers on the promises that REST makes: being an architectural style that gives you e.g. scalability and evolvability. So people basically start ignoring these aspects, sometimes even outright arguing they don’t need them but then turn around and criticize REST for not delivering on its promises.

In my opinion that’s a way to common pattern observable in the wild, but I guess the only way to improve the situation here is to work on making it easier to implement those aspects and good examples of the benefits you get when you follow those advanced constraints.

jOOQ Tuesdays: Gerald Sangudi and Keshav Murthy Reveal the Secrets of N1QL (SQL on JSON)

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Gerald Sangudi and Keshav Murthy, creators of the the N1QL query language, CouchBase’s SQL-based JSON querying language.

Hi Gerald and Keshav – It’s great to have two people on the jOOQ Tuesdays interview at once! How did you guys meet?

Keshav: Gerald interviewed me for my the job at Couchbase.

Gerald: Best decision we ever made.

You’re both working at CouchBase, one of the leading “JSON databases” in the market. What drove you towards JSON?

Keshav: I worked for IBM, specifically on the Informix database. In 2013, Websphere teams needed JSON support. I was skeptical of JSON for data management except for managing metadata.  Years before, I had successfully opposed implementing XML and XQuery inside Informix. Then I went to a NoSQL conference where I saw enterprises using JSON in real-world applications.  I saw enterprises like eBay, Cisco developing their enterprise applications on NoSQL and JSON.

RDBMS customers had been asking for ALTER TABLE ONLINE when they needed to add new columns. They typically would have additional unused  columns which they’d rename them when needed.  Obviously, this wasn’t an optimal or a good solution because you could run out these columns, types or structure wasn’t flexible.

For years, customers had been ALTER TABLE while application were online.  I realized, JSON was a good way to provide it. That’s when we added JSON as a type, added sharding, and started to extend SQL to support JSON.  A good relational database can deliver lot of value JSON.  So, we developed a mechanism to exploit JSON to provide schema flexibility on relational tables and we received a patent for it: http://bit.ly/2og1uXe.

At Couchbase, data is stored as JSON and N1QL was designed as SQL for JSON.  So, we keep calm and JSON.

Gerald: I am not at Couchbase any more, but I was there for four years. Couchbase has its roots in memcached and CouchDB, so Couchbase was already storing and managing JSON data before N1QL and before we arrived. N1QL was developed to enable our users to query and manipulate their JSON data.

You’re both working on, or have worked on N1QL, CouchBase’s innovative SQL-esque query language for JSON. Has the SQL language come full circle? Why is the SQL syntax such a great fit for you?

Keshav: Don Chamberlin, co-inventor of SQL said this. During XQuery discussion, he argued against using SQL for manipulating XML. Now, with N1QL and SQL++ on JSON, he sees enormous possibilities for SQL to manipulate JSON effectively.  So, you could say, it has come a full circle.

From the Application point of view, requirement for SQL to support complex data models has been there.  SQL-99 added the structured type into the language, but didn’t recognize need for schema flexibility.

What Gerald has very nicely is to inherit SQL and  extend not just the language but the
underlying boolean logic.  

SQL has three valued logic: TRUE, FALSE, NULL.  N1QL has 4-valued logic: TRUE, FALSE, NULL and MISSING.  SQL has the select-join-project operations. N1QL has those and adds NEST and UNNEST operations for handling arrays.  Once we have the logic and operations, the type system, rest of the expressions for handling nested objects and arrays can be added.

There is another important change in N1QL compared to SQL.  SQL  is the query language to manipulate data.  N1QL can discover the document metadata (names, structure and types) dynamically and operate on it.

Gerald: SQL is the greatest and most successful query language of all time. Our job was to enable our users to query data that is sometimes different from what standard SQL expects. We hope N1QL does that.

As an aside, the “N1” in N1QL stands for non-first normal form. SQL geeks like you Lukas may know that “non-first” is one primary difference between JSON data and relational data. The other primary differences are schema and uniformity.

Will your relational competition steal features from N1QL? Or will you steal more features from them?

Keshav: I do hope relational databases steal features from N1QL. Having common approaches solve problems makes it easier for customers to choose the right database for the right problem. I do hope they choose Couchbase more often than RDBMS!

Gerald always says, we don’t differ from SQL unless there is a good reason. In that sense, we’ve taken lot of the features from relational databases already.  In addition, we learn from successful models in relational databases for things like index design, query optimization, security and monitoring.  We stand on shoulders of giants.

Gerald: We hope that both relational and non-relational vendors steal features from each other. There is a collaborative effort on something called SQL++, which is a superset of both SQL and N1QL. We hope SQL++ is the convergence point. One lesson from the success of SQL is that standards are great for both users and vendors.

CouchBase doesn’t have what you call a “static schema”. The biggest advantage of a static schema for the database optimiser is the many ways such a schema can be used to predict performance and choose optimal execution plans. How does optimisation work in a “schema-less” database?

Keshav: Actually, static schema gives you the structure, but not the data distribution, which is a major factor for calculating the execution cost. N1QL uses the information within the query and available indexes to glean the structure and decide on the plan. For example, if you have a query with a predicate:  WHERE state = “CA” and zipcode = “94040”, and there is an index on either state, zipcode or both, we’d assume these key-values exist in the document and push down the predicates to index scan.  We given further details on an article on DZone:  https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-optimization

Separately, we do have a mechanism to INFER the schema by sampling.  Right now, we show the inferred schema so users can understand the structure.  We also use the inferred schema to make query editing easier with hints and validations within the workbench.  We do have plans to use this, collect additional statistics to improve decisions in the optimizer.

Gerald: The N1QL optimizer is one of the joys of working on N1QL. As Keshav said, most of the SQL optimization techniques carry over to N1QL. The data may not have a static schema, but the query has an implicit schema, and the indexes have implicit schemas. That is, the query has predicates and other characteristics, and each index has keys and other characteristics. That is enough to keep an optimizer busy.

Working for a newer database vendor, you don’t have as much legacy, so you can innovate faster and more freely. What will be the next big thing in the database market?

Keshav: Our metric for innovation is the progress customer trying to make in their business. So, we innovate within the constraint of a customer job.  That helps us to innovate and measure its success from customer point of view.

We see customers deploying NoSQL databases for newer patterns like systems of engagement.

When you plan your vacation, you search a lot before you buy.  Search for places, hotels, airlines, things-to-do.  Then you compare costs, ratings, before you make the final purchase. For a travel company, these require significant database infrastructure to support high number of queries with low latencies and at a very low cost.

We see customers deploying NoSQL databases to support lot of the customer engagement and information requirement use cases.  RDBMS is still used as system of record for buy-sell-cancel-checkin-etc operations but will integrate with system of engagement databases to enhance customer experience.

Doing this effectively requires innovations in every area: data platform designs, scale out, query processing, index designs and manageability.

Gerald: I read Jeff Bezos’ latest annual letter to shareholders.  He says that the things that don’t change are more important than the things that do change. Databases should store your data reliably and give you answers and updates quickly. If database vendors continue to do that, we’ll be ok.

jOOQ Tuesdays: Richard North Makes Database Testing More Reproducible with Testcontainers

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Richard North, creator of Testcontainers, a promising new database and UI testing tool for reproducible tests.

Hi Richard – you work at Skyscanner, which means you have tons of travel data to work with. What’s the most exciting thing about working with this data?

Skyscanner is a really data-led company, using data at all levels in decision-making. The volume of data we gather and process really helps us understand what travellers want and serve them better. For example, our destination recommender system helps with people discover new and interesting places to go, based upon a vast amount of data, algorithms and experiments. But it’s interesting how this varies from recommendations in internet media companies – there are far fewer possible destinations than books, songs and movies, yet users’ reasons for travelling and tastes can be more nuanced and varied.

There’s yet more data-oriented work under the surface, too – for example, the infrastructure needed to gather and analyse such large amounts of data, and the running of experiments to help us improve. There are a lot of smart people working hard to make this all happen, and it’s an exciting place to be!

You’ve created an increasingly popular testing framework, Testcontainers. What made you do it? What itch does it scratch?

Well, I think it’s something that scratches several itches at the same time – things that previously we only had isolated solutions to. The common element, though, is reproducibility of test environments. Of all my time developing and writing tests for JVM-based systems, it’s always been the non-JVM dependencies that caused the most complexity, unreliability and maintenance overhead.

I remember my first day as a developer, years ago: I was given a desktop machine and 2 days’ worth of step-by-step instructions that I needed to follow – just so that I’d be able to develop and run tests with all dependencies in place. A few months later I had to repeat the same task many times over when building new CI servers.

A lot has changed since then in terms of how we deploy and manage our production infrastructure, and thankfully Docker has done a lot to further bring prod-parity to developers’ machines.

Testcontainers started out as my effort to bring the full power of Docker to integrated testing on the JVM in two areas that I’ve experienced the most pain: testing against a clean, representative database, and making browser-based selenium testing more reproducible, both for developers and on CI.

Mostly being curious about testing databases, your documentation mentions Testcontainers as an alternative to using H2 as a test database. What are the disadvantages of emulating a database e.g. with H2? Did you make any personal experience with that?

Yes, definitely – it was one of the tipping point factors that triggered me to create Testcontainers. I do think H2 is a fantastic piece of work in what it manages to deliver, and it’s something I’ve used on a number of projects to good effect.

However, compatibility with real databases has often been a sticking point. Back in 2015, before I started Testcontainers, we were struggling with a few MySQL features that didn’t have equivalents in H2. We were facing the unpleasant prospect of having to constrain our implementation to what H2 would allow us to test against. It became fairly obvious that there was a gap in the market for an H2-like tool that was actually a facade to a Docker-based database container – and Testcontainers was born.

How do you think of mocking the database at any layer, including the DAO layer, service layer, etc.?

I’m all in favour of keeping tests small, light and layered, and using mocks to accomplish this. This might sound strange coming from somebody who has developed an integrated testing tool, but it’s true!

Still, I feel that we need to be pragmatic about how we approach automated tests and how we make sure we’re testing the right thing – especially when crossing boundaries. Are we testing how this code behaves against reality, or are we testing against our own (potentially false) understanding of how external components work?

My feeling is that it’s quite straightforward to mock layers of your system that you yourself wrote, or where you can easily jump into the source code, a spec or documentation. With an external component, you can still produce a mock that behaves how you expect, or how you witness the real thing behaving. But does that mock continue to represent the real thing, especially after accretion of other features, or the additional perils of state that a database entails – schema changes and actual data?

My ideal is to mock the data access layer for consumption by higher layers, but to be quite careful about what the data access layer itself talks to in my tests. It should probably be a real database. Hopefully Testcontainers is one tool that helps make this particular thing a little less painful – so that when you find yourself needing to do this, there’s a way to do it easily.

What’s the biggest challenge you’ve faced when testing databases, or other things?

It’s not databases, but I’d say that by far the hardest testing challenge I’ve faced as a developer is mobile apps, especially iOS. I’ve always enjoyed mobile development  as a whole, but when switching from a Java server-side/web project to mobile, it really feels like you’re going back in time. Some of the challenges are harder – such as asynchronicity and platform APIs that make it harder to structure software in a testable way. But it also feels like the tooling is much further behind, and until quite recently received far less attention. I feel the net result has been that developers have been discouraged from investing in automated tests, which is sad given that we know how valuable they can be.

Things are getting better, but I do greatly prefer the testing aspects of working on server-side JVM projects. For all its difficulties, we are actually quite lucky!

jOOQ Tuesdays: Brett Wooldridge Shows What it Takes to Write the Fastest Java Connection Pool

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

brett-wooldridge

I’m very excited to feature today Brett Wooldridge, creator of HikariCP, the fastest connection pool available for Java.

Brett, you’ve created one of the most popular connection pools for Java: HikariCP. What made your library so popular?

I’ll provide some backstory on HikariCP before I answer that, but I’ll tease the answer by saying “marketing“.

A few years ago I was creating a product prototype for the company I work for, and I needed a connection pool. Like most developers I just wanted to drop in a pool and move on, so I took to the web to find the most popular and actively maintained library. Unfortunately, while load testing the prototype we started encountering deadlocks, and exceptions indicating connection state bleed over between threads.

Because the pool was open source, I thought I’d just pull down the code, find and fix the problems, and contribute back. But when I opened the code, I found thousands of lines more code than I was expecting.  Added to the mix were many locks, nested, sometimes acquired in one method and released in some distant place. There was simply no way to reason about where potential deadlocks lurked, even if we found and fixed the ones we encountered.

I picked up another pool and inspected its code. The lock semantics were clearer, but the volume of code was still more than 2x what I expected, especially given that it was delegating the core pooling logic to a separate library.

In addition, all of the pools I studied violated JDBC contracts in multiple ways. In as much as it is possible, a pool should return a Connection that is indistinguishable from one received in the absence of the pool. But these pools didn’t automatically close Statements when a connection was “closed” (returned), or clear warnings, or rollback uncommitted transactions, and they didn’t reset properties altered by the user such as auto-commit or transaction isolation level, and more; resulting in the next consumer getting a “dirty” connection.

I thought, “Really? This is the state of connection pools in the ecosystem after 20 years of Java?” Out of necessity and frustration, I created HikariCP.

To be fair, since I started HikariCP some pools have made some of these “correctness” behaviors configurable, but none of them do so by default and I suspect most users are running with the safety off.  At least two popular pools fail to complete our benchmark with OutOfMemory exceptions when they are enabled.  Conversely, HikariCP doesn’t support an unsafe mode of operation.

Returning to your question, as noted above there were many established pools available, so how did HikariCP become popular?  “Correctness” and reliability are a tough sell, so I focused on promoting performance, and started with a simple tweet. One follower led to another.  Some users tweeted about big performance gains, and improved reliability, and at some point in 2015 the Wix engineering team wrote a blog about switching to HikariCP.

In essence, simple word of mouth has led to HikariCP’s rising popularity, with an initial “marketing” push based on performance.  I do hope that over time more users will give equal weight to correctness and reliability, without which performance is meaningless, and for my part I plan to write more about those aspects of HikariCP.  

You quoted Edsger Dijkstra: “Simplicity is prerequisite for reliability.” – That reminds me of Antoine de Saint-Exupery’s “Perfection is Achieved Not When There Is Nothing More to Add, But When There Is Nothing Left to Take Away”. How do you manage to keep things simple when this world only ever gets more complicated?

Resisting complexity through feature-creep can be challenging.  I get a lot of requests for this or that feature, and while each may be simple in and of itself, if taken in totality would significantly increase complexity and code size.  Of course, that is not to say that I don’t add features.

For example, initial versions of HikariCP only supported a fixed size pool.  HikariCP was designed for systems with fairly constant load, and in that environment pools tend to stay at their maximum size, so I saw little need to complicate the code to support dynamic sizing.  Can you imagine a server at Google falling idle for several minutes?  Additionally, I feel like the more axes of configuration there are, the more difficult it is for users to optimally configure a pool.  However, eventually there were enough users who needed dynamic sizing, and its absence was a barrier to adoption, so support was added.  Principally, I did not want lack of dynamic sizing support to deprive users of the reliability and correctness benefits of HikariCP.

Still, I probably reject the vast majority of feature requests. As the custodian of HikariCP keeping it simple and true to that core philosophy is in the best interest of the community.  I always try to minimize the “surface area”, both in terms of code and configuration.  The larger the surface area of an API, the more difficult it is to comprehend.  Our brains have a limit for the amount of contextual information that can be held “in view” at one time; this is true in a lot of contexts.  For example, when reading code, methods larger than a certain size, or conditionals of more than a certain number of terms, are difficult to follow or reason about.  Generally, for users of HikariCP, the “surface area” is manifest in the number of configuration parameters.  While I can hardly say that “Perfection [has been] achieved”, I do feel like there is not much left to take away without cutting into functionality.

Few libraries go to the byte code level to optimise their code. While this helps in benchmarks, did it also help your users in production? What were the biggest caveats you found while micro-optimising?

Definitely.  Maybe some developers are dismissive of the potential gains, because in their minds they think, “What does it matter if connection acquisition takes 100ns or 100μs, the query is going to take 10ms anyway?”  However, pools intercept dozens of methods, and the “close()” path is typically slower than acquisition, so it’s not that simple.  I often get reports from users providing confirmation of real world performance improvements.  It’s anecdotal but one user initially commented in a bug report, “We’re testing HikariCP at the client and have had great initial success – an application loading 1 million records over multiple HTTP threads and putting them in the DB had its run time cut by 70% after moving from Tomcat CP to HikariCP!”  The follow-up comment on the bug was, “This was a bug in our side, using some unrelated non-threadsafe code.  No issue.  After fixing the bug, the code runs about 2x faster using HikariCP than Tomcat CP.”  That’s pretty good; and yet some reports surprise even me.

Regarding optimisation, and as long as we’re quoting famous thinkers, I would be remiss if I didn’t cite Knuth: “We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil.”  I think the key word here is “premature”.  It is definitely better to write the code as it naturally comes and then, based on detailed profiling and benchmarking, perform “peephole optimisations” (to hijack a word from compiler theory).  At the same time, I would estimate that half of the performance gains in HikariCP have come as the result of algorithmic changes, rather than low-level optimisations.

Regarding caveats to micro-optimising, it would be hard to convey how much I have learned, and am still learning.  I’d like to give a shout-out to Aleksey Shipilëv for his excellent JMH micro-benchmark framework.  Aleksey has become somewhat of a JVM performance oracle (no pun intended, he used to work for Oracle).  The JVM performs an amazing array of optimisations, and if one is not careful then what appears to be a clever optimisation in the code simply confuses the JIT’s pattern-based optimiser and the result is slower rather than faster.

In order to effectively optimise on the JVM you sometimes end up reading the JIT source code, and you must become familiar with concepts such as dead code elimination, loop invariant hoisting, constant propagation, virtual call inlining, and many more.  Even with a good grip on these concepts I am sometimes surprised by the JVM in my attempts at optimisation.  In addition to the JIT, you really must understand the Java Memory Model (JMM) and how it maps onto CPU architectures like x86.

Lastly, after the design of algorithms, contention for shared state is the source of most bottlenecks (see the aforementioned JMM), so recently the biggest gains (for example, in v2.6.0) have come from tricks that simply avoid it; the fastest code is code that is never executed.

If there is a main takeaway, it is “trust the benchmarks”, your assumptions and intuitions are wrong more often than you imagine.

Your fellow jOOQ Tuesdays interviewee Vlad Mihalcea talked to us about queueing theory. How does this compare to what you wrote about connection pool sizing?

I have great respect for Vlad, I think we’re both members of the Mutual Admiration Society.  His FlexyPool is trying to solve a difficult problem; that being how to automatically tune optimal pool settings for varying loads.  Ultimately, the upper-bound is constrained by the database’s optimal concurrent query capacity, which is where my write-up on pool sizing comes into play.  However, there is a large amount of configuration space in-between a minimally sized pool and that upper-bound, which is where FlexyPool is trying to add value, by ensuring that the pool is “right sized”, dynamically, for the load it is servicing.

I say it is a difficult problem, because connection pools on modern multi-core servers likely present as a M/G/k queue in queueing theory; arrivals have a Markovian distribution, service times have a General distribution, and there are k servers (where “server” is defined as an abstract single-threaded processor).  Quoting wikipedia, “Most performance metrics for this queueing system are not known and remain an open problem.”  Modeling connection pools as a M/M/c queue might provide a decent approximation for the purposes of predicting queue lengths, but service times are not likely to have a Markovian distribution.  Of course, there are also non-Markovian stochastic models in queueing theory that could be applied.  Complicating everything is the fact that queued waiters (threads) can abandon the queue before service, for example when a timeout is reached.  That adds an additional twist when trying to predict queue lengths and wait times.  Hats off to Vlad for taking on this problem!

Anyway, what I wrote about setting the upper-bound on pool sizing translates to pinning the k (or c) value in those respective Markovian queueing theory models.

You chose a Japanese word in your product: 光 (Hikari, “Light”). What’s your connection to Japan?

I’ve lived and worked in Tokyo since 2008, though I think my Japanese is far behind where it should be given my time here.  I chalk that up to preferring time at the keyboard to language study.

As you mentioned, Hikari (pronounced Hi-ka-lee) translates to “Light” (as in sunlight).  In English, it is a double entendre in the context of HikariCP; though in Japanese it would not be.  “Light” in the sense of “the speed of…”, and “light” in the sense of being light in terms of code weight.

jOOQ Tuesdays: Mario Fusco Talks About Functional and Declarative Programming

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

mariofusco

I’m very excited to feature today Mario Fusco, author of LambdaJ, working on Red Hat’s drools, a Java Champion and frequent speaker at Java conferences on all topics functional programming.

Mario, a long time ago, I have already stumbled upon your name when looking up the author of Lambdaj – a library that went to the extreme to bring lambdas to Java 5 or earlier. How does it work? And what’s the most peculiar hack you implemented to make it work?

When I started developing Lambdaj in 2007 I thought to it just as a proof-of-concept to check how far I could push Java 5. I never expected that it could become something that somebody else other than myself may actually want to use. In reality, given the limited, or I should say non-existing, capabilities of Java 5 as a functional language, Lambdaj was entirely a big hack. Despite this, people started using and somewhat loving it, and this made me (and possibly somebody else) realize that Java developers, or at least part of them, were tired of the pure imperative paradigm imposed by the language and ready to experiment with something more functional.

The main feature of Lambdaj, and what made its DSL quite nice to use, was the possibility to reference the method of a class in a static and type safe way and pass it to another method. In this way you could for example sort a list of persons by their age doing something like:

sort(persons, on(Person.class).getAge());

As anticipated what happened under the hood was a big hack: the on() method created a proxy of the Person class so you could safely call the getAge() method on it. The proxy didn’t do anything useful other than registering the method call. However it had to return something of the same type of the value returned by the actual method to avoid a ClassCastException. To this purpose it had a mechanism to generate a reasonably unique instance of that type, an int in my example. Before returning that value it also associated it, using a WeakHashMap, to the invoked method. In this way the sort() method was actually invoked with a list and the value generated by my proxy. It then retrieved from the map the Java method associated with that value and invoked it on all the items of the list performing the operation, a sorting in this case, that it was supposed to execute.

That’s crazy :) I’m sure you’re happy that a lot of Lambdaj features are now deprecated. You’re now touring the world with your functional programming talks. What makes you so excited about this topic?

The whole Lambdaj project is now deprecated and abandoned. The new functional features introduced with Java 8 just made it obsolete. Nevertheless it not only had the merit to make developers become curious and interested about functional programming, but also to experiment with new patterns and ideas that in the end also influenced the Java 8 syntax. Take for instance how you can sort a Stream of persons by age using a method reference

persons.sort(Person::getAge)

It looks evident how the method references have been at least inspired by the Lambdaj‘s on() method.

There is a number of things that I love of functional programming:

  1. The readability: a snippet of code written in functional style looks like a story while too often the equivalent code in imperative style resembles a puzzle.
  2. The declarative nature: in functional programming is enough to declare the result that you want to achieve rather than specifying the steps to obtain it. You only care about the what without getting lost in the details of the how.
  3. The possibility of treating data and behaviors uniformly: functional programming allows you to pass to a method both data (the list of persons to be sorted) and computation (the function to be applied to each person in the list). This idea is fundamental for many algorithms like for example the map/reduce: since data and computation are the same thing and the second is typically orders of magnitude smaller you are free to send them to the machine holding the data instead of the opposite.
  4. The higher level of abstraction: the possibility of encapsulating computations in functions and pass them around to other functions allows both a dramatic reduction of code duplication and the design of more generic and expressive API.
  5. Immutability and referential transparency: using immutable values and having side-effects programs makes far easier to reason on your code, test it and ensure its correctness.
  6. The parallelism friendliness: all the features listed above also enable the parallelization of your software in a simpler and more reliable way. It is not coincidence that functional programming started becoming more popular around 10 years ago that is also when multicore CPUs began to be available on commodity hardware.

Our readers love SQL (or at least, they use it frequently). How does functional programming compare to SQL?

The most evident thing that FP and SQL have in common is their declarative paradigm. To some extent SQL, or at least the data selection part, can be seen as a functional language specialized to manipulate data in tabular format.

The data modification part is a totally different story though. The biggest part of SQL users normally change data in a destructive way, overwriting or even deleting the existing data. This is clearly in contrast with the immutability mantra of functional programming. However this is only how SQL is most commonly used, but nothing dictates that it couldn’t be also employed in a non-destructive append-only way. I wish to see SQL used more often in this way in future.

In your day job, you’re working for Red Hat, on drools. Business rules sound enterprisey. How does that get along with your fondness of functional programming?

Under an user point of view a rule engine in general and drools in particular are the extreme form of declarative programming, second only to Prolog. For this reason developers who are only familiar with the imperative paradigm struggle to use it, because they also try to enforce it to work in an imperative way. Conversely programmers more used to think in functional (and then declarative) terms are more often able to use it correctly when they approach it for the first time.

For what regards me, my work as developer of both the core engine and the compiler of drools allows me to experiment every day in both fields of language design and algorithmic invention and optimization. To cut it short it’s a challenging job and there’s lot’s of fun in it: don’t tell this to my employer but I cannot stop being surprised that they allow me to play with this everyday and they also pay me for that.

You’re also on the board of VoxxedDays Ticino, Zurich, and CERN (wow, how geeky is that? A large hadron collider Java conference!). Why is Voxxed such a big success for you?

I must admit that, before being involved in this, I didn’t imagine the amount of work that organizing a conference requires. However this effort is totally rewarded. In particular the great advantage of VoxxedDays is the fact of being local 1-day events made by developers for developers that practically anybody can afford.

I remember that the most common feedback I received after the first VoxxedDays Ticino that we did 2 years ago was some like: “This has been the very first conference I attended in my life and I didn’t imagine it could have been a so amazing experience both under a technical and even more a social point of view. Thanks a lot for that, I eagerly wait to attend even next year”. Can you imagine something more rewarding for a conference organizer?

The other important thing for me is giving the possibility to speakers that aren’t rock stars (yet) to talk in public and share their experience with a competent audience. I know that for at least some of them this is only the first step to let themselves and others discover their capabilities as public speakers and launch them toward bigger conferences like the Devoxx.

Thank you very much Mario

If you want to learn more about Mario’s insights on functional programming, please do visit his interesting talks at Devoxx from the recent past:

jOOQ Tuesdays: Daniel Dietrich Explains the Benefits of Object-Functional Programming

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

danieldietrich

I’m very excited to feature today Daniel Dietrich whose popular library vavr is picking up a lot of momentum among functional programming afictionados working with Java.

Daniel, you created vavr – Object-Functional Programming in Java, a library that is becoming more and more popular among functional programmers. Why is vavrso popular?

Thank you Lukas for giving me the opportunity to share my thoughts.

I think that many users were disappointed about Java 8 in the whole, especially those who are already familiar with more advanced languages. The Java language architects did an awesome job. Java 8 brought groundbreaking new features like Lambdas, the new Stream API and CompletableFuture. But the new abstractions were only poorly integrated into the language from an API perspective.

There is already an increasing amount of write-ups about the disadvantages of Java 8, starting with the drawbacks of the Optional type. We read that we have to take care when using parallel Streams. These are self-made problems that keep us busy, stealing our expensive time. vavr provides us with alternatives.

There is no reason to reinvent the wheel. My vision is to bring as much as possible of the Scala goodness to Java. In fact Scala emerged from Java in the form of the Pizza language. Back in 2001 it had features like generics, function pointers (aka lambdas), case classes (aka value types) and pattern matching. In 2004 Java got generics, in 2014 came lambdas, and hopefully Java 10 will include value types. Scala left Java far behind. It used the last 15 year to evolve.

Object-functional programming is nothing new. It is the best of both worlds, object-oriented programming and functional programming. Scala is one of the better choices to do it on the JVM. Java’s Lambdas are an enabling feature. They allowed us to create a vavr API that is similar to Scala.

Java developers who get their hands on vavr often react in a way that I call the nice-effect: “Wow that’s nice, it feels like Scala”.

You have published a guest post on the jOOQ blog about vavr more than one year ago. Since then, vavr has moved forward quite a bit and you’ve recently published the roadmap for version 3.0. What have you done since then and where are you going?

Yes, that is true, it has changed a lot since then. We released vavr 1.2.2 two weeks before the first jOOQ guest post went online. Beside enriched functions that release offered popular Scala features like Option for null-safety, Try for performing computations headache-free in the presence of exceptions and a fluent pattern matching DSL. Also notably we shipped two common persistent collections, an eagerly evaluated linked List and the lazy form of it, also called Stream.

Roughly one year later we released vavr 2.0.0. We hardened the existing features and most notably included Future and Promise for concurrent programming and a full-fledged, Scala-like persistent collection library. Beside that, we replaced the pattern matching DSL with a more powerful pattern matching API that allows us to recursively match arbitrary object trees.

I spent a significant amount of time and energy abstracting on the type level over the mentioned features, as far as this is possible in Java. For Java developers it is not important to call things monads, sum-types or products. For example we do not need to know group theory in order to calculate 1 + 1. My duty as library developer is to make it as simple as possible for users of vavr to reach their goals. The need to learn new APIs and DSLs should be reduced to the minimum. This is the main reason for aligning vavr to Scala.

Our efforts for the next release concentrate on adding more syntactic sugar and missing persistent collections beyond those of Scala. It will be sufficient to add one import to reach 90% of vavr’s API. There will be new persistent collections BitSet, several MultiMaps and a PriorityQueue. We are improving the performance of our collections, most notably our persistent Vector. It will be faster than Java’s Stream for some operations and have a smaller memory footprint than Java’s ArrayList for primitive elements.

Beyond library features we pay special attention on three things: backward compatibility, controlled growth and integration aspects. Web is important. Our Jackson module ensures that all vavr types can be sent over the wire as serialized JSON. The next release will include a GWT module, first tests already run vavr in the browser. However, the vavr core will stay thin. It will not depend on any other libraries than the JDK.

Towards the next major release 3.0.0 I’m starting to adjust the roadmap I sketched in a previous blog post. I’ve learned that it is most important to our users that they can rely on backward compatibility. Major releases should not appear often, following the 2.x line is a better strategy. We will start to deprecate a few APIs that will be removed in a future major release. Also I keep an eye on some interesting developments that will influence the next major release. For example a new major Scala release is in the works and there are new interesting Java features that will appear in Java 10.

Looking at the current issues I don’t have to be an oracle to foresee that the next minor release 2.1.0 will take some more time. I understand that users want to start using the new vavr features but we need the time and the flexibility to get things right. Therefore we target a first beta release of 2.1.0 in Q4 2016.

In the meantime, there is a variety of functional(-ish) libraries for Java 8, like our own jOOλ, StreamEx, Cyclops, or the much older FunctionalJλvλ. How do all these libraries compare and how is yours different?

This question goes a little bit in the philosophical direction, maybe it is also political. These are my subjective thoughts, please treat them as such.

Humans have the ability to abstract over things. They express themselves in various ways, e.g. with painting and music. These areas split into different fields. For example in literature things are expressed in manifold ways like rhythmic prose and poetry. Furthermore different styles can be applied within these fields, like the iambic trimeter in poetry. The styles across different areas are often embossed by outer circumstances, bound to time, like an epoch.

In the area of mathematics there are also several fields, like algebra and mathematical analysis. Both have a notion of functions. Which field should I take when I want to express myself in a functional style?

Personally, I’m not able to afford the time to write non-trivial applications in each of the mentioned libraries. But I took a look at the source code and followed discussions. I see that nearly all libraries are embossed by the outer circumstance that lambdas finally made it to all curly-braces languages, especially to Java in our case. Library designers are keen to modernize their APIs in order to keep pace. But library designers are also interested in staying independent of 3rd party libraries for reasons like stability and progression.

The field of jOOQ is SQL in Java, the field of Cyclops is asynchronous systems. Both libraries are similar in the way that they adapted the new Java Lambda feature. I already mentioned that the new Java features are only poorly integrated into the language. This is the reason why we see a variety of new libraries that try to close this gap.

jOOQ needs jOOλ in order to stay independent. On the technical level StreamEx is similar to jOOλ in the way that both sit on top of Java’s Stream. They augment it with additional functionality that can be accessed using a fluent API. The biggest difference between them is that StreamEx supports parallel computations while jOOλ concentrates on sequential computations only. Looking at the SQL-ish method names it shows that jOOλ is tailored to be used with jOOQ.

Cyclops states to be the answer to the cambrian explosion of functional(-ish) libraries. It offers a facade that is backed by one of several integration modules. From the developer perspective I see this with skepticism. The one-size-fits-all approach did not work well for me in the past because it does not cover all features of the backing libraries. An abstraction layer adds another source of errors, which is unnecessary.

Many names of Cyclops look unfamiliar to me, maybe because of the huge amount of types. Looking at the API, the library seems to be a black hole, a cambrian implosion of reactive and functional features. John McClean did a great job abstracting over all the different libraries and providing a common API but I prefer to use a library directly.

FunctionalJλvλ is different. It existed long before the other libraries and has the noble goal of purely functional programming: If it does compile, it is correct. FunctionalJλvλ was originally driven by people well known from the Scala community, more specifically from the Scalaz community. Scalaz is highly influenced by Haskell, a purely functional language.

Haskell and Scala are much more expressive than Java. Porting the algebraic abstractions from Scalaz to Java turned out to be awkward. Java’s type system isn’t powerful enough, it does not allow us to reach that goal in a practical way. The committers seem to be disillusioned to me. Some state that Java is not the right language for functional programming.

vavr is a fresh take on porting Scala functionality to Java. At its core it is not as highly influenced by Scalaz and Haskell as FunctionalJλvλ is. However, for purely functional abstractions it offers an algebra module that depends on the core. The relation algebra/core can be compared to Scalaz/Scala.

vavr is similar to StreamEx in the way that it is not bound to a specific domain, in contrast to jOOλ and Cyclops. It is different from StreamEx in the sense that it does not build on top of Java’s Stream. I understand vavr as language addition that integrates well with existing Java features.

You have never spoken at conferences, you let other people do that for you. What’s your secret? :)

In fact I never attended a conference at all. My secret is to delegate the real work to others.

Joking aside, I feel more comfortable spending my time on the vavr source code than preparing conferences and travelling. Currently I am working on vavr beside my job but I’m still looking for opportunities to do it full-time.

It is awesome to see other people jumping on the vavr train. We receive help from all over the world. Beside IntelliJ and YourKit we recently got TouK as new sponsor and produced vavr stickers that are handed out at conferences.

Because of the increasing popularity of vavr there is also an increasing amount of questions and pull requests. Beside the conception and development I concentrate on code-reviews, discussions and managing the project.

Where do you see Java’s future with projects like Valhalla?

Java stands for stability and safety. New language features are moderately added, like salt to a soup. This is what we can expect from a future Java.

In his recent mission statement Brian Goetz gives us a great overview about the goals of Project Valhalla. From the developer point of view I really love to see that the Java language architects attach great importance to improve the expressiveness of Java. Value types for example will reduce a lot of redundant code and ceremony we are currently confronted with. It is also nice to see that value types will be immutable.

Another feature I’m really looking forward to is the extension of generics. It will allow us to remove several specializations that exist only for primitive types and void. Popular functional interfaces like Predicate, Consumer, Supplier and Runnable will be equivalent to Function. In vavr we currently provide additional API for performing side-effects. Having extended generics that API can be reduced to the general case, like it should have been from the beginning.

There are two more features I’m really interested in: local variable type inference, that will come to Java, and reified generics, that might come. Reified generics are needed when we want to get the type of a generic parameter at runtime. We already have type inference for lambdas. Extending it to local variables will increase conciseness and readability of method and lambda bodies while preserving type-safety. I think it is a good idea that we will still have to specify the return type of methods. It is a clear documentation of the API of an application.

I’m deeply impressed how Java and the JVM evolve over time without breaking backward compatibility. It is a safe platform we can rely on. The gap between Java and other, more modern languages is getting smaller but Java is still behind. Some popular features might never come and most probably outdated API will not get a complete refresh or a replacement. This is a field where libraries such as vavr can help.

jOOQ Tuesdays: Chris Saxon Explains the 3 Things Every Developer Should Know About SQL

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month (today, exceptionally on a Wednesday because of technical issues) where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

chris-saxon-headshot[1]

I’m very excited to feature today Chris Saxon who has worked with Oracle forever, and who is one of the brains behind the famous Ask Tom website.

Chris, you’re part of the famous Ask Tom team. Everyone working with Oracle has wound up on Ask Tom’s website at least once. You’ve answered an incredible amount of questions already. What’s it like to work for such a big community as Oracle’s?

It’s an amazing experience! My first real job was as a PL/SQL developer. My only knowledge of SQL was a couple of vaguely remembered lectures at university. Ask Tom was the main place I learned about SQL and Oracle Database. So it’s a huge honor to be on the other side, helping others get the best out of the technology.

The best part has to be the positive comments when you help someone solve a problem that’s been troubling them for days. That’s why we’re here. To help developers learn more about Oracle and improve their SQL skills. When you use the database to its full extent, you can write better, faster applications with less code!

What were your three most interesting questions, so far?

Any question that has a clear definition and a complete test case is interesting! ;) Personally I enjoy using SQL to solve complex problems the best. So the first two do just that:

1. Finding the previous row in a different group

The poster had a series of transactions. These were grouped into two types. For each row, they wanted to show the id of the previous transaction from the other group.

At first this sounds like a problem you can solve using LAG or LEAD. But these only search for values within the same group. So you need a different method.

I provided a solution using the model clause. Using this, you can generate columns based on complex, spreadsheet-like formulas. Rows in your table are effectively cells in the sheet. You identify them by defining dimensions which can be other columns or expressions. By setting the transaction type as a dimension, you can then easily reference – and assign – values from one type to the other.

This worked well. But commenters were quick to provide solutions using window functions and 12c’s match_recognize clause. Both of which were faster than my approach!

I like this because it shows the flexibility of SQL. And it shows the value of an engaged community. No one knows everything. By sharing our knowledge and workin together we can all become better developers.

2. Improving SQL that deliberately generates cartesian product

The poster had a set of abbreviations for words. For example, Saint could also be “St.” or “St”. They wanted to take text containing these words. Then generate all combinations of strings using these abbreviations.

The “obvious” solution the user had is to split the text into words. Then for each word, join the abbreviation table, replacing the string as needed. So for a five word string, you have five joins.

There are a couple of problems with this method. The number of joins limits the number of words. So if you have a string with seven words, but only six table joins you won’t abbreviate the final word.

The other issue is performance. Joining the same table N times increases the work you do. If you have long sentences and/or a large number of abbreviations, the query could take a long time to run.

To overcome these you need to ask: “how can I join to the abbreviation table just once?”

The solution to do this starts the same as the original. Split the sentence into a table of words. Then join this to the abbreviations to give a row for each replacement needed.

You can then recursively walk down these rows using CTEs. These build up the sentence again, replacing words with their abbreviations as needed. A scalable solution that only needs a single pass of each table!

The final question relates to performance. Tom Kyte’s mantra was always “if you can do it in SQL, do it in SQL”. The reason is because a pure SQL solution is normally faster than one which combines SQL and other code. Yet a question came in that cast doubt on this:

3. Difference in performance SQL vs PL/SQL

The poster was updating a table. The new values came from another table. He was surprised that PL/SQL using bulk processing came out faster than the pure SQL approach.

The query in question was in the form:

update table1
set col1 = (select col2 from table2 where t1.code = t2.code);

It turned out the reason was due to “missing” indexes. Oracle executes the subquery once for every row in table1. Unless there’s an index on table2 (code), this will full scan table2 once for every row in table1!

The PL/SQL only approach avoided this problem by reading the whole of table2 into an array. So there was only one full scan of table2.
 

The problem here is there was no index on the join condition (t1.code = t2.code). With this in place Oracle does an index lookup of table2 for each row in table1. A massive performance improvement!
 

The moral being if your SQL is “slow”, particularly in compared to a combined SQL + other language method, it’s likely you have a missing index (or two).

This question again showed the strength and value of the Oracle community. Shortly after I posted the explanation, a reviewer was quick to point out the following SQL solution:

merge into table1
using  table2
on   (t1.code = t2.code)
when matched
  then update set t1.col = t2.col;

This came out significantly faster than both the original update and PL/SQL – without needing any extra indexes!

You’re running a YouTube channel called “The Magic of SQL”. Are SQL developers magicians?

Of course they are! In fact, I’d say that all developers are magicians. As Arthur C Clarke said:

“Any sufficiently advanced technology is indistinguishable from magic”

The amount of computing power you carry around in your phone today is mind blowing. Just ask your grandparents!

I think SQL developers have a special kind of magic though :). The ability to answer hard questions with a few lines of SQL is amazing. And for it to adapt to changes in the underlying data to give great performance without you changing it is astounding.

Your Twitter account has a pinned tweet about window functions. I frequently talk to Java developers at conferences, and few of them know about window functions, even if they’ve been in databases like Oracle for a very long time. Why do you think they’re still so “obscure”?

Oracle Database has had window functions has had them since the nineties. But many other RDBMSes have only fully supported them recently. So a combination of writing “database independent” code and people using other databases is certainly a factor.

Use of tools which hide SQL from developers is also a problem. If you’re not actively using SQL, it’s easy to overlook many of its features.

Fortunately I think this is changing. As more and more developers are realizing, SQL is a powerful language. Learning how to use it effectively is a key skill for all developers. Window functions and other SQL features mean you can get write better performing applications with less code. Who doesn’t want that? ;)

What are three things that every developer should know about SQL?

1. Understand set based processing

If you find yourself writing a cursor loop (select … from … loop), and inside that loop you run more SQL, you’re doing it wrong.

Think about it. Do you eat your cornflakes by placing one flake in your bowl, adding the milk, and eating that one piece? Then doing the same for the next. And the next. And so on? Or do you pour yourself a big bowl and eat all the flakes at once?

If you have a cursor loop with more SQL within the loop, you’re effectively doing this. There’s a lot of overhead in executing each SQL statement. This will slow you down if you have a large number of statements that each process one row. Instead you want few statements that process lots of rows where possible.

It’s also possible to do this by accident. As developers we’re taught that code reuse is A Good Thing. So if there’s an API available we’ll often use it. For example, say you’re building a batch process. This finds the unshipped orders, places them on shipments and marks them as sent.

If a ship_order function exists, you could write something like:

select order_id from unshipped_orders loop
  ship_order ( order_id );
end loop;

The problem here is ship_order almost certainly contains SQL. SQL you’ll be executing once for every order awaiting postage. If it’s only a few this may be fine. But if there’s hundreds or thousands this process could take a long time to run.

The way to make this faster is to process all the orders in one go. You can do this with SQL like:

insert into shipments
  select … from unshipped_orders;

update unshipped_orders
set  shipment_date = sysdate;

You may counter there’s other, non-SQL, processing you need to do such as sending emails. So you still need a query to find the order ids.

But you can overcome this! With update’s returning clause, you can get values from all the changed rows:

update unshipped_orders
set  shipment_date = sysdate
returning order_id bulk collect into order_array;

This gives you all the order ids to use as you need.

2. Learn what an execution plan is and how to create and read one

“How can I make my SQL faster” is one of the most common classes of questions posted on Ask Tom. The trouble is there’s scant one-size-fits-all advice when it comes to SQL performance. To help we need to know what your query is, what the tables and indexes are and details about the data. But most importantly we need to know what the query is actually doing!

For example, say you want me to help you figure out a faster route to work. To do this I need to know which route you currently use and how long each part of it takes. We can then compare this against other routes, checking how far they are, expected traffic and predicted speeds. But we need the data to do this!

So when answering performance questions, the first thing we look for is an execution plan. Many confuse this with an explain plan. An explain plan is just a prediction. Often it’s wrong. And even when it’s right, we still don’t know how much work each step does.

An execution plan shows exactly what the database did. It also gives stats about how much work, how often and how long it took to process each step. Combine this with a basic understanding of indexes and join methods and you can often solve your own performance problems.

3. Use bind variables

Sadly data breaches are all too common. There hardly seems to be a week that goes by without news of a major company leaking sensitive data. And the root cause of these attacks is often SQL injection.

This is a simple, well known attack vector. If you write vulnerable SQL on a web enabled application, eventually you’ll be attacked.

And this isn’t something you can avoid by using NoSQL databases. SQL injection like attacks are possible there too!

Fortunately the solution is easy: use bind variables. Not only do these secure your application, they can improve performance too.

Make sure your company is not tomorrow’s data leak headline. Use bind variables!

Last but not least: When will Oracle have a BOOLEAN type? :)

We have a BOOLEAN type! It’s just only in PL/SQL ;P

There’s currently a push in the community to for us to add a SQL BOOLEAN type. If this is a feature you’d like to see, you can vote for it on the Database Ideas forum. The more support there is, the more likely we are to implement it! But no promises ;)

jOOQ Tuesdays: Thorben Janssen Shares his Hibernate Performance Secrets

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

thorben-janssen

I’m very excited to feature today Thorben Janssen who has spent most of his professional life with Hibernate.

Thorben, with your blog and training, you are one of the few daring “annotatioficionados” as we like to call them, who risks diving deep into JPA’s more sophisticated annotations – like @SqlResultSetMapping. What is your experience with JPA’s advanced, declarative programming style?

From my point of view, the declarative style of JPA is great and a huge problem at the same time.

If you know what you’re doing, you just add an annotation, set a few properties and your JPA implementation takes care of the rest. That makes it very easy to use complex features and avoids a lot of boilerplate code.

But it can also become a huge issue, when someone is not that familiar with JPA and just copies a few annotations from stack overflow and hopes that it works.

It will work in most of the cases. JPA and Hibernate are highly optimized and handle suboptimal code and annotations quite well. At least as long as it is tested with one user on a local machine. But that changes quickly when the code gets deployed to production and several hundred or thousand users use it in parallel. These issues get then often posted on stack overflow or other forums together with a complaint about the bad performance of Hibernate…

Your training goes far beyond these rather esoteric use-cases and focuses on JPA / Hibernate performance. What are three things every ORM user should know about JPA / SQL performance?

Only three things? I could talk about a lot more things related to JPA and Hibernate performance.

The by far most important one is to remember that your ORM framework is using SQL to store your data in a relational database. That seems to be pretty obvious, but you can avoid the most common performance issues by analyzing and optimizing the executed SQL statements. One example for that is the popular n+1 select issue which you can easily find and fix as I show in my free, 3-part video course.

Another important thing is that no framework or specification provides a good solution for every problem. JPA and Hibernate make it very easy to insert and update data into a relational database. And they provide a set of advanced features for performance optimizations, like caching or the ordering of statements to improve the efficiency of JDBC batches.

But Hibernate and JPA are not a good fit for applications that have to perform a lot of very complex queries for reporting or data mining use cases. The feature set of JPQL is too limited for these use cases. You can, of course, use native queries to execute plain SQL, but you should have a look at other frameworks if you need a lot of these queries.

So, always make sure that your preferred framework is a good fit for your project.

The third thing you should keep in mind is that you should prefer lazy fetching for the relationships between your entities. This prevents Hibernate from executing additional SQL queries to initialize the relationships to other entities when it gets an entity from the database. Most use cases don’t need the related entities, and the additional queries slow down the application. And if one of your use cases uses the relationships, you can use FETCH JOIN statements or entity graphs to initialize them with the initial query.

This approach avoids the overhead of unnecessary SQL queries for most of your use cases and allows you to initialize the relationships if you need them.

These are the 3 most important things you should keep in mind, if you want to avoid performance problems with Hibernate. If you want to dive deeper into this topic, have a look at my Hibernate Performance Tuning Online Training. The next one starts on 23th July.

What made you focus your training mostly on Hibernate, rather than also on EclipseLink / OpenJPA, or just plain SQL / jOOQ? Do you have plans to extend to those topics?

To be honest, that decision was quite easy for me. I’m working with Hibernate for about 15 years now and used it in a lot of different projects with very different requirements. That gives me the experience and knowledge about the framework, which you need if you want to optimize its performance. I also tried EclipseLink but not to the same extent as Hibernate.

And I also asked my readers which JPA implementation they use, and most of them told me that they either use plain JPA or Hibernate. That made it pretty easy to focus on Hibernate.

I might integrate jOOQ into one of my future trainings. Because as I said before, Hibernate and JPA are a good solution if you want to create or update data or if your queries are not too complex. As soon as your queries get complex, you have to use native queries with plain SQL. In these cases, jOOQ can provide some nice benefits.

What’s the advantage of your online training over a more classic training format, where people meet physically – both for you and for your participants?

The good thing about a classroom training is that you can discuss your questions with other students and the instructor. But it also requires you to be in a certain place at a certain time which creates additional costs, requires you to get out of your current projects and keeps you away from home.

With the Hibernate Performance Tuning Online Training, I want to provide a similar experience to a classroom training in which you study with other students and ask your questions but without having to travel somewhere. You can watch my training videos and do the exercises from your office or home and meet with me, and other students in the forum or group coaching calls to discuss your questions.

So you get the best of both worlds without declaring any travel expenses ;-)

Your blog also includes a weekly digest of all things happening in the Java ecosystem called Java Weekly. What are the biggest insights into our ecosystem that you’ve gotten out of this work, yourself?

The Java ecosystem is always changing and improving, and you need to learn constantly if you want to stay up to date. One way to do that is to read good blog posts. And there are A LOT of great, small blogs out there written by very experienced Java developers who like to share their knowledge. You just have to find them. That’s probably the biggest insight I got.

I read a lot about Java and Java EE each week (that’s probably the only advantage of a 1.5-hour commute with public transportation) and present the most interesting ones every Monday in a new issue of Java Weekly.

jOOQ Tuesdays: Ming-Yee Iu Gives Insight into Language Integrated Querying

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

Ming-Yee Iu

We have the pleasure of talking to Ming-Yee Iu in this eighth edition who will be telling us about how different people in our industry have tackled the integration of query systems into general purpose languages, including his own library JINQ, which does so for Java.

Ming, everyone coming from C# to Java will google LINQ for Java. You have implemented just that with JINQ. What made you do it?

Jinq actually grew out of my PhD research at EPFL university in Switzerland. When I started a PhD there in 2005, I needed a thesis topic, and I heard that my supervisor Willy Zwaenepoel was interested in making it easier to write database code. I had a bit of a background with Java internals from when I was an intern with one of IBM’s JVM teams in 1997, so when I took a look at the problem, I looked at it from a lower-level systems perspective. As a result, I came up with the idea of using a bytecode rewriting scheme to rewrite certain types of Java code into database queries. There were other research groups looking at the problem at the same time, including the LINQ group. Different groups came up with different approaches based on their own backgrounds. The basic assumption was that programmers had difficulty writing database code because there was a semantic gap–the relational database model was so different from the object-oriented programming model that programmers wasted mental effort bridging the differences. The hope was that this semantic gap could be reduced by letting programmers write normal Java code and having the computer figure out how to run this code on a database. Different approaches would result in tools that could handle more complex database queries or could be more flexible in the style of code they accept.

Although I came up with an initial approach fairly quickly, it took me many years to refine the algorithms into something more robust and usable. Similar to the LINQ researchers, I found that my algorithms worked best with functional code. Because functional-style code has no side effects, it’s easier to analyze. It’s also easier to explain to programmers how to write complex code that the algorithms could still understand. Unfortunately, when I finished my PhD in 2010, Java still didn’t properly support functional programming, so I shelved the research to work on other things. But when Java 8 finally came out in 2014 with lambdas, I decided to revisit my old research. I adapted my research to make use of Java 8 lambdas and to integrate with current enterprise tools. And the result was Jinq, an open source tool that provided support for LINQ-style queries in Java.

In a recent discussion on reddit, you’ve mentioned that the Java language stewards will never integrate query systems into the language, and that LINQ has been a mistake. Yet, LINQ is immensely popular in C#. Why was LINQ a mistake?

My opinion is a little more nuanced than that. LINQ makes a lot of sense for the C# ecosystem, but I think it is totally inappropriate for Java. Different languages have different trade-offs, different philosophies, and different historical baggage. Integrating a query system into Java would run counter to the Java philosophy and would be considered a mistake. C# was designed with different trade-offs in mind, so adding feature like query integration to C# is more acceptable.

C# was designed to evolve quickly. C# regularly forces programmers to leave behind old code so that it can embrace new ways of doing things. There’s an old article on Joel on Software describing how Microsoft has two camps: the Raymond Chen camp that always tries to maintain backwards compatibility and the MSDN Magazine camp that is always evangelizing shiny new technology that may abandoned after a few years. Raymond Chen camp allows me to run 20 year old Windows programs on Windows 10. The MSDN Magazine camp produces cool new technology like C# and Typescript and LINQ. There is nothing wrong with the MSDN philosophy. Many programmers prefer using languages built using this philosophy because the APIs and languages end up with less legacy cruft in them. You don’t have to understand the 30 year history of an API to figure out the proper way to use it. Apple uses this philosophy, and many programmers love it despite the fact that they have to rewrite all their code every few years to adapt to the latest APIs. For C#, adopting a technology that is immature and still evolving is fine because they can abandon it later if it doesn’t work out.

The Java philosophy is to never break backwards compatibility. Old Java code from the 1990s still compiles and runs perfectly fine on modern Java. As such, there’s a huge maintenance burden to adding new features to Java. Any feature has to be maintained for decades. Once a feature is added to Java, it can’t be changed or it might break backwards compatibility. As a result, only features that have that have withstood the test of time are candidates for being added to Java. When features are added to Java that haven’t yet fully matured, it “locks-in” a specific implementation and prevents the feature from evolving as people’s needs change. This can cause major headaches for the language in the future.

One example of this lock-in is Java serialization. Being able to easily write objects to disk is very convenient. But the feature locked in an architecture that isn’t flexible enough for future use-cases. People want to serialize objects to JSON or XML, but can’t do that using the existing serialization framework. Serialization has led to many security errors, and a huge amount of developer resources were required to get lambdas and serialization to work correctly together. Another example of this premature lock-in is synchronization support for all objects. At the time, it seemed very forward-looking to have multi-threading primitives built right into the language. Since every object could be used as a multi-threaded monitor, you could easily synchronize access to every object. But we now know that good multi-threaded programs avoid that sort of fine-grained synchronization. It’s better to work with higher-level synchronization primitives. All that low-level synchronization slows down the performance of both single-threaded and multi-threaded code. Even if you don’t use the feature, all Java objects have to be burdened by the overhead of having lock support. Serialization and synchronization were both added to Java with the best of intentions. But those features are now treated like “goto”: they don’t pass the smell test. If you see any code that uses those features, it usually means that the code needs extra scrutiny.

Adding LINQ-style queries to Java would likely cause similar problems. Don’t get me wrong. LINQ is a great system. It is currently the most elegant system we have now for integrating a query language into an object-oriented language. Many people love using C# specifically because of LINQ. But the underlying technology is still too immature to be added to Java. Researchers are still coming up with newer and better ways of embedding query systems into languages, so there is a very real danger of locking Java into an approach that would later be considered obsolete. Already, researchers have many improvements to LINQ that Microsoft can’t adopt without abandoning its old code.

For example, to translate LINQ expressions to database queries, Microsoft added some functionality to C# that lets LINQ inspect the abstract syntax trees of lambda expressions at runtime. This functionality is convenient, but it limits LINQ to only working with expressions. LINQ doesn’t work with statements because it can’t inspect the abstract syntax trees of lambdas containing statements. This restriction on what types of lambdas can be inspected is inelegant. Although this functionality for inspecting lambdas is really powerful, it is so restricted that very few other frameworks use it. In a general-purpose programming language, all the language primitives should be expressive enough that they can be used as building blocks for many different structures and frameworks. But this lambda inspection functionality has ended up only being useful for query frameworks like LINQ. In fact, Jinq has shown that this functionality isn’t even necessary. It’s possible to build a LINQ-style query system using only the compiled bytecode, and the resulting query system ends up being more flexible in that it can handle statements and other imperative code structures.

As programmers have gotten more experience with LINQ, they have also started to wonder if there might be alternate approaches that would work better than LINQ. LINQ is supposed to make it easier for programmers to write database queries because they can write functional-style code instead of having to learn SQL. In reality though, to use LINQ well, a programmer still needs to understand SQL too. But if a programmer already understands SQL, what advantages does LINQ give them? Would it be better to use a query system like jOOQ matches SQL syntax more closely than Slick and can quickly evolve to encompass new SQL features then? Perhaps, query systems aren’t even necessary. More and more companies are adopting NoSQL databases that don’t even support queries at all.

Given how quickly our understanding of LINQ-style query systems are evolving, it would definitely be a mistake to add that functionality directly to a language like Java at the moment. Any approach might end up being obsolete, and it would impose a large maintenance burden on future versions of Java. Fortunately, Java programmers can use libraries such as Jinq and jOOQ instead, which provide most of the benefits of LINQ but don’t require tight language integration like LINQ.

Lightbend maintains Slick – LINQ for Scala. How does JINQ compare to Slick?

They both try to provide a LINQ-style interface for querying databases. Since Slick is designed for Scala, it has great Scala integration and is able to use Scala’s more expressive programming model to provide a very elegant implementation. To get the full benefits of Slick, you have to embrace the Scala ecosystem though.

Jinq is primarily designed for use with Java. It integrates with existing Java technologies like JPA and Hibernate. You don’t have to abandon your existing Java enterprise code when adopting Jinq because Jinq works with your existing JPA entity classes. Jinq is designed for incremental adoption. You can selectively use it some places and fall back to using regular JPA code elsewhere. Although Jinq can be used with Scala, it’s more useful for organizations that are using Scala but haven’t embraced the full Scala ecosystem. For example, Jinq allows you to use your existing Hibernate entities in your Scala code while still using a modern LINQ-style functional query system for them.

JINQ has seen the biggest improvement when Java 8 introduced the Stream API. What is your opinion about functional programming in Java?

I’m really happy that Java finally has support for lambdas. It’s a huge improvement that really makes my life as a programmer much easier. Over time, I’m hoping that the Java language stewards will be able to refine lambdas further though.

From Jinq’s perspective, one of the major weaknesses of Java 8’s lambdas is the total lack of any reflection support. Jinq needs reflection support to decode lambdas and to translate them to queries. Since there is no reflection support, Jinq needs to use slow and brittle alternate techniques to get the same information. Personally, I think the lack of reflection is a significant oversight, and this lack of reflection support could potentially weaken the entire Java ecosystem as a whole in the long term.

I have a few small annoyances with the lack of annotation support and lack of good JavaDoc guidelines for how to treat lambdas. The Streams API and lambda metafactories also seem a little bit overly complex to me, and I wonder if something simpler would have been better there.

From a day-to-day programming perspective though, I’ve found that the lack of syntactic sugar for calling lambdas is the main issue that has repeatedly frustrated me. It seems like a fairly minor thing, but the more I use lambdas, the more I feel that it is really important. In Java 8, it’s so easy to create and pass around lambdas, that I’m usually able to completely ignore the fact that lambdas are represented as classes with a single method. I’m able to think of my code in terms of lambdas. My mental model when I write Java 8 code is that I’m creating lambdas and passing them around. But when I actually have to invoke a lambda, the lambda magic completely breaks down. I have to stop and switch gears and think of lambdas in terms of classes. Personally, I can never remember the name of the method I need to call in order to invoke a lambda. Is it run(), accept(), consume(), or apply()? I often end up having to look up the documentation for the method name, which breaks my concentration. If Java 8 had syntactic sugar for calling lambdas, then I would never need to break out of the lambda abstraction. I would be able to create, pass around, and call lambdas without having to think about them as classes.

Java 9 will introduce the Flow API for reactive interoperability. Do you plan to implement a reactive JINQ?

To be honest, I’m not too familiar with reactive APIs. Lately, I’ve been working mostly on desktop applications, so I haven’t had to deal with problems at a sufficient scale where a reactive approach would make sense.

You’ve mentioned to me in the past that you have other projects running. What are you currently working on?

After a while, it’s easy to accumulate projects. Jinq is mostly stable at the moment though I do occasionally add bug fixes and other changes. There are still a few major features that could be added such as support for bulk updates or improved code generation, but those are fairly major undertakings that would require some funding to do.

I occasionally work on a programming language called Babylscript, which is a multilingual programming language that lets you write code in a mix of French, Chinese, Arabic, and other non-English languages. As a companion project to that, I also run a website for teaching programming to kids called Programming Basics that teaches programming in 17 different languages. Currently, though, I’m spending most of my time on two projects. One is an art tool called Omber, which is a vector drawing program that specializes in advanced gradients. The other project involves using HTML5 as the UI front-end for desktop Java programs. All your UI code would still be written in Java, but instead of using AWT or Swing, you would just manipulate HTML using a standard DOM interface bound to Java. As a side benefit, all your UI code can be recompiled using GWT to JavaScript, so you can reuse your UI code for web pages too.

Further info

Thank you very much for this very interesting interview, Ming. Want to learn more about JINQ? Read about it in this previous guest post on the jOOQ blog, and watch Ming’s JVMLS 2015 talk: