Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.
I’m very excited to feature today Gerald Sangudi and Keshav Murthy, creators of the the N1QL query language, CouchBase’s SQL-based JSON querying language.
Hi Gerald and Keshav – It’s great to have two people on the jOOQ Tuesdays interview at once! How did you guys meet?
Keshav: Gerald interviewed me for my the job at Couchbase.
Gerald: Best decision we ever made.
You’re both working at CouchBase, one of the leading “JSON databases” in the market. What drove you towards JSON?
Keshav: I worked for IBM, specifically on the Informix database. In 2013, Websphere teams needed JSON support. I was skeptical of JSON for data management except for managing metadata. Years before, I had successfully opposed implementing XML and XQuery inside Informix. Then I went to a NoSQL conference where I saw enterprises using JSON in real-world applications. I saw enterprises like eBay, Cisco developing their enterprise applications on NoSQL and JSON.
RDBMS customers had been asking for ALTER TABLE ONLINE when they needed to add new columns. They typically would have additional unused columns which they’d rename them when needed. Obviously, this wasn’t an optimal or a good solution because you could run out these columns, types or structure wasn’t flexible.
For years, customers had been ALTER TABLE while application were online. I realized, JSON was a good way to provide it. That’s when we added JSON as a type, added sharding, and started to extend SQL to support JSON. A good relational database can deliver lot of value JSON. So, we developed a mechanism to exploit JSON to provide schema flexibility on relational tables and we received a patent for it: http://bit.ly/2og1uXe.
At Couchbase, data is stored as JSON and N1QL was designed as SQL for JSON. So, we keep calm and JSON.
Gerald: I am not at Couchbase any more, but I was there for four years. Couchbase has its roots in memcached and CouchDB, so Couchbase was already storing and managing JSON data before N1QL and before we arrived. N1QL was developed to enable our users to query and manipulate their JSON data.
You’re both working on, or have worked on N1QL, CouchBase’s innovative SQL-esque query language for JSON. Has the SQL language come full circle? Why is the SQL syntax such a great fit for you?
Keshav: Don Chamberlin, co-inventor of SQL said this. During XQuery discussion, he argued against using SQL for manipulating XML. Now, with N1QL and SQL++ on JSON, he sees enormous possibilities for SQL to manipulate JSON effectively. So, you could say, it has come a full circle.
From the Application point of view, requirement for SQL to support complex data models has been there. SQL-99 added the structured type into the language, but didn’t recognize need for schema flexibility.
What Gerald has very nicely is to inherit SQL and extend not just the language but the
underlying boolean logic.
SQL has three valued logic: TRUE, FALSE, NULL. N1QL has 4-valued logic: TRUE, FALSE, NULL and MISSING. SQL has the select-join-project operations. N1QL has those and adds NEST and UNNEST operations for handling arrays. Once we have the logic and operations, the type system, rest of the expressions for handling nested objects and arrays can be added.
There is another important change in N1QL compared to SQL. SQL is the query language to manipulate data. N1QL can discover the document metadata (names, structure and types) dynamically and operate on it.
Gerald: SQL is the greatest and most successful query language of all time. Our job was to enable our users to query data that is sometimes different from what standard SQL expects. We hope N1QL does that.
As an aside, the “N1” in N1QL stands for non-first normal form. SQL geeks like you Lukas may know that “non-first” is one primary difference between JSON data and relational data. The other primary differences are schema and uniformity.
Will your relational competition steal features from N1QL? Or will you steal more features from them?
Keshav: I do hope relational databases steal features from N1QL. Having common approaches solve problems makes it easier for customers to choose the right database for the right problem. I do hope they choose Couchbase more often than RDBMS!
Gerald always says, we don’t differ from SQL unless there is a good reason. In that sense, we’ve taken lot of the features from relational databases already. In addition, we learn from successful models in relational databases for things like index design, query optimization, security and monitoring. We stand on shoulders of giants.
Gerald: We hope that both relational and non-relational vendors steal features from each other. There is a collaborative effort on something called SQL++, which is a superset of both SQL and N1QL. We hope SQL++ is the convergence point. One lesson from the success of SQL is that standards are great for both users and vendors.
CouchBase doesn’t have what you call a “static schema”. The biggest advantage of a static schema for the database optimiser is the many ways such a schema can be used to predict performance and choose optimal execution plans. How does optimisation work in a “schema-less” database?
Keshav: Actually, static schema gives you the structure, but not the data distribution, which is a major factor for calculating the execution cost. N1QL uses the information within the query and available indexes to glean the structure and decide on the plan. For example, if you have a query with a predicate: WHERE state = “CA” and zipcode = “94040”, and there is an index on either state, zipcode or both, we’d assume these key-values exist in the document and push down the predicates to index scan. We given further details on an article on DZone: https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-optimization
Separately, we do have a mechanism to INFER the schema by sampling. Right now, we show the inferred schema so users can understand the structure. We also use the inferred schema to make query editing easier with hints and validations within the workbench. We do have plans to use this, collect additional statistics to improve decisions in the optimizer.
Gerald: The N1QL optimizer is one of the joys of working on N1QL. As Keshav said, most of the SQL optimization techniques carry over to N1QL. The data may not have a static schema, but the query has an implicit schema, and the indexes have implicit schemas. That is, the query has predicates and other characteristics, and each index has keys and other characteristics. That is enough to keep an optimizer busy.
Working for a newer database vendor, you don’t have as much legacy, so you can innovate faster and more freely. What will be the next big thing in the database market?
Keshav: Our metric for innovation is the progress customer trying to make in their business. So, we innovate within the constraint of a customer job. That helps us to innovate and measure its success from customer point of view.
We see customers deploying NoSQL databases for newer patterns like systems of engagement.
When you plan your vacation, you search a lot before you buy. Search for places, hotels, airlines, things-to-do. Then you compare costs, ratings, before you make the final purchase. For a travel company, these require significant database infrastructure to support high number of queries with low latencies and at a very low cost.
We see customers deploying NoSQL databases to support lot of the customer engagement and information requirement use cases. RDBMS is still used as system of record for buy-sell-cancel-checkin-etc operations but will integrate with system of engagement databases to enhance customer experience.
Doing this effectively requires innovations in every area: data platform designs, scale out, query processing, index designs and manageability.
Gerald: I read Jeff Bezos’ latest annual letter to shareholders. He says that the things that don’t change are more important than the things that do change. Databases should store your data reliably and give you answers and updates quickly. If database vendors continue to do that, we’ll be ok.