Site icon Java, SQL and jOOQ.

When to Choose SQL and When to Choose NoSQL

Some people make architecture decisions purely based on the loudest consultant:

For most others, however, decisions are not as simple as this. For instance: When should we start evaluating NoSQL storage systems as an alternative to RDBMS?

RDBMS as the hammer for all nails

This question obviously assumes that you’re starting out with an RDBMS, which is classically the database system that solves pretty much any problem decently enough not to be replaced easily. What does this mean? Simply put:

Having said so, sometimes you do have a niche problem. For instance a graph database problem. In fact, a graph is nothing fundamentally different from what you can represent in the relational model. It is easy to model a graph with a many-to-many relationship table.

The same is true for XML/JSON in the database (don’t forget, JSON is just XML with less syntax and less features, so it’s less awesome than XML). Sometimes you need to actually store document structures (hierarchical data) in their hierarchical form in your database, instead of normalising them. Sure, you could normalise a document, but that’s a lot of work and not necessarily beneficial. This was actually the question on Twitter by Gareth, that triggered the subsequent discussion:

Most modern RDBMS offer means to store (and much more importantly, query) XML/JSON data structures in simple forms. This includes PostgreSQL, Oracle, DB2, SQL Server and others.

So, when do we decide to switch?

As developers, we may be inclined to switch quite quickly. E.g. when working with graphs, we’d love to use Neo4j which implements the awesome Cypher query language. When working with JSON, we’d love to use something like Couchbase which implements the interesting N1QL query language. Both of these query languages are heavily inspired by SQL, which were wise vendor choices in my opinion (as opposed to MongoDB’s weird JSON-based query language), because in the end, the SQL language is still the most powerful and popular 4GL that was ever created.

But as developers, we shouldn’t make that decision lightly. Yes, at first, these specialised databases seem like a better fit. But the additional cost for operations teams to maintain, monitor, patch, tune production systems must not be underestimated. This is even true within the RDBMS ecosystem. A prominent recent example was Uber’s switch (back!) from PostgreSQL to MySQL:

Why Uber Engineering Switched from Postgres to MySQL

Do note that they switched the other way round before, only to regret things. Truth is, there are tons of reasons why your operations teams prefer to always use the same database, even if that’s quite expensive in terms of licensing. In many cases, however, it would be even more expensive to:

Ultimately, there’s a threshold:

In terms of using JSON in the database, it’s simple:

The same is true for graph problems. SQL can totally handle graphs and traverse them with recursive SQL. Here’s a funky statement that recursively calculates the subset sum problem:

Full article here: How to Find the Closest Subset Sum with SQL

If you only have a few tree/graph traversals to calculate (e.g. for a simple menu structure), don’t jump ship with RDBMS yet. If graphs are your business and that’s all you do, then probably RDBMS aren’t the right choice.

Conclusion

Whatever problem you’re solving. Do remember. Yes, all you have is a hammer, every problem starts looking like a nail. But don’t dismiss RDBMS as a silly hammer. It is a very mighty hammer and in 2016, it can do a lot of non-relational niche things decently well.

RDBMS is still the best default choice for all sorts of data problems. Only once you have gone above a certain threshold (or if you can foresee doing that), then you should look for alternatives. Because those alternatives will give you a much rougher time when you go outside of that niche (JSON, graphs, etc.) back to your “ordinary” relational business.

Exit mobile version