One for the weekend: Big Data
One for the weekend: Big Data
One for the weekend: Big Data
So you’re playing around with all those sexy new technologies, enjoying yourself, getting inspiration from state-of-the-art closure / lambda / monads and other concepts-du-jour…
Now that I have your attention provoking a little anger / smirk / indifference, let’s think about the following. I’ve recently revisited a great article by Ken Downs written in 2010. There’s an excellent observation in the middle.
[…] in fact there are two things that are truly real for a developer:
- The users, who create the paycheck, and
- The data, which those users seemed to think was supposed to be correct 100% of the time.
He says this in the context of a sophisticated rant against ORMs. It can also be seen as a rant against any recent abstraction or database technology with a mixture of nostalgia and cynicism. But the essence of his article lies in another observation that I’ve made with so many software systems:
Having said so, we can quickly understand why Java has become the new COBOL to some, who are bored with its non-sexiness (specifically in the JEE corner). Yet, according to TIOBE, Java and C are still the most widely used programming languages in the market and many systems are thus implemented in Java.
On the other hand, many inspiring new technologies have emerged in the last couple of years, also in the JVM field. A nice overview is given in ZeroTurnaround’s The Adventurous Developer’s Guide to JVM Languages. And all of that is fine to a certain extent. As Bruno Borges from Oracle recently and adequately put it:
anything not mainstream has more odds to be “sexy” [than JSF]
Now, let’s map this observation back to a subsequent section of Ken’s article:
[…] the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user […] and takes instructions back from the user and puts them in the database […]. No matter how beautiful the code was, the user would only ever see the screen […] and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.
Think about an E-Banking system. None of your users really cares about the code you wrote to get it running. What they care about is their data (i.e. their money) and the fact that it is correct. This is true for many many systems where the business logic and UI layers can be easily replaced with fancy new technology, whereas the data stays around.
In other words, you are free to choose whatever sexy new UI technology you like as long as it helps your users get access to their data.
Now, that’s an entirely different story, compared to sexy new UI technology.
You might be considering some NoSQL-solution-du-jour to “persist” your data, because it’s so easy and because it costs so much less. Granted, the cost factor may seem very tempting at first. But have you considered the fact that:
And all this time, you have “persisted” UI / user / domain model data in your database, from various systems, from various developers, through various programming paradigms. And then, you realise:
We’re not saying that there aren’t some use-cases where NoSQL databases really provide you with a better solution than the “legacy” alternatives. Most specifically, graph databases solve a problem that no RDBMS has really solved well, yet.
But consider one thing. You will have to migrate your data. Time and again. You will have to archive it. And maybe, migrate the archive. And maybe provide reports of the archive. And provide correct reports (which means: ACID!) And be transactional (which means: ACID!) And all the other things that people do with data.
In fact, your system will grow like any other system ever did before and it will have high demands from your database. While some NoSQL databases have started to get some decent traction, in a way that it is safe to say their vendors might still be around in 5-10 years, when the systems will have been replaced by developers who have replaced other developers.
In other words, there is one more bullet to this list:
Beware of the fact that data will probably outlast your sexy new technology. So, choose wisely when thinking about sexy new technology to operate your data.
A very interesting talk about the future of DBMS was recently given at EPFL by MIT Professor and VoltDB Co-founder and CTO Michael Stonebraker, who also gave us Ingres and Postgres. In a bit less than one hour, he explains his views with respect to the three main pillars of database management systems:
As a NewSQL vendor also actively involved with H-Store, he is of course heavily yet refreshingly biased towards traditional RDBMS storage models being obsolete (an interesting fact is that Oracle Labs representative Eric Sedlar also attended the talk. One might think that the talk was a slighly FUD-dy move against a VoltDB competitor). Unlike what has come to be known as the NoSQL movement, NewSQL relies on similar relational theory / set theory as “traditional SQL”, including support for ACID and structured data.
His claims mainly include that:
I specifically recommend the OLTP part of his talk, as it shows how various new techniques could heavily increase performance of traditional RDBMS already today:
The essence of Stonebraker’s talk is that the “elephants” who currently dominate the market are too slow to react to all the NewSQL vendors’ innovations. It is a very exciting time for a database professional (some refer to them as data geeks) to enter the market and publish new findings.
Another interesting thing to note is that SQL (call it NewSQL, OldSQL) will remain a dominant language for querying DBMS, both for column-stores as for row-stores. This is a strong statement for tools like jOOQ, which embrace SQL as a first-class citizen among programming languages.