One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:
MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.
And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:
- The one in your head when you designed the data structures
- The one that your database really implemented to store your data structures
- The one you should have implemented to fulfill your requirements
Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:
With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.
Everything above is true as well.
“Schema-less” vs. “Schema-ful”
But let’s translate this to SQL (or use any other “schema-ful” database instead):
ALTER TABLE student ADD gpa VARCHAR(10);
And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:
ALTER TABLE student DROP gpa;
Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.
- We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
- We guarantee that a migration takes place (and it’s instant, too)
- We guarantee data integrity when we roll back the change
What about more real-world DDL?
Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!
CREATE TABLE student_to_teacher AS SELECT id AS student_id, teacher_id FROM student; ALTER TABLE student DROP teacher_id;
… and we’re done! (of course, we’d be adding constraints and indexes)
Let’s face the truth
Schemalessness is about a misleading term as much as NoSQL is:
And again, MongoDB’s blog post is telling the truth (and an interesting one, too):
Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.
When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.
(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)
So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.
But let me tell you the ugly truth:
Anything that can possibly go wrong, does
We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.