But this the single most appalling design decision they could have made while also claiming their product was a "database". (And this has been discussed here before so just do a search if you will).
This wasn't a bug, it was a deliberate design decision. Ok, that that would have been alright if they put a bright red warning on their front page. "We disabled durability by default (even though we call this product a database). If you run this configuration as a single server your data could be silently corrupted. Proceed at your own risk". But, they didn't. I call that being "shady". Yap, you read the correctly, you'd issue a write and there would be no result coming back acknowledging that at least the data made into the OS buffer. I can only guess their motive was to look good in benchmarks that everyone like to run and post results on the web. But there were (still are) real cases of people's data being silently corrupted, and they noticed only much later, when backups, and backups of backups were already made.
That's not what is meant by "safe" here. Safe is when you can be sure that the data were written (into the journal at least) by the time the write call in your code returns. Doing it up to 100ms later leaves a wide window open when the application believes the data are safely stored, while they are in fact in RAM only.
"Safe" is a driver implementation detail, not really a server default change. The driver basically has to give the DB a command, then ask what just happened for "safe" writes. If the driver doesn't bother listening for the result, the database just does whatever it's going to quietly.
That said, I really wish all the drivers issued the getLastError command after writes by default. It's the first thing we tell customers to set in their apps.
I would bet that there are many apps out there that don't do any reconciliation, and the data will forever be lost unknown. Sadly only 1% of the customers will notice something weird, 0.001% of those will call support saying something is off, and then 99% of those calls will be ignored as customer incompetence. Scary indeed.
I think they just assumed that people would run the database in clusters, not single instances.
If you've done enough research to choose a relatively off-the-beaten-path DBMS such as MongoDB, the assumption is that you've carefully weighed the tradeoffs made by the various alternatives in the space, and learned the best practices for deploying the one you chose to use.
you've carefully weighed the tradeoffs made by the various alternatives in the space, and learned the best practices for deploying the one you chose to use
It is unreasonable to expect everyone you want to use your software to become an expert in its subtleties before deployment. Of course it's what you'd hope, and in a perfect world it would hold, but in the real world, no matter your market, a lot of your users are going to be lazy and irresponsible. You need to protect your users better than this. To do otherwise is lazy and irresponsible.
So that's not actually "safe". If you issue an insert in the default "fire and forget" mode and that insert causes an error (say a duplicate key violation), no exception will be thrown.
Even with journaling on your code does not get an exception.
Journaling is a method for doing "fast recovery" and flushing to disk on a regular basis. "Write Safety" is method for controlling how / where the data has been written. So these are really two different things.
I think that is fixed now.
But this the single most appalling design decision they could have made while also claiming their product was a "database". (And this has been discussed here before so just do a search if you will).
This wasn't a bug, it was a deliberate design decision. Ok, that that would have been alright if they put a bright red warning on their front page. "We disabled durability by default (even though we call this product a database). If you run this configuration as a single server your data could be silently corrupted. Proceed at your own risk". But, they didn't. I call that being "shady". Yap, you read the correctly, you'd issue a write and there would be no result coming back acknowledging that at least the data made into the OS buffer. I can only guess their motive was to look good in benchmarks that everyone like to run and post results on the web. But there were (still are) real cases of people's data being silently corrupted, and they noticed only much later, when backups, and backups of backups were already made.