IM pretty sure you can yank the power cord from couchdb as soon as you get a (po...

Xylakant · on Dec 6, 2013

Saved (durable) and hard to corrupt are different properties of a database. For example elasticsearch uses the lucene index format in the background. It's write once per segment. Once a segment is written, data is save and it's (apart from disk corruption) impossible to corrupts, since the file is never opened for writing again. However, segments are not written immediately after a document is received - so when yanking the power cord right after a write to the cluster, you'll loose data - however without any danger of corruption since the last, partially written segment is discarded. Couchdb behaves in a similar fashion: If the last bit of the storage file contains corrupt data, it is discarded. I'm not absolutely certain atm about the default durability settings in couch, so I can't say if the write happens before or after the "ack" from the server. However, since disk controllers cheat and sometimes a "flush" to disk doesn't actually flush, you can get data loss regardless of the promises your database makes.

bjerun · on Dec 6, 2013

Durability is indeed hard to archive - as you pointed out disk controller sometimes simply tell you a lie.

With respect to corruption ArangoDB behaves similar: It uses an append-only log file with CRC checksums. So, if the last bit of storage contains corrupt data, it is discarded.