Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The previous person seems to claim it meant the disk was dead (i.e. no read works)

He made no such claim.

> while you seem to claim that it means an error caught by low level formatting.

A: There is no such thing as low level formatting in a modern drive, and B: No I don't. I said he should do a full disk read. Not format.

The SMART built in self-test does a full read of the drive, not write.

> Those scrubs do nothing to catch errors that the drives do not report such as misdirected writes.

That's only true with RAID 5. Ever other RAID level can compare disks and check that the data matches exactly. The Linux md software RAID does that automatically if you ask it to check things, and then it will report how many mismatches it found.

If you look he wrote: "I had to drop back to raid 5". He had a better level of RAID before, with multiple disk redundancy, that allows the RAID to check for mismatches and even correct them.

But because he never scheduled full disk reads the RAID never detected that many of the drives had problems.

> Consequently, there is no correct way to setup RAID in a way that makes dats safe.

That is not correct. The only advice I would give is avoid RAID 5. The other levels let you check for correctness.

> A check summing filesystem such as ZFS would handle this without a problem though.

Only if A: you actually run disk checks, B: and only if ZFS handles the RAID!!! ZFS on top of RAID will NOT detect such errors 50% of the time (randomly depending on which disk is read from).



Low level formatting exists on modern disks. You just are not able to reformat it. There is a diagram showing it here:

http://www.anandtech.com/show/2888

Doing a full read causes every sector's ECC in the low level formatting to be checked. If something is wrong, you get a read error that can be corrected by RAID, ZFS or whatever else you are running on top of it provides redundancy. Without the ECC, the self test mechanism would be pointless as it would have no way to tell if the magnetic signals being interpreted are right or wrong.

As for other RAID levels catching things. With RAID 1 and only two mirrors, there is no way to tell which is right either. The same goes for RAID 10 with two mirrors and RAID 0+1 with two mirrors. You might be able to tell with RAID 6, but such things are assumed by users rather than guaranteed. RAID was designed around the idea that uncorrectable bit errors and drive failures are the only failure states. It is incapable of handling silent corruption in general and in the few cases where it might be able to handle it, whether it does is implementation dependent. RAID 6 also degrades to RAID 5 when a disk fails and there is no way for a patrol scrub to catch a problem that occurs after it and before the next patrol scrub. RAID will happily return incorrect data, especially since only 1 mirror member is read at a given time (for performance) and only the data blocks in RAID 5/6 are read (again for performance) unless there is a disk failure.

There is no reason to use RAID with ZFS. However, ZFS will always detect silent corruption in what it reads even if it is on top of RAID. It just is not guarenteed to be able to correct it. Maybe you got the idea of "ZFS on top of RAID will NOT detect such errors 50% of the time" from thinking of a two-disk mirror. If you are using ZFS on RAID instead of letting ZFS have the disks and that happens, you really only have yourself to blame.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: