Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So has anyone here used zfs and btrfs and would like to comment on the differences? I've been on a heavy zfs kick lately, but the performance loss is hard to stomach and the only research I found points to btrfs being faster (though of course they both take a hit).

Basically the reasons I'm drawn to zfs are:

- checksumming & self-healing

- ergonomics & flexibility of managing pools with zfs

- copy on write for cheap local copy/experimentation (i.e. just clone your DB folder and you have a new DB)

- zfs send/recv for very efficient incremental backups

From what I can find it seems like btrfs does all that, and faster[0]. In addition to being faster, it also is in-kernel[1], and more flexible for the user in various ways, for example allowing resizing[2]. Looking around btrfs may not be blessed as stable but there are a lot of big orgs using it.

All that said, there are articles like this one[3] which are somewhat dated but paint ZFS really positively from a maintenance point of view. Very hard to pick between these two.

I'd really like to use ZFS -- the community seems very welcoming and amazing but I'm a little worried about picking the wrong tool for the job.

[EDIT] - there's also this old comparison from phoronix[1] which is confusing. I'm still learning towards ZFS but sure would like to hear some strong opinions if anyone has em.

[0]: https://www.diva-portal.org/smash/get/diva2:822493/FULLTEXT0...

[1]: https://btrfs.wiki.kernel.org/index.php/FAQ#Is_btrfs_stable....

[2]: https://markmcb.com/2020/01/07/five-years-of-btrfs/

[3]: https://rudd-o.com/linux-and-free-software/ways-in-which-zfs...

[4]: https://www.phoronix.com/scan.php?page=article&item=freebsd-...



I've used both, the reasons I favour zfs over btrfs:

- I've corrupted btrfs filesystems with compression just through normal use, on hardware that is fine. This may have been fixed, but it was on relatively recent (post 5.x) kernels

- zfs's logical volume layer is rather more flexible than btrfs's - you can make a multi-device set of mixed disks. For example, my backup pool is 9x4TB and 7x3TB. These are individual raidz sets that are combined together into one pool. In ZFS this is all in one place and it means the fs is aware of the logical disk layout and where data is stored. To do this on btrfs I'd need to use lvm and I'd in theory lose a bunch of the self-healing ability

- btrfs's snapshotting seems excessively complicated - it requires you to create a non-trivial logical layout in the fs, and it's very easy to accidentally expose these snapshots to the system's view of the fs. It's more flexible, but more annoying for my usecase. zfs's on the other hand is really very simple, and much much easier to use

With that said, ZFS on linux is slightly awkward as it's out of tree, and most distros build the module with DKMS. I don't entirely trust this for using for / (so my / is just a md raid1) - I use zfs for bulk data instead. btrfs is in-kernel, so there's no real disadvantage with using it for /.


Other advantages:

-Unlike btrfs subvolumes, zfs datasets can be mounted with different zfs-specific mount options, such as compression algorithms or recordsizes.

-zfs can take atomic recursive snapshots of nested datasets, whereas btrfs snapshots of a subvolume do not include nested subvolumes.

Overall, zfs treats datasets as first-class citizens, whereas the only purpose of btrfs subvolumes seems to be to exclude folders from snapshots.


> With that said, ZFS on linux is slightly awkward as it's out of tree, and most distros build the module with DKMS. I don't entirely trust this for using for / (so my / is just a md raid1) - I use zfs for bulk data instead. btrfs is in-kernel, so there's no real disadvantage with using it for /.

Is that true? RHEL/CentOS have both DMKS and kmod versions; in Ubuntu ZFS is a supported package. That's not most distros, certainly, but it is the ones most people use.


Based on using CentOS + DKMS, Debian + DKMS, and Arch + DKMS :) Some of these have prebuilt modules too, but as community projects they usually lag behind kernel releases to the main distro, and that can be a problem. So I always end up using DKMS.

Not used Ubuntu ZFS, admittedly. Haven't used Ubuntu since 2010 or earlier.


> zfs's logical volume layer is rather more flexible than btrfs's - you can make a multi-device set of mixed disks. For example, my backup pool is 9x4TB and 7x3TB.

You may wish to checkout ZFS dRAID, which recently got committed:

* https://www.youtube.com/watch?v=jdXOtEF6Fh0


> - zfs's logical volume layer is rather more flexible than btrfs's - you can make a multi-device set of mixed disks

Can you please elaborate on this? Working on mixed disks was always a killer feature of btrfs for me!


I do a pool called backup with two raidz vdevs - 7x3T in one raidz vdev, and 9x4T in one raidz vdev. Data is striped on each of the two raidz and I lose one 3T and one 4T's worth of capacity. This allows me to lose any one drive at a given time without data loss, and up to two drives (as long as they come out of separate vdevs).

As far as I could work out from the btrfs documentation, this isn't currently possible? Plus, RAID5/6 in btrfs is still of questionable stability?


Thanks for clarification!

Indeed, btrfs doesn't allow such configuration. However, it allows using disks of unequal sizes within single filesystem - like, 3+3+4T in RAID1 mode gives you 5T of usable space (imagine 4T disk being split in half, and each half duplicated to a different 3T disk; remaining space of 3T disks duplicated to each other). But as I understand, it's possible to achieve the same with manual partitioning and vdev allocation on ZFS, too.

RAID5/6 is indeed a danger zone - in another thread I've already mentioned a nice write-up of "guidelines for users running btrfs raid5 arrays to survive single-disk failures without losing all the data" by Zygo Blaxell: https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@h...


I've had a few BTRFS attempts a few years ago, and I ended up twice with a suddenly unbootable system. I stress that this was a few years ago though.

ZFS needs serious tweaking if you have performance-critical workloads. I experienced this on databases, when comparing against ext4 - at the very least, one needs to move the ZIL on a separate disk. Also, up to a short time ago, ZFS had performance problems with encrypted volumes, due to (let's call) formal issues with the kernel - as a matter of fact, it was slow on a laptop of mine.

All in all, I don't use BTRFS because of trust issues. The BTRFS is in a sort of "never stable" camp, which is not a good indicator of engineering practices. The nail in the coffin was for me that in the official FAQ, at least up to some time ago, there was the tragicomic cop-out statement that the concept of stability in software is just a matter of labeling, because all the software has bugs.


Three months ago I converted an old Core 2 Duo desktop machine to a server and also deliberated whether to use btrfs or ZFS. What put me off btrfs were the rough edges, e.g. some RAID setups are considered beta. In ZFS those have been stable for a long time. The fact that btrfs has been in development for quite a long time and still has such issues is not very assuring to me. That said, if you only use the features big orgs are using, you are probably fine.

Performance has been great so far even on my underpowered machine, even with just 4GB. I don't use deduplication though which makes a huge difference.

I chose Ubuntu 20.04 as OS since they are pushing ZFS support. Did consider FreeBSD as well where I had a positive experience in the past but since they are switching to OpenZFS anyway I stuck with Ubuntu since I'm running Debian derivatives on all my servers.


Thanks for sharing this insight!

> Performance has been great so far even on my underpowered machine, even with just 4GB. I don't use deduplication though which makes a huge difference.

So almost every OpenZFS community video/talk I've seen recently has included a like like "friends don't let friends dedup" or something to that effect... I think dedup is considered unnecessary/dangerous these days with how good compression is. Not sure exactly what the dangers were, but I know that I wouldn't even turn it on.

> I chose Ubuntu 20.04 as OS since they are pushing ZFS support. Did consider FreeBSD as well where I had a positive experience in the past but since they are switching to OpenZFS anyway I stuck with Ubuntu since I'm running Debian derivatives on all my servers.

Same, I used to run lots of different OSes but I've settled down on Ubuntu for everything now, and OpenZFS having good support is what makes it possible there.


> Not sure exactly what the dangers were

No expert either but from what I've gathered the main issue is that the dedup tables require a _lot_ of memory that scales with the size of the pool, and if they can't fit in RAM performance tanks. However the real issue is that any blocks written while dedup was enabled will demand this dedup overhead, even if the dedup option is turned off. That is, it's a "sticky feature".

So if you use dedup, notice performance tanks because dedup tables are too big, well you're screwed because turning it off won't give you back the performance. Only way to recover is to send/receive the whole shebang to a separate pool.

In addition, they don't actually bring that much space saving on common workloads. Most people don't have a lot of truly identical blocks of data.


> In addition, they don't actually bring that much space saving on common workloads. Most people don't have a lot of truly identical blocks of data.

VM backing storage seems to be the biggest worthwhile use case, but that depends on whether snapshots and clones are used extensively. Installing 100 copies of Debian on empty VMs will likely get deduped quite a bit. But it's faster and provides almost the same benefits to install one VM, snapshot it, and produce the rest of the VMs by cloning from the snapshot.

The only other case I could imagine dedup being good for is storing a lot of genomic data: https://techtransfer.universityofcalifornia.edu/NCD/25080.ht...

But if the use-case is narrow enough for custom deduplication then it will probably be much more efficient than ZFS's block-based dedup.


My experience is that btfs has been quite stable over the last couple of years.

Things to consider:

While a volume can span multiple devices (physical), historically it hasn't been the most stable and many users just stick with md, so real world testing is probably limited. Details are in the wiki.

When a volume starts getting full (let's say above 98% or something), performance suffers. This is also documented behaviour. Take monitoring seriously, even more than usual.

Pools and subvolumes can be a bit confusing at first, even if you've had experience with other volume managers. Read the documentation make sure you know what you're doing.


Thanks for this -- I really haven't read all the BFS wiki and documentation just yet (since zfs was my first foray into this), will keep these points in mind. Don't think I've seen an "md" mentioned before...


md is the normal tool for doing RAID in software on Linux. It exposes a number of physical devices as one block device, which looks just like a normal hard disk.

btrfs is also a volume manager with RAID-like functionality of its own. But you can also use btrfs on an md device, just like you would with any other filesystem.

Software RAID is something I have only used for personal use, and with the enormous consumer hard drives available now, striping them seems less necessary than before.


Ahhhhh I though it was some sort of btrfs specific thing -- didn't recognize it as the 'md' in 'mdadm' (I've only dealt with software RAID before).

This is actually pretty interesting because one of the 'features' of the hosting provider I'm using is that they will software RAID your drives by default. Maybe btrfs is a better choice in that kind of environment if I don't have to undo the software raid on every machine and btrfs will interop well without too much abstraction.


I use btrfs, and the main reason I don't consider zfs instead is that zfs doesn't use the regular linux fs page cache. That and the fact that I use latest mainline kernel, so dont want to have to deal with kernel updates breaking zfs or tanking its performance, as when it lost access to the simd functions. The main feature I would want from zfs would be tiering, which can be gotten from btrfs on bcache, which is what I will likely use in the future. I think there was some issue with adding disks in zfs too? Don't exactly remember about that.


The fact that it doesn't use the page cache is one of the top reasons why I love zfs - it can do a lot smarter decisions (MRU+LRU instead of just LRU) and integrate it with L2ARC.

Of course it depends a lot on your read/write patterns what impact this has, though.

On adding disks, I don't know if that's what you're referring to, but you can't add new disks to an existing RAIDZ.

Personally I've only used mirrors and single-device vdevs so far, haven't seen any need for RAIDZ.


Thanks for sharing -- looks like zfs still doesn't support tiering and just has the ARC and using SLOG for similar functionality...

And yeah, ZFS can't be expanded willy-nilly, found a good blog post with someone's adventures that was illuminating[0].

[0]: https://markmcb.com/2020/01/07/five-years-of-btrfs/


It sounds like you're mixing up SLOG and L2ARC. How is ARC/L2ARC/base dataset not tiered?


ah I did mix this up -- thanks for pointing that out


I’ve been using btrfs for about two years now on my primary machine. Single disk, btrfs-on-luks, on a 970 evo plus. I have never had any stability issues, and as far as I can tell performance is excellent.

I’m not taking any chances, though, seeing as it is “unstable” – I’ve got snapshots and local and remote backups set up and working flawlessly. It’s sooo awesome to be able to pluck older versions of any file on my system whenever I want, and it’s saved me numerous times.

I haven’t tried zfs but I really have no need to. Btrfs does the job for me.


I have a similar setup. It all works perfectly until the disk starts filling up. At 95% performance goes down the cliff. You have to clean the disk and defrag. Running 4yo version, I would guess things might have improved since.


This is true. I’ve had it happen a few times too. I try to keep an eye on disk usage and purge things if it gets too high.


I guess the difference for me is that BTRFS is dead easy (being mainline, as you said). The knock is reliability, but BTRFS has been rock solid every time I've used it, so from personal experience, it's a no-brainer.

Plus, I love snapper and the way subvolumes and snapshots are handled.

EDIT: I will add that I don't use any RAID. I use BTRFS for snapshots, cow and data integrity on my single drive desktops. I think that's where it ready shines. I don't think anyone should be using EXT4 anymore.


Thank you for this input -- yeah it's just so weird that BTRFS is in the mainline kernel, but no one wants to stand up for it, and people are evidently using it with great success (Facebook for example).

I'm also not necessarily going to do any RAID5/6 stuff -- I'm probably just going to keep it safe (for my level of understanding) and do a RAID1/mirror setup and call it a day. The snapshots/cow/data integrity bit is definitely what I'm interested in as well. It feels to me like as long as I run ZFS under my servers I am much safer than anything else (and it's easier/possible to go back in time and undo mistakes).

Unfortunately, there's that whole thing about it being hard to boot to.. Is that still a thing?


BTRFS hard to boot to? You mean, to have it as your boot partition? I guess that could be true. I still use vfat (I think) for boot, cus I just don't care. :D

But for the rest of it, I just have everything in my fstab and it works like anything else. Super easy.


OpenSUSE uses it as the default FS.


As does Fedora as of Fedora 33!


I don't have much experience with btrfs but my understanding is whilst they're generally equivalent the multi-disk config of btrfs isn't really considered stable (think raid5/6 vs zraid1/2). Most production use case is single disk.

Regarding performance, I'm guessing you won't be happy with either if you're not happy with performance. You want to be looking at ARC / SLOG if you want higher performance.

https://www.growse.com/2020/06/09/improving-zfs-write-perfor...


Using the built-in RAID of btrfs is unstable in some cases, but you can still use btrfs without using its RAID on top of mdadm with that providing the RAID, like ReadyNAS and Synology NAS devices have been using by default for years.


Thanks, I learned a thing. :)


thanks for the input -- so yeah I saw the RAID5/6 thing. I'm wondering if it's a bit of a moot point, because the servers basically all come with 2 identical drives of various speed.

Moving writes to ARC or SLOG-on-a-faster-thing would also definitely help, but I'm dealing with SSDs for the most part.

Also, talking of faster storage, NVMe looks really bad for ZFS (and probably btrfs), based on this reddit post[0](graphs[1]). It's not terrible of course, and some recommended that maybe actually turning ARC off would be better, since it might have been actually getting in the way of the NVMe drive.

[0]: https://www.reddit.com/r/zfs/comments/jmdxxx/openzfs_benchma...

[1]: https://64.media.tumblr.com/0d141001aa951a44063c2cac9d2b9cb7...


I've used ZFS for 4 years and really enjoyed it. It was simple to setup and send/recv was very useful for making backups of the data on a separate machine on the network. I now have been using btrfs as I switched to OpenSuse and frankly it also just works and its been easier to setup as I didn't need to do module installs etc. However I've had to switch to rsync for my backups but that wasn't hard. The other good thing I've used is the fact I can grow the raid and can rebalance the disks this isn't possible on ZFS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: