Any remarks/experiences about the "record size" of ZFS, maybe especially in relation to RAIDZx? I don't fully understand it.
I have in a NAS a RAIDZ1 of 4HDDs on which I set a recordsize of 1MB, currently full at 50% and so far performance has been good with both big and small files... .
I'll probably create in a future a RAIDZ2/3 by using ~8 HDDs and I'll test various record sizes but I just wanted to know if anybody had already any positive/negative experiences with some combination of record size and RAIDZx... .
The record size setting is the max, ZFS can use less in some cases. The best one depends on the data youre writing and the ashift of the pool, so testing is best. Large record sizes are helpful for large files (less overhead losses).
I did read Arstechnica's article in the past but I did not feel comfortable with their results... (I'm not challenging them, I'm just not sure if they're relevant for me or not).
So, I just did a test (ashift 12, RAIDZ1 with 4 8TB HDDs) and I got better performance in both cases with a 1MB recordsize vs. 128KB (all sequential).
> Maybe a small recordsize can have some benefits when overwriting parts of the files...mmmhhh...?
Right. People who've done more testing than me reckon on 16KB being a good record size for transaction-processing database work, where tables are seeing lots of small inserts and updates. (You might think matching the database's block size would be ideal, e.g. Postgres writes 8KB at a time, but the rationale here is that you tend to get better compression at 16KB recordsize than 8KB, and the benefit from this outweights the write-amplification.)
But if database update performance isn't a big deal for you then you can probably just ignore this.
I've not done any testing of my own at the 1MB size, but I don't think I'd be inclined to try it unless I was fairly confident that there weren't going to be many small writes to big files.
In short: use the large recordsize where you think you've got a good case for it, and likewise with a small record size. Otherwise, just stick with the default.
Yeah, in my case the DBs "Clickhouse" and "MariaDB+MyRocks" might fit well the 1MB-case (as they both never "update" existing files but keep writing new files not just for "inserts" but as well for "updates", Clickhouse anyway not supporting "update/delete" almost at all, he).
On the other hand "PostgreSQL" and (maybe) as well "MariaDB+TokuDB" might need a small recordsize -> I'll have to test it, and anyway, splitting each single DB to use different datasets seems to be a great idea :)
In some cases -- when the file is smaller.
For small files ZFS uses the smallest possible block size that can accommodate the file. Once the file grows beyond the recordsize (maximum block size), it uses recordsize-sized blocks.
I have in a NAS a RAIDZ1 of 4HDDs on which I set a recordsize of 1MB, currently full at 50% and so far performance has been good with both big and small files... .
I'll probably create in a future a RAIDZ2/3 by using ~8 HDDs and I'll test various record sizes but I just wanted to know if anybody had already any positive/negative experiences with some combination of record size and RAIDZx... .
Thx