More

huntaub · 2026-04-09T15:10:06 1775747406

I don't think there's much for Amazon to gain from publishing these sorts of internal details. Amazon's services are used by developers who are looking to tightly optimize their usage. If Amazon were to publish detailed internal information, it's likely that folks would start optimizing applications based on internal details that have the potential to change over time.

Secondly, I think that a lot of companies publish these "tech blogs" as a way to boost recruiting (look at the cool stuff that we're doing, don't you want to join us?). Amazon, of course, doesn't have a recruiting problem. If you want to work on the largest-scale systems, it's already a top destination for you.

huntaub · 2026-04-08T10:24:23 1775643863

I imagine (hope) that they are doing some kind of intelligent read-ahead in the frontend servers to optimize for sequential reads that would avoid this looking terrible for applications.

huntaub · 2026-04-08T10:21:50 1775643710

What does DuckDB need that NFS/SMB do not provide?

huntaub · 2026-04-08T10:19:34 1775643574

Notably, this is going to manage your data in it's native format (i.e. you can actually read-write the files out of the S3 bucket as if they were actual objects, mapping 1:1 to each file). The ZFS backend is (almost certainly) a block-based format that is persisted to S3 (meaning that you cannot use it for existing data in S3, and you cannot access data written through ZFS via S3).

huntaub · 2026-04-07T20:52:02 1775595122

This is pretty different than s3fs. s3fs is a FUSE file system that is backed by S3.

This means that all of the non-atomic operations that you might want to do on S3 (including edits to the middle of files, renames, etc) are run on the machine running S3fs. As a result, if your machine crashes, it's not clear what's going to show up in your S3 bucket or if would corrupt things.

As a result, S3fs is also slow because it means that the next stop after your machine is S3, which isn't suitable for many file-based applications.

What AWS has built here is different, using EFS as the middle layer means that there's a safe, durable place for your file system operations to go while they're being assembled in object operations. It also means that the performance should be much better than s3fs (it's talking to ssds where data is 1ms away instead of hdds where data is 30ms away).

mgaunard · 2026-04-08T06:41:15 1775630475

It also means that you need to pay for EFS, which is outrageously expensive, to use S3, whose whole purpose is to be cheap.

huntaub · 2026-04-08T10:20:30 1775643630

Of course, you don't need to, this is just a way to opt-in to getting file semantics on top of S3.

The purpose of S3 isn't to be cheap, it's to be simple.

ChocolateGod · 2026-04-07T21:01:00 1775595660

You can also use something like JuiceFS to make using S3 as a shared filesystem more sane, but you're moving all the metadata to a shared database.

Eikon · 2026-04-08T06:43:48 1775630628

Or ZeroFS which doesn’t require a 3rd party database, just a s3 bucket!

https://github.com/Barre/ZeroFS

ChocolateGod · 2026-04-10T11:58:53 1775822333

ZeroFS isn't a shared redundant filesystem.

Eikon · 2026-04-13T13:30:56 1776087056

It's definitely shared, and can be redundant.

huntaub · 2026-03-09T17:56:20 1773078980

howdy! two things on the archil front:

1. we're not NFS, we wrote our own protocol to get much better performance

2. we're planning on coming out with native branching this month, which should make these kinds of workloads much easier to build!

huntaub · 2026-03-06T22:20:17 1772835617

Well, I think this is what our company, Archil, is working on. We basically built an SSD clustering layer that proxies/caches/and assembles requests into object storage so that you can run a POSIX file system directly on top.

There's also some really great projects like SlateDB in this space, which could be more like what you're looking for (~RocksDB like API that runs on S3).

atombender · 2026-03-07T00:21:55 1772842915

Your product looks very interesting, I will take a look!

huntaub · 2026-02-26T15:18:23 1772119103

We just released a driver that allows users of just-bash to attach a full Archil file system, synced to S3. This would let you run just-bash in an enrivonment where you don't have a full VM and get high-performance access to data that's in your S3 bucket already to do like greps or edits.

Check it out here: https://www.npmjs.com/package/@archildata/just-bash

huntaub · 2026-02-09T17:08:43 1770656923

It's 100% because the number of operations happening on Github has likely 100x'd since the introduction of coding agents. They built Github for one kind of scale, and the problem is that they've all of a sudden found themselves with a new kind of scale.

That doesn't normally happen to platforms of this size.

data-ottawa · 2026-02-09T17:24:36 1770657876

A major platform lift and shift does not help. They are always incredibly difficult.

There are probably tons of baked in URLs or platform assumptions that are very easy to break during their core migration to Azure.

lelanthran · 2026-02-09T19:28:02 1770665282

> A major platform lift and shift does not help.

ISTR that the lift-n-shift started like ... 3 years ago? That much of it was already shifted to Azure ... 2 years ago?

The only thing that changed in the last 1 year (if my above two assertions are correct (which they may not be)) is a much-publicised switch to AI-assisted coding.

huntaub · 2026-02-05T13:03:27 1770296607

This turns out to be a more and more important primitive for companies who are building their own models [1].

[1] https://si.inc/posts/the-heap/