Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lustre is big in the HPC/AI training world. Amazing performance and scalability, but not for the faint of heart.


Got any more details about pros and cons based on your experience?


Well, kind of hard to say anything exhaustive in a quick comment, but roughly advantages:

- POSIX compliant, including dotting the i's. As opposed to, say, NFS which isn't cache coherent.

- performance and scalability. 1 TB/s+ sequential IO to a single file is what you'd expect on a large HPC system these days.

- Metadata performance has gotten a lot better over the past decade or so, beating most(all?) other parallel filesystems.

Downsides:

- Lots of pieces in a Lustre cluster (typically nodes are paired in sort-of active/active HA configs). And lots of cables, switches etc. So a fairly decent chance something breaks every now and then.

- When something breaks, Lustre is weird and different compared to many other filesystems. Tools are rudimentary and different.

To get a feel for what 'life with Lustre' could be, see e.g. various 'site reports' from workshops. E.g. for a couple somewhat recent ones: https://www.eofs.eu/wp-content/uploads/2024/09/cscs_site_rep... and https://www.eofs.eu/wp-content/uploads/2024/09/LAD-24-Luster...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: