Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can understand disk I/O limits, but I'm not buying the metadata argument on a read-mostly workload.

More detail about WAFL (netapp's filesystem) features and design in this preso: http://www.cs.tau.ac.il/~ohadrode/slides/WAFL.pdf

Great for corporate documents. Overkill for photo storage.



On a much smaller scale, the amount of metadata generated from the Git operations we do at GitHub was killing GFS until we instituted a more aggressive strategy to clear it out. I'm not surprised this was an issue for Facebook.


But if it is overkill for photo storage why did facebook use that in the first place ?


Because early in a high-growth company's life, speed and ease to market is vastly more important than IOPS, $/TB, TB/KVA, or many other measures that matter a lot when you're facebook's current size, and not much at all when you're facebook's size 5 years ago.

Back in 2004, buying an appliance that plugs right in and works the way you understand (presenting a Posix filesystem over NFS) means they can get on with the Facebook-ness of Facebook. 20 hours of research and quoting, 2 hours to rack, network and power it up and you're in business.

Once you're at 1/2 TB/day (I actually can't believe it's that low unless they're counting the final storage requirement, not what the users are initially sending them), the linear factor dominates the constant factor, and they can now "afford" the luxury of doing a design that's better in $/TB terms.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: