TL;DR: He wrote an OS X dedup app which finds files with the same contents and tells the filesystem that their contents are identical, so it can save space (using copy-on-write features).
He points out its dangerous but could be worth it cause space savings.
I wonder if the implementation is using a hash only or does an additional step to actually compare the contents to avoid hash collision issues.
It's not open source, so we'll never know. He chose a pay model instead.
Also, some files might not be identical but have identical blocks. Something that could be explored too. Other filesystems have that either in their tooling or do it online or both.
He points out its dangerous but could be worth it cause space savings.
I wonder if the implementation is using a hash only or does an additional step to actually compare the contents to avoid hash collision issues.
It's not open source, so we'll never know. He chose a pay model instead.
Also, some files might not be identical but have identical blocks. Something that could be explored too. Other filesystems have that either in their tooling or do it online or both.