Story Detail of id 42010179 | Liveview Hacker News

anyfooThu, Oct 31, 2024, 06:54 PM | on: OpenZFS deduplication is good now and you shouldn't use it

I’m not sure that’s true, because the hypervisor can know which blocks are related to begin with? From what I quoted above it seems that the file system instead does a lookup based on the block content to determine if a block is a dupe (I don’t know if it uses a hash, necessitating processing the whole block, or something like an RB tree, which avoids having to read the whole block if it already differs early from candidates). Unless there is a way to explicitly tell the file system that you are copying blocks for that purpose, and that VMware is actually doing that. If not, then leaving it to the file system or even the storage layer should have a definite impact on performance, albeit in exchange for higher space efficiency because a lookup can deduplicate blocks that are identical but not directly related. This would give a space benefit if you do things like installing the same applications across many VMs after the cloning, but assuming that this isn’t commonly done (I think you should clone after establishing all common state like app installations if possible), then my gut feeling is very much that the performance benefit of more semantic-level hypervisor bookkeeping outweighs the space gains from “dumb” block-oriented fs/storage bookkeeping.

Dylan16807Thu, Oct 31, 2024, 07:35 PM | parent

Your phrasing sounds like you're unaware that filesystems can also do the same kind of cloning that a hypervisor does, where the initial data takes no storage space and only changes get written.

In fact, it's a much more common feature than active deduplication.

VM drives are just files, and it's weird that you imply a filesystem wouldn't know about the semantics of a file getting copied and altered, and would only understand blocks.

loading story #42011373

#visit	13,053,144
#session	74,665
#live-session	0