Hacker News new | past | comments | ask | show | jobs | submit

Tell HN: Deduplicating a 10.4 TiB game preservation archive (WIP)

It looks like your solutions so far expose the result as a plain, accessible filesystem. If you can relax this constraint you might have an easier time -

"Precomp" is a program that losslessly expands a zip/gzip archive in a way that can be 1:1 restored, but in its expanded format, is more amenable to solid compression.

Could you write a custom archiver for the MPQ files that does something similar - expands them out, in a reversible way, to make the files appear more similar? Then long-range solid compression (e.g. zpaq) can take care of the rest,

Or, your custom archiver could extract the blobs to individual files, and then you'd only need to store each inner stream once, while still being able to reconstruct the original MPQ on demand.

loading story #42447661
loading story #42446552
loading story #42448989
loading story #42446379
loading story #42488355
loading story #42448948
loading story #42470390
loading story #42481890
loading story #42481838
loading story #42481888
loading story #42447024
loading story #42489571
loading story #42449581
loading story #42447652
loading story #42446227
loading story #42445366