Yes, that seems to be a silly way to go about it if your goal is to store the whole web and not just a single scrape. Of course anything that deduplicates data is more vulnerable to data corruption (or at least corruption can have wider consequences) so it's not a trivial problem but you'd think deduplicating identical resources would be something added the first time they came close to their storage limits.