Hacker News new | past | comments | ask | show | jobs | submit
There are some hurdles preventing that flow from achieving reproducible builds. As the bad guys get more sophisticated, it's going to become more and more important that one party can say "we trust this build hash" and a separate party to say "us too".

That's not going to work if both parties get different hashes when they build the image, which won't happen as long as file modification timestamps (and other such hazards) are part of what gets hashed.

Recent versions of buildkit have added support for SOURCE_DATE_EPOC. I've been making the images reproducible before that with my own tooling, regctl image mod [1] to backdate the timestamps.

It's not just the timestamps you need to worry about. Tar needs to be consistent with the uid vs username, gzip compression depends on implementations and settings, and the json encoding can vary by implementation.

And all this assumes the commands being run are reproducible themselves. One issue I encountered there was how alpine tracks their package install state from apk, which is a tar file that includes timestamps. There are also timestamps in logs. Not to mention installing packages needs to pin those package versions.

All of this is hard, and the Dockerfile didn't make it easy, but it is possible. With the right tools installed, reproducing my own images has a documented process [2].

[1]: https://regclient.org/cli/regctl/image/mod/

[2]: https://regclient.org/install/#reproducible-builds

loading story #47296342
Does any of that matter if you’re not auditing the packages you install?

I’m more concerned about sources being poisoned over the build processes. Xz is a great example of this.

Both are needed, but you get more bang for your buck focusing on build security than on audited sources. If the build is solid then it forces attackers to work in the open where all auditors can work together towards spoiling the attack.

If you flip it around and instead have magically audited source but a shaky build, then perhaps a diligent user can protect themself, but they do so by doing their own builds, which means they're unaware that the attack even exists. This allows the attacker to just keep trying until they compromise somebody who is less diligent.

Getting caught requires a user who analyses downloaded binaries in something like ghidra... who does that when it's much easier to just build them from sources instead? (answer: vanishingly few people). And even once the attacker is found out, they can just hide the same payload a bit differently, and the scanners will stop finding it again.

Also, "maybe the code itself is malicious" can only ever be solved the hard way, whereas we have a reasonable hope of someday providing an easy solution to the "maybe the build is malicious" problem.

loading story #47296546