Hacker News new | past | comments | ask | show | jobs | submit

How many branches can your CPU predict?

https://lemire.me/blog/2026/03/18/how-many-branches-can-your-cpu-predict/
Hmm, that's interesting. The code as written only has one branch, the if statement (well, two, the while loop exit clause as well). My mental model of the branch predictor was that for each branch, the CPU maintained some internal state like "probably taken/not taken" or "indeterminate", and it "learned" by executing the branch many times.

But that's clearly not right, because apparently the specific data it's branching off matters too? Like, "test memory location X, and branch at location Y", and it remembers both the specific memory location and which specific branch branches off of it? That's really impressive, I didn't think branch predictors worked like that.

Or does it learn the exact pattern? "After the pattern ...0101101011000 (each 0/1 representing the branch not taken/taken), it's probably 1 next time"?

loading story #47441226
loading story #47440749
loading story #47440726
I guess the generate_random_value function uses the same seed every time, so the expectation is that the branch predictor should be able to memorize it with perfect accuracy.

But the memorization capacity of the branch predictor must be a trade-off, right? I guess this generate_random_value function is impossible to predict using heuristics, so I guess the question is how often we encounter 30k long branch patterns like that.

Which isn’t to say I have evidence to the contrary. I just have no idea how useful this capacity actually is, haha.

loading story #47440377
loading story #47440647
AMD CPUs have been killing it lately, but this benchmark feels quite artificial.

It's a tiny, trivial example with 1 branch that behaves in a pseudo-random way (random, but fixed seed). I'm not sure that's a really good example of real world branching.

How would the various branch predictors perform when the branch taken varies from 0% likely to 100% likely, in say, 5% increments?

How would they perform when the contents of both paths are very heavy, which involves a lot of pipeline/SE flushing?

How would they perform when many different branches all occur in sequence?

How costly are their branch mispredictions, relative to one another?

Without info like that, this feels a little pointless.

loading story #47441342
Before switching to a hot and branchless code path, I was seeing strangely lower performance on Intel vs. AMD under load. Realizing the branch predictor was the most likely cause was a little surprising.
Enlarging a branch predictor requires area and timing tradeoffs. CPU designers have to balance branch predictor improvements against other improvements they could make with the same area and timing resources. What this tells you is that either Intel is more constrained for one reason or another, or Intel's designers think that they net larger wins by deploying those resources elsewhere in the CPU (which might be because they have identified larger opportunities for improvement, or because they are basing their decision making on a different sample of software, or both).
Does any JIT/AOT/hot code optimization/techniques/compilers/runtime takes into account whether the branch prediction is saturated and try to recompile to go branchless
loading story #47442271
Using random values defeats the purpose of the branch predictor. The best branch predictor for this test would be one that always predicts the branch taken or not taken.
loading story #47440764
loading story #47440521
{"deleted":true,"id":47440409,"parent":47438490,"time":1773931526,"type":"comment"}