Yeah, I’m also not an expert in this. Just had enough architecture classes to know that all three companies are using cleverer branch predictors than I could come up with, haha.
Another possibility is that the memorization capacity of the branch predictors is a bottleneck, but a bottleneck that they aren’t often hitting. As the design is enhanced, that bottleneck might show up. AMD might just have most recently widened that bottleneck.
Super hand-wavey, but to your point about data, without data we can really only hand-wave anyway.