Its also inevitable given that we still don't even really know how these models work or what they do at inference time.
We know input/output pairs, when using a reasoning model we can see a separate stream of text that is supposedly insight into what the model is "thinking" during inference, and when using multiple agents we see what text they send to each other. That's it.