> once all the code seems okay, you will run THREE parallel sub-agents for code review: each looking at ALL changed code
I did some evals with a prompt like this when I had some subscription tokens to burn, a few months ago. I think using Opus 4.5. What I found was:
1. Running two subagents was somewhat useful
2. Running three started to get redundant
3. Any more than three was pointless (at least when using the same model)
However, even two were getting like 60% the same results.
Much, much more effective was splitting out into audits through different lenses:
* One looking for security issues
* One looking for whether the task was completed successfully
* One looking for performance issues
* One looking for contract/maintainability issues
* One looking at test coverage
Etc.
You can get reasonably close with fewer, however more agents give better signal: e.g. if 3/3 flag something as an issue, the outer one that orchestrates them can view it as something to give more attention to, whereas if it's just 1/3, then it probably begs more consideration. Ofc more doesn't always imply right.