After some interrogation, here's how it organized the work:
1. Design workflow (rts-game-design, 11 agents, ~13 min) ran first, produced SPEC.md + DESIGN.md:
1.1. Proposals (3 parallel agents): each designed a complete RTS from a different philosophy
1.2 Judge (1 agent): evaluated all three and synthesized one unified design, committing to specific numbers (costs, HP, map size, etc.).
1.3 Deep-dives (6 parallel agents): each wrote an implementation-ready spec for one subsystem, all consistent with the chosen design
1.4 Synthesis (1 agent): merged the design + all six subsystem specs into one conflict-free master spec
2. Code-review workflow (rts-code-review, 25 agents, ~5 min), ran after the main agent had written and tested the code:
2.1 Review (6 agents, read-only Explore type): each scrutinized one dimension and returned structured findings.
2.2. Verify (19 agents): every finding got its own skeptic agent told to try to refute it, Result: 19 flagged → 16 confirmed, 3 rejected as non-bugs.
What the main agent did in the main loop:
- Wrote all ~2,400 lines of index.html by hand from the spec.
- All browser testing/debugging via headless Chrome (I told it to use rodney by @simonw, love the tool :)
- Applied all 16 fixes from the review and re-verified them in the browser.
If you can stand a little AI expansion - here are a few points Gemini came up with - I think the idea has some merit:
https://g.co/gemini/share/b5b97867eeb1
(Maybe the better analogy is roulette vs pinball machine)
I don't think the Rube Goldberg analogy works if the agentic meandering is essential complexity required to get at the results. Rube Goldberging it would be something like putting this loop inside some comically overengineered enterprise microservice web which is then found out to be running inside a Window 98 emulator or what have you.
So no extra guidance beyond the prompt.