Story Detail of id 48315260 | Liveview Hacker News

It's a combination of reasoning effort (max) + enabling workflow that orchestrates multiple sub-agents.

After some interrogation, here's how it organized the work:

1. Design workflow (rts-game-design, 11 agents, ~13 min) ran first, produced SPEC.md + DESIGN.md:

1.1. Proposals (3 parallel agents): each designed a complete RTS from a different philosophy

1.2 Judge (1 agent): evaluated all three and synthesized one unified design, committing to specific numbers (costs, HP, map size, etc.).

1.3 Deep-dives (6 parallel agents): each wrote an implementation-ready spec for one subsystem, all consistent with the chosen design

1.4 Synthesis (1 agent): merged the design + all six subsystem specs into one conflict-free master spec

2. Code-review workflow (rts-code-review, 25 agents, ~5 min), ran after the main agent had written and tested the code:

2.1 Review (6 agents, read-only Explore type): each scrutinized one dimension and returned structured findings.

2.2. Verify (19 agents): every finding got its own skeptic agent told to try to refute it, Result: 19 flagged → 16 confirmed, 3 rejected as non-bugs.

What the main agent did in the main loop:

- Wrote all ~2,400 lines of index.html by hand from the spec.

- All browser testing/debugging via headless Chrome (I told it to use rodney by @simonw, love the tool :)

- Applied all 16 fixes from the review and re-verified them in the browser.

33MHz-i48617 hours ago | parent | next

seems like a rube-goldberg esque way to consume 10x tokens. is this really where the industry is heading?

e12e15 hours ago | root | parent | next

I like to think of it like the difference between dropping a ball on a roulette wheel (get one random number/sequence of repeated) - vs dropping a ball on a carved topographic map, where valleys guide the ball to a particular outcome.

If you can stand a little AI expansion - here are a few points Gemini came up with - I think the idea has some merit:

https://g.co/gemini/share/b5b97867eeb1

(Maybe the better analogy is roulette vs pinball machine)

derac16 hours ago | root | parent

Why is it Rube Goldbergesque? The process doesn't seem arbitrary.

OJFord9 hours ago | root | parent

Rube Goldberg machines (or Heath Robinson contraptions) aren't arbitrary, they're complicated or contrived ways of achieving the process; often a very literal interpretation of how an automatic machine might imitate an otherwise manual action – a robotic hand movement for example. I think it's quite a good analogy, even if agentic Goldberg works well.

sdfsdssdfsdf8 hours ago | root | parent

Those machines are, to quote Wikipedia, "designed to perform a simple task in a comically overcomplicated way". This implies there is a much simpler way that works just as well.

I don't think the Rube Goldberg analogy works if the agentic meandering is essential complexity required to get at the results. Rube Goldberging it would be something like putting this loop inside some comically overengineered enterprise microservice web which is then found out to be running inside a Window 98 emulator or what have you.

loading story #48320979

artur_makly3 hours ago | parent | next

Just to confirm - you did not generate this plan/orchestration/harness - it did all that on its own?

senko2 hours ago | root | parent

Correct, that's the "workflows" part they introduced in claude code alongside the new model.

chrisweekly4 hours ago | parent | next

Did you start with a clean slate or do you have global ~/.claude/CLAUDE.md and/or specific skills, plugins, etc?

senko2 hours ago | root | parent

I don't have global CLAUDE.md and the only non-default skill I have that was used here is the one to use rodney[0] headless browser. I didn't expressly tell Claude to do browser testing, it decided to do it on its own.

So no extra guidance beyond the prompt.

[0] https://github.com/simonw/rodney/

jmtame15 hours ago | parent

Thanks for sharing this. Going to try it out on a game inspired by Rust. It's helpful re: the point on rodney - I've had a hard time getting the testing to work well in the browser.

#visit	13,437,887
#session	74,665
#live-session	0