Hacker News new | past | comments | ask | show | jobs | submit

Get Shit Done: A meta-prompting, context engineering and spec-driven dev system

https://github.com/gsd-build/get-shit-done
I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.
I ended up grafting the brainstorm, design, and implementation planning skills from Superpowers onto a Ralph-based implementation layer that doesn't ask for my input once the implementation plan is complete. I have to run it in a Docker sandbox because of the dangerously set permissions but that is probably a good idea anyway.

It's working, and I'm enjoying how productive it is, but it feels like a step on a journey rather than the actual destination. I'm looking forward to seeing where this journey ends up.

I use GitHub Copilot and unfortunately there has been a weird regression in the bundled Plan mode. It suddenly, when they added the new plan memory, started getting both VERY verbose in the plan output and also vague in the details. It's adding a lot of step that are like "design" and "figure out" and railroads you into implementation without asking follow-up questions.
loading story #47421518
Just tried GSD and Plan Mode on the same exact task (prompt in an MD file). Plan Mode had a plan and then base implementation in twenty minutes. GSD ran for hours to achieve the same thing.

I reviewed the code from both and the GSD code was definitely written with the rest of the project and possibilities in mind, while the Claude Plan was just enough for the MVP.

I can see both having their pros and cons depending on your workflow and size of the task.

I've gone the other way recently, shifting from pure plan mode to superpowers. I was reminded of it due to the announcement of the latest version.

It is perhaps confirmation bias on my part but I've been finding it's doing a better job with similar problems than I was getting with base plan mode. I've been attributing this to its multiple layers of cross checks and self-reviews. Yes, I could do that by hand of course, but I find superpowers is automating what I was already trying to accomplish in this regard.

loading story #47419262
I've played around a bit with the plugins and as you've said, plan mode really handles things fine for the most part. I've got various workflows I run through in Claude and I've found having CC create custom skills/agents created for them gets me 80% of the way there. It's also nice that letting the Claude file refer to them rather than trying to define entire workflows within it goes a long way. It'll still forget things here and there, leading to wasted tokens as it realizes it's being dumb and corrects itself, but nothing too crazy. At least, it's more than enough to let me continue using it naturally rather than memorizing a million slash commands to manually evoke.
I have been using superpowers for Gryph development for a while. Love the brainstorming and exploration that it brings in. Haven’t really compared token usage but something in my bucket.
Same experience. Superpowers are a little too overzealous at times. For coding especially I don’t like seeing a comprehensive design spec written (good) and then turning that into effectively the same doc but macro expanded to become a complete implementation with the literal code for the entire thing in a second doc (bad). Even for trivial changes I’d end up with a good and succinct -design.md, then an -implementation.md, then end with a swarm of sub agents getting into races while more or less just grabbing a block from the implementation file and writing it.

A mess. I still enjoy superpowers brainstorming but will pull the chute towards the end and then deliver myself.

Why are we using cli wrappers if you're using Claude Code? I get if you need something like Codex but they released sub agents today so maybe not even that, but it's an unnecessary wrapper for Claude Code.
loading story #47421072
loading story #47420386
loading story #47419323
What's happening with the other 90%?
loading story #47423339
loading story #47422065
loading story #47426551
I have a ai system i use. I'd like to release it so others can benefit, but at the same time it's all custom to myself and what i do, and work on.

If I fork out a version for others that is public, then I have to maintain that variation as well.

Is anyone in a similar situation? I think most of the ones I see released are not particularly complex compraed to my system, but at the same time I don't know how to convey how to use my system as someone who just uses it alone.

it feels like I don't want anyone to run my system, I just want people to point their ai system to mine and ask it what there is valuable to potentially add to their own system.

I don't want to maintain one for people. I don't want to market it as some magic cure. Just show patterns that others can use.

{"deleted":true,"id":47420740,"parent":47420236,"time":1773798755,"type":"comment"}
you don't have to maintain it. Especially in the age of ai, just giving people inspiration and something to vibe from is more than sufficient and appreciated
loading story #47421551
loading story #47420664
loading story #47423900
I've had a good experience with https://github.com/obra/superpowers. At first glance this looks similar. Has anyone tried both who can offer a comparison?
I've used both From my experience, gsd is a highly overengineered piece of software that unfortunately does not get shit done, burns limits and takes ages while doing so. Quick mode does not really help because it kills the point of gsd, you can't build full software on ad-hocs. I've used plain markdown planning before, but it was limiting and not very stable, superpowers looks like a good middleground
loading story #47420589
I tried Superpowers for my current project - migrating my blog from Hugo to Astro (with AstroPaper theme). I wrote the main spec in two ways - 1) my usual method of starting with a small list of what I want in the new blog and working with the agent to expand on it, ask questions and so on (aka Collaborative Spec) and 2) asked Superpowers to write the spec and plan. I did both from the working directory of my blog's repo so that the agent has full access to the code and the content.

My findings:

1. The spec created by Superpowers was very detailed (described the specific fonts, color palette), included the exact content of config files, commit messages etc. But it missed a lot of things like analytics, RSS feed etc.

2. Superpowers wrote the spec and plan as two separate documents which was better than the collaborative method, which put both into one document.

3. Superpowers recommended an in-place migration of the blog whereas the collaborative spec suggested a parallel branch so that Hugo and Astro can co-exist until everything is stable.

And a few more difference written in [0].

In general, I liked the aspect of developing the spec through discussion rather than one-shotting it, it let me add things to the spec as I remember them. It felt like a more iterative discovery process vs. you need to get everything right the first time. That might just be a personal preference though.

At the end of this exercise, I asked Claude to review both specs in detail, it found a few things that both specs missed (SEO, rollback plan etc.) and made a final spec that consolidates everything.

[0] https://annjose.com/redesign/#two-specs-one-project

loading story #47419555
It's one of those things where having a structure is really helpful - I've used some similar prompt scaffolds, and the difference is very noticeable.

Another great technique is to use one of these structures in a repo, then task your AI with overhauling the framework using best practices for whatever your target project is. It works great for creative writing, humanizing, songwriting, technical/scientific domains, and so on. In conjunction with agents, these are excellent to have.

I think they're going to be a temporary thing - a hack that boosts utility for a few model releases until there's sufficient successful use cases in the training data that models can just do this sort of thing really well without all the extra prompting.

These are fun to use.

I've tried both. Each has pros and cons. Two things I don't like about superpowers is it writes all the codes into the implementation plan, at the plan step, then the subagents basically just rewrite these codes back to the files. And I have to ask Claude to create a progress.md file to track the progress if I want to work in multiple sessions. GSD pretty much solved these problems for me, but the down side of GSD is it takes too many turns to get something done.
loading story #47420339
I don't get why people need a cli wrapper for this. Can't you just use Claude skills and create everything you need?
loading story #47419449
loading story #47419428
Yes, and IMO Superpowers is better when you want to Get Not-Shit Done.

Get Shit Done is best when when you're an influencer and need to create a Potemkin SaaS overnight for tomorrow's TikTok posts.

loading story #47425995
I've been using GSD extensively over the past 3 months. I previously used speckit, which I found lacking. GSD consistently gets me 95% of the way there on complex tasks. That's amazing. The last 5% is mostly "manual" testing. We've used GSD to build and launch a SaaS product including an agent-first CMS (whiteboar.it).

It's hard to say why GSD worked so much better for us than other similar frameworks, because the underlying models also improved considerably during the same period. What is clear is that it's a huge productivity boost over vanilla Claude Code.

Same. Have had great results with it. I got sick of paying FreshBooks monthly for basic income/expense tracking for Schedule C reporting and used GSD to build a macOS Swift app with Codex 5.4 and Opus 4.6. It’s working great and I am considering releasing it on the App Store. It started as a web app, but then I wanted screen capture from other windows for receipts in email or whatever. Then I wanted physical receipts, and so used Apple continuity camera. All working now in my app. And, I just added receipt auto-extract to pull salient info from and determine deduction category using Anthropic API.

Yes this is how much paying FreshBooks annoyed me. Plus I hated they forced an emailed 2FA if you didn’t connect with Google.

I tried it once; it was incredibly verbose, generating an insane amount of files. I stopped using it because I was worried it would not be possible to rapidly, cheaply, and robustly update things as interaction with users generated new requirements.

The best way I have today is to start with a project requirements document and then ask for a step-by-step implementation plan, and then go do the thing at each step but only after I greenlight the strategy of the current step. I also specify minimal, modular, and functional stateless code.

loading story #47428789
I've compared this to superpowers and the classic prd->task generator. And I came away convinced that less is more. At least at the moment. gsd performed well, but took hours instead of minutes. Having a simple explanation of how to create a PRD followed by a slightly more technical task list performed much better. It wasn't that grd or superpowers couldn't find a solution, it's just that they did it much slower and with a lot more help. For me, the lesson was that the workflow has changed, and we that we can't apply old project-dev paradigms to this new/alien technology. There's a new instruction manual and it doesn't build on the old one.
loading story #47425583
I like openspec, it lets you tune the workflow to your liking and doesn’t get in the way.

I started with all the standard spec flow and as I got more confident and opinionated I simplified it to my liking.

I think the point of any spec driven framework is that you want to eventually own the workflow yourself, so that you can constraint code generation on your own terms.

I also like openspec.

I think these type of systems (gsd/superpowers) are way too opinionated.

It's not that they can't or don't work. I just think that the best way to truly stay on top of the crazy pace of changes is to not attach yourself to super opinionated workflows like these.

I'm building an orchestrator library on top of openspec for that reason.

I use openspec and love it. I’m doing 5-7x with close to 100% of code AI generated, and shipping to production multiple times a day. I work on a large sass app with hundreds of customers. Wrote something here:

https://zarar.dev/spec-driven-development-from-vibe-coding-t...

>large sass app

>hundreds of customers

loading story #47420188
loading story #47423136
I tried this for a week and gave up. Required far too much back and forth. Ate too many tokens, and required too much human in the loop.

For this reason I don’t think it’s actually a good name. It should be called planning-shit instead. Since that’s seemingly 80%+ of what I did while interacting with this tool. And when it came to getting things done, I didn’t need this at all, and the plans were just alright.

I gave it a shot, but won't be using it going forward. It requires a waterfall process. And, I found it difficult, and in some cases impossible, to adjust phases/plans when bugs or changes in features arise. The execution prompts didn't do a good job of steering the code to be verified while coding and relies on the user to manually test at the end of each phase.
I did a similar system myself, then I run evals on it and found that the planning ceremony is mostly useless, claude can deal with simple prose, item lists, checkbox todos, anything works. The agent won't be a better coder for how you deliver your intent.

But what makes a difference is running plan review and work review agent, they fix issues before and after work. Both pull their weight but the most surprising is the plan-review one. The work review judge reliably finds bugs to fix, but not as surprising in its insights. But they should run from separate subagents not main one because they need a fresh perspective.

Other things that matter are 1. testing enforcement, 2. cross task project memory. My implementation for memory is a combination of capturing user messages with a hook, append only log, and keeping a compressed memory state of the project, which gets read before work and updated after each task.

If you want some context about spec-driven development and how it could be used with LLMs I recommend [1]. Having some background like helps me to understand tools like this a bit more.

[1] https://www.riaanzoetmulder.com/articles/ai-assisted-program...

loading story #47426578
GSD has a reputation for being a token burner compared to something like Superpowers. Has that changed lately? Always open to revisiting things as they improve.
> If you know clearly what you want

This is the real challenge. The people I know that jump around to new tools have a tough time explaining what they want, and thus how new tool is better than last tool.

What do you think drives the tooling ecosystem aside from VC dollars?
loading story #47418529
loading story #47425409
Apart from GSD and superpowers, there's another system, called PAUL [1]. It apparently requires fewer tokens compared to GSD, as it does not use subagents, but keeps all in one session. A detailed comparison with GSD is part of the repo [2].

[1] https://github.com/ChristopherKahler/paul

[2] https://github.com/ChristopherKahler/paul/blob/main/PAUL-VS-...

Would love to migrate from GSD and try, if there is community around it.
loading story #47424968
loading story #47426399
I think the research / plan / execute idea is good but feels like you would be outsourcing your thinking. Gotta review the plan and spend your own thinking tokens!
loading story #47423394
loading story #47424179
loading story #47426887
loading story #47423552
I could not produce useful output from this. It was useful as a rubber duck because it asks good motivating questions during the plan phase, but the actual implementation was lacklustre and not worth the effort. In the end, I just have Claude Opus create plans, and then I have it write them to memory and update it as it goes along and the output is better.
No brother, the Claude plans aren't the right path, they're for hobbyists.
loading story #47419288
There should be an "Examples" section in projects like this one to show what has actually been made using it. I scrolled to the end and was really expecting an example the way it's being advertised.

If it was game engine or new web framework for example there would be demos or example projects linked somewhere.

loading story #47422960
loading story #47427318
loading story #47428565
loading story #47422485
I'm curious if anyone has used this (or similar) to build a production system?

I'm facing increasing pressure from senior executives who think we can avoid the $$$ B2B SaaS by using AI to vibe code a custom solution. I love the idea of experimenting with this but am horrified by the first-ever-case being a production system that is critical to the annual strategic plan. :-/

I would love to take up that challenge. With what I have learnt so far, I am raring to get opportunities to make custom solutions.
How come we have all these benchmarks for models, but none whatsoever for harnesses / whatever you'd call this? While I understand assigning "scores" is more nuanced, I'd love to see some website that has a catalog of prompts and outputs as produced with a different configuration of model+harness in a single attepmt
250K lines in a month — okay, but what does review actually look like at that volume?

I've been poking at security issues in AI-generated repos and it's the same thing: more generation means less review. Not just logic — checking what's in your .env, whether API routes have auth middleware, whether debug endpoints made it to prod.

You can move that fast. But "review" means something different now. Humans make human mistakes. AI writes clean-looking code that ships with hardcoded credentials because some template had them and nobody caught it.

All these frameworks are racing to generate faster. Nobody's solving the verification side at that speed.

Code is a cost. It seems everyone's forgotten.

Saying "I generated 250k lines" is like saying "I used 2500 gallons of gas". Cool, nice expense, but where did you get? Because it it's three miles, you're just burning money.

250k lines is roughly SQLite or Redis in project size. Do you have SQLite-maintaining money? Did you get as far as Redis did in outcomes?

loading story #47419105
I agree with this to some degree. Agents often stub and take shortcuts during implementation. I've been working on this problem a little bit with open-artisan which I published yesterday (https://github.com/yehudacohen/open-artisan).

Rather than having agents decide to manage their own code lifecycle, define a state machine where code moves from agent to agent and isolated agents critique each others code until the code produced is excellent quality.

This is still a bit of an token hungry solution, but it seems to be working reasonably well so far and I'm actively refining it as I build.

Not going to give you formal verification, but might be worth looking into strategies like this.

I have been ~obsessed~ with exactly this problem lately.

We built AI code generation tools, and suddenly the bottleneck became code review. People built AI code reviewers, but none of the ones I've tried are all that useful - usually, by the time the code hits a PR, the issues are so large that an AI reviewer is too late.

I think the solution is to push review closer to the point of code generation, catch any issues early, and course-correct appropriately, rather than waiting until an entire change has been vibe-coded.

I've been trying to beat this drum for a minute now. Your code quality is a function of validation time, and you have a finite amount of that which isn't increased by better orchestration.

My rant about this: https://sibylline.dev/articles/2026-01-27-stop-orchestrating...

You can AI to audit and review. You can put constraints that credentials should never hit disk. In my case, AI uses sed to read my env files, so the credentials don't even show up in the chat.

Things have changed quite a bit. I hope you give GSD a try yourself.

{"deleted":true,"id":47419090,"parent":47418650,"time":1773785661,"type":"comment"}
In my experience the issue is that when the same agent writes and reviews its own code it'll always think it's fine. I've been running a setup where the coder and reviewer are completely separate - different models, reviewer doesn't see any of the coder's context, just the spec and final output. catches way more stuff than i expected honestly.
loading story #47418825
loading story #47418945
I've tried it, and I'm not convinced I got measurably better results than just prompting claude code directly.

It absolutely tore through tokens though. I don't normally hit my session limits, but hit the 5-hour limits in ~30 minutes and my weekly limits by Tuesday with GSD.

Same experience on multiple occasions.
With the coding slot machine, I prefer move fast and start over if anything goes off track. Maybe the amount of token spent with several iterations is similar to using a more well planned system like GSD.
This looks like moving context from prompts into files and workflows.

Makes sense for consistency, but also shifts the problem:

how do you keep those artifacts in sync with the actual codebase over time?

it is very hard for me to take seriously any system that is not proven for shipping production code in complex codebases that have been around for a while.

I've been down the "don't read the code" path and I can say it leads nowhere good.

I am perhaps talking my own book here, but I'd like to see more tools that brag about "shipped N real features to production" or "solved Y problem in large-10-year-old-codebase"

I'm not saying that coding agents can't do these things and such tools don't exist, I'm just afraid that counting 100k+ LOC that the author didn't read kind of fuels the "this is all hype-slop" argument rather than helping people discover the ways that coding agents can solve real and valuable problems.

I’ve tried GSD several times. I actually like the verbosity and it’s a simple chore for Claude to refresh project docs from GSD planning docs.

Like most spec driven development tools, GSD works well for greenfield or first few rounds of “compound engineering.” However, like all others, the project gets too big and GSD can’t manage to deliver working code reliably.

Agents working GSD plans will start leaving orphans all over, it won’t wire them up properly because verification stages use simple lexical tools to search code for implementation facts. I tried giving GSD some ast aware tools but good luck getting Claude to reliably use them.

Ultimately I put GSD back on the shelf and developed my own “property graph” based planner that is closer to Claude “plan mode” but the design SOT is structured properties and not markdown. My system will generate docs from the graph as user docs. Agents only get tasked as my “graph” closes nodes and re-sorts around invariants, then agents are tasked directly.

Can you expand on that at all (or point to some reading on how Claude plan mode works etc?)

I think I have to get my head around a lot more than I think

loading story #47420791
{"deleted":true,"id":47419535,"parent":47417804,"time":1773788439,"type":"comment"}
"I am a super productive person that just wants to get shit done"

Looked at profile, hasn't done or published anything interesting other than promoting products to "get stuff done"

This is like the TODO list book gurus writing about productivity

Looking for 5 seconds at the github profile I see a bunch of music-related stuff, and also a bunch of contributions to private repos that we have no idea what they are. I get the productivity guru anti-pattern, but I honestly don't know what you're looking at that merits this kind of reflexive personal attack.
loading story #47423999
At the risk of sounding stupid what does the author mean by: “I’m not a 50-person software company. I don’t want to play enterprise theatre.” ?
Seems fairly obvious: Some agent harnesses play enterprise theater by creating jira-type tickets for you and moving them around silly swim lanes, instead of, of course, just simply getting sh!t done.
loading story #47419282
No idea but doesn’t it sound GREAT and filled with portentous meaning? Don’t be an enterprise clown! Be a gutsy hustle guy like me! Down with enterprise theatre, long live the vibe jam!
The author of that page seems to mostly be AI, not a human.
With GSD, I was able to write 250K lines of code in less than a month, without prior knowledge of claude.
That sounds awful.

I got a promotion once for deleting 250K lines of code in less than a month. Now that sounds better

loading story #47420879
I could copy 250k lines from github.

Faster than using ai. Cheaper. Code is better tested/more secure. I can learn/build with other humans.

loading story #47418272
loading story #47418314
250K? Could you expand your experience with details about your project and the lessons and issues you found?
loading story #47418225
loading story #47418336
yes vibecoding is fun.
The README recommends --dangerously-skip-permissions as the intended workflow. Looking at gsd-executor.md you can see why — subagents run node gsd-tools.cjs, git checkout -b, eslint, test runners, all generated dynamically by the planner. Approving each one kills autonomous mode.

There is a gsd-plan-checker that runs before execution, but it only verifies logical completeness — requirement coverage, dependency graphs, context budget. It never looks at what commands will actually run. So if the planner generates something destructive, the plan-checker won't catch it because that's not what it checks for. The gsd-verifier runs after execution, checking whether the goal was achieved, not whether anything bad happened along the way. In /gsd:autonomous this chains across all remaining phases unattended.

The granular permissions fallback in the README only covers safe reads and git ops — but the executor needs way more than that to actually function. Feels like there should be a permission profile scoped to what GSD actually needs without going full skip.

{"deleted":true,"id":47419093,"parent":47419048,"time":1773785667,"type":"comment"}
Did anyone compare it with everything-claude-code (ECC)?
loading story #47424074
I've tried several of these sorts of things, and I keep coming away with the feeling that they are a lot of ceremony and complication for not much value. I appreciate that people are experimenting with how to work with AI and get actual value, but I think pretty much all of these approaches are adding complexity without much, or often any, gain.

That's not a reason to stop trying. This is the iterative process of figuring out what works.

loading story #47423397
Another heavily overengineered AND underengineered abomination. I'm convinced anyone who advocates for these types of tools would find just as much success just prompting claude code normally and taking a little bit to plan first. Such a waste of time to bother with these tools that solve a problem that never existed in the first place.
loading story #47426418
loading story #47422032
This seems like something I'd want to try but I am wholly opposed to `npx` being the sole installation mechanism. Let me install it as a plugin in Claude Code. I don't want `npx` to stomp all over my home directory / system configuration for this, or auto-find directories or anything like that.
I use Oh-My-Opencode (Now called Oh-My-OpenAgent), but it's effectively the same as GSD, but better imo
For me it was awesome. I needed a custom Pipeline for Preprocessing some Lab Data, including Visualization and Manipulation and it got me exactly what I wanted, as opposed to Codex Plan Mode, which just burned my weekly quota and produced Garbage
loading story #47422091
I honestly tried this a while back, unless this is something else, this was completely not very much useful thing.

If I remember correctly, it created a lot of changes, spent a lot of time doing something and in the end this was all smoke and mirrors. If I would ever use something like this, I would maybe use BMad, which suffers from same issues, like Speckit and others.

I don't know if they have some sponsorship with bunch of youtubers who are raving how awesome this is... without any supporting evidence.

Anyhow, this is my experience. Superpowers on the other hand were quite useful so far, but I didn't use them enough to have to claim anything.

loading story #47426296
loading story #47421900
loading story #47428363
terrible name, DOA
Nah it's entirely possible a project with a name like this starts to get traction and then changes it's name to Get Stuff Done to go mainstream. Honestly it could be an asset to getting traction with a "move fast and break things" audience. It adds texture and a name change adds lore.
The whole gsd/agents folder is hilarious. Like a bunch of MD that never breaks. How do you is it minimally correct? Subjective prose. Sad to see this on the frontpage