Story Detail of id 47419801 | Liveview Hacker News

yoavsha120 hours ago | on: Get Shit Done: A meta-prompting, context engineering and spec-driven dev system

How come we have all these benchmarks for models, but none whatsoever for harnesses / whatever you'd call this? While I understand assigning "scores" is more nuanced, I'd love to see some website that has a catalog of prompts and outputs as produced with a different configuration of model+harness in a single attepmt

#visit	13,160,719
#session	74,665
#live-session	0