Story Detail of id 48464498 | Liveview Hacker News

yaodub8 hours ago | on: Claude Fable 5

SWE-Bench measures single tasks in isolation. In a real loop the model usually loses track of what I was trying to do long before code quality becomes the issue.

#visit	13,690,497
#session	74,665
#live-session	0