Story Detail of id 48316790 | Liveview Hacker News

mordae17 hours ago | on: Claude Opus 4.8

This is a terrible benchmark. It literally tests the models on their ability to track shifting line numbers. If they cannot keep up, no amount of abstract reasoning can redeem them.

lordmauve9 hours ago | parent

Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.

https://github.com/datacurve-ai/deep-swe

mordae9 hours ago | root | parent

[flagged]

#visit	13,438,301
#session	74,665
#live-session	0