Story Detail of id 48313098 | Liveview Hacker News

ethanpil22 hours ago | on: Claude Opus 4.8

The table comparing eval scores shows the following:

Agentic Terminal Coding (Terminal-Bench 2.1) Opus 4.8 74.6% GPT 5.5 78.2%

Then, when you scroll all the way down to the bottom Footnotes section it says

"Terminal-Bench 2.1: We reported scores for all models using the Terminus-2 public harness. GPT-5.5’s reported score with the Codex CLI harness is 83.4%."

fastball20 hours ago | parent

Seems reasonable? Presumably Claude also performs better under the Claude Code harness.

ethanpil15 hours ago | root | parent

Why not state that?

11 hours ago | root | parent

{"deleted":true,"id":48319287,"parent":48317853,"time":1780031284,"type":"comment"}

#visit	13,437,775
#session	74,665
#live-session	0