Story Detail of id 47685774 | Liveview Hacker News

gertlabs10 hours ago | on: GLM-5.1: Towards Long-Horizon Tasks

Interesting idea. The metric I'd intuitively want to see is low variance between harnesses for a smarter model. But if a large sample of models statistically outperformed with a certain harness, that's indeed a valuable signal for a developer.

#visit	13,259,586
#session	74,665
#live-session	0