Hacker News new | past | comments | ask | show | jobs | submit
Interesting idea. The metric I'd intuitively want to see is low variance between harnesses for a smarter model. But if a large sample of models statistically outperformed with a certain harness, that's indeed a valuable signal for a developer.