Hacker News new | past | comments | ask | show | jobs | submit
> Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

Empirically speaking, I have a set of evals with an objective pass/fail result and a prompt. I'm doing codegen, so I'm using syntax linting, tests passing, etc. to determine success. With chain-of-thought included in the prompting, the evals pass at a significantly higher rate. A lot of research has been done demonstrating the same in various domains.

If chain-of-thought can't improve quality, how do you explain the empirical results which appear to contradict you?

loading story #42010453