Hacker News new | past | comments | ask | show | jobs | submit
I saw an LLM having this kind of problem when I was doing some testing a ways back. I asked it to order three fruits from largest to smallest. I think it was orange, blueberry and grapefruit. It could do that easily with a simple prompt. When the prompting included something to the effect of “think step by step”, it would try to talk through the problem and it would usually get it wrong.
loading story #42003959
It's not thinking, it compressed the internet into a clever, lossy format with nice interface and it retrieves stuff from there.

Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

  >It's not thinking



  >it compressed the internet into a clever, lossy format with nice interface and it retrieves stuff from there.

Humans do both, why can't LLM's?

  >Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.
More like pulling out a deep-fried meme, looking for context, then searching google images until you find the most "original" JPG representation with the least amount of artifacts.

There is more data to add confidently, it just has to re-think about it with a renewed perspective, and an abstracted-away higher-level context/attention mechanism.

> Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

Empirically speaking, I have a set of evals with an objective pass/fail result and a prompt. I'm doing codegen, so I'm using syntax linting, tests passing, etc. to determine success. With chain-of-thought included in the prompting, the evals pass at a significantly higher rate. A lot of research has been done demonstrating the same in various domains.

If chain-of-thought can't improve quality, how do you explain the empirical results which appear to contradict you?

The empirical results like OP’s paper, in which chain of thought reduces quality?
loading story #42010774
loading story #42007314
loading story #42007169
loading story #42006828
loading story #42007220
loading story #42007723
loading story #42007533
loading story #42004007