Story Detail of id 42000781 | Liveview Hacker News

gpsx1 month ago | on: Chain-of-thought can hurt performance on tasks where thinking makes humans worse

I saw an LLM having this kind of problem when I was doing some testing a ways back. I asked it to order three fruits from largest to smallest. I think it was orange, blueberry and grapefruit. It could do that easily with a simple prompt. When the prompting included something to the effect of “think step by step”, it would try to talk through the problem and it would usually get it wrong.

loading story #42003959

ajuc1 month ago | parent | next

It's not thinking, it compressed the internet into a clever, lossy format with nice interface and it retrieves stuff from there.

Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

Jerrrrrrry1 month ago | root | parent | next

  >It's not thinking



  >it compressed the internet into a clever, lossy format with nice interface and it retrieves stuff from there.

Humans do both, why can't LLM's?

  >Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

More like pulling out a deep-fried meme, looking for context, then searching google images until you find the most "original" JPG representation with the least amount of artifacts.

There is more data to add confidently, it just has to re-think about it with a renewed perspective, and an abstracted-away higher-level context/attention mechanism.

danenania1 month ago | root | parent | next

> Chain of thought is like trying to improve JPG quality by re-compressing it several times. If it's not there it's not there.

Empirically speaking, I have a set of evals with an objective pass/fail result and a prompt. I'm doing codegen, so I'm using syntax linting, tests passing, etc. to determine success. With chain-of-thought included in the prompting, the evals pass at a significantly higher rate. A lot of research has been done demonstrating the same in various domains.

If chain-of-thought can't improve quality, how do you explain the empirical results which appear to contradict you?

mplewis1 month ago | root | parent

The empirical results like OP’s paper, in which chain of thought reduces quality?

loading story #42010774

loading story #42007314

loading story #42007169

loading story #42006828

loading story #42007220

loading story #42007723

loading story #42007533

loading story #42004007

#visit	11157937
#session	45005
#live-session	0