Hacker News new | past | comments | ask | show | jobs | submit
Still fails the car wash question, I took the prompt from the title of this thread: https://news.ycombinator.com/item?id=47031580

The answer was "Walk! It would be a bit counterproductive to drive a dirty car 50 meters just to get it washed — you'd barely move before arriving. Walking takes less than a minute, and you can simply drive it through the wash and walk back home afterward."

I've tried several other variants of this question and I got similar failures.

loading story #47053851
loading story #47053841
I see a big focus on computer use - you can tell they think there is a lot of value there and in truth it may be as big as coding if they convincingly pull it off.

However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.

[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...

If the world becomes dependent on computer-use than the AI buildout will be more than validated. That will require all that compute.
loading story #47053762
loading story #47053487
loading story #47053524
loading story #47053482
loading story #47051089
loading story #47050837
loading story #47052072
loading story #47051472
loading story #47051221
loading story #47052278
loading story #47053713
loading story #47052194
loading story #47051286
loading story #47050819
loading story #47051172
loading story #47051134
loading story #47052692
loading story #47050863
loading story #47051233
loading story #47052984
loading story #47052978
loading story #47050969
loading story #47053903
loading story #47051412
loading story #47052418
loading story #47051497
loading story #47051076
loading story #47051407
loading story #47051544
loading story #47051691
loading story #47050751
loading story #47050805
loading story #47052942
loading story #47051804
loading story #47050814
loading story #47050960
loading story #47051277
loading story #47052512
loading story #47051335
loading story #47051397
loading story #47051446
loading story #47051492
loading story #47052703
loading story #47051120
loading story #47050839
loading story #47051250
loading story #47051706
loading story #47050833
loading story #47051627
loading story #47051174
loading story #47050890
loading story #47050934
Hoe much power did it take to train the models?
loading story #47050887
loading story #47050919
loading story #47050938
Does it matter? How much power does it take to run duolingo? How much power did it take to manufacture 300000 Teslas? Everything takes power
I think it does matter how much power it takes but, in the context of power to "benefits humanity" ratio. Things that significantly reduce human suffering or improve human life are probably worth exerting energy on.

However, if we frame the question this way, I would imagine there are many more low-hanging fruit before we question the utility of LLMs. For example, should some humans be dumping 5-10 kWh/day into things like hot tubs or pools? That's just the most absurd one I was able to come up with off the top of my head. I'm sure we could find many others.

It's a tough thought experiment to continue though. Ultimately, one could argue we shouldn't be spending any more energy than what is absolutely necessary to live. (food, minimal shelter, water, etc) Personally, I would not find that enjoyable way to live.

loading story #47051573
The biggest issue is that the US simply Does Not Have Enough Power, we are flying blind into a serious energy crisis because the current administration has an obsession with "clean coal"
loading story #47051119
loading story #47050711