Hacker News new | past | comments | ask | show | jobs | submit
" Do we have any standard benchmarks for humanoids to do domestic tasks?" The answer is yes. Steve Wozniak proposed the Coffee Test. See https://www.youtube.com/watch?v=MowergwQR5Y

It's actually very clever. Despite the apparent simplicity, no current model could pass it.

Re your forecasts, I think they are optimistic in terms of timing but not ridiculously so.

I think coffee test for robots will be similar to Turing Test for LLMs, which was quietly achieved and forgotten somewhere between gpt-3.5 and gpt-4. Real tests are tasks like cooking or plumbing - I expect that to come in 2-3 years.