Story Detail of id 43118625 | Liveview Hacker News

porphyra1 day ago | on: Helix: A vision-language-action model for generalist humanoid control

It seems that end to end neural networks for robotics are really taking off. Can someone point me towards where to learn about these, what the state of the art architectures look like, etc? Do they just convert the video into a stream of tokens, run it through a transformer, and output a stream of tokens?

loading story #43119388

#visit	12094373
#session	46811
#live-session	0