Story Detail of id 43116118 | Liveview Hacker News

Hopefully in the next decade we’ll get there.

Vision+language multimodal models seem to solve some of the hard problems.