Hacker News new | past | comments | ask | show | jobs | submit

Executing programs inside transformers with exponentially faster inference

https://www.percepta.ai/blog/can-llms-be-computers
This seems way cooler than just computation (which is easy to hand off to a tool, and arguably more predictable that way). The broader point here is that you can have your model switch dynamically to/from a kind of attention that scales with the log of the token count, by only exploring the convex hull in a 2D space. A less capable version of attention, to be sure, but one capable of tracing a program’s execution with text representations of registers and stack - which is a meaningful level of flexibility, and one many humans would find difficult to do reliably!

What could you do with an LLM that can go into “focus mode” and generate tokens extremely rapidly? How much more powerful would a reasoning-token-generation phase be that can explore and cull large numbers of paths/hypotheses, so long as they are well defined? Does this have implications for multi-modal models and spatial reasoning?

As the paper suggests:

> These models could be useful in several modes: as a dedicated fast path paired with a slower, more general model; as part of a fast/slow hybrid architecture inside a single system; or as a speculative execution model that proposes tokens quickly while a regular-attention model verifies and accepts them. Regardless of their eventual capability ceiling, they already suggest a powerful systems primitive for speeding up larger models.

loading story #47366572
loading story #47366288
loading story #47365994
loading story #47365256
This seems a really interesting path for interpretability, specially if a big chunk of a model's behavior occurs pseudo-symbolically. This is an idea I had thought about, integrating tools into the main computation path of a model, but I never imagined that it could be done efficiently with just a vanilla transformer.

Truly, attention is all you need (I guess).

loading story #47364614
This shows the downside of using AI to write up your project. I see the eloquent sentences, but don't get the message.

> This works, but the actual execution happened outside the model. The model specified the computation, then waited for an external system to carry it out. > Our transformer also emits a program, but instead of pausing for an external tool, it executes that program itself, step by step, within the same transformer.

What's the benefit? Is it speed? Where are the benchmarks? Is it that you can backprop through this computation? Do you do so?

Why is it good that it's "inside" the model? Just making it more elegant and nice? The tool was already "inside" the overall hybrid system. What's the actual problem?

loading story #47362148
loading story #47363106
loading story #47362814
loading story #47362753
loading story #47362763
loading story #47362181
I'd like to see this combined with reinforcement learning to optimize models to think computationally. Generating ideas with hypothetical results and then running them in the same thought. Their solution sounded like a lot of tokens though.
loading story #47363421
Interesting... But why? What is the benefit, other than increasing our understanding of model architectures?

Our brains can also simulate turing machines, slowly. We automated that with computers that are faster and more reliable. So why not allow a model to use external much faster and reliable tools, just as we do?

loading story #47362909
loading story #47362540
loading story #47364138
loading story #47366526
I really liked the article, but food for thought: is a transformer that offloads computation to python really that different from Python code being read and then executed by a compiler?

Both examples are of a system we created to abstract most of the hard work.

I think a more important concept here is that the term "AI" has a lot of built-in assumptions, one of which being that it is (or will be) super intelligent, and so folks like the author here think (correctly) that it's important for the AI to be actually doing the work itself.

It makes sense that a next token predictor could execute assembly code. This is fascinating work, especially with the memory implementation.
loading story #47366319
This is brilliant, game changing level.

Hey, give it also access to the dump of its weights and way to propose updates so it can see and tinker its brain directly.

loading story #47365280
loading story #47365502
one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool
loading story #47366292
loading story #47364950
loading story #47363943
loading story #47364211
loading story #47364003
loading story #47365068
Besides being a very interesting conceptual exercise, the animated figures in this article are absolutely stunning - best I’ve ever seen.
This sounds so cool but I can’t tell if it’s a practical joke, even after sitting on it for 2-3 hours. Key points where I lose understanding/trust are when a WASM interpreter suddenly appears in the model, and when we’re representing code in weights.

It is unclear to me how this WASM interpreter is / could be deterministic.

Is this genius? Or just a new binary executable format? Can't tell.
loading story #47363727
very cool idea. But, time savings are not true for every tool call, and it's not clear to me yet whether this is batch-able; also, intuitively, for most of the models that run on GPU, you'd still want to offload tool exec part to CPU since it's much cheaper...
This is really important work.
The original title is "Can LLMs be computers?"

But the right question is, should they?

loading story #47366789
This looks like a hack. Yes, being able to interpret webassembly is a general oracle. Still falls short of solving the real problem directly.
loading story #47364620
big question is how efficient is this compare to executing assembly on CPU
loading story #47363299
loading story #47367276