Executing programs inside transformers with exponentially faster inference

https://www.percepta.ai/blog/can-llms-be-computers

261u1hcw9nx | 1 day ago | 99 | HN

This seems way cooler than just computation (which is easy to hand off to a tool, and arguably more predictable that way). The broader point here is that you can have your model switch dynamically to/from a kind of attention that scales with the log of the token count, by only exploring the convex hull in a 2D space. A less capable version of attention, to be sure, but one capable of tracing a program’s execution with text representations of registers and stack - which is a meaningful level of flexibility, and one many humans would find difficult to do reliably!

What could you do with an LLM that can go into “focus mode” and generate tokens extremely rapidly? How much more powerful would a reasoning-token-generation phase be that can explore and cull large numbers of paths/hypotheses, so long as they are well defined? Does this have implications for multi-modal models and spatial reasoning?

As the paper suggests:

> These models could be useful in several modes: as a dedicated fast path paired with a slower, more general model; as part of a fast/slow hybrid architecture inside a single system; or as a speculative execution model that proposes tokens quickly while a regular-attention model verifies and accepts them. Regardless of their eventual capability ceiling, they already suggest a powerful systems primitive for speeding up larger models.

loading story #47366572

loading story #47366288

loading story #47365994

loading story #47365256

andy12_1 day ago | parent | next

This seems a really interesting path for interpretability, specially if a big chunk of a model's behavior occurs pseudo-symbolically. This is an idea I had thought about, integrating tools into the main computation path of a model, but I never imagined that it could be done efficiently with just a vanilla transformer.

Truly, attention is all you need (I guess).

loading story #47364614

bonoboTP10 hours ago | parent | next

This shows the downside of using AI to write up your project. I see the eloquent sentences, but don't get the message.

> This works, but the actual execution happened outside the model. The model specified the computation, then waited for an external system to carry it out. > Our transformer also emits a program, but instead of pausing for an external tool, it executes that program itself, step by step, within the same transformer.

What's the benefit? Is it speed? Where are the benchmarks? Is it that you can backprop through this computation? Do you do so?

Why is it good that it's "inside" the model? Just making it more elegant and nice? The tool was already "inside" the overall hybrid system. What's the actual problem?

loading story #47362148

loading story #47363106

loading story #47362814

loading story #47362753

loading story #47362763

loading story #47362181

koolala11 hours ago | parent | next

I'd like to see this combined with reinforcement learning to optimize models to think computationally. Generating ideas with hypothetical results and then running them in the same thought. Their solution sounded like a lot of tokens though.

loading story #47363421

MattPalmer108610 hours ago | parent | next

Interesting... But why? What is the benefit, other than increasing our understanding of model architectures?

Our brains can also simulate turing machines, slowly. We automated that with computers that are faster and more reliable. So why not allow a model to use external much faster and reliable tools, just as we do?

loading story #47362909

loading story #47362540

loading story #47364138

loading story #47366526

deviation10 hours ago | parent | next

I really liked the article, but food for thought: is a transformer that offloads computation to python really that different from Python code being read and then executed by a compiler?

Both examples are of a system we created to abstract most of the hard work.

I think a more important concept here is that the term "AI" has a lot of built-in assumptions, one of which being that it is (or will be) super intelligent, and so folks like the author here think (correctly) that it's important for the AI to be actually doing the work itself.

pennomi1 day ago | parent | next

It makes sense that a next token predictor could execute assembly code. This is fascinating work, especially with the memory implementation.

loading story #47366319

mirekrusin10 hours ago | parent | next

This is brilliant, game changing level.

Hey, give it also access to the dump of its weights and way to propose updates so it can see and tinker its brain directly.

loading story #47365280

loading story #47365502

galsapir1 day ago | parent | next

one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool

loading story #47366292

loading story #47364950

loading story #47363943

loading story #47364211

loading story #47364003

loading story #47365068

plaidfuji8 hours ago | parent | next

Besides being a very interesting conceptual exercise, the animated figures in this article are absolutely stunning - best I’ve ever seen.

refulgentis3 hours ago | parent | next

This sounds so cool but I can’t tell if it’s a practical joke, even after sitting on it for 2-3 hours. Key points where I lose understanding/trust are when a WASM interpreter suddenly appears in the model, and when we’re representing code in weights.

It is unclear to me how this WASM interpreter is / could be deterministic.

behehebd11 hours ago | parent | next

Is this genius? Or just a new binary executable format? Can't tell.

loading story #47363727

yalok9 hours ago | parent | next

very cool idea. But, time savings are not true for every tool call, and it's not clear to me yet whether this is batch-able; also, intuitively, for most of the models that run on GPU, you'd still want to offload tool exec part to CPU since it's much cheaper...

RagnarD8 hours ago | parent | next

This is really important work.

rebolek8 hours ago | parent | next

The original title is "Can LLMs be computers?"

But the right question is, should they?

loading story #47366789

TedHerman8 hours ago | parent | next

This looks like a hack. Yes, being able to interpret webassembly is a general oracle. Still falls short of solving the real problem directly.

loading story #47364620

ndxone10 hours ago | parent | next

big question is how efficient is this compare to executing assembly on CPU

loading story #47363299