Hacker News new | past | comments | ask | show | jobs | submit
You are oversimplifying. They do produce one word per cycle. But they can also have context buffers carrying up to two million tokens, which is most definitely larger than your measly human short-term memory context buffers.

You, of course, wouldn't notice if your only experience of LLMs was chatting with the cheapest, smallest, least capable LLMs that you get through ChatGPT, or Google search.

It becomes pretty obvious when you use a coding AI on a daily basis. It is the context buffer in which the magic occurs, not the tokens that get spit out one at a time.

Every day, I watch my coding AI develop plans, search the web a half dozen times for documentation, grep through my entire codebase looking for pieces of related code and context, analyze relevant source code across multiple files, spit out an initial plan for implementing the fix before starting to execute it, run requests through some sort of advanced mathematics tool (they are EXTREMELY good at graduate-level calculus and linear algebra), implement fixes that extend across half a dozen files in 2 different computer languages (typescript and C++), run trial compiles and fix coding errors in its output, sometimes developing sub-plans to deal with compile errors. I've seen it get halfway through a fix and revise its initial plan mid-flight as it encounters something in existing source code.

Not vibe coding, to be clear. Targeted use of a coding tool by a by a professional senior software developer with decades of experience, and fair bit of expertise with the limits of what sort of problems my coding AI can and cannot do. Every line code reviewed. Sometimes it needs additional prompts, telling it how it mis-implemented something, or specifying more carefully what I actually want but didn't properly express in the initial request

All the time maintaining that context across multiple request, so that I don't have to restate requests from scratch.

A particularly interesting revision: "You have misread the equation (13) on page 112 of 'Spice, the Manual 2nd ed.'. I should be ....". (It had previously identified the textbook as a source I was using, from comments in source, in a preceding request, and actually already read cited pages in the PDF file, which it had found online). And I had actually asked it to implement equation (13), which was, in fact, badly typeset. The error it had made was defensible, if not the best reading of the equation.

"You are correct. Let me fix that." (producing updates to the implementation of the equation in code, AND code that implements the symbolically-differentiated version of that equation 60 lines later, which is not explicitly given in the text). The text says "take the lagrangian of equations (11), (12) and (13)" or something like that.

ALL information that gets carried in context buffers, even though it's generating code one word at a time. The bulk of the magic occurs in context buffers, not spitting out words one at a time, which, for my coding AI is, I think about 250,000 tokens.

I think it's pretty safe to think that my coding AI is working out of context buffers that may carry plans and research results consisting of tens or hundreds of thousands of arranged tokens carried in context buffers through the multiple steps of the implementation, and later revision. None of that would be possible if were simply working one token ahead.

I kind of suspect that a lot of activity occurs in the first few words of its response. "Let me examine your current source code and develop a plan. Ok. I can see on line 131 where you want me to implement the equation.". (An opportunity to perform about 27 updates of the context buffer). And in the sometimes hundreds of lines of output it generates as it talks itself through what it needs to do.

loading story #48399759