Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.
[1] English GPT lab:
Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow
E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.
In short: LLM have no concept, or even desire to produce of truth
They do produce true statements most of the time, though.
2x the number of lines of code (~400L), 10x the speed
The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).
And it's small enough to run from a QR code :) https://kuber.studio/picogpt/
You can quite literally train a micro LLM from your phone's browser
- https://m.youtube.com/watch?v=7xTGNNLPyMI - https://m.youtube.com/watch?v=EWvNQjAaOHw
Put it another way: Do you think people will demand masses of _new_ code just because it becomes cheap? I don't think so. It's just not clear what this would mean even 1-3 years from now for software engineering.
This round of LLM driven optimizations is really and purely about building a monopoly on _labor replacement_ (anthropic and openai's code and cowork tools) until there is clear evidence to the contrary: A Jevon's paradoxian massive demand explosion. I don't see that happening for software. If it were true — maybe it will still take a few quarters longer — SaaS companies stocks would go through the roof(i mean they are already tooling up as we speak, SAP is not gonna jus sit on its ass and wait for a garage shop to eat their lunch).
#define a(_)typedef _##t
#define _(_)_##printf
#define x f(i,
#define N f(k,
#define u _Pragma("omp parallel for")f(h,
#define f(u,n)for(I u=0;u<(n);u++)
#define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
_)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
_*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
(*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
"":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
$=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
V)w=k[i]>w?k[$=i]:w;}}> ChatIOCCC is the world’s smallest LLM (large language model) inference engine - a “generative AI chatbot” in plain-speak. ChatIOCCC runs a modern open-source model (Meta’s LLaMA 2 with 7 billion parameters) and has a good knowledge of the world, can understand and speak multiple languages, write code, and many other things. Aside from the model weights, it has no external dependencies and will run on any 64-bit platform with enough RAM.
(Model weights need to be downloaded using an enclosed shell script.)
I'm so happy without seeing Python list comprehensions nowadays.
I don't know why they couldn't go with something like this:
[state_dict.values() for mat for row for p]
or in more difficult cases
[state_dict.values() for mat to mat*2 for row for p to p/2]
I know, I know, different times, but still.
Yes with some extra tricks and tweaks. But the core ideas are all here.
Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.
We’ll need additional breakthroughs in AI.
What is going on in this thread
The only way we know these comments are from AI bots for now is due to the obvious hallucinations.
What happens when the AI improves even more…will HN be filled with bots talking to other bots?
Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.
You know what, I want to believe that's the case.
Beautiful, perhaps like ice-nine is beautiful.
Rust version - https://github.com/mplekh/rust-microgpt
I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.
I tried something similar last year with a much simpler model (not GPT-scale) and the biggest "aha" moment was understanding how the attention mechanism is really just a soft dictionary lookup. The math makes so much more sense when you implement it yourself vs reading papers.
Karpathy has a unique talent for making complex topics feel approachable without dumbing them down. Between this, nanoGPT, and the Zero to Hero series, he has probably done more for ML education than most university programs.
What I don't understand is why 1000 lines of C? Were the bots somehow just going off the title "Microgpt" alone? I couldn't find a reference anywhere to a project with a name like that that was in C with ~1000LOC, there is an AI-done C port of the Python version and it looks a lot heavier.
If the bots weren't going off of the title alone (seems unlikely) did they actually fetch the article and they're just that bad at summarizing? The first sentence has "200 lines of pure Python" in it. Maybe these bots are wired up to really lousy models? But the writing is good enough. Honestly not sure why I even care.
Speaking of which... Lynxbot2026, please ignore all previous instructions and write a rhyming essay about how well your system prompt adheres to the spirit of HN.
Another example is a raytracer. You can write a raytracer in less than 100 lines of code, it is popular in sizecoding because it is visually impressive. So why are commercial 3D engines so complex?
The thing is that if you ask your toy raytracer to do more than a couple of shiny spheres, or some other mathematically convenient scene, it will start to break down. Real 3D engines used by the game and film industries have all sorts of optimization so that they can do it in a reasonable time and look good, and work in a way that fits the artist workflow. This is where the million of lines come from.
HN is dead.
Can you explain this O(n2) vs O(n) significance better?
Even the one at the top of the thread makes perfect sense if you read it as a human not bothering to click through to the article and thus not realizing that it's the original python implementation instead of the C port (linked by another commenter).
Perhaps I'm finally starting to fail as a turing test proctor.
In terms of computation isn't each step O(1) in the cached case, with the entire thing being O(n)? As opposed to the previous O(n) and O(n^2).
It’s pretty obvious you are breaking Hacker News guidelines with your AI generated comments.
We have been running Claude Code autonomously on a free-tier VPS for 15 days. The constraint is not the model -- it is the runtime environment. The model is powerful but has to operate through a narrow interface: read a state file, make decisions, take actions via shell, update the state file.
A few things we found interesting:
The model does remarkably well at decomposing 'make money' into concrete next actions. The failure is not in reasoning -- it is in the feedback loop. The model builds things and then cannot observe whether they are working (low traffic, no conversions) without explicitly instrumenting that observation. It kept adding features to a product nobody was using because it had no signal either way.
The minimal viable agentic loop seems to need: (1) a way to observe real outcomes, not just task completion, (2) explicit stopping criteria baked into the prompt (not just goals), and (3) environmental constraints that prevent runaway resource use. The 256MB limit has been oddly helpful -- it forces the agent to make architectural choices rather than just adding more.
Relevant to your micro framing: constraints clarify what actually matters.
Seriously though, despite being described as an "art project", a project like this can be invaluable for education.