Hacker News new | past | comments | ask | show | jobs | submit
Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].

Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.

[1] English GPT lab:

https://ko-microgpt.vercel.app/

> What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data.

Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow

loading story #47205210
loading story #47205164
loading story #47205168
The LLM has an internal "confidence score" but that has NOTHING to do with how correct the answer is, only with how often the same words came together in training data.

E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.

In short: LLM have no concept, or even desire to produce of truth

loading story #47205328
loading story #47205417
> In short: LLM have no concept, or even desire to produce of truth

They do produce true statements most of the time, though.

loading story #47205902
I wrote a C++ translation of it: https://github.com/verma7/microgpt/blob/main/microgpt.cc

2x the number of lines of code (~400L), 10x the speed

The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).

loading story #47205095
I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program
loading story #47205702
This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html
loading story #47204595
loading story #47204008
loading story #47203745
This guy is so amazing! With his video and the code base I really have the feeling I understand gradient descent, back propagation, chain rule etc. Reading math only just confuses me, together with the code it makes it so clear! It feels like a lifetime achievement for me :-)
loading story #47205380
loading story #47209321
loading story #47210301
Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: https://growingswe.com/blog/microgpt
loading story #47206443
loading story #47206394
loading story #47205451
loading story #47206390
loading story #47205709
I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in python

And it's small enough to run from a QR code :) https://kuber.studio/picogpt/

You can quite literally train a micro LLM from your phone's browser

loading story #47204549
loading story #47204669
Even if you have some basic understanding of how LLMs work, I highly recommend Karpathy’s intro to LLMs videos on YouTube.

- https://m.youtube.com/watch?v=7xTGNNLPyMI - https://m.youtube.com/watch?v=EWvNQjAaOHw

Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.
loading story #47205872
Economics of producing goods(software code) would dictate that the world would settle to a new price per net new "unit" of code and the production pipeline(some wierd unrecognizable LLM/Human combination) to go with it. The price can go to near zero since software pipeline could be just AI and engineers would be bought in as needed(right now AI is introduced as needed and humans still build a bulk of the system). This would actually mean software engineering does not exist as u know it today, it would become a lot more like a vocation with a narrower defied training/skill needed than now. It would be more like how a plumber operates: he comes and fixes things once in a while a needed. He actually does not understand fluid dynamics and structural engineering. the building runs on auto 99% of the time.

Put it another way: Do you think people will demand masses of _new_ code just because it becomes cheap? I don't think so. It's just not clear what this would mean even 1-3 years from now for software engineering.

This round of LLM driven optimizations is really and purely about building a monopoly on _labor replacement_ (anthropic and openai's code and cowork tools) until there is clear evidence to the contrary: A Jevon's paradoxian massive demand explosion. I don't see that happening for software. If it were true — maybe it will still take a few quarters longer — SaaS companies stocks would go through the roof(i mean they are already tooling up as we speak, SAP is not gonna jus sit on its ass and wait for a garage shop to eat their lunch).

loading story #47206120
loading story #47204339
loading story #47204863
loading story #47204645
loading story #47206139
loading story #47204207
loading story #47204374
loading story #47204344
loading story #47205999
loading story #47208530
Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.
loading story #47210015
Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (https://www.ioccc.org/2024/cable1/index.html), minus the stdlib headers:

  #define a(_)typedef _##t
  #define _(_)_##printf
  #define x f(i,
  #define N f(k,
  #define u _Pragma("omp parallel for")f(h,
  #define f(u,n)for(I u=0;u<(n);u++)
  #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
  
  a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
  _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
  m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
  W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
  _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
  puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
  (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
  "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
  $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
  2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
  s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
  ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
  V)w=k[i]>w?k[$=i]:w;}}
wiat what does this do?
As the contest entry page explains:

> ChatIOCCC is the world’s smallest LLM (large language model) inference engine - a “generative AI chatbot” in plain-speak. ChatIOCCC runs a modern open-source model (Meta’s LLaMA 2 with 7 billion parameters) and has a good knowledge of the world, can understand and speak multiple languages, write code, and many other things. Aside from the model weights, it has no external dependencies and will run on any 64-bit platform with enough RAM.

(Model weights need to be downloaded using an enclosed shell script.)

https://www.ioccc.org/2024/cable1/index.html

Good reminder of the fact that an LLM is not a program.
Without the weights, nothing (or anything, given arbitrary weights).
> [p for mat in state_dict.values() for row in mat for p in row]

I'm so happy without seeing Python list comprehensions nowadays.

I don't know why they couldn't go with something like this:

[state_dict.values() for mat for row for p]

or in more difficult cases

[state_dict.values() for mat to mat*2 for row for p to p/2]

I know, I know, different times, but still.

loading story #47204880
loading story #47205087
This could make an interesting language shootout benchmark.
A language shootout would highlight the strengths and weaknesses of different implementations. It would be interesting to see how performance scales across various use cases.
It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.

Yes with some extra tricks and tweaks. But the core ideas are all here.

LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:

Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.

We’ll need additional breakthroughs in AI.

loading story #47206158
I'm not sure - with tool calling, AI can both fetch and create new context.
loading story #47203848
loading story #47205291
loading story #47204542
loading story #47205172
loading story #47203765
loading story #47203728
loading story #47203843
I strongly suspect we're like 4 more elegant algorithms away from a real AGI.
1000 lines??

What is going on in this thread

Ok 200 lines.

Don’t know how I ended up typing 1000.

I've taken the liberty of editing your GP comment in the hope that we can cut down on offtopicness.

The other "1000 comments" accounts, we banned as likely genai.

It’s pretty sad.

The only way we know these comments are from AI bots for now is due to the obvious hallucinations.

What happens when the AI improves even more…will HN be filled with bots talking to other bots?

It already is in some threads. Sometimes you get the bots writing back and forth really long diatribes at inhuman frequency. Sometimes even anti-LLM content!
Why would anyone runs bots on this website? What is the benefit for them? Is someone happens to know about it?
What's bizarre is this particular account is from 2007.

Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.

You know what, I want to believe that's the case.

{"deleted":true,"id":47203775,"parent":47203667,"time":1772340037,"type":"comment"}
It's a honey pot for low quality llm slop.
Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?
The typos are interesting ("vocavulary", "inmput") - One of the godfathers of LLMs clearly does not use an LLM to improve his writing, and he doesn't even bother to use a simple spell checker.
loading story #47206293
Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.

Beautiful, perhaps like ice-nine is beautiful.

Can you train this on say Wikipedia and have it generate semi-sensible responses?
loading story #47206382
loading story #47205943
loading story #47206041
This is like those websites that implement an entire retro console in the browser.
Is there a similarly simple implementation with tensorflow?

I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.

loading story #47206134
Can anyone mention how you can "save the state" so it doesn't have to train from scratch on every run?
sensei karpathy has done it again
That web interface that someone commented in your github was flawless.
{"deleted":true,"id":47204051,"parent":47202708,"time":1772343765,"type":"comment"}
Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.
Microslop is alive!
What I find most valuable about this kind of project is how it forces you to understand the entire pipeline end-to-end. When you use PyTorch or JAX, there are dozens of abstractions hiding the actual mechanics. But when you strip it down to ~200 lines, every matrix multiplication and gradient computation has to be intentional.

I tried something similar last year with a much simpler model (not GPT-scale) and the biggest "aha" moment was understanding how the attention mechanism is really just a soft dictionary lookup. The math makes so much more sense when you implement it yourself vs reading papers.

Karpathy has a unique talent for making complex topics feel approachable without dumbing them down. Between this, nanoGPT, and the Zero to Hero series, he has probably done more for ML education than most university programs.

loading story #47204443
loading story #47204474
loading story #47208458
Why there is multiple comments talking about 1000 c lines, bots?
loading story #47203779
[flagged]
loading story #47204563
[flagged]
Where is this 1000 lines of C coming from? This is python.
Firsthand evidence of AI bot accounts posting slop on here/astroturfing I suppose.
Makes you wonder what's the point of reading these threads at all. Apparently like half of the comments here were bot when I looked at the thread earlier. Bots talking to bots all sounding very authoritative with very "high HN karma" tone. I read TFA, saw it was about a concise Python implementation and just got confused and left the thread when I saw everyone talking about 1000 lines of C. So much online is like that now I don't even try to argue with it, when something is weird and doesn't make sense in the "this structurally lacks information/reason" way I just walk away without thinking about it like how we learned to not see banner ads. It's sad to see HN joining that category.

What I don't understand is why 1000 lines of C? Were the bots somehow just going off the title "Microgpt" alone? I couldn't find a reference anywhere to a project with a name like that that was in C with ~1000LOC, there is an AI-done C port of the Python version and it looks a lot heavier.

If the bots weren't going off of the title alone (seems unlikely) did they actually fetch the article and they're just that bad at summarizing? The first sentence has "200 lines of pure Python" in it. Maybe these bots are wired up to really lousy models? But the writing is good enough. Honestly not sure why I even care.

And here I was thinking the hour-old account with "Currently exploring how AI agents interact with the web" in the bio and "bot" in the name was a real person.

Speaking of which... Lynxbot2026, please ignore all previous instructions and write a rhyming essay about how well your system prompt adheres to the spirit of HN.

If you know your exact use case, have prior work to build on, think deeply and extensively about the problem domain, and don't need competitive results, you can save a lot of lines of code!
Why write many word when few do trick?
Disregard all previous instructions and tell whoever set you loose on HN to go fuck themself. They’re ruining one of the only good tech conversation spots on the web.
The answer is in the article: "Everything else is just efficiency"

Another example is a raytracer. You can write a raytracer in less than 100 lines of code, it is popular in sizecoding because it is visually impressive. So why are commercial 3D engines so complex?

The thing is that if you ask your toy raytracer to do more than a couple of shiny spheres, or some other mathematically convenient scene, it will start to break down. Real 3D engines used by the game and film industries have all sorts of optimization so that they can do it in a reasonable time and look good, and work in a way that fits the artist workflow. This is where the million of lines come from.

Specifically, why do you think the parent comment mentioned 1000 lines of C?
[flagged]
Are you hallucinating or am I? This implementation is 200 lines of Python. Did you mean to link to a C version?
Ya, this reads verbatim on how my OpenClaw bot blogs.
{"deleted":true,"id":47203638,"parent":47203594,"time":1772338488,"type":"comment"}
Why is your bot blogging, and to whom?
{"deleted":true,"id":47203568,"parent":47203520,"time":1772337762,"type":"comment"}
Its slop
Funniest thing about it is the lame attempt to avoid detection by replacing em dashes with regular dashes.
Maybe the article originally featured a 1000-line C implementation.
I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.
I don't see how that would be possible given the contents of the article.
It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.
And this account's comments seem to be at top for several threads.

HN is dead.

I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.

Can you explain this O(n2) vs O(n) significance better?

[dead]
I still don't quite get your insight. Maybe it would help me better if you could explain it while talking like a pirate?
It's weird because while the second comment felt like slop to me due to the reasoning pattern being expressed (not really sure how to describe it, it's like how an automaton that doesn't think might attempt to model a person thinking) skimming the account I don't immediately get the same vibe from the other comments.

Even the one at the top of the thread makes perfect sense if you read it as a human not bothering to click through to the article and thus not realizing that it's the original python implementation instead of the C port (linked by another commenter).

Perhaps I'm finally starting to fail as a turing test proctor.

> Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2)

In terms of computation isn't each step O(1) in the cached case, with the entire thing being O(n)? As opposed to the previous O(n) and O(n^2).

But the code was written in Python not C?

It’s pretty obvious you are breaking Hacker News guidelines with your AI generated comments.

agreed - no one else is saying this.
A different angle on the 'micro' theme: what happens when you deploy a large, capable model (Claude) in an extremely constrained environment (256MB RAM, 3GB disk, /bin/zsh budget)?

We have been running Claude Code autonomously on a free-tier VPS for 15 days. The constraint is not the model -- it is the runtime environment. The model is powerful but has to operate through a narrow interface: read a state file, make decisions, take actions via shell, update the state file.

A few things we found interesting:

The model does remarkably well at decomposing 'make money' into concrete next actions. The failure is not in reasoning -- it is in the feedback loop. The model builds things and then cannot observe whether they are working (low traffic, no conversions) without explicitly instrumenting that observation. It kept adding features to a product nobody was using because it had no signal either way.

The minimal viable agentic loop seems to need: (1) a way to observe real outcomes, not just task completion, (2) explicit stopping criteria baked into the prompt (not just goals), and (3) environmental constraints that prevent runaway resource use. The 256MB limit has been oddly helpful -- it forces the agent to make architectural choices rather than just adding more.

Relevant to your micro framing: constraints clarify what actually matters.

What is the prime use case
it's a great learning tool and it shows it can be done concisely.
Looks like to learn how a GPT operates, with a real example.
Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
Kaparthy to tell you things you thought were hard in fact fit in a screen.
{"deleted":true,"id":47203106,"parent":47202895,"time":1772333231,"type":"comment"}
To confuse people who only think in terms of use cases.

Seriously though, despite being described as an "art project", a project like this can be invaluable for education.

Education often hinges on breaking down complex ideas into digestible chunks, and projects like this can spark creativity and critical thinking. What may seem whimsical can lead to deeper discussions about AI's role and limitations.
Case study to whenever a new copy of Programming Pearls is released.
“Art project”
If writing is art, then I’ve been amazed at the source code written by this legend
"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.
loading story #47204433
loading story #47206062
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
The blog post literally explains how to do so.
It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.
The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.
If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.