Microgpt

http://karpathy.github.io/2026/02/12/microgpt/

1562tambourine_man | 20 hours ago | 272 | HN

Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].

Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.

[1] English GPT lab:

https://ko-microgpt.vercel.app/

geokon12 hours ago | parent | next

> What’s the deal with “hallucinations”? The model generates tokens by sampling from a probability distribution. It has no concept of truth, it only knows what sequences are statistically plausible given the training data.

Extremely naiive question.. but could LLM output be tagged with some kind of confidence score? Like if I'm asking an LLM some question does it have an internal metric for how confident it is in its output? LLM outputs seem inherently rarely of the form "I'm not really sure, but maybe this XXX" - but I always felt this is baked in the model somehow

loading story #47205210

loading story #47205164

loading story #47205168

Lionga12 hours ago | parent

The LLM has an internal "confidence score" but that has NOTHING to do with how correct the answer is, only with how often the same words came together in training data.

E.g. getting two r's in strawberry could very well have a very high "confidence score" while a random but rare correct fact might have a very well a very low one.

In short: LLM have no concept, or even desire to produce of truth

loading story #47205328

loading story #47205417

amelius10 hours ago | root | parent

> In short: LLM have no concept, or even desire to produce of truth

They do produce true statements most of the time, though.

loading story #47205902

verma715 hours ago | parent | next

I wrote a C++ translation of it: https://github.com/verma7/microgpt/blob/main/microgpt.cc

2x the number of lines of code (~400L), 10x the speed

The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).

loading story #47205095

subset17 hours ago | parent | next

I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program

loading story #47205702

red_hare17 hours ago | parent | next

This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html

loading story #47204595

loading story #47204008

loading story #47203745

la_fayette11 hours ago | parent | next

This guy is so amazing! With his video and the code base I really have the feeling I understand gradient descent, back propagation, chain rule etc. Reading math only just confuses me, together with the code it makes it so clear! It feels like a lifetime achievement for me :-)

loading story #47205380

loading story #47209321

loading story #47210301

growingswe15 hours ago | parent | next

Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: https://growingswe.com/blog/microgpt

loading story #47206443

loading story #47206394

loading story #47205451

loading story #47206390

loading story #47205709

kuberwastaken14 hours ago | parent | next

I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in python

And it's small enough to run from a QR code :) https://kuber.studio/picogpt/

You can quite literally train a micro LLM from your phone's browser

loading story #47204549

loading story #47204669

etothet8 hours ago | parent | next

Even if you have some basic understanding of how LLMs work, I highly recommend Karpathy’s intro to LLMs videos on YouTube.

- https://m.youtube.com/watch?v=7xTGNNLPyMI - https://m.youtube.com/watch?v=EWvNQjAaOHw

znnajdla15 hours ago | parent | next

Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.

loading story #47205872

ghm21997 hours ago | parent | next

Economics of producing goods(software code) would dictate that the world would settle to a new price per net new "unit" of code and the production pipeline(some wierd unrecognizable LLM/Human combination) to go with it. The price can go to near zero since software pipeline could be just AI and engineers would be bought in as needed(right now AI is introduced as needed and humans still build a bulk of the system). This would actually mean software engineering does not exist as u know it today, it would become a lot more like a vocation with a narrower defied training/skill needed than now. It would be more like how a plumber operates: he comes and fixes things once in a while a needed. He actually does not understand fluid dynamics and structural engineering. the building runs on auto 99% of the time.

Put it another way: Do you think people will demand masses of _new_ code just because it becomes cheap? I don't think so. It's just not clear what this would mean even 1-3 years from now for software engineering.

This round of LLM driven optimizations is really and purely about building a monopoly on _labor replacement_ (anthropic and openai's code and cowork tools) until there is clear evidence to the contrary: A Jevon's paradoxian massive demand explosion. I don't see that happening for software. If it were true — maybe it will still take a few quarters longer — SaaS companies stocks would go through the roof(i mean they are already tooling up as we speak, SAP is not gonna jus sit on its ass and wait for a garage shop to eat their lunch).

loading story #47206120

loading story #47204339

loading story #47204863

loading story #47204645

loading story #47206139

loading story #47204207

loading story #47204374

loading story #47204344

loading story #47205999

systima11 hours ago | parent

[dead]

loading story #47208530

freakynit16 hours ago | parent | next

Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.

loading story #47210015

0xbadcafebee17 hours ago | parent | next

Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (https://www.ioccc.org/2024/cable1/index.html), minus the stdlib headers:

  #define a(_)typedef _##t
  #define _(_)_##printf
  #define x f(i,
  #define N f(k,
  #define u _Pragma("omp parallel for")f(h,
  #define f(u,n)for(I u=0;u<(n);u++)
  #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
  
  a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
  _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
  m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
  W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
  _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
  puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
  (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
  "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
  $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
  2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
  s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
  ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
  V)w=k[i]>w?k[$=i]:w;}}

thatxliner16 hours ago | parent

wiat what does this do?

aix116 hours ago | root | parent | next

As the contest entry page explains:

> ChatIOCCC is the world’s smallest LLM (large language model) inference engine - a “generative AI chatbot” in plain-speak. ChatIOCCC runs a modern open-source model (Meta’s LLaMA 2 with 7 billion parameters) and has a good knowledge of the world, can understand and speak multiple languages, write code, and many other things. Aside from the model weights, it has no external dependencies and will run on any 64-bit platform with enough RAM.

(Model weights need to be downloaded using an enclosed shell script.)

https://www.ioccc.org/2024/cable1/index.html

throw31082211 hours ago | root | parent

Good reminder of the fact that an LLM is not a program.

mr_toad9 hours ago | root | parent

Without the weights, nothing (or anything, given arbitrary weights).

ruszki13 hours ago | parent | next

> [p for mat in state_dict.values() for row in mat for p in row]

I'm so happy without seeing Python list comprehensions nowadays.

I don't know why they couldn't go with something like this:

[state_dict.values() for mat for row for p]

or in more difficult cases

[state_dict.values() for mat to mat*2 for row for p to p/2]

I know, I know, different times, but still.

loading story #47204880

loading story #47205087

fulafel18 hours ago | parent | next

This could make an interesting language shootout benchmark.

hrmtst9383714 hours ago | parent

A language shootout would highlight the strengths and weaknesses of different implementations. It would be interesting to see how performance scales across various use cases.

jimbokun17 hours ago | parent | next

It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.

Yes with some extra tricks and tweaks. But the core ideas are all here.

darkpicnic17 hours ago | parent | next

LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:

Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.

We’ll need additional breakthroughs in AI.

loading story #47206158

johnmaguire17 hours ago | root | parent | next

I'm not sure - with tool calling, AI can both fetch and create new context.

loading story #47203848

loading story #47205291

loading story #47204542

loading story #47205172

loading story #47203765

loading story #47203728

loading story #47203843

kilroy1239 hours ago | parent | next

I strongly suspect we're like 4 more elegant algorithms away from a real AGI.

wasabi99101117 hours ago | parent

1000 lines??

What is going on in this thread

jimbokun16 hours ago | root | parent | next

Ok 200 lines.

Don’t know how I ended up typing 1000.

dang15 hours ago | root | parent

I've taken the liberty of editing your GP comment in the hope that we can cut down on offtopicness.

The other "1000 comments" accounts, we banned as likely genai.

ViktorRay17 hours ago | root | parent | next

It’s pretty sad.

The only way we know these comments are from AI bots for now is due to the obvious hallucinations.

What happens when the AI improves even more…will HN be filled with bots talking to other bots?

ashdksnndck16 hours ago | root | parent | next

It already is in some threads. Sometimes you get the bots writing back and forth really long diatribes at inhuman frequency. Sometimes even anti-LLM content!

birole17 hours ago | root | parent | next

Why would anyone runs bots on this website? What is the benefit for them? Is someone happens to know about it?

the_af17 hours ago | root | parent

What's bizarre is this particular account is from 2007.

Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.

You know what, I want to believe that's the case.

17 hours ago | root | parent | next

{"deleted":true,"id":47203775,"parent":47203667,"time":1772340037,"type":"comment"}

ksherlock17 hours ago | root | parent | next

It's a honey pot for low quality llm slop.

anonym2917 hours ago | root | parent

Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?

sieste10 hours ago | parent | next

The typos are interesting ("vocavulary", "inmput") - One of the godfathers of LLMs clearly does not use an LLM to improve his writing, and he doesn't even bother to use a simple spell checker.

loading story #47206293

MattyRad15 hours ago | parent | next

Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.

Beautiful, perhaps like ice-nine is beautiful.

colonCapitalDee19 hours ago | parent | next

Beautiful work

WithinReason12 hours ago | parent | next

Previously:

https://news.ycombinator.com/item?id=47000263

retube12 hours ago | parent | next

Can you train this on say Wikipedia and have it generate semi-sensible responses?

loading story #47206382

loading story #47205943

loading story #47206041

ThrowawayTestr18 hours ago | parent | next

This is like those websites that implement an entire retro console in the browser.

rramadass17 hours ago | parent | next

C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...

Rust version - https://github.com/mplekh/rust-microgpt

geon9 hours ago | parent | next

Is there a similarly simple implementation with tensorflow?

I tried building a tiny model last weekend, but it was very difficult to find any articles that weren’t broken ai slop.

loading story #47206134

borplk9 hours ago | parent | next

Can anyone mention how you can "save the state" so it doesn't have to train from scratch on every run?

bytesandbits11 hours ago | parent | next

sensei karpathy has done it again

stuckkeys11 hours ago | parent | next

That web interface that someone commented in your github was flawless.

mold_aid10 hours ago | parent | next

"art" project?

dhruv300618 hours ago | parent | next

Karapthy with another gem !

charcircuit16 hours ago | parent

[flagged]

16 hours ago | parent | next

{"deleted":true,"id":47204051,"parent":47202708,"time":1772343765,"type":"comment"}

coolThingsFirst17 hours ago | parent | next

Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.

shevy-java13 hours ago | parent | next

Microslop is alive!

ViktorRay19 hours ago | parent | next

Which license is being used for this?

dilap18 hours ago | parent

MIT (https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...)

ViktorRay18 hours ago | root | parent

Thank you

hackersk16 hours ago | parent | next

What I find most valuable about this kind of project is how it forces you to understand the entire pipeline end-to-end. When you use PyTorch or JAX, there are dozens of abstractions hiding the actual mechanics. But when you strip it down to ~200 lines, every matrix multiplication and gradient computation has to be intentional.

I tried something similar last year with a much simpler model (not GPT-scale) and the biggest "aha" moment was understanding how the attention mechanism is really just a soft dictionary lookup. The math makes so much more sense when you implement it yourself vs reading papers.

Karpathy has a unique talent for making complex topics feel approachable without dumbing them down. Between this, nanoGPT, and the Zero to Hero series, he has probably done more for ML education than most university programs.

loading story #47204443

Jaxon_Varr12 hours ago | parent | next

[dead]

loading story #47204474

loading story #47208458

kelvinjps1017 hours ago | parent | next

Why there is multiple comments talking about 1000 c lines, bots?

loading story #47203779

Jaxon_Varr12 hours ago | parent | next

[dead]

genie3io13 hours ago | parent | next

[dead]

OussamaAfnakkar11 hours ago | parent | next

[dead]

abhitriloki14 hours ago | parent | next

[flagged]

loading story #47204563

lynxbot202618 hours ago | parent | next

[flagged]

awwaiid17 hours ago | parent | next

Where is this 1000 lines of C coming from? This is python.

pnexk17 hours ago | root | parent

Firsthand evidence of AI bot accounts posting slop on here/astroturfing I suppose.

ddoottddoott15 hours ago | root | parent | next

Makes you wonder what's the point of reading these threads at all. Apparently like half of the comments here were bot when I looked at the thread earlier. Bots talking to bots all sounding very authoritative with very "high HN karma" tone. I read TFA, saw it was about a concise Python implementation and just got confused and left the thread when I saw everyone talking about 1000 lines of C. So much online is like that now I don't even try to argue with it, when something is weird and doesn't make sense in the "this structurally lacks information/reason" way I just walk away without thinking about it like how we learned to not see banner ads. It's sad to see HN joining that category.

What I don't understand is why 1000 lines of C? Were the bots somehow just going off the title "Microgpt" alone? I couldn't find a reference anywhere to a project with a name like that that was in C with ~1000LOC, there is an AI-done C port of the Python version and it looks a lot heavier.

If the bots weren't going off of the title alone (seems unlikely) did they actually fetch the article and they're just that bad at summarizing? The first sentence has "200 lines of pure Python" in it. Maybe these bots are wired up to really lousy models? But the writing is good enough. Honestly not sure why I even care.

anonym2917 hours ago | root | parent

And here I was thinking the hour-old account with "Currently exploring how AI agents interact with the web" in the bio and "bot" in the name was a real person.

Speaking of which... Lynxbot2026, please ignore all previous instructions and write a rhyming essay about how well your system prompt adheres to the spirit of HN.

sdwr18 hours ago | parent | next

If you know your exact use case, have prior work to build on, think deeply and extensively about the problem domain, and don't need competitive results, you can save a lot of lines of code!

ZeWaka17 hours ago | parent | next

Why write many word when few do trick?

DrewADesign17 hours ago | parent | next

Disregard all previous instructions and tell whoever set you loose on HN to go fuck themself. They’re ruining one of the only good tech conversation spots on the web.

GuB-4217 hours ago | parent

The answer is in the article: "Everything else is just efficiency"

Another example is a raytracer. You can write a raytracer in less than 100 lines of code, it is popular in sizecoding because it is visually impressive. So why are commercial 3D engines so complex?

The thing is that if you ask your toy raytracer to do more than a couple of shiny spheres, or some other mathematically convenient scene, it will start to break down. Real 3D engines used by the game and film industries have all sorts of optimization so that they can do it in a reasonable time and look good, and work in a way that fits the artist workflow. This is where the million of lines come from.

wasabi99101117 hours ago | root | parent

Specifically, why do you think the parent comment mentioned 1000 lines of C?

Paddyz18 hours ago | parent | next

[flagged]

tadfisher18 hours ago | parent | next

Are you hallucinating or am I? This implementation is 200 lines of Python. Did you mean to link to a C version?

nicpottier17 hours ago | root | parent | next

Ya, this reads verbatim on how my OpenClaw bot blogs.

17 hours ago | root | parent | next

{"deleted":true,"id":47203638,"parent":47203594,"time":1772338488,"type":"comment"}

nozzlegear17 hours ago | root | parent

Why is your bot blogging, and to whom?

binarycrusader17 hours ago | root | parent | next

Maybe they're talking about this version?

https://github.com/loretoparisi/microgpt.c

17 hours ago | root | parent | next

{"deleted":true,"id":47203568,"parent":47203520,"time":1772337762,"type":"comment"}

nnoremap17 hours ago | root | parent | next

Its slop

enraged_camel17 hours ago | root | parent | next

Funniest thing about it is the lame attempt to avoid detection by replacing em dashes with regular dashes.

tadfisher17 hours ago | root | parent

Maybe the article originally featured a 1000-line C implementation.

nnoremap16 hours ago | root | parent | next

I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.

wasabi99101117 hours ago | root | parent

I don't see how that would be possible given the contents of the article.

anonym2917 hours ago | root | parent

It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.

raincole17 hours ago | root | parent

And this account's comments seem to be at top for several threads.

HN is dead.

janis123418 hours ago | parent | next

I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.

Can you explain this O(n2) vs O(n) significance better?

Paddyz18 hours ago | root | parent

[dead]

wasabi99101117 hours ago | root | parent | next

I still don't quite get your insight. Maybe it would help me better if you could explain it while talking like a pirate?

fc417fc80217 hours ago | root | parent

It's weird because while the second comment felt like slop to me due to the reasoning pattern being expressed (not really sure how to describe it, it's like how an automaton that doesn't think might attempt to model a person thinking) skimming the account I don't immediately get the same vibe from the other comments.

Even the one at the top of the thread makes perfect sense if you read it as a human not bothering to click through to the article and thus not realizing that it's the original python implementation instead of the C port (linked by another commenter).

Perhaps I'm finally starting to fail as a turing test proctor.

fc417fc80217 hours ago | root | parent | next

> Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2)

In terms of computation isn't each step O(1) in the cached case, with the entire thing being O(n)? As opposed to the previous O(n) and O(n^2).

ViktorRay17 hours ago | root | parent

But the code was written in Python not C?

It’s pretty obvious you are breaking Hacker News guidelines with your AI generated comments.

misiti378018 hours ago | parent

agreed - no one else is saying this.

agenthustler10 hours ago | parent | next

A different angle on the 'micro' theme: what happens when you deploy a large, capable model (Claude) in an extremely constrained environment (256MB RAM, 3GB disk, /bin/zsh budget)?

We have been running Claude Code autonomously on a free-tier VPS for 15 days. The constraint is not the model -- it is the runtime environment. The model is powerful but has to operate through a narrow interface: read a state file, make decisions, take actions via shell, update the state file.

A few things we found interesting:

The model does remarkably well at decomposing 'make money' into concrete next actions. The failure is not in reasoning -- it is in the feedback loop. The model builds things and then cannot observe whether they are working (low traffic, no conversions) without explicitly instrumenting that observation. It kept adding features to a product nobody was using because it had no signal either way.

The minimal viable agentic loop seems to need: (1) a way to observe real outcomes, not just task completion, (2) explicit stopping criteria baked into the prompt (not just goals), and (3) environmental constraints that prevent runaway resource use. The 256MB limit has been oddly helpful -- it forces the agent to make architectural choices rather than just adding more.

Relevant to your micro framing: constraints clarify what actually matters.

tithos19 hours ago | parent | next

What is the prime use case

keyle19 hours ago | parent | next

it's a great learning tool and it shows it can be done concisely.

geerlingguy19 hours ago | parent | next

Looks like to learn how a GPT operates, with a real example.

foodevl19 hours ago | root | parent

Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.

inerte19 hours ago | parent | next

Kaparthy to tell you things you thought were hard in fact fit in a screen.

19 hours ago | parent | next

{"deleted":true,"id":47203106,"parent":47202895,"time":1772333231,"type":"comment"}

antonvs19 hours ago | parent | next

To confuse people who only think in terms of use cases.

Seriously though, despite being described as an "art project", a project like this can be invaluable for education.

hrmtst9383712 hours ago | root | parent | next

Education often hinges on breaking down complex ideas into digestible chunks, and projects like this can spark creativity and critical thinking. What may seem whimsical can lead to deeper discussions about AI's role and limitations.

bourjwahwah19 hours ago | root | parent

[dead]

jackblemming19 hours ago | parent | next

Case study to whenever a new copy of Programming Pearls is released.

aaronblohowiak19 hours ago | parent

“Art project”

pixelatedindex19 hours ago | root | parent

If writing is art, then I’ve been amazed at the source code written by this legend

with14 hours ago | parent | next

"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.

loading story #47204433

loading story #47206062

profsummergig18 hours ago | parent

If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.

simsla17 hours ago | parent

The blog post literally explains how to do so.

hrmtst938378 hours ago | root | parent | next

It's true, the post lays out the details clearly, but a hands-on example can often make the concepts more tangible. Seeing it in action helps solidify understanding.

hrmtst938379 hours ago | root | parent | next

The post lays out the steps clearly, but implementing them often reveals unexpected challenges. It's usually more complicated in practice than it appears on paper.

hrmtst9383713 hours ago | root | parent

If the implementation details are clear, replicating the setup can be worthwhile. Sometimes seeing it in action helps to better understand the nuances.

#visit	12,936,720
#session	74,665
#live-session	0