Hacker News new | past | comments | ask | show | jobs | submit
This stuff smells like maybe the bitter lesson isn't fully appreciated.

You might as well just write instructions in English in any old format, as long as it's comprehensible. Exactly as you'd do for human readers! Nothing has really changed about what constitutes good documentation. (Edit to add: my parochialism is showing there, it doesn't have to be English)

Is any of this standardization really needed? Who does it benefit, except the people who enjoy writing specs and establishing standards like this? If it really is a productivity win, it ought to be possible to run a comparison study and prove it. Even then, it might not be worthwhile in the longer run.

loading story #46879953
Folks have run comparisons. From a huggingface employee:

  codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.

  I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.
https://xcancel.com/ben_burtenshaw/status/200023306951767675...

That said, it's not a perfect comparison because of the Codex model mismatch between runs.

The author seems to be doing a lot of work on skills evaluation.

https://github.com/huggingface/upskill

loading story #46872614
loading story #46872429
loading story #46872371
loading story #46872526
loading story #46873065
loading story #46875314
loading story #46872620
loading story #46873088
loading story #46876291
loading story #46877862
loading story #46871997
loading story #46873800
loading story #46878990
loading story #46876538
loading story #46874492
loading story #46879602
loading story #46874449
loading story #46873430
loading story #46872295
loading story #46873517
loading story #46876389
loading story #46875423
loading story #46873008
loading story #46872949
loading story #46878802
loading story #46872410
loading story #46877003