Story Detail of id 48316891 | Liveview Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

ACCount3718 hours ago | on: Claude Opus 4.8

Full distributions are a fucking pain to save - at this point just save the hidden states. But there are lossy compression tricks there.

rao-v11 hours ago | parent

To the previous poster's point, soft distributions are useful, even saving the top 10 logits is significantly more training signal than just the final token.