Story Detail of id 47439961 | Liveview Hacker News

benob6 hours ago | on: Pretraining Language Models via Neural Cellular Automata

Reminds me of "Universal pre-training by iterated random computation" https://arxiv.org/pdf/2506.20057, with bit less formal approach.

I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.

loading story #47443725

#visit	13,175,458
#session	74,665
#live-session	0