I did a similar project but using 3D fractals I found on shadertoy feeding into ViTs. They are extremely simple iterative functions that produce a ton of scene like complexity.
I have a pet theory that the visual cortex when developing is linked to some kind of mechanism such as this. You just need proteins that create some sort of resonating signal that feed into the neurons as they grow (obviously this is hand-wavy) but similar feedback loops guide nervous system growth in Zebra fish for example.
What were the results of 3d fractal shader pretraining?
I like your funny words, magic man!