> There are no details about training
my understanding was that they are not training at all, which would explain that. they are compiling an interpreter down to a VM that has the shape of a transformer.
ie they are calculating the transformer weights needed to execute the operations of the machine they are generating code for.
loading story #47367764