I assume this is because reasoning is easy as long as it's just BAU prediction based on reasoning examples it was trained on. It's only when tackling a novel problem that the model needs to "reason for itself" (try to compose a coherent chain of reasoning). By generating synthetic data (R1 outputs) it's easy to expand the amount of reasoning data in the training set, making more "reasoning" problems just simple prediction that a simple model can support.