That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters