I'd have thought at least a tiny explicit penalty term for switching, to discourage messing around with the composition without any expected gains from it.
If one is to use these on hardware that can't keep everything loaded I guess someone should examine how it works out in practice. Interpretability may be be a too much to ask, but I can't spontaneously see any reason why the experts can't at least be pushed to incorporate what's needed to remain the good choice for a longer segment.