Unless you're systematically repeating the exact same task, the most parsimonious explanation is that you're seeing natural variation based on different tasks, random sampling of tokens, etc.
I don't think this explains the phenomenon as is more temporal in nature - not prompt to prompt. I'm sure the AI labs gracefully degrade to simpler models when resources are low - why wouldn't they?