It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.
Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.
Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.
Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.
I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.