Hacker News new | past | comments | ask | show | jobs | submit
Cache is always there, it’s just that it only caches up to the point where an input token changes. So if the tools list is early in the prompt, changing it would limit cache for most of the prompt. If the tools list is the last thing, you could still get 99% cache hits even if it changes every turn.
Depends upon the service and how the harness is built, Some of the services allow for very few cache keys, so you won't necessarily get any cache if you edit recent messages as the cache is not per message, but big blocks of everything up to a cache key.

This was actually surprising to me when I learned about it as I have never worked with (or built) any cache working like that before.

After a couple of turns the system prompt is a small part of the context. Not changing the system prompt at all is key so that the rest of the history is itself part of the prefix.