Hacker News new | past | comments | ask | show | jobs | submit
Is it because of caching? If the context changes arbitrarily every turn then you would have to throw away the cache.
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
This seems misguided, you have to cache a prefix due to attention.