Hacker News new | past | comments | ask | show | jobs | submit
That's the theory and it does hold up in practice. When context is 70% raw logs and snapshots, the model starts losing track of the actual task. We haven't run formal benchmarks on answer quality yet, mostly focused on measuring token savings. But anecdotally the biggest win is sessions lasting longer before compaction kicks in, which means the model keeps its full conversation history and makes fewer mistakes from lost context.
> When context is 70% raw logs and snapshots, the model starts losing track of the actual task

Which frontier model will (re)introduce the radical idea of separating data from executable instructions?