Story Detail of id 47193074 | Liveview Hacker News

mksglu1 day ago | on: MCP server that reduces Claude Code context consumption by 98%

Author here. I shared the GitHub repo a few days ago (https://news.ycombinator.com/item?id=47148025) and got great feedback. This is the writeup explaining the architecture.

The core idea: every MCP tool call dumps raw data into your 200K context window. Context Mode spawns isolated subprocesses — only stdout enters context. No LLM calls, purely algorithmic: SQLite FTS5 with BM25 ranking and Porter stemming.

Since the last post we've seen 228 stars and some real-world usage data. The biggest surprise was how much subagent routing matters — auto-upgrading Bash subagents to general-purpose so they can use batch_execute instead of flooding context with raw output.

Source: https://github.com/mksglu/claude-context-mode Happy to answer any architecture questions.

lkbm19 hours ago | parent | next

Small suggestion: Link to the Cloudflare Code mode post[0] in the blog post where you mentio it. It's linked in the README, but when I saw it in the blog post, I had to Google it.

[0] https://blog.cloudflare.com/code-mode-mcp/

re5i5tor1 day ago | parent | next

Really intrigued and def will try, thanks for this.

In connecting the dots (and help me make sure I'm connecting them correctly), context-mode _does not address MCP context usage at all_, correct? You are instead suggesting we refactor or eliminate MCP tools, or apply concepts similar to context_mode in our MCPs where possible?

Context-mode is still very high value, even if the answer is "no," just want to make sure I understand. Also interested in your thoughts about the above.

I write a number of MCPs that work across all Claude surfaces; so the usual "CLI!" isn't as viable an answer (though with code execution it sometimes can be) ...

Edit: typo

mksglu1 day ago | root | parent | next

Right, context-mode doesn't change how MCP tool definitions get loaded into context. That's the "input side" problem that Cloudflare's Code Mode tackles by compressing tool schemas. Context-mode handles the "output side," the data that comes back from tool calls. That said, if you're writing your own MCPs, you could apply the same pattern directly. Instead of returning raw payloads, have your MCP server return a compact summary and store the full output somewhere queryable. Context-mode just generalizes that so you don't have to rebuild it per server.

re5i5tor23 hours ago | root | parent

Hmmm. I was talking about the output side. When data comes back from an MCP tool call, context-mode is still not in the loop, not able to help, is it?

Edit: clarify "MCP tool"

re5i5tor21 hours ago | root | parent

I dug into this further. Tested empirically and read the code.

Confirmed: context-mode cannot intercept MCP tool responses. The PreToolUse hook (hooks/pretooluse.sh) matches only Bash|Read|Grep|Glob|WebFetch|WebSearch|Task. When I called my obsidian MCP's obsidian_list via MCP, the response went straight into context — zero entries in context-mode's FTS5 database. The web fetches from the same session were all indexed.

The context-mode skill (SKILL.md) actually acknowledges this at lines 71-77 with an "after-the-fact" decision tree for MCP output: if it's already in context, use it directly; if you need to search it again, save to file then index. But that's damage control — the context is already consumed. You can't un-eat those tokens.

The architectural reason: MCP tool responses flow via JSON-RPC directly to the model. There's no PostToolUse hook in Claude Code that could modify or compress a response before it enters context. And you can't call MCP tools from inside a subprocess, so the "run it in a sandbox" pattern doesn't apply.

So the 98% savings are real but scoped to built-in tools and CLI wrappers (curl, gh, kubectl, etc.) — anything replicable in a subprocess. For third-party MCP tools with unique capabilities (Excalidraw rendering, calendar APIs, Obsidian vault access), the MCP author has to apply context-mode's concepts server-side: return compact summaries, store full output queryably, expose drill-down tools. Which is essentially what you suggested above.

Still very high value for the built-in tool side. Just want the boundary to be clear.

Correct any misconceptions please!

1 day ago | root | parent

{"deleted":true,"id":47200421,"parent":47197779,"time":1772313857,"type":"comment"}

nextaccountic19 hours ago | parent | next

Can this be used with other agents? I'm looking specifically into the Zed Agent

nitinreddy8822 hours ago | parent | next

Any reason why it doesn't support Codex? I believe the idea and implementation seems to be pretty much agent independent

esafak1 day ago | parent

Does your technique break the cache? edit: Thanks.

doctorpangloss4 hours ago | root | parent | next

The LLM that the "author" is using has no idea what it's talking about, and the reply you got is nonsense.

@dang it's really bad lately.

mksglu1 day ago | root | parent

Nope. The raw data never enters the conversation history in the first place, so there's nothing to invalidate. Tool output runs in a sandbox, a short summary comes back, and the full data sits in a local FTS5 index. The conversation cache stays intact because the context itself doesn't change after the fact.

#visit	12,938,256
#session	74,665
#live-session	0