Chrome DevTools MCP (2025)
https://developer.chrome.com/blog/chrome-devtools-mcp-debug-your-browser-sessionYes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
Never tried that level of autonomy though. How long is your iteration cycle?
If I had to guess, mine was maybe 10-20 minutes over a few prompts.
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
‘I don’t have a girlfriend. But I do know a woman who’d be mad at me for saying that.’
‘I’m against picketing, but I don’t know how to show it.’
‘I haven’t slept for ten days, because that would be too long.’
‘I like to play blackjack. I’m not addicted to gambling. I’m addicted to sitting in a semi-circle.’
> Too many requests
You have exceeded a secondary rate limit.
Please wait a few minutes before you try again; in some cases this may take up to an hour. Signing in may provide a higher rate limit if you are not already signed in.
For more on scraping GitHub and how it may affect your rights, please review our Terms of Service.
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
Is this the same as what Claude in Chrome does?
I tried that for a while and since I use Firefox and Chromium, the security problem of it seeing your tabs wasn't a big deal. Fresh Chrome install, only ever used for this exact purpose. Plus you can watch it working in real (actually very slow) time so if you did point it at something risky you can take over at any point.
For actual testing of web apps though, a skill with playwright cli in headless mode is much more effective. About 1-2k context per interaction after a bit of tuning.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
Source - I know people at Google.
Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
Use agent skills. And say goodbye to MCP. We need to move on from MCP.
It is hard to say nowadays, when things change so quickly
MCP permanently sacrifice a chunk of the context window? And a skill for you cli is free?
Couldn't have been more wrong. MCP despite its manageable downsides is leagues ahead of anything else in many ways.
The fact that SoTA models are trained to handle MCP should be hint enough to the observant.
I probably build one MCP tool per week at work.
And every project I work on gets its own MCP tool too. It's invaluable to have specialized per-project tooling instead of a bunch of heterogeneous scripts+glue+prayer.
Anything specialized goes into an MCP.
I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
What about all the CLI tools not baked into the model's priors?
Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
Odd that this article from Dec 2025 has been posted to the top of HN though
It takes over your entire browser to center a div... and then fails to do so?
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle - Auth/session handling gets messy fast - And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
Will check this out to see if they’ve solved the token burn problem.
https://danielraffel.me/2026/03/16/my-oscar-2026-picks/
I know I could just use claude --chrome, but I’m used to this excellent MCP server.
We could put them in a dedicated tag:
<script type="text/skill+markdown">
---
name: ...
description ...
---
...
</script>
For all the skills with you want on the page, optionally set to default which "should be read in full to properly use the page".And then add some javascript functions to wrap it / simplify required tokens.
Made a repo and a website if anyone is interested: https://webagentskills.dev/
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
bb-browser site twitter/feed
and get structured JSON back.Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
Hoping from some good stories from open claw users that permanently run debug sessions.
The approach I landed on was a deterministic enforcement pipeline that sits between the agent and the MCP server, so every tool call gets checked for things like SSRF (DNS resolve + private IP blocking), credential leakage in outbound params, and path traversal, before the call hits the real server. No LLM in that path, just pattern matching and policy rules, so it adds single-digit ms overhead.
The DevTools case is interesting because the attack surface is the page content itself. A crafted page could inject tool calls via prompt injection. Having the proxy there means even if the agent gets tricked, the exfiltration attempt gets caught at the egress layer.
For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
Which is called ARIA and has been a thing forever.