Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I use playwright CLI. Wrote a skill for it, and after a bit of tuning it's about 1-2k context per interaction which is fine. The key was that Claude only needs screenshots initially and then can query the dev tools for logs as needed.
Mostly, yes: https://github.com/microsoft/playwright-cli
my workaround for this was to make a wrapper mcp server which uses claude haiku to summarize the page snapshot returned in the response of each playwright mcp call, and that has worked pretty well for me: https://github.com/jsdf/playwright-slim-mcp