For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).
Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).
For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.
It is very basic and I am no DSL expert, but my idea was to build a graph from those complex documents (maintenance manuals) a that to decide what tools can be used for a given part on a given equipment in a given situation. If there is a path from A to Z it means you can use that tool given the circumstances. Basically the DSL is about pruning the graph as you specify things. I could have very well done without, but it is a fun project to try out rust, so I said, why not :)
If you look at a file like:
https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...
you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.
4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.
There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.
Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.
MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.
I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.
(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)
For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.
But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)
--
[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)
When I tried glm found it way way slower (omlx as runtime)
Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.