During off peak hour a simple 3 line CSS change took over 50 minutes and it routinely times out mid-tool and leaves dangling XML and tool uses everywhere, overwriting files badly or patching duplicate lines into files
Starting an hour or two ago GLM's API endpoint is failing 7/8 times for me, my editor is retrying every request with backoff over a dozen times before it succeeds and wildly simple changes are taking over 30 minutes per step.
But it's all casual side projects.
Edit: I often to /compact at around 100 000 token or switch to a new session. Maybe that is why.
For the price this is a pretty damn impressive model.
Providers like DeepInfra are already giving access to 5.1 https://deepinfra.com/zai-org/GLM-5.1
$1.40 in $4.40 out $0.26 cached
/ 1M tokens
That's more expensive than other models, but not terrible, and will go down over time, and is far far cheaper than Opus or Sonnet or GPT.
I haven't had any bad luck with DeepInfra in particular with quantization or rate limiting. But I've only heard bad things about people who used z.ai directly.
Devil's advocate: why shouldn't they do it if OpenAI, Anthropic and Google get away with playing this game?