I use them for research on new features. If my feature is going to interact with a frontier language model in prod, I start with these free local ones which are all competent enough to produce structured output, make tool calls, interact with mcp etc. I don’t care much for the content at the early phase of engineering, I care about the schema & failure modes.
Then when I’m getting close to feature-complete, I’ll move to a hosted frontier model for the final integration.
Cost savings are enormous if you’re making dozens of calls to language models a minute.