(1) OpenAI & Anthropic are absolutely cooked; it's obvious they have no moat
(2) Local/private inference is the future of AI
(3) There's *still* no killer product yet (so get to work!)1) OpenAI and Anthropic are killing it, and continue to do so, their coding tools are unmatched for professionals.
2) Local models don't hold a candle to SOTA models and there's nothing on the horizon that indicates that consumers will be able to run anything close to what you can get in a data center.
3) Coding is a killer product, OpenAI and Anthropic are raking in the cash. The top 3 apps are apps in the app store are AI. Everyone who knows anything is using AI, every day, across the economy.
On (2), I agree with you for local models. BUT, there are also the open source Chinese models accessible via open-router. Your argument ("don't hold a candle to SOTA models") does not hold if the comparison is between those.
On (1), I agree more with the grandparent than with your assessment. Yes, OpenAI and Anthropic are killing it for now, but the time horizon is very short. I use codex and claude daily, but it's also clear to me that open source is catching up quickly, both w.r.t. the models and the agentic harnesses.
I thought so myself, but after burning a lot of money on OpenRouter in a few days I just subscribed to Z.ai's Coding Pro plan and using the subscription is much, much friendlier with my wallet.
And? They aren't as good as SOTA models. Even the SOTA model provider's small models aren't worth using for many of my coding tasks.
(1): You don't have to be an Ed Zitron disciple to infer that OpenAI and Anthropic are likely overvalued and that Nvidia is selling everyone shovels in a gold rush. AI is a game-changing technology, but a shitty chat interface does not a company make. OpenAI and Anthropic need to recoup astronomical costs used in training these models. Models that are now being distilled[1] and are quickly becoming commoditized. (And frankly, models that were trained by torrenting copyrighted data[2], anyway.) Many have been calling this out for years: the model cannot be your product. And to be clear, OpenAI/Anthropic most definitely know this: that's why they've been aquihiring like crazy, trying to find that one team that will make the thing.
(2): Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this. Go use an almost-SOTA model (a big Deepseek or Qwen model) offered by many bare-metal providers and you'll see what "true" token prices should look like. The end-state here is likely some models running locally and some running in the cloud. But the current state of OpenClaw token-vomit on top of Claude is fiscally untenable (in fact, this is why Anthropic shut it down).
(3): This is typical Dropbox HN snark[3], of which I am also often guilty of. I really don't think AI coding is a killer product and this seems very myopic—engineers are an extreme minority. Imo, the closest we've seen to something revolutionary is OpenClaw, but it's janky, hard to set up, full of vulnerabilities, and you need to buy a separate computer. But there's certainly a spark there. (And that's personally the vertical I'm focusing on.)
[1] https://www.anthropic.com/news/detecting-and-preventing-dist...
[2] https://media.npr.org/assets/artslife/arts/2025/complaint.pd...
Anthropic is up to $30B annual recurring revenue. I wish I had failing business models like that.
> Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this. Go use an almost-SOTA model (a big Deepseek or Qwen model) offered by many bare-metal providers and you'll see what "true" token prices should look like.
I'm not sure what think you are saying here, but if you look at the providers for both "almost-SOTA model (a big Deepseek or Qwen model)" or at the price for Claude on AWS Bedrock, Azure or on GCP you will quickly see inference is very profitable.
Landing a man on the moon is way more impressive. Finding several vaccines for a once in a century pandemic within a year of its outbreak is and achievement that in its impact and importance dwarfs what the entire LLM industry put together has achieved. The near-complete eradication of polio, once again, way more important and impactful.
I'd like to think the superior product wins. But Windows still thrives despite widespread Linux availability. I think sometimes we can underestimate the resilience of the tech oligopolies, particularly when they're VC-funded.
If I want to switch from Windows to Linux, I have to reconsider a whole variety of applications, learn a different UX, migrate data, all sorts of annoyances.
When I switch between Codex and Claude Code, there is literally no difference in how I interact with them. They and a number of other competitors are drop in replacements for each other.
That's because by most metrics Linux is inferior is Windows.
I can totally see the same happening here; on-device LLMs are a toy, and then they eat the world and everyone has their own personal LLM running on their own device and the cloud LLMs are a niche used by large institutions.
GLM 5.1 has 754B parameters tho. And you still need RAM for context too. You'll want much more than 96GB ram.
That's a valuable guarantee. So valuable, in fact, that you won't get it from Anthropic, OpenAI, or Google at any price.
Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.
Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:
Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.
I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.
For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.
We probably talk abuot a year of progress diffeerence.
Its also still quite expensive for an avg person to consume any of it. Either due to hardware invest, energy cost or API cost.
Also professionally I don't think anyone will really spend a little bit less money of having the 3th quality model running if they can run the best model.
I'm happy that we reach levels were this becomes an alternative if you value open and control though.
(2) is probably true but with caveats. Top-tier models will never run on desktop machines, but companies should (and do) host their own models. The future is open-weight though, that much is for sure.
(3) This is so ignorant that others have already responded to it. Look outside of your own bubble, please.
Sorry, but you don't know that
Every time I asked a question it generated an interactive geometry graph on the fly in Javascript. Sometimes it spent minutes compiling and testing code on the server so it could make sure it was correct. I was really impressed.
Anyway I couldn't really learn anything since when the code didn't work I wasn't sure if I had ported it wrong or the AI did it wrong, so I ended up learning how to calculate SDF and pixel to hex grid from tutorials I found on google instead.
I think big corporations will continue to use them no matter how cheap and good other models are. There's a saying: nobody was fired for buying IBM.