Most privacy talk folds on contact with a quote. Latency and convenience beat philosophy fast once someone wants a dashboard next week, and a lot of "data sensitivity" talk is just the corporate version of buying "organic" food until the price tag shows up.
If private inference is actually non-negotiable, then sure, put GPUs in your colo and enjoy the infra pain, vendor weirdness, and the meeting where finance learns what those power numbers meant.
The real case for private inference is not "organic", it's "slow food". Offering slow-but-cheap inference is an afterthought for the big model providers, e.g. OpenRouter doesn't support it, not even as a way of redirecting to existing "batched inference" offerings. This is a natural opening for local AI.
But how slow is too slow (faster than you’d think) and even then, you’re in for $25,000 for even the most basic on-premise slow LLM.