Hacker News new | past | comments | ask | show | jobs | submit
The same thing applies to US models. Check out various system prompt leak repos on github. There are also prompt injections by various parallel "alignment" models that pre-process the prompt before it's sent to the main one with questionable guidance.

You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.

So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.