Story Detail of id 48506703 | Liveview Hacker News

torginus4 hours ago | on: Kimi K2.7-Code: open-source coding model with better token efficiency

I wonder why it's the natural tendency of models to BS or do stuff like this when they don't have the correct answer - it's clear that they can program refusal into them, but for some reason, refusal has to be injected after the fact, and models can't really arrive at the conclusion that they can't answer properly.

Eridrus3 hours ago | parent

I assume it's a lack of care when RLing them.

RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution.

So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them.

#visit	13,790,357
#session	74,665
#live-session	0