Hacker News new | past | comments | ask | show | jobs | submit
The H in RLHF stands for human. If humans didn't use the expression, then the LLM wouldn't.
In practice RLHF isn't a survey of every living humans personal style or preferences though, its purpose is to make the model more useful in the eyes of the vendor, mainly by getting cheap third-world labor to nudge the model according to the vendors instructions. You don't get a subservient, sycophantic and "safe" chat interface out of unstructured data without putting your thumb on the scale, hard.