Why wouldn't someone's subjective experience outweigh someone else's subjective experience?
Regardless, I do wonder how accurate those successful reports are. Do people take LLM output, use it verbatim, not notice subtle bugs, and report that as success?
loading story #42004115