Hacker News new | past | comments | ask | show | jobs | submit

Pendulum: A Benchmark for Assessing Sycophancy in MLLM's

https://arxiv.org/abs/2512.19350