Story Detail of id 48607545 | Liveview Hacker News

xlii10 hours ago | on: GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

My anecdotal experience differs (though I hold ground that LLM evaluations are highly subjective and benchmarks are just as useful for LLMs as they are for dating websites users).

GLM 5.2 tends to stray way more than and 5.1. It also hallucinates you things subtly: morphs requirements, makes unfounded conclusions. This output is not something I experienced in any model I seen so far.

In coding it's especially annoying because it steers whole request. E.g. I give instruction: "make we a Rust-WASM-Canvas app" and GLM 5.2 goes like "Oh user surely doesn't mean that. I'll better build Dioxus app instead".

loading story #48607651

oshrimpton10 hours ago | parent

Yeah the benchmark for sure isn't perfect and without super rigid prompting it is far too easy for it to get off course. 28% hallucination rate isn't nothing either

#visit	13,965,316
#session	74,665
#live-session	0