Story Detail of id 48386262 | Liveview Hacker News

wolttam1 day ago | on: Gemma 4 12B: A unified, encoder-free multimodal model

I think the idea is that the model is seeing embeddings that map directly to underlying pixel data, rather than being fed semantically rich embeddings from an encoder model which itself had seen the raw pixel data.

#visit	13,564,160
#session	74,665
#live-session	0