For instance, in coding tasks, Sonnet 3.5 has benchmarked below other models for some time now, but there is fairly prevalent view that Sonnet 3.5 is still the best coding model.
Come onnnnnn, when someone releases something and claims it’s “infinite speed up” or “better than the best despite being 1/10th the size!” do your skepticism alarm bells not ring at all?
You can’t wave a magic wand and make an 8b model that good.
I’ll eat my hat if it turns out the 8b model is anything more than slightly better than the current crop of 8b models.
You cannot, no matter hoowwwwww much people want it to. be. true, take more data, the same architecture and suddenly you have a sonnet class 8b model.
> like an insane transfer of capabilities to a relatively tiny model
It certainly does.
…but it probably reflects the meaninglessness of the benchmarks, not how good the model is.