It's interesting they only included 6 metrics this time. Opus 4.7 had 12, and 4.6 had 13.
Of the metircs they reported for 4.7, for 4.8 they excluded BrowseComp, CharXiv Reasoning, CyberGym, GPQA Diamond, MCP Atlas, MMMLU, SWE-bench Verified. The last 4 were almost always mentioned in previous Opus releases.
loading story #48321705