The only thing I have Fable do now is create UIs or otherwise front-ends for systems where correctness doesn't matter as much.
Anthropic models lead at making nice looking UIs for sure, but when it comes to making sure my Rust code is actually 100% correct and uses 1% of CPU most of the time, Codex is king.
For me, Claude makes bone headed decisions all the time, like glaring errors, not even particularly subtle.
But the more obvious flag is the amount of irrelevant code and tests which Fable writes. Like it regularly writes 2X or 3X the amount of code and tests that are needed. It’s an expert at writing plausible but entirely useless tests.
But I think that if you’re a more junior engineer or haven’t been around a the block you can easily think that “more code equals smarter”. Claude ends up creating a massive, hard to manage codebase, and if you look the Claude Code codebase (which was leaked), you can see I’m right!
The Claude Code codebase is terrible. And presumably Anthropic has been using their smartest models for working on Claude Code. I wrote my own coding harness with Codex (as a fun experiment) which used a fraction of the code and is about 100X more performant and memory efficient (than Claude Code)!