I am just testing it on stuff I know intimately myself. I would probably not understand a proof of Collatz if it was dansing in front of me!
Sorry to belabor this but it's basically pointless saying you have nuts it can't crack without showing us the nuts.
The curse of the 'use case' comes in here too. When people think that everything should have a use case, that's a lot of training data suggesting to a model that things should only be used for what someone has already thought of.
A couple of times I have had to manually code proof of concept pieces so that the model breaks out of that "unpossible" mode and actually helps me.
I can't remember if it was chatGPT or Claude, but when I showed it how to get a MessagePort in its JavaScript executor through to the artifact/canvas, it quickly went from "That can't be done" to positively enthusiastic about the possibilities. I suspect those shenanigans will be well off the table for Fable though.