Conceptually, this seems a good direction.
The other piece that has always struck me as a huge inefficiency with current usage of LLMs is the hoops they have to jump through to make sense of existing file formats - especially making sense of (or writing) complicated semi-proprietary formats like PDF, DOC(X), PPT(X), etc.
Long-term prediction: for text, we'll move away from these formats and towards alternatives that are designed to be optimal for LLMs to interact with. (This could look like variants of markdown or JSON, but could also be Base64 [0] or something we've not even imagined yet.)
If LLMs can't deal with those legacy file formats, I don't trust them to be able to deal with anything. The idea that LLMs are so sophisticated that we have a need to dumb down inputs in order to interact with them is self-contradictory.
loading story #47352523