For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
Which is called ARIA and has been a thing forever.